Using and Reusing Data: why data curation matters in the age of information abundance
From data explosion to data responsibility
In Library and Information Science (LIS), using data usually means applying it for the purpose for which it was first collected. Reusing data means taking an existing dataset and asking new questions of it, sometimes in a different field or with a different method. Pasquetto, Randles, and Borgman make an important point here: reuse is not just a technical matter of downloading a file (2017). It depends on context, documentation, and interpretation. For LIS students and data professionals, that is where curation becomes practical rather than abstract.
Why reusing data matters
Scientific value: reproducibility and new questions
Collaborative value: working across fields
Data rarely stays neatly inside one discipline. A public health dataset may interest sociologists, economists, geographers, or policy researchers. In that kind of environment, libraries, archives, and information centers do more than store files. They help people find datasets, judge whether those datasets are suitable, and understand the conditions under which reuse is allowed.
How curation makes reuse possible
The data lifecycle
The data lifecycle describes the stages data passes through, from creation to preservation and later reuse. The Digital Curation Centre describes curation as an ongoing process rather than a final step that happens after a project is finished. In practice, the lifecycle usually includes:
- Creating or collecting data
- Processing and analyzing it
- Writing documentation and metadata
- Storing and preserving files
- Managing access, sharing, and reuse
Each stage affects what happens later. A file saved in an unstable format, a missing codebook, or an unclear rights statement can make an otherwise valuable dataset difficult or even impossible to reuse.
Preservation, quality, and honest documentation
Preservation is more than putting files in cloud storage. It includes protecting data from corruption, format obsolescence, unauthorized access, and loss of context. Quality also matters, but quality does not mean pretending that data is perfect. The Inter-university Consortium for Political and Social Research (ICPSR, n.d.) stresses the need for clear documentation, preparation, and confidentiality management. In practice, good curation tells future users what the data can do, what it cannot do, and how those judgments were made.
Standards and frameworks for reuse
The FAIR principles
- Findable data has clear descriptions, searchable records, and persistent identifiers.
- Accessible data can be obtained through clearly described access conditions, even when restrictions apply.
- Interoperable data uses formats, vocabularies, and structures that other systems and communities can work with.
- Reusable data includes licenses, provenance, methods, and enough context for responsible secondary use.
Metadata as the language of reuse
Metadata is often introduced as data about data, but that definition is too thin for reuse. Metadata is the information that lets someone decide whether a dataset is understandable, trustworthy, and appropriate for their purpose. The Dublin Core Metadata Initiative (2020) provides standard terms for describing digital resources, including title, creator, date, and format. Data Cite extends this idea for research outputs by supporting dataset citation, persistent identification, and relationships among creators, funders, and outputs (Data Cite Metadata Working Group, 2026).
At its best, metadata answers the questions a careful user would naturally ask: Who created this? How was it collected? What methods shaped it? What restrictions apply? How should it be cited?
Barriers to reuse
Data friction and weak documentation
Ethics, privacy, and consent
Not every dataset should be openly reused. Human-subject data, health records, personal identifiers, and sensitive cultural knowledge all require careful handling. The World Health Organization (2022) argues for clear rules around the onward sharing of health-related data. For LIS professionals, responsible curation means balancing access with privacy, consent, and respect for the people represented in the data. Openness is useful only when it is also lawful and ethical.
The information professionals responsibility
References
Digital Curation Centre. (n.d.). Curation lifecycle model.
Simple, clear, and relevant insights on data use and reuse.
ReplyDelete