Using and Reusing Data: why data curation matters in the age of information abundance

 From data explosion to data responsibility

Almost every part of academic and public life now produces data. Universities collect research datasets, governments release census and administrative records, and libraries continue to digitize collections that once lived only in boxes, cabinets, and reading rooms. Creating data, however, is only the beginning. Data becomes useful when someone else can understand it, trust it, preserve it, and use it without having to reconstruct the whole story from scratch.
In Library and Information Science (LIS), using data usually means applying it for the purpose for which it was first collected. Reusing data means taking an existing dataset and asking new questions of it, sometimes in a different field or with a different method. Pasquetto, Randles, and Borgman make an important point here: reuse is not just a technical matter of downloading a file (2017). It depends on context, documentation, and interpretation. For LIS students and data professionals, that is where curation becomes practical rather than abstract.

Why reusing data matters

Economic value: getting more from research investment collecting data takes money, time, expertise, and often the trust of communities or research participants. When datasets are prepared well enough to be reused, institutions avoid repeating work
that has already been done and get more value from public or grant-funded research. Tenopir et al. (2015) found that data sharing and reuse have become part of how many scientists understand the research lifecycle, even if everyday practice still varies across fields. Good reuse does not mean treating a dataset as free raw material. It means making sure the people who come after the original project can see how the data was produced, what it can support, and where its limits are.

Scientific value: reproducibility and new questions

Reusable data also supports more transparent research. If a dataset is well documented and accessible, other researchers can check findings, test methods, or apply new forms of analysis. Sometimes a dataset collected for one study can help answer a question the original team never had in mind. That possibility depends on the future user knowing enough about the data to avoid misreading it (Pasquetto et al., 2017).

Collaborative value: working across fields

Data rarely stays neatly inside one discipline. A public health dataset may interest sociologists, economists, geographers, or policy researchers. In that kind of environment, libraries, archives, and information centers do more than store files. They help people find datasets, judge whether those datasets are suitable, and understand the conditions under which reuse is allowed.

How curation makes reuse possible

The data lifecycle

The data lifecycle describes the stages data passes through, from creation to preservation and later reuse. The Digital Curation Centre describes curation as an ongoing process rather than a final step that happens after a project is finished. In practice, the lifecycle usually includes:

  • Creating or collecting data
  • Processing and analyzing it
  • Writing documentation and metadata
  • Storing and preserving files
  • Managing access, sharing, and reuse

Each stage affects what happens later. A file saved in an unstable format, a missing codebook, or an unclear rights statement can make an otherwise valuable dataset difficult or even impossible to reuse.

Preservation, quality, and honest documentation

Preservation is more than putting files in cloud storage. It includes protecting data from corruption, format obsolescence, unauthorized access, and loss of context. Quality also matters, but quality does not mean pretending that data is perfect. The Inter-university Consortium for Political and Social Research (ICPSR, n.d.) stresses the need for clear documentation, preparation, and confidentiality management. In practice, good curation tells future users what the data can do, what it cannot do, and how those judgments were made.

Standards and frameworks for reuse

The FAIR principles

The FAIR principles are one of the most widely used frameworks in research data management. Wilkinson et al. (2016) introduced FAIR as a way to improve the value of digital research objects, and the UK Data Service (n.d.) presents the same principles as central to responsible data management. For LIS work, FAIR gives curators a practical checklist:
  • Findable data has clear descriptions, searchable records, and persistent identifiers.
  • Accessible data can be obtained through clearly described access conditions, even when restrictions apply.
  • Interoperable data uses formats, vocabularies, and structures that other systems and communities can work with.
  • Reusable data includes licenses, provenance, methods, and enough context for responsible secondary use.

Metadata as the language of reuse

Metadata is often introduced as data about data, but that definition is too thin for reuse. Metadata is the information that lets someone decide whether a dataset is understandable, trustworthy, and appropriate for their purpose. The Dublin Core Metadata Initiative (2020) provides standard terms for describing digital resources, including title, creator, date, and format. Data Cite extends this idea for research outputs by supporting dataset citation, persistent identification, and relationships among creators, funders, and outputs (Data Cite Metadata Working Group, 2026).

At its best, metadata answers the questions a careful user would naturally ask: Who created this? How was it collected? What methods shaped it? What restrictions apply? How should it be cited?

Barriers to reuse

Data friction and weak documentation

One of the biggest obstacles to reuse is data friction: the small and large difficulties that make data hard to move, interpret, or apply in a new context. Friction can come from incompatible file formats, missing metadata, unclear variable names, incomplete codebooks, or rights information that nobody can interpret with confidence. A dataset without documentation is a little like a library book with no title page, no catalog record, and no indication of where it came from.

Ethics, privacy, and consent

Not every dataset should be openly reused. Human-subject data, health records, personal identifiers, and sensitive cultural knowledge all require careful handling. The World Health Organization (2022) argues for clear rules around the onward sharing of health-related data. For LIS professionals, responsible curation means balancing access with privacy, consent, and respect for the people represented in the data. Openness is useful only when it is also lawful and ethical.

The information professionals responsibility

Using and reusing data now sits at the center of research, accountability, and evidence-based decision making. But reuse does not happen just because a file is uploaded somewhere. It depends on curation, preservation, metadata, ethical review, and the steady application of frameworks such as FAIR.
Information professionals are no longer only custodians of books and archival materials. They also help maintain the digital knowledge infrastructure that makes future research possible. That work is practical, technical, and ethical all at once. The question worth asking is simple but demanding: how do we curate data today so that tomorrows communities can still use it with confidence?

References

DataCite Metadata Working Group. (2026). DataCite metadata schema documentation for the publication and citation of research data and other research outputs, Version 4.7. DataCite e.V.

Digital Curation Centre. (n.d.). Curation lifecycle model.

Dublin Core Metadata Initiative. (2020). DCMI metadata terms.

Inter-university Consortium for Political and Social Research (ICPSR). (n.d.). Guide to social science data preparation and archiving.

Pasquetto, I. V., Randles, B. M., & Borgman, C. L. (2017). On the reuse of scientific data. Data Science Journal,16, Article 8.

Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D., & Dorsett, K. (2015). Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLOS ONE, 10(8), e0134826.

UK Data Service. (n.d.). Research data management.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., et al.(2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, Article 160018.

World Health Organization. (2022). Sharing and reuse of health-related data for research purposes: WHO policy and implementation guidance.

Comments

Post a Comment

Popular posts from this blog

Why Data Curation Matters More Than You Think

Data Curation- The Art of Selecting and Appraising Data

Data Curation Preservation Issues: Organizational Challenges in preserving Digital Information