Data Curation- The Art of Selecting and Appraising Data

May 13, 2026

Here is the truth nobody likes to here: not all data is worth saving. We live in a time that generates information at a breathtaking pace, yet the instinct to keep everything can do more hard than good. Without thoughtful selection, repositories become cluttered warehouses where valuable datasets get buried under mountains of noise. that is why selection and appraisal sit at the very heart of good data curation.

What Do Selection and Appraisal Actually Mean?

In the DCC Curation Lifecycle Model, appraisal is described as the process of evaluating data to determine what merits long-term curation and preservation. Selection is the decision that follows: which datasets stay, and which ones do not. Together, they act as a quality filter, ensuring that the data we invest in preserving is genuinely worth the effort.

Think of it like editing a book. A first draft has raw material, some brilliant and some not. An editor does not keep every sentence out of loyalty to the writer. They keep what serves the reader. Appraisal works the same way, except the reader is every future researcher who might depend on that data.

DCC Curation Lifecycle Model showing appraisal and selection as key activities

The DCC Curation Lifecycle Model- appraisal and selection are central activities in the cycle.

Why We Cannot Keep Everything

Storage is cheap, but curation is not. Creating proper metadata, ensuring file formats remain accessible, and maintaining documentation all cost time and expertise. The Digital curation Centre makes a compelling point: reducing the quality of data we maintain actually improves the quality of what we preserve. Less clutter means better metadata, more reliable verification, and stronger trust in the collection.

Deciding what not to keep is just as important as deciding what to keep. Good curation requires the courage to let go.

What Makes Data Worth Keeping?

There is no single checklist that fits every discipline, but common criteria tend to surface across the literature. Tallman and Work developed a practical framework rooted in traditional collection development principles, asking questions about a dataset's uniqueness, its potential for reuse, its technical sustainability, and its alignment with an institution's mission.

Cornell University Library, for instance, runs each submission through a curation review that includes file inventory, risk assessment, and appraisal before anything goes live. It is a careful, human process, not an automated sorting machine.

The Human Judgment Behind It All

What I find most compelling about appraisal is that it cannot be fully automated. Algorithms can flag duplicates or outdated formats, but deciding whether a dataset holds long-term scientific or cultural significance requires human judgment. It requires understanding context, recognizing potential, and sometimes making difficult calls about what to let go.

In that sense, selection and appraisal are not cold bureaucratic steps. They are acts of stewardship. Every time a curator decides a dataset is worth preserving, they are making a quiet promise: this matters, and we will take care of it.

FAIR Data Principles- Findable, Accessible, Interoperable, Reusable