Where Does Your Data Live? Storing Data in Data Curation
Get link
Facebook
X
Pinterest
Email
Other Apps
You have cleaned your data, labelled every column, and written documentation you are proud of. Now comes a question that is far less glamorous but every bit as important: where are you going to put it? In the context of curation, how and where you store your data can determine whether it survives five years or fifty. A study published in Library Management found that researchers at a malawian public university stored data primarily on personal computers, flash disks, and email-all high- risk, fragile options. The data was being created, but it had nowhere safe to live.
Storage Is Not The Same as Preservation
Data storage is not just an IT concern. It is a core pillar of data curation. Where data is kept determines whether it can be found, accessed, verified, and reused years from now.As Hart et al. argue, poor storage contributes directly to "data entropy"- the slow, quiet decay of information that becomes less accessible over time. Good curation means choosing storage that protects data not just today, but for decades.
A Malawian Success Story
Not all the news is bleak. Malawi's Ministry of Health has used DHIS2 since 2012 as its central Health Management Information System, storing health data from facilities nationwide. When COVID-19 arrived, the country leveraged this infrastructure to track cases and coordinated responses- precisely because the data already had a proper home. the National Statistical Office in zomba also maintains an open data portal, making census and demographic datasets publicly accessible under national standards.
A dataset without a home is a dataset waiting to be lost. Storage is not where curation ends- it is where curation begins.
What Good Storage Looks Like
Whether you are a student at chancellor College or a researcher at the Malawi-Liverpool-Welcome Trust, the principles are the same. Good storage means using platforms that assign persistent identifiers, enforce backup protocols, and support open formats. Global repositories like Zenodo and Dryad are free and accept data from anywhere in the world. Locally, institutional repositories- even simple ones -can make an enormous difference if paired with clear metadata and version control.
What This Means for Us
Malawi generates rich, valuable research data every- in agriculture, health, education, and climate science. But if that data lives on a single flash drive or an unprotected laptop, it is as fragile as the device holding it. Moving towards structured, curated storage is not about buying expensive servers. It is about building habits: naming files consistently, writing documentation, choosing stable formats, and depositing data somewhere it will survive beyond the project that created it .
The question is not whether your data matters. It does. The question is: does it have a home that will last?
>> Learn More
Short videos on data storage, curation, and preservation in practice.
Let me paint you a picture. A reseacher spends two years collecting data, publishes a groundbreaking paper, and then moves on. Five years later, someone tries to build on that work. They find the dataset, download it, and open it only to discover unlabeled colomns, missing documentation, and file formats nobody usess anymore. all that effort, quietly slipping into irrelevance. This happens far more often that we like to admit, and it is exactly why data curation deserves our attention. what data curation really means Data curation is the ongoing practice of collecting, organizing, cleaning, documenting, and preserving data so it remains useful over time. The goal is to make date FAIR (Findable, accessible, interoperable, and reusable). It sounds technical, but at its heart, it is about respect: respect for the work that produced the data, and respect for the people who might need it next. A survey published in PLOS ONE found that 97% of researchers who used curation service...
Here is the truth nobody likes to here: not all data is worth saving. We live in a time that generates information at a breathtaking pace, yet the instinct to keep everything can do more hard than good. Without thoughtful selection, repositories become cluttered warehouses where valuable datasets get buried under mountains of noise. that is why selection and appraisal sit at the very heart of good data curation. What Do Selection and Appraisal Actually Mean? In the DCC Curation Lifecycle Model , appraisal is described as the process of evaluating data to determine what merits long-term curation and preservation. Selection is the decision that follows: which datasets stay, and which ones do not. Together, they act as a quality filter, ensuring that the data we invest in preserving is genuinely worth the effort. Think of it like editing a book. A first draft has raw material, some brilliant and some not. An editor does not keep every sentence out of loyalty to the writer. They keep ...
In today's digital environment, organizations generate and manage vast amounts of information that must remain accessible and reliable over time. Data curation refers to the active management, preservation, and enhancement of digital assets throughout their lifecycle to ensure their long-term usability (Harvey, 2018). While technological threats often dominate discussions about digital preservation, organizational issues are equally significant and can greatly affect the success of preservation initiatives. One of the most common organizational challenges is the lack of a comprehensive digital preservation strategy . Many organizations create and store digital records without establishing clear policies, procedures, and responsibilities for long-term preservation. According to the Digital Preservation Coalition (2024), the absence of formal preservation frameworks often leads to inconsistent practices, making digital materials vulnerable to loss, corruption, or inaccessibility. Ins...
Nice work
ReplyDeleteGreat overview of digital preservation vs storage .
ReplyDeleteGood one
ReplyDelete