6 reasons higher education should think about preserving research data

What does it mean?
Digital preservation is a way of planning a strategy for the long-term continuity and survival of important digital objects such as documents, images, other digital files, and outputs from research data. Although it’s often perceived as exclusively a technical matter, it does in fact go a lot further than an IT project.

It is more like a very specialist form of project management. Long-term preservation is extremely relevant for those working in higher education, particularly research data managers.

What sort of things would you want to preserve?
Broadly, there are two main objects in this context:

Published outputs from the project, such as reports and findings, probably peer-reviewed and maybe published in an academic journal
The raw data itself

Why should you care about it?
There are a number of compelling reasons why preserving research data is relevant to HE institutions just now. Here are just six:

1. Funders require it
Individual funders, the EPSRC, and UK Research Councils have now explicitly expressed digital preservation as a requirement. The expectation is that datasets will be preserved for at least 10 years after the project’s completion. This expectation is now becoming a condition of funding. To put this at its starkest, one might even say: “no preservation, no funding”.

2. Openness
There is a national and international drive towards openness and sharing of data. While this can be achieved with a well-managed institutional repository, the task is made easier through a digital preservation plan that includes a well-defined and persistent metadata set and adherence to descriptive standards.

3. Your institution cares about its research
The university will have an interest in showcasing certain high profile projects, or as examples of best practice. Preservation of these high-quality datasets will be useful in securing funding for new research projects.

4. Data owners care about their research
The academic who created the dataset will have a strong interest in preserving their own work; they will be aware of the value it has for others in their discipline. Keeping the dataset in a state of preservation will support their current and future research work. It is extremely expensive, and well-nigh impossible, to recreate that dataset from scratch.

5. Users of the data have an interest
Through the drive towards open datasets, increasingly we find that a dataset can be reused, repurposed, and cited in new research. This is another strong driver for preservation.

In all, datasets have a demonstrable meaning and value beyond the life of the project; and they will have a long-term value as the life of that dataset extends beyond the local concerns of the Institution and it passes into the collaborative research community.

6. Everchanging types of data
While some raw data can be static digital objects in an easily preservable format (e.g. word-processed files and spreadsheets), a lot of it is dynamic digital content which is harder to capture and harder to preserve.

It can commonly take the form of databases, but increasingly it also takes the form of blogs, websites, social media outputs, and other forms of dynamic data. Some data creates very large files, and this is a known problem for storage and for repositories, and for preservation.

Given the nature of these datasets – which can be complex, collaborative, large, and frequently updated – the time to start thinking about their preservation is at the start of the project, not at the end of the project’s life. This is just one of the problems which preservation can help to address.

Leave a Reply Cancel reply