Do FAIR data ever become heritage?

Since 2014 when they were first launched, the FAIR data principles are getting more and more attention in the research and open science community. Developed in reaction to the incidents with fraudulent research data, by making the data Findable, Accesible, Interoperable and Reusable these incidents should be avoided.

For preservationists it was long unclear whether the “R” included preservation. Reusable data at least require some form of preservation, even if you will preserve the data for 10 years – as required by many funding bodies. The latest EU publication “ Turning FAIR into reality” is clear about this.

In paragraph 2.4.2 it is stated under the heading 2.4.2 Long-term preservation and stewardship

The FAIR principles focus on access to the data and do not explicitly address the long-term preservation needed to ensure that this access endures. Data should be stored in a trusted and sustainable digital repository to provide reassurances about the standard of stewardship and the commitment to preserve.

So FAIR is not explicitly about long term preservation. The report describes a set of recommendations to implement the FAIR principles, and these are focused around two elements: the FAIR Digital Object and the FAIR ecosystem, consisting of services and infrastructure. These services and infrastructure are based on policies, data management plans, identifiers, standards and repositories. The repositories will take care of the data.

“Repositories manage access to valuable data and metadata and offer services to support access and reuse. They also take responsibility for long-term data stewardship by curating data and metadata. “

This curation is done during the research life cycle, as is stated in chapter 5

Data stewardship is a set of skills to ensure data are properly managed, shared and preserved throughout the research lifecycle and in subsequent storage. “

For me, as a preservationist in a memory organization the question is: what is the relation between the FAIR data principles and us, memory organisations? Are the principles applicable to us and if so when?

Introducing a time scale in FAIR might make things clearer. You could conclude that the data that is curated according to the FAIR principles might be of interest to be preserved for the long term (phase 2) after the research life cycle (phase 1) ends. After all, if the FAIR repositories have no long-term preservation responsibility, and there is a moment in time when it will be decided that the data need to be preserved for the long term, then the responsibility for the long-term preservation will be handed over (not necessarily to a different institution). During the research life cycle the data are “production data”. In phase 2 the FAIR repositories will become “Producers” in OAIS terminology and the data will become Submission Information Packages in the Long-term Archive. The long-term preservation organisation in phase 2 will follow the OAIS principles as the standard for digital preservation. The implementation of the FAIR principles during the (previous) research life cycle will certainly be beneficial for the quality of these data.

If you look at it from this point of view, the question of certification is also easier. In the report it is discussed that the repositories should be trusted and thus be certified. A candidate certification instrument is the Core Trust Seal (although they currently do not certify the FAIR principles) so a recommendation is made to adapt the CTS to develop metrics to certify FAIR services. The phrase on page 44 that “ISO/OAIS [ BS: I think ISO 16363 is meant here] is very heavyweight for most repositories” is according to my reasoning superfluous for most repositories as the ISO 16363 is meant for long term archives and not for repositories acting as part of the research life cycle.

Which organizations will be responsible for the long term preservation is not described in the report and it seems that not many memory organisations were involved in the making of it. Although it will help me at least to make a distinction between the data as production data in the research life cycle and the data as heritage data, I would regret if this way of thinking would lead to reinventing wheels and ignoring the good work that has been done in the digital preservation community in the last 20 years. Looking at the challenges that are there both for curating as well as preserving the research data, collaboration would be beneficial for all. We could start with investigating whether FAIR Data Objects will lead to sustainable  Archival Information Packages.


© 2024 Barbara Sierman