Digital Preservation Seeds

by Barbara Sierman

Why reinvent the wheel for FAIR?


A message to FAIR-ists

The current discussions around FAIR (stands for Findable, Accessible, Interoperable, Reusable) datasets are raising a lot of confusion. It occurs to me, that the current discussions about FAIR in different forums often seem to freshly start inventing solutions for digital preservation, by describing the problem but not calling it a digital preservation issue. Digital preservation exists already for more than 20 years. For lots of problems there are already solutions. No need to freshly start wondering how to cope with keeping FAIR data findable, accessible, interoperable and re-usable over the years: this is daily work for digital preservationists.  The explicit exclusion of digital preservation in the EU report “Turning FAIR into reality” was a clear statement but is not helpful in solving the range of problems. FAIR should be connected with OAIS and digital preservation in general. We all would benefit from a better interaction between FAIR and digital preservation.

The start of a dataset

When a researcher creates a dataset, the next step is to add the relevant information to make this dataset FAIR-ready: metadata, persistent identifiers, relevant software etcetera. The dataset in itself is not FAIR, but it is the infrastructure around a dataset, such as a website, search keys, persistent identifiers, software etc. which can make it FAIR. As long as a researcher is the only custodian of this data set, s/he is responsible for making and keeping the dataset FAIR. A researcher can take care of the accessibility of the dataset, but he or she might change workplace or die, and then someone else needs to maintain the FAIR principles. That, or right after the FAIR readiness of the dataset, is when a repository comes into view.

The trusted repository

At the moment the dataset is handed over to a repository, it will become the responsibility of the repository to assess the FAIRness of the dataset and to decide whether the repository will be able to keep the dataset FAIR over the longer term. The researcher will become the Producer and the dataset will,  in OAIS terminology,  become a Submission Information Package.  From a preservationist point of view, it is not important whether the repository aims to keep the dataset FAIR for one year or for 10 years:   digital preservation (or curation or stewardship) is the solution to keep the data FAIR  in the long run.  Digital preservation starts when the dataset is deposited in a trustworthy repository with a preservation policy.

The added Value: Authenticity for the long term

This repository will not only keep the dataset FAIR, but does two more crucial things to the dataset: it will keep the dataset authentic over the years. Authenticity and time are crucial aspects, fully covered by OAIS, but not covered by the FAIR principles.  Nor are they covered in the recent published White paper on TRUST. In this paper a proposal is made to assess trusted repositories, based on Transparancy, Responsibility, User Community, Sustainability and Technology. But we do not need new criteria to assess repositories, we already have a framework for trustworthy repositories. Based on OAIS we have the Core Trust Seal, the nestor seal and the ISO 16363 standard, also known as the European Framework (although the 3 partners seems no longer collaborating, in the preservation world this framework is still a starting point). Paragraph 3.1 in OAIS on Responsibilities of an OAIS archive fully covers the 5 topics of TRUST. With the important addition of authenticity for the long term. It is high time that the world of FAIR and digital preservation embrace each other, to avoid costly reinventions of wheels.

So FAIR-ists, you can reuse the vast amount of knowledge and experience of the preservationists!

1 Comment

  1. Hervé L'Hours May 7, 2019

    Hi Barbara,
    I’ve suggested we catch up when you’re around, but in the meantime wanted to pop in a small response here.
    The success and adoption of FAIR is understandable. The central simplicity speaks to a wide range of actors in the world of data. But the sheer range of projects and approaches does risk confusion if we don’t cooperate and align.
    To me, one of the key risks is that some FAIR work seems to focus on judging the ‘snapshot’ of an object. But as you say, an object can’t be evaluated in isolation from its context (including its custodians). Preservation might not be explicit enough in FAIR for some tastes, but Making FAIR Data a Reality does mention it and long-term stewardship a number of times. My current view is that FAIR plus Time/Change = Preservation.
    But preservationista’s aren’t the only source of data expertise we need to consider. For full lifecycle management of the data/metadata, and the organisations that look after them, we can learn from records managers, data managers, information theorists, technologists, business process modellers and others.
    OAIS and trustworthy digital repository standards are one source of that information, but while OAIS (particularly the functional model) has been repeatedly referenced it’s not clear to me that it (or TRAC/ISO16363) has gone beyond the ISO standard governance model to become an adopted, engaged and dynamic part of the operational research data lifecycle. That’s not to say that FAIR is guaranteed to do so.
    In moving from a high-level driver of policy direction to the top tier of an implementation plan FAIR will face challenges. The attractive simplicity of the acronym will have to deal with the real world challenges of operational data management. Many of the answers that FAIR efforts are beginning to ask can be found in prior work, including past preservation efforts. Let’s talk, I know the FAIRsFAIR project would value your insight here.
    Best,
    Hervé

Leave a Reply

© 2019 Barbara Sierman

Theme by Anders Norén adapted by Bob Koeman