Last week I attended the Research Data Alliance Fair Data Assesment Working Group in an online meeting. The concept of the FAIR data principles is getting more and more popular but the description of each principle (findable, accessible, interoperable and re-usable) is short and rather vague. My own interest is in the R of Reusable and in a previous blog post, I noticed a risk of either excluding preservation (how are they able to preserve data for 10 years without preservation?) or including preservation but on yet unidentified terms and without referring to the work that has been done in preservation in the past 20 years. And I’m not the only one that is worried about this situation.
Brian Lavoie mentioned it in his blog post for the World Digital Preservation Day “But one part of the research life cycle has been relatively slow to this point in attracting RDM service support, best practices, and policies: long-term preservation.”. Dennis Wehrle and Klaus Rechert from the University of Freiburg asked themselves “Are research data sets FAIR in the long run?” at the IDCC conference in 2018 and analyzed a large data set with the preservation tool FITS to show that a careful choice of a file format need to be made in order to be able to preserve these data sets for the long term.
With the popularity and the vagueness come a desire to assess the fairness, to compare and to measure. But on what basis? The Working Group identified already 17 initiatives, in different domains, that created an assessment tool to give an answer. Most of these tools are in a prototype phase, under development or published in a beta version. Most of them are intended for self assessment use, although some claim to be fit for external assessment. The scope of the assessment varies: from the maturity level of the data set to the quality of the researchers research. Currently only a very few tools are said to lead to certification. It is not clear from the survey results whether the tools are focused on data sets in a specific domain, although this might be unavoidable given the differences between domains.
The intention of the RDA Working Group is to come with a recommendation of core assessment criteria in relation to the FAIR principles, distilled from the existing tools. The focus is on “Datasets and Data-related aspects (e.g. algorithms, tools, workflows)”. It will be difficult to keep the scope clear as various aspects of research are getting more intertwined, see for example the concept of the Research Object, where data is only part of the object. But this working group might shed some light on the meaning and practical implications of the FAIR principles and hopefully the R will be connected to preservation.
All information of the RDA FAIR DATA Assessment Working Group can be found on github https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG