The FAIR principles in its broadest sense were at the heart of the conference of the Digital Curation Centre, this time held in Barcelona. The FAIR principles about data being Findable, Accessible, Interoperable and Re-usable have seen a massive take up in the research community, but translating these principles into practice is another matter. This was the topic of the conference : “from principles to practice to global join up”.
There were a lot of interesting presentations, but in this blog post I will focus on the reproducibility of the data, as I see this as an important link to digital preservation.
Sabina Leonelli, Professor in Philosophy and History of Science at the University of Exeter was the first key note speaker and she mentioned that there is still disagreement about what “open science” means but she also saw this as an opportunity to debate about what counts as science. The FAIR principles can be seen as a cornerstone and its implementation is challenging. Researchers, as was shown by a study she did for the EU Working Group are not aware of the EU initiatives at all related to Open Science and not prepared to implement the FAIR principles. They need support to really implement the FAIR principles – and this was a message that was heard quite often – , this is a role for research libraries and their data curators. With colleagues she initiated a project called the Epistomology of Data-intensive science, to investigate the consequences of re-using data sets in different contexts. One of the findings is that key to data re-use is context specific curation with field specific knowledge. The other is that long term preservation of the data is key, otherwise one don’t even think about re-using the data set. Re-use often happens by scientists that were already involved in the project. Links between data sets and other materials like publications are not systematically created. So far nothing new under the sun but it was interesting to hear this from a life sciences point of view, working together to achieve this in the DISSCO project.
Several practical approaches like how to qualitatively measuring the FAIR principles through operationalizing findability, accessibility, interoperability, and reusability from a re-user’s perspective was presented by Carolyn Hank of University of Tennessee. Amy Koshoffer (University of Cincinnati) investigated the role of institutional repositories and their curation efforts: will they lead to better data sets fit for re-use? Answer: in a way yes, as datasets submitted to a repository with pre- or post-ingest curation more often included documentation. Preservation was introduced in the presentation of Dennis Wehrle and Klaus Recht (University of Freiburg) who analysed the underlying data sets of dissertations in the repository and found a large amount of file formats that could not be identified with FITS. The re-usability of these datasets will be questionable and more effort should be done in collaboration with the researcher to create a dataset that really is re-usable, be it via migration or emulation and of course if needed together with the preservation of software.
Privacy and the re-usability of data is challenged by the EU GDPR regulations, and DANS presented their prototype of a tool to evaluate whether datasets can be shared and under which conditions. The tools was a inspired by the Harvard Data Tag System.
The final keynote speaker was Nancy McGovern, Director of Digital Preservation at MIT Libraries. I was asked to introduce her and in my opinion she choose the ideal topic to send everyone home with enough food for thought: collaboration. Now that new groups are involved in preservation of data, collaboration between different communities is important to avoid reinventing the wheel. In the past 20 years a lot of progress in digital preservation has been made that could be re-used by the curation community. Early adopters have a wealth of knowledge about what works and what not. “Radical collaboration” where everyone respect different opinions but at the same time is willing to share insights and experience, will help building and expanding an inclusive community.
Clifford Lynch finally summarized the conference and concluded that the iDCC was no longer restricted to the institutional context of sciences but had grown relatively mature now. Although there are still gaps. Preservation and the cost of it is still not solved, policies are unclear and the division of effort in who is preserving what should get more attention. Links between the physical objects and digital information are becoming more important, partially influenced by the growth of digital humanities in the research landscape. And he agreed with Nancy McGovern that quarrelling about the terminology (curation, preservation, data stewards,… yes it still happens) is not helpful either.
According to Twitter the slides of this conference will be available soon from the DCC website.