This blogpost was given as a lightning talk at iDCC 2021 and was prepared by Ingrid Dillo (DANS) and me. We want to draw your attention to an overlooked element in the open research data and the preservation infrastructure: the (dis-)satisfied user.
Open science and re-use
Open research and open data are a hot topic. In order to achieve the overall goal of Open Science, we need different components. Transparent scientific practices, scientific integrity and open access publishing. Another fundamental component of Open Science is data sharing and re-use. To enable re-use of data and digital objects, these need to meet a certain standard with respect to their scientific quality, fitness for purpose, technical quality and ease of access. In order to keep the data re-usable over time, data curation and preservation is needed.
Infrastructure for Open Science
To improve Open Science, an infrastructure to tackle various issues was developed. With the publication of the FAIR principles a high-level framework was created to improve the Findability Accessibility, Interoperability and Re-use of digital objects. The FAIR principles are widely accepted, not only in the research domain, but also for example in the digital heritage domain.
The FAIR principles can be seen as a guide to improve the management of research data as well as a first step to prepare digital data for the longer term. By phrasing and implementing requirements to improve the usability of digital data, these data sets are better prepared to be saved for the long term. Practical guidelines were developed to implement the principles. In the Research Data Alliance the FAIR Data Maturity Model was created. This model phrased a set of criteria for each ingredient in the FAIR principles to support the testing of the maturity level of a dataset. Based on these criteria, the European FAIRsFAIR project created an automated FAIR data assessment tool called F-UJI. This tool is designed for machine interaction and can e.g. be used by repositories. There is also a web based graphical user interface available that allows for FAIR data assessment tests with F-UJI in a human friendly manner. In addition to F-UJI, the FAIR-Aware tool helps researchers to make their datasets FAIR before uploading them into a repository.
FAIR data however is only valid at a certain moment in time. In order to stay FAIR, repositories where the data are deposited, need to undertake actions. Continuously, until the data are disposed. This is where the profession of digital preservation plays an important role, but only if the FAIR principles are well implemented. The Open Archival Information System already in 2005 defined starting points to keep information accessible over time. These are translated into requirements and assessment criteria like CoreTrustSeal and ISO 16363. A summary of these requirements and counterpart to the FAIR principles are the TRUST principles. Transparency, Responsibility, User Focus, Sustainability and Technology, in that order of priority.
So with these two mechanisms, FAIR and TRUST, it seems that both the data and the organisations taking care of curating and preserving these data are covered. But one party is missing: the users.
Where are the users?
An important topic for long term preservation is the concept of Designated Community in the OAIS model. The idea behind the concept is that a repository will pay attention to the users of the data and will take appropriate measures to serve the user in understanding the data. In the discussion about re-use of digital data this concept is hardly discussed. No wonder, as the concept is vague and only easy to understand if you have a very clear view of the people / machines that will re-use your data. In practice, this will often not be the case. Open science will invite new users to re-use the data. Currently out of scope for the repositories that are taking care of the data.
Digital preservation also needs to pay attention to the future user. It is after all long-term preservation and the designated community might change over the years. This new user and the related expectations might even be less known than the current unknown user and their knowledge base. How can we get to grips with this?
Collaboration Open Science Repositories & Preservationists
Open Science repositories and digital preservationist can collaborate in thinking about how to serve the user, now and in the future, based on the current experiences. In digital preservation this is part of Preservation Watch. By monitoring the current situation, with expected and unexpected users, we can learn how to improve current approaches and prepare for the future. In other words: if we can serve our current users, we can learn about the needs of future users, at least partly.
According to the State of Open Data Report, due to the COVID pandemic, an increase in the re-use of data is expected. In and between domains, which means that new users will need to familiarize themselves with digital data from another domain.
The actual re-use can be counted and there are several methods to do so. Like the COUNTER code of practice for research data usage metrics. Recently RDA published a set of recommendations for public review to data usage metrics. But these applications only bring you figures. What is not measured, is the satisfaction of the re-user with the data.
Measuring user satisfaction, why?
A satisfied user is the ultimate goal of digital preservation. We preserve to give access to the user. The satisfaction of the user with the data is the ultimate test of sustainability. It will inform us whether all the effort done in digital preservation was worthwhile. It is how repositories can account for their expenses.
Flaws in re-use will inform us how to improve the preservation process. It is important to show success stories of the reuse of data sets, like for example OpenAire is doing. But even more important is it to record when re-use of data went wrong and why, especially if the users come from another domain than the repository was focused on. The actual state of re-use, with the positive and negative outcomes, can be input for the repositories to steer their digital preservation processes. To give input to the preservation community of issues that are to be expected in the near future.
This talk is an invitation to collaboratively collect practical information about (un-) successful re-use of digital data. A methodological evaluation of the current practices will help to derive well-founded insights. Insights that can lead to a validation and improvement of the current preservation infrastructure. To show that sustainable repositories will lead to satisfied users, now and in the future. Please get in touch if you want to discuss this topic: firstname.lastname@example.org