Reusability is the R in FAIR principles. The scope of “reusabilty”in the FAIR principles is still under discussion as well as the mandatory requirements in order to achieve re-usable data sets. Knowing the consequences of reuse of datasets is also important for memory organizations as data producers, creating and preserving large datasets, that will be used by researchers.
But is there a definition of “reuse” that can guide us? That can support the wish to make reuse “measurable” in an objective way? “The desire that a certain impact or even the value of resources can be measured by its reuse count can only be fulfilled if reuse can easily be broken down into attributes or characteristics.”
Four authors recently published an article in which they report about their search for a workable definition of reuse.
They created a literature set of around 65 articles by querying the information resources of Google Scholar, Scopus and LISA for articles, in which the term reuse, re-use or secondary use were mentioned in the title and the relevant literature references mentioned in these articles. From these articles they collected the definitions of reuse and tried to find the characteristics of reuse in the definitions, so that you could distinguish reuse from use.
One definition of reuse was often cited. This was the definition of A. Zimmermann “Thus, I define reuse as the use of data collected for one purpose to study a new problem. I also use the phrase secondary use, which I intend to be synonymous with the term reuse.”
But the authors wanted to check this definition against reality in the research world. In their opinion research practice is more complicated than the “lineair model” whereby an author collects research data and produces an article and the next researcher use the same data set to perform his research on. What the authors distilled from the definitions in the articles were 4 characteristics related to re-use of data:
- The data (as new research can be based on a combination of one or more datasets, reuse is not restricted to one dataset usage)
- The user, (one user versus a collaboration of users)
- The purpose of the reuse (that data has to be repurposed to be considered reuse by a new research question or not?)
- the time in which the research based on the data takes place.
“Each characteristic was tested by comparing it to models that represent certain aspects of the current research landscape in order to find out if the characteristics resulting from theoretical definitions would also play a role in practice and if the difference between reuse and use is really measurable”.
There is not one characteristic that can measurably define reuse. This finally leads to an intriguing reuse model, showing the complexity of the research practice in reusing datasets and the difficulty in measuring the reuse.
I found it interesting to see how a analysis of an often used concept of reuse can lead to a complex model of reuse in practice. For the creators of datasets, like memory institutions, this model might be helpful in their thinking about other concepts like “designated community” and the necessary metadata that need to accompany the datasets in order to support future research. It also shows that it will be very difficult to measure the (re)use of the data, which might be problematic for measuring impact and receiving the credentials and financial resources.
Van de Sandt, S, Dallmeier-Tiessen, S, Lavasa, A and Petras, V. 2019. The Definition of Reuse. Data Science Journal, 18: 22, pp. 1–19. DOI: https://doi.org/10.5334/dsj-2019-022