Do FAIR data ever become heritage?

Since 2014 when they were first launched, the FAIR data principles are getting more and more attention in the research and open science community. Developed in reaction to the incidents with fraudulent research data, by making the data Findable, Accesible, Interoperable and Reusable these incidents should be avoided.

For preservationists it was long unclear whether the “R” included preservation. Reusable data at least require some form of preservation, even if you will preserve the data for 10 years – as required by many funding bodies. The latest EU publication “ Turning FAIR into reality” is clear about this. Lees verder

The Research Data Alliance in Amsterdam and the KB

Originally posted at

Prof. C. Borgman Image: Inge Angevaare KB-NL

Our colleagues from DANS organized the 4th Plenary Meeting of the Research Data Alliance ( RDA: research data sharing without barriers) in Amsterdam, held this past three days. I was there, representing the KB, one of the few national libraries present. The concept that national libraries have “research data” is a concept that needs some explanation. There are repositories that collect data sets that are a result of research, often underpinning an article. DANS and 3TU are good examples of this. But there are also repositories that have “collections” to facilitate research, like sensor data, astronomical data, climate data. This is similar to what the KB offers the researchers: a vast amount of digitized historical texts and a (restricted accessible) web archive. Researchers use these sets, see for example the Webart project. With the growing attention for digital scholarship or e-Humanities, we can expect more use. And to make the process complete, the results of research done on KB collections might end up as a publication in the KB and a data set at DANS. An NCDD working group on Enhanced Publications is looking into ways to present both outputs smoothly as an integral entity to the user. In short, there are good reasons for libraries to be at RDA! The opening of the conference had several speakers from the European Commission. Both Robert Jan Smits (Director General DG Research) and Neelie Kroes, Vice president of the European Commision via video, stressed that the European Commission expects RDA to contribute to the growing importance of sharing and preserving research data, as open access to research data is a key message in the Horizon 2020 Programme. With a new cohort of EU politicians, some canvassing work to convince them of the ins and outs of this and the role of RDA will be necessary. Prof. dr. Barend Mons from Leiden University and founder of the “fair data” initiative was asked to give his views on the matter. FAIR data being: Findable, Accessible Interoperable and Re-usable, for both humans and computers. With the motto “Bringing data to Broadway” he pleaded for professionalism in data publishing by a good infrastructure for data and a rewarding system for researchers (data should have the same “status” as a publication) and for real data stewardship. Difficulties in hiring and keeping competent data scientists, for example are a barrier. Are publishers ready for data publishing or will the data end up in a black hole? Despite the trend of putting data central, he believes that there will always be “a narrative” explaining the findings (read: articles, books). To improve professional data stewardship, he pleaded to reserve 5% of research budgets to achieve the goals of FAIR data. Prof. Christine Borgman of UCLA gave an interesting talk in which she criticized some assumptions related to research data. For example data sharing: this is not common practice in every discipline and (again) as long as researchers are not rewarded for it, it will not happen. The emphasis on data might not be fair, publications are not simply “containers for data” but are arguments, supported by the data. The carefully designed process in publications (for example the order of appearance of the authors) is not even designed yet for data sets. More of this will be described in her new book, to be published by the end of the year. The rest of the work these days was done in a variety of Interest Groups (IGs) and Working Groups (WGs). The KB participates in the activities on Certification of Digital Repositories and Data Publishing (about workflows in data publishing, of interest for our (inter-) national e-Depot, about costs for data centers). All information is available from the RDA website. At the final meeting an interesting announcement was made: in December a follow up of the Riding the Wave report will be published, with the working title Harvesting the Data. Knowing the immense impact the Riding the Wave report had, this is something to look forward to. The Research Data Alliance started as a small group and has now over 2500 members, with a large range of Interest Groups and Working Groups. Time has come to streamline the activities more in order to integrate the results and to think about the sustainability of the RDA itself. The results of this process will be discussed in the next Plenary Meeting in San Diego 9-11 March 2015.

“Retracted”, so: no longer accessible?

Some time ago I blogged about a fraud case in the Netherlands. The author was a well known expert in his field and published in various scientific journals. Many of his articles turned out to be based on fraudulent data and should not have been published. I briefly described the existing policies of publishers to retract information from their databases in these kind of cases.

Stapel in Science Direct

Stapel in Science Direct

Well, this has now happened: the Science Direct database no longer shows a number of articles of Diederik Stapel. Instead you’re warned that the article is retracted and the reason why. If you request the article itself, you will only see part of the first page.


What Elsevier did is in line with their policy. But there is another side of the coin. Other scientists based publications on the insights that Stapel described in his articles, and cited from these articles. These citations can no longer be checked via Science Direct.  If, say in 20 years time, someone wants to investigate what was all the fuss about in 2012 and to study the scientific publications of Stapel, he/she will not find the original articles in Science Direct, only perhaps the censored versions. It is not likely that his own university repository did preserve the original digital article, as they only have a subscription to the Science Direct e-journal, and do not own a digital copy.

This is exactly one of the reasons why some major players are collecting the “world” digital scientific output. Organizations like LOCKSS, CLOCKS, Portico and the International e-Depot of the National Library of the Netherlands all have the mission to preserve these e-journals and their articles. In these collections one should be able to find the original articles. They will have a policy of not to delete articles once they acquired them for long term preservation. The future researcher/detective should go to one of these repositories for his/her investigation.