“Retracted”, so: no longer accessible?

Some time ago I blogged about a fraud case in the Netherlands. The author was a well known expert in his field and published in various scientific journals. Many of his articles turned out to be based on fraudulent data and should not have been published. I briefly described the existing policies of publishers to retract information from their databases in these kind of cases.

Stapel in Science Direct

Stapel in Science Direct

Well, this has now happened: the Science Direct database no longer shows a number of articles of Diederik Stapel. Instead you’re warned that the article is retracted and the reason why. If you request the article itself, you will only see part of the first page.


What Elsevier did is in line with their policy. But there is another side of the coin. Other scientists based publications on the insights that Stapel described in his articles, and cited from these articles. These citations can no longer be checked via Science Direct.  If, say in 20 years time, someone wants to investigate what was all the fuss about in 2012 and to study the scientific publications of Stapel, he/she will not find the original articles in Science Direct, only perhaps the censored versions. It is not likely that his own university repository did preserve the original digital article, as they only have a subscription to the Science Direct e-journal, and do not own a digital copy.

This is exactly one of the reasons why some major players are collecting the “world” digital scientific output. Organizations like LOCKSS, CLOCKS, Portico and the International e-Depot of the National Library of the Netherlands all have the mission to preserve these e-journals and their articles. In these collections one should be able to find the original articles. They will have a policy of not to delete articles once they acquired them for long term preservation. The future researcher/detective should go to one of these repositories for his/her investigation.

Thou shall not delete …

In the Netherlands we recently had an unpleasant affair in the scientific world. It became clear that a famous professor in social sciences based his publications and conclusions on faked data. This is called ‘scientific misconduct’ and can be based on fabrication, falsification and plagiarism. In this case he fabricated the data himself. A newspaper euphemistically called it “data massaging”. This fraud happened over the past 10 years and was unnoticed by his colleagues. His articles were published in various scientific journals like from Springer’s and Elsevier’s. Undoubtedly these articles are now permanently stored in various long term archives.

A special commission is investigating the complete list of articles of this professor to determine which of them can no longer be called “scientific” and were fraud. Recently  this commission published their first findings, which immediately led to a vivid discussion in the newspapers. Some suggested deleting the affected articles immediately from the university library repositories. Others wanted librarians to add metadata to the articles to indicate that the conclusions were based on faked data. Some said publishers should do this, but the publishers replied that they were waiting for the final report of the special commission and will then react.

How are publishers handling these cases? When you read the guidelines on the NISO site about versioning recommendations , it becomes clear that (in theory) publishers have a special procedure for published articles or “Versions of Records”. As is shown in “ Use Case #10: Corrected Version Corrections to the published version are posted as the equivalent of errata or corrigenda, or the article has these corrections inserted [CVoR], or the errors are so serious (technically or legally)that the article is retracted or removed from publication [VoRs may still exist on various sites but they are no longer formally recognized; the formal publication site identifies the article as having been retracted or removed].”

In a recent article in PNAS  , as cited in the Dutch newspaper NRC Handelsblad of 3-10-2012 two researchers  analysed 2.047 articles that were retracted in the past 40 years from PubMed to see what was the reason for retraction. Although retraction is often stated to be based on erroneous data, in fact only 21% appears to be, while 43,3 % was based on fraud or plagiarism (9,8%) . In 11,3 % the reason for retraction was unclear. Quite often the researchers needed to base their conclusion on extra information, not from the publishers site but from contemporary sources. The long term repositories would have added new versions of articles to already existing records in the digital archive. But what do they do in case the publishers retracted the article?

The long term archives that are based on OAIS all know one of the “responsibilities” of an OAIS archive:” There should be no ad-hoc deletions”  without a related policy. (http://public.ccsds.org/publications/archive/650x0m2.pdf p. 3.1) So even  if a long term archive had the intention to delete these fraudulous articles, it should be based on a policy. I´m not aware of long term archives with a policy to delete articles that were retracted by publishers.

The archive “simply” takes the responsibility for the long term accessibility for all articles that are ingested in the digital archive.

In my opinion it is not the role of the archive to judge. The public should be able to see the history of the article and use other sources to get fully informed. One could argue that the future researcher, using the article of the long term archive for his own research, should investigate the validity of the article.

Forgeries happen all the time and sometimes they are discovered, sometimes not. These kinds of affairs (and they happen more often, not only in the Netherlands) show the value of long term archives: to really be a safe haven for scientific publications, despite their content. May be eventually to investigate fraud articles.