In the Netherlands we recently had an unpleasant affair in the scientific world. It became clear that a famous professor in social sciences based his publications and conclusions on faked data. This is called ‘scientific misconduct’ and can be based on fabrication, falsification and plagiarism. In this case he fabricated the data himself. A newspaper euphemistically called it “data massaging”. This fraud happened over the past 10 years and was unnoticed by his colleagues. His articles were published in various scientific journals like from Springer’s and Elsevier’s. Undoubtedly these articles are now permanently stored in various long term archives.

A special commission is investigating the complete list of articles of this professor to determine which of them can no longer be called “scientific” and were fraud. Recently  this commission published their first findings, which immediately led to a vivid discussion in the newspapers. Some suggested deleting the affected articles immediately from the university library repositories. Others wanted librarians to add metadata to the articles to indicate that the conclusions were based on faked data. Some said publishers should do this, but the publishers replied that they were waiting for the final report of the special commission and will then react.

How are publishers handling these cases? When you read the guidelines on the NISO site about versioning recommendations , it becomes clear that (in theory) publishers have a special procedure for published articles or “Versions of Records”. As is shown in “ Use Case #10: Corrected Version Corrections to the published version are posted as the equivalent of errata or corrigenda, or the article has these corrections inserted [CVoR], or the errors are so serious (technically or legally)that the article is retracted or removed from publication [VoRs may still exist on various sites but they are no longer formally recognized; the formal publication site identifies the article as having been retracted or removed].”

In a recent article in PNAS  , as cited in the Dutch newspaper NRC Handelsblad of 3-10-2012 two researchers  analysed 2.047 articles that were retracted in the past 40 years from PubMed to see what was the reason for retraction. Although retraction is often stated to be based on erroneous data, in fact only 21% appears to be, while 43,3 % was based on fraud or plagiarism (9,8%) . In 11,3 % the reason for retraction was unclear. Quite often the researchers needed to base their conclusion on extra information, not from the publishers site but from contemporary sources. The long term repositories would have added new versions of articles to already existing records in the digital archive. But what do they do in case the publishers retracted the article?

The long term archives that are based on OAIS all know one of the “responsibilities” of an OAIS archive:” There should be no ad-hoc deletions”  without a related policy. ( p. 3.1) So even  if a long term archive had the intention to delete these fraudulous articles, it should be based on a policy. I´m not aware of long term archives with a policy to delete articles that were retracted by publishers.

The archive “simply” takes the responsibility for the long term accessibility for all articles that are ingested in the digital archive.

In my opinion it is not the role of the archive to judge. The public should be able to see the history of the article and use other sources to get fully informed. One could argue that the future researcher, using the article of the long term archive for his own research, should investigate the validity of the article.

Forgeries happen all the time and sometimes they are discovered, sometimes not. These kinds of affairs (and they happen more often, not only in the Netherlands) show the value of long term archives: to really be a safe haven for scientific publications, despite their content. May be eventually to investigate fraud articles.

