Whitts cure for preservationists despair?

A cure for preservation despair?

A cure for preservation despair?

After Christmas I tried to reduce my digital pile of recent articles, conference papers, presentations etc. on digital preservation. Interesting initiatives (“a pan European AIP” in the e-Ark project:  wow!) could not prevent that after a few days of reading I ended up slightly in despair: so many small initiatives but should not we march together in a shared direction to get the most out of these initiatives? Where is our vision about this road? David Rosenthals blog post offered a potential medicine for my mood.

 

 

He referred to the article of Richard Whitt “Through A Glass, Darkly” Technical, Policy, and Financial Actions to Avert the Coming Digital Dark Ages.” 33 Santa Clara High Tech. L.J. 117 (2017).  http://digitalcommons.law.scu.edu/chtlj/vol33/iss2/1 Lees verder

Developments in Preservation Policies

Often it is unclear whether results from European projects have any follow-up after the project is finished. If so, how can one monitor this? With regard to our work in SCAPE, including the Catalogue of Policy Elements and the list of Published Preservation Policies, however I am under the impression that these tools are still supporting organisations in creating preservation policies. People sometimes tell me this directly and sometimes I see references in articles and presentations.

B&GPolicies

One initiatives I’m involved in myself is a Dutch working group under the flag of the Network Digital Heritage, that will use the SCAPE Catalogue to create Dutch Guidelines for creating preservation policies, with a focus on smaller organisations in various domains. Not only libraries and data centres – which were involved in the creation of the SCAPE version – but also archives, museums and organisations collecting digital art and architectural materials. These Guidelines should support these organisations and also help them to not only write the preservation policies, but to also implement them in their organisations (often it is the other way around: policies are not written down but actions are based on implicit “policies” ).

The Institute for Sound and Vision is partner in this working group.. Annemieke de Jong, whom I mentioned earlier in a blogpost about their work to become a TDR , created Preservation Policies for their institute. I’ve read all the preservation policies collected here, but this policy is exemplary and should be high on the list of Best Preservation Policies. This is the first preservation policy that looks good, reads well and covers all main topics mentioned in the SCAPE catalogue. The design of this policy shows that this document is not seen as an obligatory task, but as way of communicating with the Producers and Consumers of the content of the digital archive. From what I’ve seen of policies so far is that they are seldom attractively designed. In this case, the text itself is understandable and clear, without too much jargon, but instead explaining the concepts and approaches in a clear language. And as said it covers all topics we identified as Guidance Policies in the SCAPE Preservation Policy model and added much information to it that is part of the Procedure Policies, the middle level in which you translate the high level policies into practical approaches. Based on this policy you will get a good overview of what the Institute is collecting and how this is preserved. With additional internal guidelines, referred to in the text, it should be clear to the employers of the Institute what is expected from them and as I mentioned earlier at iPRES 2014, this is one of the goals of a good policy. A new item on your reading list!

Crystal clear digital preservation: a management issue

Digital Preservation of Libraries.final.final.inddRaising awareness for digital preservation was a frequently used phrase when I started in this field ten years ago (never regretted it, hurray!). We preservationists have made progress. But the story is still not explaining itself. So I like reading how others persuade and convince people. Recently I found a book that really does the job. In crystal clear language, without beating about the bush and based on extensive up to date (until 2014) literature, digital preservation is explained and almost every aspect of it is touched upon. Edward M. Corrado and Heather Lea Moulaison have done a great job with their Digital Preservation for Libraries, Archives and Museums , Rowman and Littlefield, 2014. ISBN 978-0-8108-8712-1 (pbk.) — ISBN 978-0-8108-8713-8 (ebook)

In fact, I should start this blog post with “Dear manager, I have found a book that tells you all you need to know about digital preservation. Spare some time and read the chapter that is dedicated to you (part II) , the sooner the better” [preservationist, please forward this to your manager, they might even read the rest of the book!]

The book starts by explaining what digital preservation is not ( like “backup and recovery”, access, “an afterthought”). Followed almost immediately by the (positively phrased) starting point, that guides the whole book:

“ensuring ongoing access to digital content over time requires careful reflection and planning. In terms of technology, digital preservation is possible today. It might be difficult and require extensive, institution-wide planning, but digital preservation is an achievable goal given the proper resources. In short, digital preservation is in many ways primarily a management issue”.

The red line/ metaphor in the book is the authors “Digital Preservation Triad”. The triad is a new variety of the Three legged stool of Nancy McGovern and is symbolized by a Celtic knot. The knot is used in order to better symbolize the interrelated activities.

triad

These activities are divided into :

  • Management-related activities,
  • Technological activities and
  • Content-centred activities.

Each set of activities is further explained in a dedicated chapter. The chapter about Management activities immediately starts to explain the basics of the OAIS model. Clearly showing that this is the essence of digital preservation. Knowledge of OAIS should be present on management level of an organisation. Only then management can deal properly with aspects like human resources (skills and training), and sustainable digital preservation (costs etc).

The Technology part is more concerned with metadata and file formats and the technical infrastructure or repository, which is closely related to mechanisms of trust (audit and certification).

The last part of the book discusses aspects related to the Content, like collection development.

The text is based on a large literature list in which many recently published conference papers, (EU) project results and reports are used. The authors are well informed about what is going on and do not restrict themselves to the US.

What I liked in this book is the very practical approach and the unvarnished description of digital preservation (‘not easy but doable’). The authors stress that preservationists should convince over and over again management “that digital preservation is important to the overall mission of the organization”, and not just “an experimental technology project” and “communicate the multiple ways in which digital preservation brings value to the organization.”

One of the barriers in this process, at least in my experience, it that people often try to connect their experience in analogue preservation with that of digital preservation. Sometimes this leads to monstrous analogies. This book does not try to map the two worlds, but clearly states:

“The digital item created and made accessible as part of a digital preservation system is fundamentally different from an analogue item. Period.”

Unavoidably some recent developments are missing, like the Cost model work that was done in the 4C project and the work on Preservation Planning and Policies in SCAPE.

But if you still need to convince your management, point them to this book – also available as an epub!

Preservation Policies & Maturity levels

lezingipres2014

Recently at the iPRES 2014 conference in Melbourne I gave a presentation on the SCAPE Preservation Policies.  Not only I explained  the SCAPE  Preservation Policy Model , but I also summarized my findings after analysing 40 real life preservation policies. You can read the detailed information in my article (to be published soon). Basically I think that organisations not seldom overstretch themselves in formulating preservation policies that are not in line with their maturity. And I propose to extend the SCAPE Catalogue of Preservation Policy Elements with information indicating in which maturity level  this policy element is relevant.  The 5 levels are based on the Maturity Model of C. Dollar and L. Ashley.

The SCAPE project is finished, and that is why  I can use your input. The current wiki on the Open Preservation Foundation will be open to OPF account holders and  you will be able to help by adding this maturity level to the preservation policies. This way it will reflect a collaborative view, rather than my own opinion.

Currently the OPF website is undergoing  some changes, but when this is finished, I’ll remind you!

nestor Preservation Policy guideline (as yet: for German readers only)

nestor-18

People who can read the German language should have a look at the recent publication of nestor, the German digital preservation coalition, about creating an institutional preservation policy, or Leitfaden zur Erstellung einer institutionellen Policy zur digitalen Langzeitarchivierung.

This 26 pages long guideline describes various aspects of preservation policies, like the usefulness of a policy, who to address with the policy and what to put into a policy. In this chapter a warning is made not to mix up the real situation in the preservation policy with the situation an organisation would like to achieve (the “ist und soll”). In practice this happens frequently I think; a topic I will discuss more on iPRES 2014 now that my paper is accepted.

Each chapter has some questions formulated related to the topic, and I assume this is to foster internal discussions.

I looked with special interest to this guideline, as I was involved in creating a Catalogue of Policy elements in the European project SCAPE . This catalogue gives an overview of elements that should be part of a preservation policy. The elements of the SCAPE policy and the nestor guideline overlap for most part, sometimes differing in granularity, for example when describing the technical environment of the repository. The nestor guideline has one interesting addition however and that is the chapter about collaborative digital preservation, where more than one organisation is involved.

There is no reference to the policy work in SCAPE and this is a pity in my (slightly biased) opinion. A reason might be that the SCAPE Catalogue is not in German but in English. But reference to this work would have been a useful additional source for organisations in creating their preservation policies.

Taking notice of the outcomes of European projects could also have extended some topics, like the “Technology and Community Watch” to which now often is referred to as Preservation Watch. Preservation Watch is less limited to technology and community only –two elements mentioned in OAIS – , but also will take into account i.a. changes in the organisation itself, see for example the report on the Planets Functional Model on this and the uptake of this concept in the SCOUT tool.

The guideline ends with a list of examples of preservation policies, a list that has a striking resemblance with the SCAPE site of Published Preservation Policies .

It would be nice if the nestor people will make an English translation of this document (of course incorporating some SCAPE work as well ). Many organisations are currently working on creating preservation policies, and not only small organisations. This could be a worthwhile supportive document and get a wider audience, if in English.

Update 2015: the English translation is there !

Too early for audits?

I never realized that the procedure of getting to an ISO standard could take several years, but this is true for two standards related to audit and certification of trustworthy digital repositories.  Although we have the ISO 16363 standard on Audit and Certification since 2012, official audits cannot take place against this standard until the related standard Requirements for bodies providing Audit and Certification (ISO 16919) is approved, regulating the appointment of auditors. This standard, similar to the ISO 16363 compiled by the PTAB group in which I participate, was already finished a few years ago, but the ISO review procedure, especially when revisions need to be made, takes long. The latest prediction is that this summer (2014) the ISO 16919 will be approved, after which national standardization bodies can train the future (official) auditors.  How many organizations will then apply for an official certification against the ISO standard is not yet clear, but if you’re planning to do so, it might be worthwhile to have a look at the recent report of the European 4C project  Quality and trustworthiness as economic determinants in digital curation.

The 4C project (Collaboration to Clarify the Cost of Curation) is looking at the costs and benefits of digital curation. Trustworthiness is one of the “economic determinants” of the 15 they distinguish. As quality is seen as a precondition for trustworthiness, the 4C project focusses in this report on the costs and benefits of “standards based quality assurance” and looks at the 5 current standards related to audit and certification: DSA, Drambora, DIN 31644 of the German nestor group, TRAC and TDR. The first part of the report gives an overview of the current status of these standards. Woven in this overview are some interesting thoughts about audit and certification. It all starts with the Open Archival Information System (OAIS) Reference Model. The report suggests that the OAIS model is there to help organisations to create processes and workflows (page 18), but I think this does not right to the OAIS model. If one really reads the OAIS standard from cover to cover (and should not we all do that regularly?) one will recognize that the OAIS model expects a repository to do more than designing workflows and processes. Instead, a repository needs to develop a vision on how to do digital preservation and the OAIS model gives directions. But the OAIS model is not a book of recipes and we all are trying to find the best way to translate OAIS into practice. It is this lack of evidence which approach will offer the best preserved digital objects, that made the authors in the report wonder whether an audit that will take place now might lead to a risky outcome (either too much confidence in the repository or too little). They use the phrase “dispositional trust” . “It is the trustor’s belief that it will have a certain goal B in the future and, whenever it will have such a goal and certain conditions obtain, the trustee will perform A and thereby will ensure B.”(p. 22). We expect that our actions will lead to a good result in the future, but this is uncertain as we don’t have an agreed common approach with evidence that this approach will be successful.  This is a good point to keep in mind I think as well as the fact that there are many more standards applicable for digital preservation then only the above mentioned. Security standards, record management standards and standards related to the creation of the digital object, to name just a few.

Based on publicly available audit reports (mainly TRAC and DSA, and test audits on TDR) the report describes the main benefits of audits for organisations as

  • to improve the work processes,
  • to meet a contractual obligation and
  • to provide a publicly understandable statement of quality and reliability (p. 29).

These benefits are rather vague but one could argue that these vague notions might lead to more tangible benefits in the future like more (paying) depositors, more funding, etc. By the way, one of the benefits recognized in the test audits was the process of peer review in itself and the ability for the repository management to discuss the daily practices with knowledgeable people.

The authors also tried to get more information about costs related to audit and certification, but had to admit in the end that currently there is hardly any information about the actual costs of an audit and/or get certified (why they mention on page 23 financial figures of 2 specific audits without any context is unclear to me) and base themselves mainly on information that was collected during the test audits that the APARSEN project performed and the taxonomy of costs that was created. For costs we need to wait for more audits and for repositories that are willing to publish all their costs in relation to this exercise.

Reading between the lines,  one could easily conclude that it is not recommended to perform audits yet. But especially now the DP community is working hard to discover the best way to protect digital material, it is important for any repository to protect their investments and to avoid that current funding organizations (often tax payers) will back off because of costly mistakes. The APARSEN trial audits were performed by experts in the field and the audited organizations (and these experts) found the discussions and recommendations valuable. As standards are evolving and best practices and tools are developed, a regular audit by experts in the field can certainly safeguard organizations to minimize the risk for the material. These expert auditors need to be aware of the current state of digital preservation, the uncertainties, the risks, the lack of tools and the best practices that are there. The audit results  will help the community to understand the issues encountered by the audited organizations, as audit results will be published.

As I noticed while reading a lot of preservation policies for SCAPE, many organisations want to get certified and put this aim in their policies. Publishers want to have their data and publications in trustworthy, certified repositories. But all stakeholders (funders, auditors, repository management) should realise that the outcomes of an audit should be seen in the light of the current state of digital preservation: that of pioneering.

BVIM and digital preservation policies

toy (solar clock)  Digital 360 computer

Toy clock Digital 350 computer

Organizations must evaluate their activities and show the relevancy of them to their funders. It  is no exception that organizations like libraries and archives are facing severe budget cuts, which will affect their current activities like their digitization projects. Simon Tanner of the Department of Digital Humanities, King’s College London, wrote an interesting report, in which he explains the Balanced Value Impact Model. This model will support organizations (especially memory institutions) to do an Impact Analyses of their digital resources in order to show how the use of digital resources will benefit  and change people. Not with vague notions, but in an evidence based approach. The results can be valuable input for further plans and can support decision makers at various levels. Decisons could be made not only on economical grounds, but also by taking the impact values into account.

Tanner distinguishes the following impact areas:

  • Social and Audience impacts : “the audience, the beneficial stakeholders and wider society has been affected and changes in a beneficial fashion”
  • Economic impacts “the activitiy is demonstrating economic benefits to the organisation or to society”
  • Innovation impacts: “that the digital resource is enabling innovation which is supporting the social and economic benefits accrued”
  • Internal process impacts: ”that the organisation creating/delivering the digital resources have been benefitted within its internal processes by the innovation demonstrated”. (p.45)

The model consists of 5 stages, of which the first two are “Context” and “Analysis and Design”. In these steps the digital environment in which the organization operates ( “the digital ecosystem”), are described, as well as the stakeholders who either benefit from or at least are affected by the digital resources.

It is not my intention to explain the model here and I would advise you to read the report. But it occurred to me that this exercise could benefit the case of digital preservation in an organisation as well. Part of the digital resources will be preserved for the long term after all.

As digital preservation is a costly activity, it is important to show the value of it. Why are we keeping all this digital material for an undefined amount of years? The Balanced Value Impact Model could be very helpful as this exercise will lead to an overview of the current ecosystem, and the current stakeholders for the digital resources. It will also show the value the stakeholders relate to the digital collections. Values for society and for individuals, economic values and values for the organization itself.

The  information collected for the Balanced Value Impact Model can help the organization to identify the areas they need to monitor in their Preservation Watch to safeguard that this ecosystem and the identified stakeholders will be served over the years. The Designated community, –  for many memory institutions quite a vague notion -, will be  described better, as well as the value this Designated Community experiences with the digital resources. These values could be an ingredient for the organization in establishing their preservation policies, in which they will describe whether and how  they will keep these values in the digital collection present.

Creating a Balanced Value Impact model will not be an easy task for an organization. But it could be a very useful exercise to support the preservation policies too.

Thou shall not delete …

In the Netherlands we recently had an unpleasant affair in the scientific world. It became clear that a famous professor in social sciences based his publications and conclusions on faked data. This is called ‘scientific misconduct’ and can be based on fabrication, falsification and plagiarism. In this case he fabricated the data himself. A newspaper euphemistically called it “data massaging”. This fraud happened over the past 10 years and was unnoticed by his colleagues. His articles were published in various scientific journals like from Springer’s and Elsevier’s. Undoubtedly these articles are now permanently stored in various long term archives.

A special commission is investigating the complete list of articles of this professor to determine which of them can no longer be called “scientific” and were fraud. Recently  this commission published their first findings, which immediately led to a vivid discussion in the newspapers. Some suggested deleting the affected articles immediately from the university library repositories. Others wanted librarians to add metadata to the articles to indicate that the conclusions were based on faked data. Some said publishers should do this, but the publishers replied that they were waiting for the final report of the special commission and will then react.

How are publishers handling these cases? When you read the guidelines on the NISO site about versioning recommendations , it becomes clear that (in theory) publishers have a special procedure for published articles or “Versions of Records”. As is shown in “ Use Case #10: Corrected Version Corrections to the published version are posted as the equivalent of errata or corrigenda, or the article has these corrections inserted [CVoR], or the errors are so serious (technically or legally)that the article is retracted or removed from publication [VoRs may still exist on various sites but they are no longer formally recognized; the formal publication site identifies the article as having been retracted or removed].”

In a recent article in PNAS  , as cited in the Dutch newspaper NRC Handelsblad of 3-10-2012 two researchers  analysed 2.047 articles that were retracted in the past 40 years from PubMed to see what was the reason for retraction. Although retraction is often stated to be based on erroneous data, in fact only 21% appears to be, while 43,3 % was based on fraud or plagiarism (9,8%) . In 11,3 % the reason for retraction was unclear. Quite often the researchers needed to base their conclusion on extra information, not from the publishers site but from contemporary sources. The long term repositories would have added new versions of articles to already existing records in the digital archive. But what do they do in case the publishers retracted the article?

The long term archives that are based on OAIS all know one of the “responsibilities” of an OAIS archive:” There should be no ad-hoc deletions”  without a related policy. (http://public.ccsds.org/publications/archive/650x0m2.pdf p. 3.1) So even  if a long term archive had the intention to delete these fraudulous articles, it should be based on a policy. I´m not aware of long term archives with a policy to delete articles that were retracted by publishers.

The archive “simply” takes the responsibility for the long term accessibility for all articles that are ingested in the digital archive.

In my opinion it is not the role of the archive to judge. The public should be able to see the history of the article and use other sources to get fully informed. One could argue that the future researcher, using the article of the long term archive for his own research, should investigate the validity of the article.

Forgeries happen all the time and sometimes they are discovered, sometimes not. These kinds of affairs (and they happen more often, not only in the Netherlands) show the value of long term archives: to really be a safe haven for scientific publications, despite their content. May be eventually to investigate fraud articles.

Policies: necessary and beneficial

The enormous growth of digital data will require memory organizations to develop a clear vision about their role in caretaking a fair piece of this data cake, with respect to their mission and goals. It is not enough to say for an organization that they will adhere to the ISO  14721 OAIS model. They need to translate this model into policies that are relevant for their specific organisation, their goals and mission, their specific collections. So a variety of policies will be developed, for example a collection policy to make the right selection and access policies to give their public (also defined in their policies) the opportunity to use these data.

In my opinion there are at least three reasons for developing preservation policies:

  1. Organizational sustainability
  2. Professionalize 3rd parties dialogue
  3. Better prepared for new developments

Organizational sustainability

Written policies becoming part of the institutional memory, will reduce the risk that change of staff or management  influences the approach to digital preservation in an organisation. In the next coming years in many organizations like libraries, the group of employees will change dramatically, due to an ageing population. Often these people were the first involved in digital preservation in an organisation. Transfer of knowledge is important, but not enough to achieve a sustainable preservation approach.

Professional 3rd parties dialogue

Organizations nowadays are deliberating whether they need to outsource certain tasks, as they lack the professionalism to perform these tasks themselves. Think of outsourcing storage, sometimes to the “cloud”, but also the outsourcing of digitization of collections, webharvesting or the creation of access tools. But despite this approach, you cannot outsource your responsibility. So it is important to have a clear idea about what the organizations want to achieve (policies!) and derive from that what to expect from these 3rd parties.

Digital Preservation Research developments

Since 2001 the European Commission supported with 94 million Euros research in digital preservation. Projects like Shaman, Planets, DL.org and SCAPE are some examples of projects where digital preservation policies played a role. One of the areas of research is related to the fact that the amount of digital material will force organizations to introduce automated ways of handling this material. For example, an automated ingest procedure or an automated migration action, including the quality control. This is only possible if there are clear policies, not only on a high level, but also on a very  detailed level. This is a goal for the SCAPE project.

Policies: not just paper work

To work well, organizational policies should be implemented into workflows and processes, to become part of the “organizational genes” and to be more then a paper work exercise. This ideal situation, however, will need some more work and research!

This blog is an abbreviated version of my presentation at the 2nd Liber preservation workshop in Florence on 6-7 May 2012 (see for the slides http://www.rinascimento-digitale.it/Liber2012_slide/Liber2012_Sierman.pdf)