A new Board for the Core Certification now established

What was called the “basic certification” in the pyramid of standards for certifying your digital repository, will now be called “Core Certification”. This is not the only outcome of merging the Data Seal of Approval criteria with the World Data Systems guidelines. Resulting in the new standard Core Trustworty  Data Repositories Requirements.

DSA-WDSThe new standard contains the same topics like ISO 16363 (on top of the pyramid) with Organizational Infrastructure, Digital Object Management and Technology. Still there are 16 requirements to follow, although some new ones are added like “Security” and “Expert Guidance”, and others must have been removed, but this is not easy to discover. There is no document with a comparison between the old and the new. The reference to the Open Archival Information System OAIS) , prominent in the former DSA Guidelines, is now changed in a requirement for “repositories with a preservation remit” and focuses only on “archival storage”. I personally regret this limitation, as in practice every digital repository could/should use the guidance of OAIS. Lees verder

New colleagues in DP?

Lately one of our newspapers De Correspondent published an article by Marian Cousijn  about people collecting art ‘that you cannot touch’ . Born digital art, like websites. They buy the website from the artist and put it on the web with their own URL (and name) so that everyone will know it  is their art collection and they are the owner of this piece of art. correspondentOne of them, the Swedisch artist is Hampus Lindwall, see for example his collection . What I found interesting, is that they are not only collecting, but also feel it as their responsibility to take care of the continuous accessibility of their collection. Not only paying the hosting costs, but Lindwall had his artwork adapted so that it can be shown on smartphones and tablets. Maintenance is not a problem in his view “a programmer from China will fix it for you for 50 dollars”. Uhh?

What a different point of view! Instinctively I would scream: What about authenticity? How about your original look and feel? But on the other hand: these are interesting developments in personal archiving. We can learn from these people. What do they think is important to preserve? And we can assist them with our knowledge. At least in the Netherlands, preservation of born digital art is in its infancy, as a recent report (sorry only in Dutch, but Google translate and Bing translator might help) shows, see http://ncdd.nl/site/wp-content/uploads/2014/06/Born_Digital_erfgoed_is_bedreigd_erfgoed.pdf

By the way, the maintenance is not always correctly done. One piece of art gives the error message

foutmelding

And the collectors own website http://www.hampuslindwall.com/ is temporarily out of order…

 

“Without archiving there is no scholarly record “

IMG_0783

“Without archiving there is no scholarly record” said Herbert van de Sompel on the OCLC/DANS Evolving Scholarly Record and Stewardship Ecosystem Workshop in Amsterdam yesterday. Unfortunatey I could only attend the morning session and did not participate in the discussions in the afternoon, but the presentations of Ricky Erway, Natasa Miliҫ-Frayling and Herbert van de Sompel gave enough food for thought.

Ricky Erway, Senior Program Officer at OCLC Research introduced their recent publication in which the authors made an attempt to create a framework of the scholarly record, that could be used as a common reference, just like the OAIS model is in the preservation community. It is not about the scholarly process, but about the end product, about “the stuff” as Ricky called it or the “Outcomes” as it is called in the diagram below. The authors envisage that this way of presenting the scholarly record and including the process and the aftermath, will better represent the collecting and preserving context of the scholarly record. Various stakeholders play their role, called the “stakeholders Ecosystem”, which could lead to different perspectives of the “scholarly record”. Roles could be “create” like the author, fix (publishers), collect (for example libraries) and use (researchers, public etc.). This model does not define the scholarly record, so as a library to collect this material you still need to decide what to collect. This was also a topic in the Driver project, in which we (pragmatically) gave the author a major role in deciding what should be in the scholarly record, or in the “enhanced publication”.

scholrecoclcNatasa Miliҫ-Frayling, principal researcher at Microsoft Research in Cambridge UK started by the statement that “digital is different” and this difference [from analogue] will ask for another approach if we want to preserve things. She saw a big role for virtualization, as is already realised by Microsoft for older operating software. So migrating files is no longer necessary if you keep the original, run it on a virtualization platform as long as you know how to handle the old software tools. She invited the preservation community to discuss with the software industry to keep older software alive in a virtualized environment, for example via UNESCO as an independent party.

To support the researchers in a good way, Microsoft Research tries to analyse the scientific practices in order to make supporting software tools. Experience in a Cambridge research environment shows that this is not an easy task, as often the same material need to be presented in different granularity and different order, dependent of the viewer and the stage of the research process. Linking documents in a computer environment (files) as well as outside this environment (hyperlinks) will require another information architecture than is currently available.

Herbert van de Sompel , scientist at Los Alamos National Laboratory gave his vision about how scholarship is changing and took as a basis the essential functions in scholarly communication as described by H.E. Roosendaal & P.A.Th.M.Geurts. As his talk according to his own words, was a slightly updated version of a previous one, read about this here.

As the boundaries of the “scholarly record” are changing rapidly, it is important to think about the implications of it for libraries, archives and data centers. We might make mistakes in what we preserve and what not, but at least we should record our coherent view on what we want to preserve.

Save our preservation tool kit!

Jan Luyken Tea and coffy tool kit. Courtesey Rijksmuseum, Netherlands

Jan Luyken Tea and coffy tool kit. Courtesy Rijksmuseum, Netherlands

The recent US Government shut down should make all people involved in digital preservation thinking, if not worrying. I gave some feedback in Simon Tanners blog post  , but the weekend helped to ponder a bit more about this topic.

We have always said, digital preservation is an international activity and we act like that, by having international collaboration in various areas. Sometimes one organisation starts a very good initiative and we all like to make use of the results, like PRONOM (TNA), FITS (Harvard), JHOVE, OPF, NDSA, DCC, DPC,  PREMIS  at the Library of Congress. Oops… due to the US government shut down this one was no longer available via the well known URL. Although the LoC website has recently be restored (5-10-2013), many more sites are still not available, like for example data.gov . So we could say that the digital preservation community is affected by the US Government shut down, and not only because we can’t have our regular meetings with the preservation people of the Library of Congress.

Have we been naïve as digital preservationists? This is not the first time the US Government shuts down, it also happened in  1995 and 1996 and before. But at that time the web was less influential on our daily activities and we were less dependent of it. Things have changed and we work with the web all day. But we might have been a little bit naïve in expecting things to be there, while our daily job is based on the expectation that things will not always be there. We try to save things. But we don’t have a rescue plan for the information we are dependent on in our processing activities. We might need registries at ingest and transformations and reference works when doing risk assessment of file formats and new object types. But we don’t have an overview of these vital sources that together make our digital preservation tool kit: standards,registries, software, reference works etc. All things that are only accessible from one place are principally in danger – the same rule we’ll apply for our preserved digital collections. What happened in the US can also happen somewhere else.

I would suggest to create an overview of our digital preservation tool kit, the vital elements without which we cannot work,  and be reassured that their availability will no longer be restricted to one place. They should be present on various mirror sites. This activity really requires collaboration, but we already have the infrastructure for that. It is now important to make use of it. We should see this US Government shut down as a wake up call, it is not too late yet.

A deadly sin

At last week’s iPRES2013 conference in Lisbon, a talk was given about an experiment on the migration of WARC files, done by Tessella, called Studies on the scalability of web preservation. One remark in the talk caused some rumour, namely the fact that the presenter suggested to adapt the WARC file and deviate from the standard. Why did they? Because – as we were told –  the current version of the Wayback Machine software, that enables you to render the WARC file format, is not optimal for rendering WARC files with conversion records. But tweaking the format of the Archival Information Package and store this for the long term is not the way we should go. We preserve information for long term. Our future custodians will not understand this (unless they are told so via metadata and even then) and will assume if they see a WARC format, all the rules in the standard are taken into account. Deviating from this is wrong, in fact it is almost a deadly sin.

After reading the corresponding publication (the conference papers are published by the Portugese National Library as a free e-book), I saw that things were less straight forward. The approach Tessella chose was to create two WARC files: a correct WARC according to the standards and an adapted  WARC for access. From the article:

 This required the development of two different workflows for creating migrated WARC files: one, which is formally correct according to the WARC standard, and maintains the integrity of the WARC schema, and a second which is more pragmatic,  and produces a file that can be displayed correctly by current WARC viewers. This pragmatic workflow can also be used for the migration of container formats  which do not support conversion records, such as ARC files.

So what should one do in the case an ISO  standard does not meet ones requirements? In this case the WARC standard is maintained by the BnF , which can easily be seen if one looks for the standard itself. This is especially mentioned on the internet so that people can get in touch. Another approach is to look for interested parties in the Wayback Machine software, which every one who is involved in web archiving knows, is the Internet Archive. And there is the IIPC, the International Internet Preservation Consortium that is currently initiating a developers working group to improve the Wayback Machine software. So if you have some problems with the standards, think about the millions of precious digital objects  that need to be preserved in that format and get in touch with the community. But don’t tweak the format itself!

Sustainability is more than saving the bits

Sustaining-our-digital-future-FINAL-31

The subject of the JISC/SCA report Sustaining our digital Future. Institutional strategies for digital content. By Nancy L. Maron, Jason Yun and Sarah Pickle (2013),  is the sustainability of digitised collections in general, illustrated with experiences of three different organisations: University College London, The Imperial War Museum and the National Library of Wales. I was especially interested by the fact that the report mentions digital preservation, but not as a goal in itself (“saving the bits”). Instead, the authors broaden the scope of digital preservation with activities that are beyond bit preservation or even beyond “functional preservation”.

Nowadays a lot of digitisation projects are undertaken and interesting material comes to life for a large audience, often with a fancy website, a press release, a blog (and a big investment)  and attracts immediately  interested public. But the problematic phase starts when the project is finished. In organizations like universities, with a variety of digitisation projects, lack of central coordination of these projects could cause “disappearance” of project results, simple because hardly anyone knew about it. We all know these stories, and this report describes the ways these 3 organizations try to avoid that risk.

Internal coordination seems to be a key factor in this process. One organisation integrated more than a hundred databases in a central catalogue, another draw together several teaching collections. Both efforts resulted in visibility of the collections. But this is not enough to achieve permanent (long term) access.  The data will be stored safely, but who is taking care of all the related products, that support the visibility of the data? In other (digital preservation jargon) words, who is monitoring the Designated Community and their changing environment?

The report describes interesting activities.  Take for example this one: the intended public need to be reminded constantly of the existence of the digitized material by promotion actions, otherwise the collections will not be used at all. Who is planning this activity as part of digital preservation? That the changing environment needs to be updated sounds familiar. But there is more reason to do this apart from technical reasons. Websites need to be redesigned to be attractive, to adapt to changing user experiences. And who is monitoring whether there might be a new group of interested  visitors?

Or, as Lyn Lewis Dafgis of the National Library of Wales said, there is an assumption that

once digitised, the content is sustainable just by virtue of living in the digital asset management system and by living in the central catalogue.

And this need to change.

Not seldom digital preservation is seen as something that deals with access to the digital collections somewhere in the future. Permanent access, which is the goal of digital preservation, is often seen as solved by “bit preservation” and if you do a really good job “functional preservation”. This report illustrates with some good examples what more needs to be done and is coloring the not always well understood OAIS phrase “monitoring the Designated Community”.

Peeking over the wall

Braga, Biscainhos Museum

Recently a very interesting report was published in the series of DPC Technology Watch Reports  Digital Forensics and Preservation, by Jeremy Leighton John, from the British Library. I knew the phrase “digital forensics” and its potentials for digital preservation, especially in the archival community. This report shows clearly, with practical examples, what the digital preservationists can learn from the digital forensics. One could think that, as this is often related to personal archives, it might not be of interest for organizations that don’t have personal archives in their collections. But this reports shows that there is much ground in common to raise the interest.

The digital forensics process had some starting points that are very similar to what in the digital preservation community is referred to as “authenticity” and “the original”.

 1. Acquire the evidence without altering or damaging the original,

2. Establish and demonstrate that the examined evidence is the same as that which was originally obtained and

3. Analyse the evidence in an accountable and repeatable fashion (p14).

The fact that the forensic practices has narrow links with legal authorities forces them to act towards criteria that are not always present in the environments of cultural heritage organisations. This might make the tools and approaches that are used in digital forensics more strict and reliable.

forensics

As noted in the report (p.17) a distinction between digital forensics and digital preservation is that the latter is aiming to have the material being accessible over time and by many different users, while digital forensics focus often on one specific goal: the court case. Of course this influences  the methodology used in digital forensics, in this report referred to as the “lifecycle”, but the similarities in approaches between digital preservation and digital forensics are striking (p. 21). Especially related to steps that prepare the material before “archival storage”, so the ingest and pre-ingest steps. As digital forensics is often confronted with handhelds, smartphones and tablets etc. –  a relatively new category for libraries – , the methods and insights they have developed could be of tremendous help for libraries, especially those with personal collections.

The benefit of this report is the practical line of approach, with references and descriptions to (open source) tools that are used in the digital forensics community. A wide range of examples that underpin the case for digital forensics are described, and I experienced a frequent occurrence of “aha Erlebnis”. Recognition of similar challenges and areas of interest: cloud computing, large scale, emulation, privacy, the need of test environments with reliable corpora (for libraries always difficult because of copyright) etc.  The report summarizes a long list of conclusions (one of them the “inertia” of libraries and archives to preserve personal archives, but maybe we can extend that to the hesitation to preserve offline digital material in general) and finishes with a set of Recommended Actions, of which I conclude as the generic topic: collaboration.

Collaboration will be the most beneficial when parties involved are aware of their needs and what they want to achieve. I think that although we talked a lot of digital preservation, and much is yet not clear, we have a set of starting points that will support us in collaboration activities. The OAIS model still offers a very clear and understandable set of coherent concepts. For those that need a more practical explanation the series of audit materials like DSA , DIN,  and RAC  can support them.  In a way digital preservation has grown up and is able to look around in other, less obviously adjacent disciplines. The people interested in emulation learned a lot of the open source community that rescued games (EU project  KEEP – website no longer available) . Data visualisation, as was mentioned in a blog at The Signal , could help us identifying patterns in collections and perhaps identify risks, if applied in a clever way. Human Computer Interaction (HCI)  science was mentioned by Luciana Duranti to be involved in her research into Records in the Cloud.

Sometimes people are wondering where all investments in digital preservation (research) have brought us so far. There seems to be no end to the challenges with the rapid technology changes.  But I like the view that seems to emerge that there are rich opportunities to collaborate between (established) disciplines. Peeking over the wall  around your own garden into the neighbours courtyard can offer some interesting views. Picking some of the seeds could make your border a stunning one!

 

OAIS explained in Dutch

handboek informatiewetenschapFor everyone who is interested in an article in Dutch, which explains the main topics of the OAIS model, please read my recently published article
Het OAIS-model, een leidraad voor duurzame toegankelijkheid.
This article was published in the Handboek Informatiewetenschap, december 2012, part IV B 690-1 pp. 1-27. It will also appear in the IWA database, at www.iwabase.nl. And later I’ll add it to this blog. Happy reading!

The Atlas continued

Thanks for the positive feedback on my blog to create an Atlas of Digital Damages. The upcoming IPRES conference in Toronto might be a good opportunity to exchange ideas and may be to initiate something in (international) collaboration. I’m happy to discuss this.

One possible place to put this information could be the File Format Issue Registry, that will be developed in the European SCAPE  project (of which I’m a member too). Other ideas are welcome, may be people from the NDSA  are also interested.

In the mean time, input is needed, so all your horror stories with scaring pictures are welcome! So if you have pictures of damaged MS-Word files, PDF files that are looking odd, Excel files with missing information, and you as a preservation expert know what is the cause of this, send me a link to this information. This will contribute to make digital preservation “visible”.

Original look not appreciated

hile it is a primary goal in digital preservation to  preserve the significant properties of the original  digital object over the years, I recently realized that it could also be beneficial not to show  the original properties on access,  but to present the reader  a version that uses  a contemporary layout (although not per se using the “comic sense” font).

I was reading a bundle of columns on my tablet of a deceased Dutch writer who was very famous in the sixties but is now seen as a bit old fashioned. The columns were digitized and OCR-ed and all the original layout was gone, even worse, the header and footer were placed in the middle of the text ! Obviously this was not professionally digitized!

But anyhow,  while reading this, it had the effect that I totally lost the sense of the period this column was originally created. In the analogue version, the book cover and the font type were typical for the sixties, the paper and the layout were typically from that period etc. Now this “original look and feel” was gone, I appreciated better the content of the columns, leaving all the prejudices I had behind me. I was even able to compare the columns with another, but contemporary, author as if they were written under equal circumstances! And I appreciated this voice from 50 years ago like he was a contemporary fellow.

The Dutch language has a drawback in that it had various standards in spelling over the years, so  this still hints to the time of creation. Other languages, like the English language, seem to have less changes over the years. For that reason,  so they say, English people nowadays still can read Shakespeare, while in the Netherlands it is very difficult for a non experienced reader to understand an 18th century text.

Libraries are digitizing their collections on a large scale and because of copyright reasons, this is mostly older material. Quite often they create a copy of the original version and have a process of OCR-ing the text to make it searchable and present the readers with a pdf file with the original look and feel. But it could be that not showing the original look and feel, but instead just present the plain text, might attract more readers . (New tools might even update the contemporary spelling to a modern version!)

Would not this contribute to an increase in the appreciation of old texts? And to more use of the digitized materials?

It would be nice if all the millions of digitization would be beneficial, not only to researchers, but also to the general public to get a better understanding of their heritage.

In the mean time, the digital preservation people will keep studying how to preserve the original characteristics! For the other group of people interested in the original look and feel.