30 years of digital preservation, issue 3: web archiving

It was 1996 when digital preservation practitioners worldwide made the first steps on the way to become a profession. We, as digital preservationists, can celebrate 30 years of experience in an evolving domain that was inspired by a tradition of libraries and archives: to preserve heritage material and keep it accessible for the long term.

With the start of the World Wide Web a new publication medium was created. National libraries with their legal deposit mandate to collect published material of their country, needed to handle the implications of this new medium. “Web archiving” was a way to collect material, the so called “online resources”, to avoid gaps in their collection. In the early days the goal was not so much to harvest whole websites, but to capture publications and periodicals on the web. Some pioneers started with web archiving 30 years ago. They were already familiar with electronic publications on for example CD-ROMS, but the web was something new. Time to give credits to the following pioneering libraries.

EPPP project of the National Library of Canada

The National Library of Canada was the first with their EPPP (Electronic Publications Pilot Project) project 1994-1996, harvesting online publications on the internet, with both access and preservation in scope. The project was led by Bill Newman. During this process they discovered several issues for which in those days no solution existed yet. Web archiving required some other skills and regulations. Staff of the National Library of Canada for example first got an “internet training” to become familiar with the WWW phenomenon. The copyright issue was another hurdle the pioneers either postponed or handled manually. One of the conclusions was: “The EPPP found that the lack of appropriate standards to be a major obstacle to the long-term accessibility and preservation of electronic publications.” Times have changed indeed…

Kulturarw3 project of the Royal Library – National Library of Sweden

Starting in 1996, John Mannerheim of the Royal Library led a team to create a web archive called Kulturarw3, the 3 referring to the 3 W’s of the World Wide Web. This was the outcome of a meeting on the 6th of August 1996, when a group of 12 people gathered together to discuss the initiative of Frans Lettenström, former IT coordinator at the Royal Library and at that time working for BIBSAM, a consortium of Swedish universities, research institutions and public agencies. He raised the idea to archive the Swedish web. The group agreed to what was called “the comprehensive scope” : no selection but harvesting “ everything”.

PANDORA of the National Library of Australia

After a small project in 1995 as prelude, the National Library of Australia started in June 1996 their PANDORA project (Preserving and Accessing Network Documentary Resources of Australia). Wendy Smith led the project. The Australian approach was a selective one and a special Selection Committee on On-line Australian Publications (SCOAP) developed a set of selection guidelines. Permission was asked of the publishers before the web site was harvested, a very labour intense approach.

Brewster Kahle with the Internet Archive

In April 1996, Brewster Kahle (who always calls himself a librarian) wrote an article about the newly founded Internet Archive. This article was published in the March 1997 issue of Scientific American. His ambition was even greater than that of the national libraries. “This collection will include all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system, and downloadable software.”

Collaboration

They all knew that collaboration was essential to make progress in this new field and wanted to involve other interested parties as well. During the WWW conference in Brisbane in 1998, the national libraries of Australia, Sweden and the Netherlands initiated PREWEB, an international discussion list and literature collection on preservation of the World Wide Web. Still available on Internet archive http://kulturarw3.kb.se/html/preweb.html

© 2026 Barbara Sierman

b-s-i-e-r-m-a-n-@-d-i-g-i-t-a-l-p-r-e-s-e-r-v-a-t-i-o-n-.-n-l