Media, Traditional and Alternative

Archiving Web Resources 2004

Media, Traditional and Alternative

Virtual Remote Control, STORS, and Digital Format Repositories

Virtual Remote Control, STORS, and Digital Format Repositories

Nancy McGovern from Cornell University Library will begin this session, with some more information about Virtual Remote Control (VRC), which we heard something about over the last days already - it will be good to see more on this. (She's also putting in a plug for RLG Diginews, a Research Libraries Group publication she co-edits.) VRC's purpose is in both risk and records management, and it moves from passive monitoring to active capture. It offers lifecycle support from selection to capture, and enables the human curator through providing relevant tools. There are guidelines for increasing Website longevity and promulgating preservation practices, by understanding Web resources and risks.

IIPC Tools and LoC Crawling

The second session this morning once again returns us to the International Internet Preservation Consortium (IIPC). Julien Masanès, IIPC coordinator from the French National Library, will team up with Monica Berko, Director of the Applications Branch at the National Library of Australia. Julien begins by speaking on the IIPC's techniques for deep Web acquisition - the archiving of resources which are deeply hidden within Websites and which often constitute the richest content on the Web (and therefore form a crucial task for Web archiving). Originally, much of this material was inaccessible to Web crawlers, but smarter tools have now changed this.

Talking Tech

Talking Tech

The End of a Conference, the Start of More Challenges

On to the final session now - and in fact the final session of the conference proper (tomorrow is billed as an information day on the various archiving projects). Speaking now is David Seaman from the Digital Library Foundation (DLF); his organisation is involved in a wide range of projects across the many topics and issues raised in the conference.

He notes that 'the chaos isn't slowing down', where new and possibly important formats and genres of Web content are constantly arising (but where it is difficult to work out what is relevant and likely to continue further and what isn't). Libraries, at least, may have some degree of expertise in this field and will be able to make some useful guesses if nothing else. There is therefore also an imperative to collaborate, it really is a survival skill for libraries and related organisations, but that doesn't necessarily make it any easier.

Web Archiving and Legal Deposit

Web Archiving and Legal Deposit

Archiving and Recordkeeping

The next session is chaired by Ross Gibbs, the Director-General of the National Archives of Australia. We're now moving into issues especially also around archives (as opposed to libraries). Hans Jansen from the Royal Dutch Library makes a start. Like many others, the library is charged with preserving all publications by Dutch publishers, but there are no legal deposit requirements in the country, so voluntary agreements with publishers have been made. More recently, of course, the rise of electronic publishing has further complicated the library's activities. Since 1994, it has been involved in developing e-Depot, a deposit system, in partnership with IBM (the system is also commercially available under the name DIAS). It now has a load capacity of some 50,000 articles per day, and contains some 4 million electronic journal articles. (So, the focus here is on archiving deposited materials, not the wider Web as such.)

Repository Collaborations

Repository Collaborations

We're back now for day three of the conference, following the lavish dinner at New Parliament House last night. Robin Dale from the Research Libraries Group begins the day's proceedings, which focus this morning on the topic of collaboration. She points out that in the current environment collaboration is increasingly important, and in such collaborations, the issue of mutual trust, and trust in the content repositories, becomes particularly crucial. How can trust be established, and trustworthiness assessed?

Formats for Archiving?

The last session for today has started. Colin Webb (how appropriate!), Director of the National Library of Australia's Preservation Services sets the scene, noting that 'preservation' means maintaining the ability to access content. Layers of responsibility include byte stream integrity, byte stream identity, and the preservation of intellectual content for each digital object that is preserved, but also the preservation of original context, current context, and 'significant properties' or essential characteristics.

However, there are some reasons for hope here: the incentive is one of taking steps and building collections, and this has driven some very promising projects already. Also, the preservation problem may break down into some more manageable segments: byte stream protection, means of access, and metadata and systems. Additionally, it is possible to make informed decisions given the limitations of known means of access; we can work on specifics and push towards automation and towards a collaboration beyond research (building networks of capacity).

Making Metadata

Making Metadata

We're on to the post-lunch session, now, and had our group photo taken as well. Tom Delsey begins by discussing issues around resource discovery and archived resources. A lot of this is connected to metadata - and importantly, given the sheer size of their Web resource collections, are archives and libraries able to sustain the creation of metadata about collected resources as they have done it in an offline context?


