You are here

Archiving Web Resources 2004

Media, Traditional and Alternative

Spent most of the ANZAC day public holiday on Monday working on a paper for the 2005 Wiki Symposium in San Diego. My colleague, the soon-to-be Dr Sal Humphreys has done much of the legwork for this paper which we'll be submitting before Friday; it details the use of wikis in my New Media Technologies unit at QUT and discusses the overall frameworks for using wikis in teaching. I'll post it here once it's done.

Virtual Remote Control, STORS, and Digital Format Repositories

Moving on now to the first of two post-lunch sessions - because I have a plan to catch later in the afternoon, though, this will be my last one for what has been a truly exciting conference. It's been great being able to cover the proceedings, and of course I should point out that all errors or mistakes here are mine and not the presenters' - at least this conference had only one track, however, so I was able to get to everything without missing any other papers being given simultaneously.

Nancy McGovern from Cornell University Library will begin this session, with some more information about Virtual Remote Control (VRC), which we heard something about over the last days already - it will be good to see more on this. (She's also putting in a plug for RLG Diginews, a Research Libraries Group publication she co-edits.) VRC's purpose is in both risk and records management, and it moves from passive monitoring to active capture. It offers lifecycle support from selection to capture, and enables the human curator through providing relevant tools. There are guidelines for increasing Website longevity and promulgating preservation practices, by understanding Web resources and risks.

IIPC Tools and LoC Crawling

The second session this morning once again returns us to the International Internet Preservation Consortium (IIPC). Julien Masanès, IIPC coordinator from the French National Library, will team up with Monica Berko, Director of the Applications Branch at the National Library of Australia. Julien begins by speaking on the IIPC's techniques for deep Web acquisition - the archiving of resources which are deeply hidden within Websites and which often constitute the richest content on the Web (and therefore form a crucial task for Web archiving). Originally, much of this material was inaccessible to Web crawlers, but smarter tools have now changed this.

Talking Tech

While the major part of this conference finished yesterday, we've still got another day to go. Billed as the 'information day', today will cover many of the technologies and projects which have been mentioned over the last few days. I'll try and take in as much of this as I can, but I do have to run off to the airport by 4 p.m.; this means I will miss some of the talks on what's happening at the National Library of Australia which very humbly have been placed last on the programme. Turnout today is somewhat smaller than the 200 or so delegates over the last few days, but still very good - I also have a feeling we'll be suffering from acronym overload by the end of the day…

The End of a Conference, the Start of More Challenges

On to the final session now - and in fact the final session of the conference proper (tomorrow is billed as an information day on the various archiving projects). Speaking now is David Seaman from the Digital Library Foundation (DLF); his organisation is involved in a wide range of projects across the many topics and issues raised in the conference.

He notes that 'the chaos isn't slowing down', where new and possibly important formats and genres of Web content are constantly arising (but where it is difficult to work out what is relevant and likely to continue further and what isn't). Libraries, at least, may have some degree of expertise in this field and will be able to make some useful guesses if nothing else. There is therefore also an imperative to collaborate, it really is a survival skill for libraries and related organisations, but that doesn't necessarily make it any easier.

Web Archiving and Legal Deposit

Some very interesting discussions over lunch, especially with Ian Oi from Blake Dawson Waldron (lawyers to the National Library of Australia as well as key collaborators with QUT on the translation of the Creative Commons framework into the Australian legal context). Talking to Paul Koerbin from the NLA also reminded me that the changes we've made to the server setup of M/C - Media and Culture (now using the three sub-domains more effectively) recently may mean that the archiving of the site by PANDORA as it's happened so far may now not work so well any more - I'll have to check back with the NLA to make sure there are no longer-term problems.

Archiving and Recordkeeping

The next session is chaired by Ross Gibbs, the Director-General of the National Archives of Australia. We're now moving into issues especially also around archives (as opposed to libraries). Hans Jansen from the Royal Dutch Library makes a start. Like many others, the library is charged with preserving all publications by Dutch publishers, but there are no legal deposit requirements in the country, so voluntary agreements with publishers have been made. More recently, of course, the rise of electronic publishing has further complicated the library's activities. Since 1994, it has been involved in developing e-Depot, a deposit system, in partnership with IBM (the system is also commercially available under the name DIAS). It now has a load capacity of some 50,000 articles per day, and contains some 4 million electronic journal articles. (So, the focus here is on archiving deposited materials, not the wider Web as such.)

Repository Collaborations

Some Dinner Venue!

We're back now for day three of the conference, following the lavish dinner at New Parliament House last night. Robin Dale from the Research Libraries Group begins the day's proceedings, which focus this morning on the topic of collaboration. She points out that in the current environment collaboration is increasingly important, and in such collaborations, the issue of mutual trust, and trust in the content repositories, becomes particularly crucial. How can trust be established, and trustworthiness assessed?

Formats for Archiving?

The last session for today has started. Colin Webb (how appropriate!), Director of the National Library of Australia's Preservation Services sets the scene, noting that 'preservation' means maintaining the ability to access content. Layers of responsibility include byte stream integrity, byte stream identity, and the preservation of intellectual content for each digital object that is preserved, but also the preservation of original context, current context, and 'significant properties' or essential characteristics.

However, there are some reasons for hope here: the incentive is one of taking steps and building collections, and this has driven some very promising projects already. Also, the preservation problem may break down into some more manageable segments: byte stream protection, means of access, and metadata and systems. Additionally, it is possible to make informed decisions given the limitations of known means of access; we can work on specifics and push towards automation and towards a collaboration beyond research (building networks of capacity).

Making Metadata

Nice Shot?

We're on to the post-lunch session, now, and had our group photo taken as well. Tom Delsey begins by discussing issues around resource discovery and archived resources. A lot of this is connected to metadata - and importantly, given the sheer size of their Web resource collections, are archives and libraries able to sustain the creation of metadata about collected resources as they have done it in an offline context?


Subscribe to RSS - Archiving Web Resources 2004