You are here

Snurb's blog

Web Archiving and Legal Deposit

Some very interesting discussions over lunch, especially with Ian Oi from Blake Dawson Waldron (lawyers to the National Library of Australia as well as key collaborators with QUT on the translation of the Creative Commons framework into the Australian legal context). Talking to Paul Koerbin from the NLA also reminded me that the changes we've made to the server setup of M/C - Media and Culture (now using the three sub-domains more effectively) recently may mean that the archiving of the site by PANDORA as it's happened so far may now not work so well any more - I'll have to check back with the NLA to make sure there are no longer-term problems.

Archiving and Recordkeeping

The next session is chaired by Ross Gibbs, the Director-General of the National Archives of Australia. We're now moving into issues especially also around archives (as opposed to libraries). Hans Jansen from the Royal Dutch Library makes a start. Like many others, the library is charged with preserving all publications by Dutch publishers, but there are no legal deposit requirements in the country, so voluntary agreements with publishers have been made. More recently, of course, the rise of electronic publishing has further complicated the library's activities. Since 1994, it has been involved in developing e-Depot, a deposit system, in partnership with IBM (the system is also commercially available under the name DIAS). It now has a load capacity of some 50,000 articles per day, and contains some 4 million electronic journal articles. (So, the focus here is on archiving deposited materials, not the wider Web as such.)

Repository Collaborations

Some Dinner Venue!

We're back now for day three of the conference, following the lavish dinner at New Parliament House last night. Robin Dale from the Research Libraries Group begins the day's proceedings, which focus this morning on the topic of collaboration. She points out that in the current environment collaboration is increasingly important, and in such collaborations, the issue of mutual trust, and trust in the content repositories, becomes particularly crucial. How can trust be established, and trustworthiness assessed?

Formats for Archiving?

The last session for today has started. Colin Webb (how appropriate!), Director of the National Library of Australia's Preservation Services sets the scene, noting that 'preservation' means maintaining the ability to access content. Layers of responsibility include byte stream integrity, byte stream identity, and the preservation of intellectual content for each digital object that is preserved, but also the preservation of original context, current context, and 'significant properties' or essential characteristics.

However, there are some reasons for hope here: the incentive is one of taking steps and building collections, and this has driven some very promising projects already. Also, the preservation problem may break down into some more manageable segments: byte stream protection, means of access, and metadata and systems. Additionally, it is possible to make informed decisions given the limitations of known means of access; we can work on specifics and push towards automation and towards a collaboration beyond research (building networks of capacity).

Making Metadata

Nice Shot?

We're on to the post-lunch session, now, and had our group photo taken as well. Tom Delsey begins by discussing issues around resource discovery and archived resources. A lot of this is connected to metadata - and importantly, given the sheer size of their Web resource collections, are archives and libraries able to sustain the creation of metadata about collected resources as they have done it in an offline context?

Of Thematic Harvesting, Virtual Remote Controls, and a Heritrix

And we continue with another session on harvesting approaches for Web content archiving and preservation. We begin with Martha Anderson from the Office of Strategic Initiatives at the Library of Congress. Its problems are probably larger than those of most other libraries - there is no clearly defined country domain, and the volume of material is of course significantly higher than in most other countries.

The library therefore takes a thematic approach to its collection - both identifying ongoing themes and time-bounded issues (elections, the 11 September attacks, etc.). The goal of such selection is to save as much as possible with limited resources, and preserving the context of content as much as this can be achieved. At this point, the LoC is required to seek permission to display and (in the case of event-based harvest) to collect. In doing so, it attempts to leverage its institutional resources to achieve a higher volume of coverage.

Approaches to Archiving

We've now moved into the second day in Canberra; this is kicked off by Abby Smith, Director of the Council on Library and Information Resources in the US, speaking on the future of Web resources. She suggests that the strategies for selection and preservation by libraries will need to be rethought; here, the barriers to the creation of content are now unprecedentedly low, while those to persistence of information and unusually high, while library approaches so far have been based on a scarcity of archivable material, but relatively easy archivability.

The Web is massive in scale, highly dynamic and unstable, and riddled with hardware dependencies. How is it to be dealt with - what to preserve, for how long, and for whom? Cooperation and coordination between collecting institutions here will be difficult, even if it may be desirable; access is always a service for a specific community, and there may be no universal, global needs upon which to build. Cooperation is highly problematic in collecting, and at best it may mean that the aggregation of local collections will enable a solid combination of material in a broad range of fields.

Back to the Big Picture

The next session is chaired by Alex Byrne, the president-elect of the International Federation of Library Associations and Institutions. His role here is to frame the next session which will present some responses from cultural institutions to the problems raised today. IFLA itself is interested in forging partnerships involved in the preservation of networked resources. Let's see how long my battery lasts through his and the following talks.

Jim Michalko, President of the Research Libraries Group, is the first speaker. He notes that what's available on the Web now are the current and future materials of research, and it is necessary to make and keep this material available through preservation and archiving mechanisms. Collective action is the appropriate way forward to achieve this, and he notes that the NLA has taken some very useful steps which should be taken up far more widely than they have.

Uses of the Web

And we're on to the next session - I'm speaking in this group, so of course I'm only blogging the other two (my Powerpoint for the presentation is already online). Timothy Hart from Museum Victoria is the first speaker; he notes that museums do receive a significant number of Web visitors who engage in some detail with their content. What about preserving their sites, though? While the museum collections are of course valued, the Websites aren't necessarily.

Australian Museums On Line (AMOL) started much of the drive to put museums on the Web (and will soon become Collections Australia Network (CAN)); there are also bodies like Museums Australia which contribute to the process. Early on, there was much experimentation, but a reduction in funding has contributed to a gradual slowdown here.

Mapping on the Internet

Tim Mackey of Geoscience Australia is next, speaking on maps and the impact of the Internet. His organisation has a significant amount of data (some 500 terabytes, growing by 150 TB per year), and aims to make a significant quantity of this material available online. Historically, of course, such spatial data would have been available in hardcopy and updated on a yearly basis, while now it is electronic and Internet-accessible and updated almost daily. Archive and versioning management therefore become crucial issues. Every client of Geoscience Australia, in effect, will receive a different map.

Pages

Subscribe to RSS - Snurb's blog