You are here

Archiving Web Resources 2004

Of Thematic Harvesting, Virtual Remote Controls, and a Heritrix

And we continue with another session on harvesting approaches for Web content archiving and preservation. We begin with Martha Anderson from the Office of Strategic Initiatives at the Library of Congress. Its problems are probably larger than those of most other libraries - there is no clearly defined country domain, and the volume of material is of course significantly higher than in most other countries.

The library therefore takes a thematic approach to its collection - both identifying ongoing themes and time-bounded issues (elections, the 11 September attacks, etc.). The goal of such selection is to save as much as possible with limited resources, and preserving the context of content as much as this can be achieved. At this point, the LoC is required to seek permission to display and (in the case of event-based harvest) to collect. In doing so, it attempts to leverage its institutional resources to achieve a higher volume of coverage.

Approaches to Archiving

We've now moved into the second day in Canberra; this is kicked off by Abby Smith, Director of the Council on Library and Information Resources in the US, speaking on the future of Web resources. She suggests that the strategies for selection and preservation by libraries will need to be rethought; here, the barriers to the creation of content are now unprecedentedly low, while those to persistence of information and unusually high, while library approaches so far have been based on a scarcity of archivable material, but relatively easy archivability.

The Web is massive in scale, highly dynamic and unstable, and riddled with hardware dependencies. How is it to be dealt with - what to preserve, for how long, and for whom? Cooperation and coordination between collecting institutions here will be difficult, even if it may be desirable; access is always a service for a specific community, and there may be no universal, global needs upon which to build. Cooperation is highly problematic in collecting, and at best it may mean that the aggregation of local collections will enable a solid combination of material in a broad range of fields.

Back to the Big Picture

The next session is chaired by Alex Byrne, the president-elect of the International Federation of Library Associations and Institutions. His role here is to frame the next session which will present some responses from cultural institutions to the problems raised today. IFLA itself is interested in forging partnerships involved in the preservation of networked resources. Let's see how long my battery lasts through his and the following talks.

Jim Michalko, President of the Research Libraries Group, is the first speaker. He notes that what's available on the Web now are the current and future materials of research, and it is necessary to make and keep this material available through preservation and archiving mechanisms. Collective action is the appropriate way forward to achieve this, and he notes that the NLA has taken some very useful steps which should be taken up far more widely than they have.

Uses of the Web

And we're on to the next session - I'm speaking in this group, so of course I'm only blogging the other two (my Powerpoint for the presentation is already online). Timothy Hart from Museum Victoria is the first speaker; he notes that museums do receive a significant number of Web visitors who engage in some detail with their content. What about preserving their sites, though? While the museum collections are of course valued, the Websites aren't necessarily.

Australian Museums On Line (AMOL) started much of the drive to put museums on the Web (and will soon become Collections Australia Network (CAN)); there are also bodies like Museums Australia which contribute to the process. Early on, there was much experimentation, but a reduction in funding has contributed to a gradual slowdown here.

Mapping on the Internet

Tim Mackey of Geoscience Australia is next, speaking on maps and the impact of the Internet. His organisation has a significant amount of data (some 500 terabytes, growing by 150 TB per year), and aims to make a significant quantity of this material available online. Historically, of course, such spatial data would have been available in hardcopy and updated on a yearly basis, while now it is electronic and Internet-accessible and updated almost daily. Archive and versioning management therefore become crucial issues. Every client of Geoscience Australia, in effect, will receive a different map.

Science and the Web

Up next is Australian Chief Scientist Robin Batterham, speaking on the use of the Web by Australian scientists. The nature of science is changing, not least because of the impact of the Web. The best direction for it isn't quite clear yet.

The is now a significant level of information on the impact of scientific research, and Robin shows what's called Australia's research footprint (measured in terms of output, citations, researchers as percentage of the workforce, etc.), where Australia rates well, but not necessarily at the very top. However, the nature of scientific work has changed towards much larger-scale collaboration across national boundaries, as well as towards a concentration of research around a small number of particularly excellent institutions. The information used to measure such outcomes is itself a result of the information revolution which the Web and other related tools have brought about. Science is also increasingly cross- or transdisciplinary.

Preservation vs. Accessibility in Audiovisual Materials

Paolo Cherchi Usai, the Director of ScreenSound Australia at the National Screen and Sound Arcive, begins the next session. He points to what he calls the death of cinema - the move from traditional audiovisual to digital production in screen and sound. Thus, most of the material produced today can be viewed electronically, via Websites. Audiovisual materials have never been this widely accessible before. This raises problems as well as opportunities, however:

  1. Is it true that the Web is going to make accessible more moving images and audiovisual works?
  2. Is the Web going to improve the quality of access?
  3. Is the issue of accessibility going to interfere with a mandate of preserving materials?

The first answer ist yes; however, the growth of Web archiving will exacerbate the conflict between the urge to create material and the imperative of legal ownership. Archives are being strangled by legal frameworks here; today, their right to archive is being challenged (or indeed besieged), and openly contested. A possible solution for the future may be in the library rather than archive framework; in libraries, freedom of access has been protected from the implications of copyright and legal ownership.

Challenges for Web Archiving

The keynote speaker this morning is Malcolm Gillies, the DVC (Education) at Australian National University. He provides a brief history of cultural transmission, from remembered oral tradition to the emergence of the written word (which suffered its first tragedy with the demise of the library at Alexandria). Massive duplication through printing made text less vulnerable to loss, and gave information a tangible form. Now, however, digitisation has made information intangible again, as well as flexible and ephemeral.

Forms of communication have multiplied and diversified with the new electronic and digital networks, and only a few of these are being covered by archives and libraries so far. This may constitute an 'archival dark age', Malcolm suggests. (This, indeed, already started with earlier electronic forms - the thermal fax, early digital music recordings, etc.). The Web only intensifies this problem - and there is also a problem of overlap between it and other forms of communicaton.

Archiving Web Resources

The conference begins with a welcome from Jan Fullerton, the Director-General of the National Library of Australia. She sets the scene by noting the relevance of Web materials as yet another slice of contemporary culture which needs to be archived by national libraries - but of course the archiving of such material is complex and unprecedented, especially also because of the significant increase in the volume of material. Therefore, cooperative approaches to archiving are required.

The NLA's PANDORA archive of Web resources is now being recognised as a significant resource by UNESCO, and has been nominated for the UNESCO world register for the memory of the world. This is a significant achievement and points to the significance of what the NLA has already managed to do. Other national libraries around the world are also involved in Web archiving now, of course, with often some very different approaches to the process - this variety is interesting in itself. Now, of course, there is the International Internet Preservation Consortium (IIPC), which aims to interoperate such approaches.

Peace in Canberra

Mower MagicWell, here I am at the National Library in Canberra - and the first thing that greeted me is a large peace sign mowed into the lawns across Lake Burley Griffin from Parliament House. Nice work.

Pages

Subscribe to RSS - Archiving Web Resources 2004