You are here

Making Metadata

Nice Shot?

We're on to the post-lunch session, now, and had our group photo taken as well. Tom Delsey begins by discussing issues around resource discovery and archived resources. A lot of this is connected to metadata - and importantly, given the sheer size of their Web resource collections, are archives and libraries able to sustain the creation of metadata about collected resources as they have done it in an offline context?

This really is a question of two interfaces, between users and metadata and between metadata and resources. The same applies in a non-digital environment, of course, but there the system is essentially closed, with collecting institutions setting the parameters for the metadata system. There is a two-step process: from the user into the metadata, and from the metadata into the resources. However, online these interfaces are usually not controlled by the collecting institution (but instead by ISPs, for example), and indeed users are able to access resources directly without resorting to the metadata.

Resources are highly fluid in an online context, of course, and metadata document descriptors need to deal with this. The sources of origin for content are extremely varied as well, and may apply metadata (to the extent that they do so themselves) very differently. The metadata applied by collecting institutions also need to be kept up-to-date as the live sites change. Further, the use and interpretation of metadata may also be varied from case to case. Ultimately, there is a significant question around the authentication and verification of as well as through metadata (the application of metadata through a collecting institution may be seen as authenticating the resource to which it is applied).

Some issues for metadata, then: on the resource side, resolution (there need to be persistent identifiers for identified resources) and access control (rights management information needs to be included in the metadata set); on the user side, search queries (what is the target audience: how do users discover resources - metadata structures need to be formulated for queriability), context (the field of research and related resources should be discoverable through metadata comparison across larger collections of resources) and a need to support future research interests (such as longitudinal studies of Web content and other forms of researching the Web).

On the side of the user interface, there are issues around the retrieval software used (from network search engines to locally available software), the metadata schema in place (what standards and protocols are used?) and content descriptors. Ultimately, what needs to be examined are what other sources of metadata may already be available (about resources themselves, their preservation, and the digital rights attached), what systems might maximise exposure for a collection, and how retrieval potential can be exploited (e.g. through software enhancements); additionally, too, how can the collecting institution add value to the content they gather (again, the idea of archiving as adding authenticity for the content that is gathered, and of preserving the wider context of an archived resource).

Next up is the Librarian and Archivist of Canada, Ian Wilson, who heads a relatively new institution, Library and Archives Canada. Speaking especially from an archivist perspective, he stresses the importance of maintaining integrity and context of archives - 'the past is a holistic place', and so it is important to avoid separating content (usually done for reasons of more effective preservation) where this is possible. The business of archives, he suggests, is 'shaping memory', and sometimes archivists are very much alone in the decisions they make.

Library and Archives Canada considers itself as a new knowledge institution (and neither of the two terms quite approaches what it aims to be). It is an amalgamation of the two previously separate institutions, attempting to harness the synergy which this makes possible. What the project is aiming to do is to reconstitute the memory of a nation, of a people, through its operations; it already houses a significant collection of collections (including the emails of the outgoing prime minister!). The new Digital Collections Catalytic Initiative will include both inherently digital content as well as digitised, previously analogue material; the Metadata Catalytic Initiative is to harness harmonised metadata and seamlessly deliver content across all forms (printed, audiovisual, electronic within the Library and Archives and beyond this to other collections) on this basis.

The third speaker is Brian Lavoie from the Office of Research at the Online Computer Library Center. He divides metadata into three forms: descriptive, structural, and administrative. Preservation metadata, then, as a subset of all metadata, directly support the archiving and preservation process (and also divides into the three groups). Such metadata cover provenance, authenticity, preservation activity, technical environment, and rights management. Preservation metadata need to document the technology used to access digital objects, any changes made in the archiving process, and what intellectual property rights apply to digital objects - preservation metadata, then, make digital objects self-documenting over longer time periods.

The OCLC (with other partners) has begun to develop a framework for preservation metadata, then. Further, it has now moved on to PREMIS: the development of preservation metadata implementation strategies, which both identifies a core set of key metadata elements and strategies for the implementation of such metadata elements in a wide range of existing digital content repositories. (This project is expected to conclude in December 2004.) There is therefore a focus on both internal guidance within and interoperability and between different stakeholders in this process.

Issues for this project are the question of what tools are available for the creation and application of metadata (ideally without work-intensive human mediation) - these include for example the JSTOR / Harvard Object Validation Environment and the NLNZ Metadata Extraction Tool; the economics of metadata - a need to develop economical ways of acquiring and maintaining preservation metadata (such as the PRONOM File Format Registry or RLG's Automatic Exposure), including the sharing and re-use of metadata and the capture of metadata at the most appropriate point in the information lifecycle (rather than after the fact, upon archiving); and the packaging of metadata with the digital objects they refer to - relevant ideas here are the OAIS Information Package and the Metadata Encoding and Transmission Standard (METS).

What's needed here, then, are tools that address other forms of preservation metadata and support formal preservation metadata schemata (such as PREMIS); there is also a need for a better division of labour (for example, metadata could be created already at other stages of the information lifecycle, such as the point of creation of a digital still image); and further - especially if thus there is a variety of sources for metadata - quality assurance will become increasingly important.