Paul Koerbin from the National Library of Australia now speaks on issues of how to select content for an archive of Internet content (and whether selection is even necessary given that there are wholesale archivists like Internet Archive). The NLA has a specific statute which leads its PANDORA project towards archiving selected content, however.
How do we know what is going to be of research value in the future, however? The NLA has set up some parameters which include an interest in original and substantial information (but again, one could ask 'for whom?', of course). Significance and a potential for future research are the key prerequisites for archiving; and further usability and accuracy of archiving are also important, of course.
Because of limitations of funding and time, priorities need to be set, of course. These might lead to missing out some important new forms before they become mainstream and generally accepted - there is no way of knowing 'what's round the corner', and especially when events change the course of what takes place on the Web (Australian elections, acts of terror, etc.) the archiving system needs to be flexible to account for content which is created around them.
PANDORA has a broad definition of 'publications' - anything that is online is defined as published. But then how do you identify specific content; PANDORA may choose only to archive particular significant subsets of content. Such choices are made by a small number of staff working for the project, which could introduce bias or overlook areas for which there is no existing expertise; the NLA therefore engages various scholarly indexing agencies (which again leads to covering potentially small areas of scholarship only).
It also has partnered with other agencies: the mainland state libraries, ScreenSound Australia, and the Australian Aboriginal and Torres Strait Islander Studies Institute. Then of course it becomes important that these agencies work in ways which are compatible with the Library's objectives.
PANDORA is generally driven by content - only available online, and of substance. It used to worry about identifying exactly whether there were print versions of the content, but has moved away from this as status of some online materials was very difficult to ascertain.
The problem with cataloguing Websites is that it decontextualises them from their original Web environment (perhaps this is particularly critical for the blog universe?). Decisions made today will of course affect the future collection, so they are crucial.
In response, Julien Masanés points out the various approaches to archiving by national libraries around the world. The NLA clearly uses a selective approach; the Royal Library of Sweden, on the other hand, tries to archive as much as possible on its national (.se) top-level domain. Neither of these is necessarily better; both raise their own problematic questions.