Alex Halavais is next; he also presented at AoIR last year. He points out that a few years back hyperlinks rarely crossed national borders (other than into the US, I suppose), but this has been reducing over time. Language borders persist, of course, and continue to mean there is little linking across language borders.
But more centrally, Alex is talking about blogs - and for him they're the vanguard of the Web. While blogs as such might be a fad, he says, they'll become a far more widely distributed phenomenon on the Web as they evolve. Many blogs tend to have very little substance and are ephemeral; yet future historians may still be interested in such content (cf. the archiving of pulp fiction or zines as a cultural phenomenon). (A nice image here: "archiving Shakespeare's blog" would have been very interesting.) But Weblogs are impossible to archive - the modal change is once a day; there is a huge number of blogs out there and archiving them all is virtually impossible. (There is a Blog Census project, by the way, which crawls blog index pages to collect some statistics about them.)
Continually evolving projects (like wikis) will be particularly difficult to track: there won't be editions, and thus no distinct history of their field in the way that subsequent versions of dictionaries or encyclopedias track their disciplinary development.
Also, much of the material on Weblogs is redundant - they reference each other (or themselves) so that an archiving crawler might come across the same content multiple times. At the same time, because individual blogs come, go, and move, such connections frequently break - especially if the content hasn't been archived somewhere. Also, the structures of Weblogs are changeable - there is no universally accepted structure and rather there are multitudes of variations on the field; so how can they be parsed by a crawler?
Blogs are also important as technology adopters and examples of advanced concepts - semantic Web, RSS/RDF, creative commons licencing are all particularly prominent in the blogosphere. Even though they are very difficult to deal with from an archivist's perspective, they are too significant as a resource to ignore. (And blog metrics sites like Blogdex and Delicious are significant sources of information on what's hot on the Web.) Bloggers are also archivists in their own right, in a way - as annotators and commentators on content, sometimes also as preservers of content which might go out of 'print'. (There is a service called FURL which archives sites as suggested by users; also note FreeCache as a caching mechanism for multimedia content, and FreeNet as a way of developing a distributed archive.)