You are here

Internet Content Preservation

Situating Digital Methods

Our Digital Methods pre-conference workshop at AoIR 2016, combining presenters from the Digital Methods Initiative at the University of Amsterdam and the Digital Media Research Centre at Queensland University of Technology starts with a presentation by Richard Rogers on the recent history of digital methods. He points out the gradual transition from a conceptualisation of the Internet and the Web as cyberspace or as a virtual space to an understanding of the Web as inherently linked with the 'real' world: online rather than offline becomes the baseline, and there is an increasing sense of online groundedness.

Non-Content Features of Material in the Internet Archive

The third presenter in this Web Science 2016 session is Tu Ngoc Nguyen, who reintroduces us to the Internet Archive's Wayback Machine. This is a useful service, but searching it is not necessarily straightforward. Is it possible to draw on the non-content features to improve search results?

New Publications, and Coming Attractions

I’m delighted to share a couple of new publications written with my esteemed colleagues in the QUT Digital Media Research Centre – and as if we weren’t working on enough research projects already, this year is about to get an awful lot busier soon, too. First, though, to the latest articles:

Axel Bruns, Brenda Moon, Avijit Paul, and Felix Münch. “Towards a Typology of Hashtag Publics: A Large-Scale Comparative Study of User Engagement across Trending Topics.Communication Research and Practice 2.1 (2016): 20-46.

This article, in a great special issue of Communication Research and Practice on digital media research methods that was edited by my former PhD student Jonathon Hutchinson, updates my previous work with Stefan Stieglitz that explored some key metrics for a broad range of hashtag datasets and identified some possible types of hashtags using those metrics. In this new work, we find that the patterns we documented then still hold today, and add some further pointers towards other types of hashtags. We’re particularly thankful to our colleagues Jan Schmidt, Fabio Giglietto, Steven McDermott, Till Keyling, Xi Cui, Steffen Lemke, Isabella Peters, Athanasios Mazarakis, Yu-Chung Cheng, and Pailin Chen, who contributed some of their own datasets to our analysis.

Folker Hanusch and Axel Bruns. “Journalistic Branding on Twitter: A Representative Study of Australian Journalists’ Profile Descriptions.Digital Journalism (2016).

Archiving Our Personal Digital Milieux

The final presenter in this morning session at "Compromised Data" is Yuk Hui, who will present a social media self-archiving project. He has worked for years on audiovisual archives, but much of the work on this field has focussed on institutional rather than personal archives, with the latter often concerned mainly with privacy issues.

But another set of problems relates to data management instead: we are working with multiple cloud-based systems, but rarely archive our digital objects effectively - archiving is not just about storing, but about preserving the context of digital objects as well: the digital milieu.

The Challenges of Mapping Archival Web Content

The next speaker in this AoIR 2012 panel is Niels Brügger, who steps back from online social networks to present some more general observations about network analysis. His specific interest is in Web historiography – how can network analysis be applied to archival Web material, then?

Approaches to Internet Content Preservation

The final speakers in this DHA 2012 session are Monica Omodei and Gordon Mohr. Monica, from the National Library of Australia, begins by pointing out the importance of Internet content as raw data for humanities research – and even when the live Web is the object of study, its ephemeral nature means that archives of Web content are absolutely crucial for verifiability and reproducibility.

Relevant examples of such research include social network research, lexicography, linguistics, network science, and political science, amongst many others. Common collection strategies to develop archives of online content include thematical and topical archiving, resource-specific archiving (e.g. audiovisual materials), broad surveys (e.g. domain-wide), exhaustive (closure crawls for a specific Web space), or frequency-based. Such captures will have input from domain experts, will operate iteratively, use registry data or trusted directories to determine what to capture, etc.

Preserving Our Memory of the First Draft of History

Rio de Janeiro.
The next keynote speaker at SBPJor is Marcos Palacios (whose speech I hear in live translation, so we’ll see how this liveblog goes…). Marcos suggests that there are hurrahs as well as uh-ohs in the transformation of journalism for the digital media environment: in the first place, as we venture into a digital environment, we learn that media have memory – that there are more uses for yesterday’s newspaper than to wrap today’s fish.

News has been called the first draft of history, of course – journalism has an input into both historiography, and into the formation of the collective memory of societies. Such journalistic memory was only able to be used in a limited fashion during the pre-digital age; today, it is much more widely and permanently available. The place of memory in journalism production is growing, therefore; memory becomes the fabric that builds the journalism we are coming to know today, and is embodied in it. This enables historical analogies and nostalgia, for example, but also has many other uses.

A Call to Action on Social Media Archiving (and More)

Briefly back in Australia, yesterday I went down to Sydney to speak at the Australian Society of Archivists’ 2011 Symposium (staged at the fabulous Luna Park venue). My paper was meant as an urgent call to action on the question of archiving public activities in social media spaces – so much material which will be of immense value to future researchers is being lost every day if we don’t get our act together very soon; we can’t wait for the lumbering beast that is the U.S. Library of Congress to do the job for us, however fulsomely they’ve promised to archive the full public Twitter firehose. The truth is, here in Australia we already have the technologies for capturing and archiving large datasets of public communication on Twitter and elsewhere – but someone with the necessary public standing and archivist expertise (the National Library, the National Archives, …) must now take the initiative; the sooner, the better.

My paper (with audio) is below:


Subscribe to RSS - Internet Content Preservation