You are here

Internet Content Preservation

Towards Better Frameworks for Social Media Data Archiving

The final keynote speaker at this iCS Symposium today is the wonderful Katrin Weller, whose focus is on what we do with social media research data: datasets that have been collected by researchers and have already been utilised in scholarly analysis. How are such datasets shared on and archived by these researchers? Sharing here means directly passing these datasets on for use by others, while archiving preserves them for potential future uses. Both practices potentially advance reproducibility and comparability, reduce digital divides in data accessibility between researchers and research groups, and save time and money in data collection; they are also increasingly important as the platforms lock down access to their data.

Researchers frequently lament the general absence of established data sharing and archiving protocols. These remain underdeveloped in part because of the ethical and legal challenges inherent in sharing datasets; the problems in establishing clearly defined and described archives for social media data, in the absence of universally accepted standards; the lack of search functionality for archived datasets; the diversity of the social media datasets collected using different methods and from various, continuously evolving platforms; and in some cases even a lack of motivation for researchers to share their data.

Situating Digital Methods

Our Digital Methods pre-conference workshop at AoIR 2016, combining presenters from the Digital Methods Initiative at the University of Amsterdam and the Digital Media Research Centre at Queensland University of Technology starts with a presentation by Richard Rogers on the recent history of digital methods. He points out the gradual transition from a conceptualisation of the Internet and the Web as cyberspace or as a virtual space to an understanding of the Web as inherently linked with the 'real' world: online rather than offline becomes the baseline, and there is an increasing sense of online groundedness.

Non-Content Features of Material in the Internet Archive

The third presenter in this Web Science 2016 session is Tu Ngoc Nguyen, who reintroduces us to the Internet Archive's Wayback Machine. This is a useful service, but searching it is not necessarily straightforward. Is it possible to draw on the non-content features to improve search results?

New Publications, and Coming Attractions

I’m delighted to share a couple of new publications written with my esteemed colleagues in the QUT Digital Media Research Centre – and as if we weren’t working on enough research projects already, this year is about to get an awful lot busier soon, too. First, though, to the latest articles:

Axel Bruns, Brenda Moon, Avijit Paul, and Felix Münch. “Towards a Typology of Hashtag Publics: A Large-Scale Comparative Study of User Engagement across Trending Topics.Communication Research and Practice 2.1 (2016): 20-46.

This article, in a great special issue of Communication Research and Practice on digital media research methods that was edited by my former PhD student Jonathon Hutchinson, updates my previous work with Stefan Stieglitz that explored some key metrics for a broad range of hashtag datasets and identified some possible types of hashtags using those metrics. In this new work, we find that the patterns we documented then still hold today, and add some further pointers towards other types of hashtags. We’re particularly thankful to our colleagues Jan Schmidt, Fabio Giglietto, Steven McDermott, Till Keyling, Xi Cui, Steffen Lemke, Isabella Peters, Athanasios Mazarakis, Yu-Chung Cheng, and Pailin Chen, who contributed some of their own datasets to our analysis.

Folker Hanusch and Axel Bruns. “Journalistic Branding on Twitter: A Representative Study of Australian Journalists’ Profile Descriptions.Digital Journalism (2016).

Archiving Our Personal Digital Milieux

The final presenter in this morning session at "Compromised Data" is Yuk Hui, who will present a social media self-archiving project. He has worked for years on audiovisual archives, but much of the work on this field has focussed on institutional rather than personal archives, with the latter often concerned mainly with privacy issues.

But another set of problems relates to data management instead: we are working with multiple cloud-based systems, but rarely archive our digital objects effectively - archiving is not just about storing, but about preserving the context of digital objects as well: the digital milieu.

The Challenges of Mapping Archival Web Content

The next speaker in this AoIR 2012 panel is Niels Brügger, who steps back from online social networks to present some more general observations about network analysis. His specific interest is in Web historiography – how can network analysis be applied to archival Web material, then?

Approaches to Internet Content Preservation

Canberra.
The final speakers in this DHA 2012 session are Monica Omodei and Gordon Mohr. Monica, from the National Library of Australia, begins by pointing out the importance of Internet content as raw data for humanities research – and even when the live Web is the object of study, its ephemeral nature means that archives of Web content are absolutely crucial for verifiability and reproducibility.

Relevant examples of such research include social network research, lexicography, linguistics, network science, and political science, amongst many others. Common collection strategies to develop archives of online content include thematical and topical archiving, resource-specific archiving (e.g. audiovisual materials), broad surveys (e.g. domain-wide), exhaustive (closure crawls for a specific Web space), or frequency-based. Such captures will have input from domain experts, will operate iteratively, use registry data or trusted directories to determine what to capture, etc.

Preserving Our Memory of the First Draft of History

Rio de Janeiro.
The next keynote speaker at SBPJor is Marcos Palacios (whose speech I hear in live translation, so we’ll see how this liveblog goes…). Marcos suggests that there are hurrahs as well as uh-ohs in the transformation of journalism for the digital media environment: in the first place, as we venture into a digital environment, we learn that media have memory – that there are more uses for yesterday’s newspaper than to wrap today’s fish.

News has been called the first draft of history, of course – journalism has an input into both historiography, and into the formation of the collective memory of societies. Such journalistic memory was only able to be used in a limited fashion during the pre-digital age; today, it is much more widely and permanently available. The place of memory in journalism production is growing, therefore; memory becomes the fabric that builds the journalism we are coming to know today, and is embodied in it. This enables historical analogies and nostalgia, for example, but also has many other uses.

Pages

Subscribe to RSS - Internet Content Preservation