Internet Content Preservation

Snurb — Thursday 19 March 2026 21:28

A Process for Creating Derivative Archival Datasets from Sensitive Hacktivist Content

Internet Content Preservation | Social Media Access Days 2026 | Liveblog |

The final session at the Social Media Access Days at the German National Library focusses on the question of systemic risks, which is a key criterion for the approval of DSA data access requests under EU regulations. We start with Hanna Gawel, whose interest is in archiving hacktivism as a part of digital heritage. Hacktivism may be regarded as a high-risk activity, and therefore requires particular care from archivists.

The content to be archived here might include screenshots of defacements, leaked manifestos, memes, protest videos, and various other forms of often very ephemeral materials; there is a need to create …

» continue reading...

Snurb — Thursday 19 March 2026 20:23

Developing the Comprehensive TeleScope Dataset of Public Telegram Content

The final speaker in this session at the Social Media Access Days at the German National Library is Susmita Gangopadhyay, presenting a project that has engaged in a continuous crawl of Telegram’s public channels. Telegram is a platform that has grown substantially in recent years, with some 950 million users.

The platform has an API which can be used to gather data from the platform, and this tends to focus on groups (which are many-to-many, may be public or private, and have distinct administrators) and channels (which are one-to-many only, with named administrators). Otherwise there are some functional similarities between …

» continue reading...

Snurb — Thursday 19 March 2026 18:54

Towards Better Metadata Profiles for Social Media Platforms

The third and final day at the Social Media Access Days at the German National Library starts with a paper by Alexander König, whose focus is on metadata for social media data. Creating quality metadata is critical for the appropriate storage and reuse of datasets, and especially so for social media dataset as the conditions of the data gathering and the state of the platform at the time of gathering also need to be captured in the metadata.

The CLARIN Knowledge Centre for Computer-Mediated Communication and Social Media Corporate (CKCMC) is currently developing a metadata standard for such data, incorporating …

» continue reading...

Snurb — Wednesday 18 March 2026 23:27

How Declining Social Media Data Access Affects National Memory Institutions

And the next speaker at the Social Media Access Days at the German National Library is Beatrice Cannelli, whose interest is in how national memory institutions’ social media archiving initiatives have been affected by changing data access regimes. Such activities are affected by national legal frameworks, available resources, collection policies and scope, technical limitations, and the Terms of Service of the various platforms.

The latter are justified by user privacy concerns and the protection of sensitive information, but in practice mostly protect the platforms’ own business interests. How these are formulated influences the extent to which content from such platforms …

» continue reading...

Snurb — Wednesday 18 March 2026 02:11

What to Do with an Author’s Data Donation of His Twitter History?

The next speakers in this session at the Social Media Access Days at the German National Library are Gabriel Viehhauser and Carl Friedrich Haak, whose interest is in making use of donated social media data – the concrete context here is that the Austrian author Clemens J. Setz, who has at times posted some of his short-form work on Twitter, has donated his archive of tweets to a library in Vienna, which was unsure about what to do with this gift.

Such work is diverse in its formats; further, Setz is author, but also interlocutor, curator, recipient, object of mentions …

» continue reading...

Snurb — Wednesday 18 March 2026 01:41

Ensuring the Long-Term Archival of Scholarly Blogs in Germany

Blogs and Blogging | Internet Content Preservation | Social Media Access Days 2026 | Liveblog |

The next speaker in this session at the Social Media Access Days at the German National Library is Catharina Ochsner, whose focus is on the archiving of scholarly blogs. Such blogs are engaged in science communication and thereby introduce more transparency into the scientific process; they exist in many different formats and across various major and minor platforms, and frequently link to each other and to other external resources.

But their long-term availability is limited, and depends on the blogger’s continued activity. There is a need for long-term archival of such resources in their original form, which also implies a …

» continue reading...

Snurb — Sunday 28 October 2018 22:51

Towards Better Frameworks for Social Media Data Archiving

'Big Data' | Social Media | Internet Content Preservation | iCS 2018 |

The final keynote speaker at this iCS Symposium today is the wonderful Katrin Weller, whose focus is on what we do with social media research data: datasets that have been collected by researchers and have already been utilised in scholarly analysis. How are such datasets shared on and archived by these researchers? Sharing here means directly passing these datasets on for use by others, while archiving preserves them for potential future uses. Both practices potentially advance reproducibility and comparability, reduce digital divides in data accessibility between researchers and research groups, and save time and money in data collection; they are …

» continue reading...

Snurb — Wednesday 5 October 2016 17:56

Situating Digital Methods

Our Digital Methods pre-conference workshop at AoIR 2016, combining presenters from the Digital Methods Initiative at the University of Amsterdam and the Digital Media Research Centre at Queensland University of Technology starts with a presentation by Richard Rogers on the recent history of digital methods. He points out the gradual transition from a conceptualisation of the Internet and the Web as cyberspace or as a virtual space to an understanding of the Web as inherently linked with the 'real' world: online rather than offline becomes the baseline, and there is an increasing sense of online groundedness.

In the process …

» continue reading...

Snurb — Tuesday 24 May 2016 01:13

Twitter as a First Draft of the Present

And the final paper in this Web Science 2016 session is by Katrin Weller and me. Below are our slides and abstract, and a link to the full paper:

Twitter as a First Draft of the Present – and the Challenges of Preserving It for the Future from Axel Bruns

This paper provides a framework for understanding Twitter as a historical source. We address digital humanities scholars to enable the transfer of concepts from traditional source criticism to new media formats, and to encourage the preservation of Twitter as a cultural artefact. Twitter has established itself as a key social …

» continue reading...

Snurb — Tuesday 24 May 2016 01:11

Non-Content Features of Material in the Internet Archive

Internet Content Preservation | WebSci '16 |

The third presenter in this Web Science 2016 session is Tu Ngoc Nguyen, who reintroduces us to the Internet Archive's Wayback Machine. This is a useful service, but searching it is not necessarily straightforward. Is it possible to draw on the non-content features to improve search results?

The project drew on the full archive for the German Web, and utilised a number of assessment techniques to assess and rank documents based on twenty non-content features. I'm frankly unable to understand the numerical data presented in the tables here, but from what I do understand the use of these additional …

» continue reading...

Internet Content Preservation