And the lucky last speaker in the final session at the Social Media Access Days at the German National Library in Frankfurt is the excellent Giada Marino, whose interest is the operation of data access provisions under DSA Article 40.12. The focus here is on very large online marketplace platforms and the systemic risks they pose to minors.
Article 40.12 addresses access to publicly available data from Very Large Online Platforms and Search Engines; this is coordinated across EU jurisdictions by the Expert Group on Access to Publicly Available Data (ECAT). The present project focusses on marketplace platforms such as …
The final session at the Social Media Access Days at the German National Library focusses on the question of systemic risks, which is a key criterion for the approval of DSA data access requests under EU regulations. We start with Hanna Gawel, whose interest is in archiving hacktivism as a part of digital heritage. Hacktivism may be regarded as a high-risk activity, and therefore requires particular care from archivists.
The content to be archived here might include screenshots of defacements, leaked manifestos, memes, protest videos, and various other forms of often very ephemeral materials; there is a need to create …
The final speaker in this session at the Social Media Access Days at the German National Library is Susmita Gangopadhyay, presenting a project that has engaged in a continuous crawl of Telegram’s public channels. Telegram is a platform that has grown substantially in recent years, with some 950 million users.
The platform has an API which can be used to gather data from the platform, and this tends to focus on groups (which are many-to-many, may be public or private, and have distinct administrators) and channels (which are one-to-many only, with named administrators). Otherwise there are some functional similarities between …
Up next at the Social Media Access Days at the German National Library is Ramin Soleymani, whose focus is on the analysis of social media content from a social ecological perspective. The use of social media has enabled digital nature experiences, somewhat making up for an overall decline in the direct nature experiences of an increasingly urban population; this might lead to the emergence of digital relational values including seeding, spreading, and grounding.
Relational values are a relatively new concept in the science policy arena; they complement nature’s instrumental (extractive) and intrinsic (inherent) values. Relational values refer to the relationships …
The next speaker in the morning session at the Social Media Access Days at the German National Library is Veronica Batzendorfer, whose interest is in the Xitter-adjacent Grokipedia project as an ‘alternative’ to Wikipedia. Specifically, her focus is on semantic drift across Grokipedia’s versions, starting with material drawn in part from Wikipedia and then rewritten by LLMs.
Such semantic drift can be examined through a geometric framework. This works with two points in time and explores changes between them, across Wikipedia and Grokipedia; Wikipedia content serves as the training data for the LLM which generates articles for the Grokipedia, so …
The third and final day at the Social Media Access Days at the German National Library starts with a paper by Alexander König, whose focus is on metadata for social media data. Creating quality metadata is critical for the appropriate storage and reuse of datasets, and especially so for social media dataset as the conditions of the data gathering and the state of the platform at the time of gathering also need to be captured in the metadata.
The CLARIN Knowledge Centre for Computer-Mediated Communication and Social Media Corporate (CKCMC) is currently developing a metadata standard for such data, incorporating …
The final speakers at the Social Media Access Days at the German National Library for today are Oliver Watteler and Jan Schwalbach, whose interest is in the legal conditions for sharing platform data; platforms’ developer policies and Terms of Service are in constant flux, so it is important to keep track of how they evolve over time.
Researchers often have a strong interest in sharing the datasets they have collected with others; data sharing aids replicability, speeds up the research process, and enables new work. But researchers are rarely aware of the frameworks the platforms have imposed on such sharing …
And the next speaker at the Social Media Access Days at the German National Library is Beatrice Cannelli, whose interest is in how national memory institutions’ social media archiving initiatives have been affected by changing data access regimes. Such activities are affected by national legal frameworks, available resources, collection policies and scope, technical limitations, and the Terms of Service of the various platforms.
The latter are justified by user privacy concerns and the protection of sensitive information, but in practice mostly protect the platforms’ own business interests. How these are formulated influences the extent to which content from such platforms …
The post-lunch session at the Social Media Access Days at the German National Library starts with LK Seiling and Sophia Graf, who discuss the Weizenbaum-Institut’s DSA40 Collaboratory project. The EU’s Digital Services Act provides for research access to public and non-public data via its articles 40(12) and 40(4), and in both cases this is limited to research that investigates what is called ‘systemic risks’, and to Very Large Online Platforms which serve at least 10% of the EU population, which translates to 45 million users.
If platforms are found to have failed to provide such access, the EU can (and …
The next speaker at the Social Media Access Days at the German National Library is Robert Jäschke. He begins by noting the legal constraints on social media data sharing, including Terms of Service, copyright, and other restrictions. One approach to managing this is the way Twitter approached this: sharing datasets with lists of tweet IDs without any further content was allowed, and researchers then needed to ‘rehydrate’ them by regathering the tweet data. Another approach is to share only aggregate metrics rather than the source data themselves; or to share derived datasets (like term matrices, n-gram datasets, or word embeddings) …