The second session at the Social Media Access Days at the German National Library begins with a paper by Mia Berg and Oliver Vettermann, whose focus is on social media data scraping, with a particular focus on TikTok. TikTok does offer an API for data access (at least in Europe), but unfortunately it remains severely limited and unreliable; this is problematic given that many user practices and content formats are in urgent need of further analysis. One example of such a content genre is AI-generated video content, such as POV videos that purport to imagine historical situations.
Manual data gathering, API access, scraping, and the use of third-party services are all current options for Web-based data gathering and will be explored here; other options include DSA-mandated data access, data donations from users, or the use of existing dataset. Scraping likely violates TikTok’s Terms of Service, however, and some available scraping tools lack transparency or have limited data quality; research API access, meanwhile, also comes with significant Terms of Service conditions and gives TikTok the right review scholarly work, and the data accessible through the API (for videos as well as comments) are limited and not necessarily reliable. Data are provided within a cleanroom environment. Scraping-based access has its own limitations; popular scraping tools like Zeeschuimer also fail to gather all visible comments, for instance.
This situation also generates considerable legal questions, and the various legal offices at research institutions often provide very different interpretations of the options available to researchers. Various services that offer research data may also operate under US ‘fair use’ legal frameworks that do not translate directly to European or other legal contexts; this is especially problematic for the context of TikTok, where so much of the research will necessarily need to examine the content of the videos itself.
Under the EU Digital Services Act, researchers have a right to access to data from Very Large Online Platforms for the analysis of systemic risks; this limits its applicability to a specific range of research interests and agendas; under the German DSGVO the persons whose activities are captured in datasets could also have a right of consent. Further, data storage can also become an issue when cloud services are being used.
There is a real need here for further discussions with scholarly legal offices, and for greater interdisciplinary collaboration. We need more initiatives to address the many remaining legal questions which exist in this field.











