The Painful Process of Gaining Cross-Platform Social Media Data Access via EU Digital Services Act Mechanisms

Snurb — Thursday 19 March 2026 19:51

Up next at the Social Media Access Days at the German National Library is Ramin Soleymani, whose focus is on the analysis of social media content from a social ecological perspective. The use of social media has enabled digital nature experiences, somewhat making up for an overall decline in the direct nature experiences of an increasingly urban population; this might lead to the emergence of digital relational values including seeding, spreading, and grounding.

Relational values are a relatively new concept in the science policy arena; they complement nature’s instrumental (extractive) and intrinsic (inherent) values. Relational values refer to the relationships between people and nature, including stewardship for nature and other important aspects.

This project explores these values across six social media platforms. It seeks to estimate the prevalence of nature-related content across these social media platforms, and therefore does not proceed with convenience sampling based on hashtags and keywords (which would introduce substantial bias), but instead takes a stratified sampling approach across various timeframes (seasons, weekends, daytime, etc.).

To do so, the project applied to several Very Large Online Platforms via the EU DSA mechanism, which required the identification of a broad systemic risk; this was linked to climate change and biodiversity challenges where the DSA application forms required it. YouTube approved the application after two months, after some back-and-forth follow-up queries; X made it difficult to even find its DSA application form, and quickly sent a generic rejection response (so the project is now scraping data from the platform instead); TikTok provides a long DSA form, did not ask any follow-up questions, and approved access after three months; Meta provided a cumbersome application process which required substantial background information on all researchers and institutions, and finally approved access one year later.

This is only the start, however, since after access is granted researchers still needed to develop the data gathering process itself. Access was partly via databases, partly via APIs; data gathering client packages are only available for some platforms; embedded media content (images, videos) often still need to be gathered separately; and all of these access mechanisms may still change during the gathering process.

For YouTube this worked fairly straightforwardly; for X, a scraper was used that required ongoing supervision and maintenance; for TikTok the available client libraries were virtually unusable at first, and required forking and fixing; and for Meta the graphical user interface of the Content Library was straightforward to use but very limited in its coverage, while the Content Library API was prohibitively restrictive in its access modalities and did not allow any content export.

Each platforms required different stratified temporal sampling approaches, too: YouTube has a ‘preferred language’ filter whose functionality is unclear, for instance, and its time range filters are limited as well; X provides language and date/time filtering functionality as part of its search function; TikTok has no language filter but has a date filter and a randomisation option for search results; the Meta Content Library GUI has language and date/time filters, but requires cumbersome manual downloads whose embedded media URLs expire after some time.

The DSA thus provided a basis for accessing data, but platforms can still control which applications they approve and what data they provide; API functionality is often limited and diverges from the documentation, and changelogs are rarely available. Access to embedded media content is especially problematic still.

127 views