You are here

A Framework for Data Donations from YouTube Users

The second day of our P³: Power, Propaganda, Polarisation ICA 2024 postconference focusses on research methods, and starts with a presentation by the excellent Jessica Gabriele Walter. Her focus is on YouTube data donations. Conventional social media data access has been via platform APIs and third-party platform initiatives like Social Science One; an alternative to this are user-centric approaches like browser tracking or data donation, which is growing in prominence.

Such work is important because of the increase in media fragmentation and the growing importance of social media especially during times of crisis; but data access has been declining markedly at least since the Cambridge Analytica scandal. This has led researchers to explore alternative options for data access: collaborating with users through data donations; collaborating with other societal stakeholders (journalists, civil society organisations); and continuing to argue for greater data access requirements in public policy-making.

Data donations are built on collaborations with users, and can draw for instance on the facilities that allow individual users to download their own data from social platforms; such services are offered – if sometimes grudgingly and in a very user-unfriendly way – by various platforms including Facebook and YouTube. Such data donations involve four major steps: users must be recruited to the research project; researchers must instruct them about how to access their data from the platform; users must then actually follow those instructions; and finally users must donate these data to researchers.

The result, if all steps are followed, are relatively structured and trustworthy datasets; however, the process needs to overcome users’ privacy concerns and might lead to biased datasets if certain types of users are more likely to participate in data donations than others. Data donations are also frustrated by the long wait time between data requests and actual data access that various platforms have implemented, and the data users are provided with are not necessarily covering all user activities on a given platform.

The Social Media Influence project at the Aarhus University Datalab focusses on YouTube: it is interested in the influence of algorithms on content engagements over time, and combines a data donation component with a user survey and further data gathering on the videos that users were found to have engaged with. It involves a panel of some 1,000 Danish regular YouTube users, with engagement data and user demographics from 2019 onwards and further fieldwork planned for August 2024.

The scale of the project aims to account for some measurement errors, but may still be affected by higher attrition or non-response rates. It has been implemented in a process that involves various steps: an initial agreement form online; instructions on the data request that are implemented as screenshots and videos (and need to be designed as specific instructions for various combinations of Windows and Mac systems and major Web browsers); facilities for the upload of the data retrieved from YouTube (which captured a surprising number of videos watched: 300,000 unique videos – including ads – watched since 2009 even by an initial tester cohort of 22 participants).

This was also supported by a further survey on attitudes towards data donation processes, which provided further insights into what people are likely to participate in such studies, and what their motivations are; some 36% indicated their unwillingness to participate, mainly due to a lack of trust in scholarly research. Somewhat surprisingly, women were less likely to participate, as were older people – trust issues, a lack of belief in the importance of such research or their own contributions, privacy concerns, and the technical complexity of participation emerged as key issues.

The university ethics board also required the study to note the sensitivity of the data in the study information; this had to be expressed carefully in order not to scare away participants. Other lessons learnt from this pilot were the complexity of instructions, the challenges of working with a survey company, the greater amount of data than expected, the need to educate participants about the data donation concept, and the potential sample biases towards more trusting and compliant participants.