Using LLMs to Code Problematic Content in the Brazilian Manosphere

Snurb — Saturday 2 November 2024 22:25

The second speaker in this final session at the AoIR 2024 conference is Bruna Silveira de Oliveira, whose focus is on using LLMs to study content in the Brazilian manosphere. Extremist groups in this space seek legitimisation, and the question here is whether LLMs can be used productively to analyse their posts.

This analysis focusses on some 2,500 episodes of Brazilian masculinist podcasts across ten streaming platforms. It engaged in an assisted content analysis using OpenAI’s GPT-4 model, and explored whether this could identify detailed variables in the content. The podcast episodes were transcribed using automated tools, and 52 episodes were coded by humans for a total of 18 categories covering podcasters’ targets and types of intolerance, and podcasters’ perceptions of harm against men in the spheres of love, rights, and social esteem; reliability was tested between human coders as well as between human and machine coding.

Coding quality was affected by the complexity of concepts, difficulties in coding nuanced expressions, definitions of the units of analysis, limitations in the refinement of specific claims, limitations in prompt crafting, difficulties in reaching acceptable levels of intercoder reliability, and adjustments to coding variables.

This should not stop us from paying more attention to the analysis of podcasts. In the present and other contexts, this can also impact on the mental health of researchers, and here the use of LLMs may be especially valuable, not least also in large-scale analysis projects; this is complicated by the interpretative flexibility of such content, however. Effective human-machine collaboration is critical here, therefore.

205 views