Skip to main content
Home
Snurblog — Axel Bruns

Main navigation

  • Home
  • Information
  • Blog
  • Research
  • Publications
  • Presentations
  • Press
  • Creative
  • Search Site

Towards an LLM-Enhanced Pipeline for Better Stance Detection in News Content

Snurb — Saturday 2 November 2024 22:28
Politics | Polarisation | Journalism | Industrial Journalism | Internet Technologies | 'Big Data' | Artificial Intelligence | Dynamics of Partisanship and Polarisation in Online Public Debate (ARC Laureate Fellowship) | AoIR 2024 |

The next speaker in this session at the AoIR 2024 conference is my QUT colleague Tariq Choucair, whose focus is especially on the use of LLMs in stance detection in news content. A stance is a public act by a social actors, achieved dialogically through communication, which evaluates objects, positions the self and other subjects, and aligns with other subjects within a sociocultural field.

Here, the focus is broadly on stances towards issues, persons, groups, and organisations. There are some tools for doing so, but they mainly focus on English-language content, are designed for specific types of data, and tend to work best for a narrow range of clearly identifiable targets. Stances towards ambivalent or ambiguous classes of targets are much harder to detect.

Large Language Models can help here: they work across a broader range of languages, even though they still have an English-language bias, and their use requires less computational knowledge; however, they may also exhibit harmful biases, have unclear privacy and safety features, and come with concerns about environmental and community impacts, ownership, authorship, reliability, and validity.

Tariq’s work explored the use of these models in the context of Facebook posts during election campaigns across four countries, as well as during the Voice to Parliament campaign in Australia. This requires fine-tuning for these use cases, and the processes for such fine-tuning remain rather unclear at this stage. Key principles here may be to challenge whether very large language models are necessary here, whether languages are appropriately covered, and what models apply in this analysis.

Sometimes it might be best to begin by defining interpretative goals rather than concrete tasks; disagreements both between humans and LLMs, but also amongst human coders themselves should also be recognised. Such work might not always be done best using random samples; the use of purposive samples might be more effective. This should be tested using multiple LLMs at first, narrowing down later towards the best-performing options. All of this should be done through an iterative and reflexive process.

Fine-tuning of models generally leads to better results, this is true also for smaller LLMs other than those offered by OpenAI; in languages like Portuguese, OpenAI models still perform much better than smaller LLMs, however.

It is especially tricky to deal with unclear and ambiguous cases, which humans might classify as showing no clear stance towards a specific group; LLMs might bring further context to the question, and end up making clearer but not always correct selections. But we should also be cautious about always taking human-coded data as the ground truth, especially if we do not also consider potential biases in the composition of the coding team.

  • 220 views
INFORMATION
BLOG
RESEARCH
PUBLICATIONS
PRESENTATIONS
PRESS
CREATIVE

Recent Work

Presentations and Talks

Beyond Interaction Networks: An Introduction to Practice Mapping (ACSPRI 2024)

» more

Books, Papers, Articles

Untangling the Furball: A Practice Mapping Approach to the Analysis of Multimodal Interactions in Social Networks (Social Media + Society)

» more

Opinion and Press

Inside the Moral Panic at Australia's 'First of Its Kind' Summit about Kids on Social Media (Crikey)

» more

Creative Work

Brightest before Dawn (CD, 2011)

» more

Lecture Series


Gatewatching and News Curation: The Lecture Series

Bluesky profile

Mastodon profile

Queensland University of Technology (QUT) profile

Google Scholar profile

Mixcloud profile

[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence]

Except where otherwise noted, this work is licensed under a Creative Commons BY-NC-SA 4.0 Licence.