The next speaker in this session at the AoIR 2024 conference is my QUT colleague Tariq Choucair, whose focus is especially on the use of LLMs in stance detection in news content. A stance is a public act by a social actors, achieved dialogically through communication, which evaluates objects, positions the self and other subjects, and aligns with other subjects within a sociocultural field.
Here, the focus is broadly on stances towards issues, persons, groups, and organisations. There are some tools for doing so, but they mainly focus on English-language content, are designed for specific types of data, and tend to work best for a narrow range of clearly identifiable targets. Stances towards ambivalent or ambiguous classes of targets are much harder to detect.
Large Language Models can help here: they work across a broader range of languages, even though they still have an English-language bias, and their use requires less computational knowledge; however, they may also exhibit harmful biases, have unclear privacy and safety features, and come with concerns about environmental and community impacts, ownership, authorship, reliability, and validity.
Tariq’s work explored the use of these models in the context of Facebook posts during election campaigns across four countries, as well as during the Voice to Parliament campaign in Australia. This requires fine-tuning for these use cases, and the processes for such fine-tuning remain rather unclear at this stage. Key principles here may be to challenge whether very large language models are necessary here, whether languages are appropriately covered, and what models apply in this analysis.
Sometimes it might be best to begin by defining interpretative goals rather than concrete tasks; disagreements both between humans and LLMs, but also amongst human coders themselves should also be recognised. Such work might not always be done best using random samples; the use of purposive samples might be more effective. This should be tested using multiple LLMs at first, narrowing down later towards the best-performing options. All of this should be done through an iterative and reflexive process.
Fine-tuning of models generally leads to better results, this is true also for smaller LLMs other than those offered by OpenAI; in languages like Portuguese, OpenAI models still perform much better than smaller LLMs, however.
It is especially tricky to deal with unclear and ambiguous cases, which humans might classify as showing no clear stance towards a specific group; LLMs might bring further context to the question, and end up making clearer but not always correct selections. But we should also be cautious about always taking human-coded data as the ground truth, especially if we do not also consider potential biases in the composition of the coding team.