You are here

LLMs and Transformer Models in News Content Coding

The next speaker in this final AoIR 2024 conference session is the great Hendrik Meyer, whose interest is in detecting stances in climate change coverage. This focusses especially on climate change debates in German news media, focussing on climate protests, discussions about speed limits, and discussions about heating and heat pump regulations.

Here stances might be better understood as evaluations related to a given issue or policy, and Large Language Models can be useful tools in assessing this, but this also requires considerable prompt crafting in order to generate consistent results. Computational costs for doing so (especially with complex prompts) are high, however, and the use of transformer-based models like SetFit may at times be preferable; this keeps changing as new models are released over time, though.

Hendrik’s work proceeded by using human coders for codebook development and testing; this was then translated into an LLM prompt for GPT-4o, through an iterative process, and assessed against a human-coded dataset that retained some of the cases that were also difficult to assess for human coders. Once an acceptable level of agreement had been reached, the final prompts were then applied in a SetFit transformer model to a much larger corpus of data.

All of this depends critically on the quality of the initial step of codebook development and validation by human coders, however. There are critical questions about the ‘ground truth’ emerging from human coding, and we must recognise that human coders will often also disagree about appropriate classifications. Such tough and messy cases must be retained, rather than pressed into a single code through overzealous coder moderation.