The final speaker in this excellent opening session at I-POLHYS 2024 is the equally excellent Fabio Giglietto from the Vera.AI project, whose focus is on media political partisanship and polarisation in Italy. Especially noteworthy here is also that his project explores the use of Large Language Models (LLMs) in news and social media research – a new approach that also needs a great deal of new validation approaches.
The project focusses on the 2022 Italian election, starting with 100,000 posts from 224 Italian news media Facebook pages. These were sampled down to some 12,600 posts on political topics from 12 selected Italian news outlets, and a specially trained OpenAI LLM was in fact used already to identify whether such posts addressed politics or not.
These posts then underwent a further embedding-based topic modelling process using LLMs in order to identify common topics, and also used LLMs to label the 94 topic clusters that emerged from this topic modelling step. To do so, a sample of the posts from each topic cluster with the greatest alignment to the topic was used for the classification process; and further human intervention also combined similar emerging clusters to arrive at a final list of 56 topics.
This was then combined with Fabio’s Multi-Party Media Partisanship Attention Score (MP-MPAS) in order to determine which topics received how much attention from the supporters of what political parties in the Italian system. In the end, this produced a distinction between topics aligned with centre-right, centre-left, Five Stars Movement, and ItalExit-related partisan groups. Further analysis then also connected this with the level of positive or negative sentiment that these Facebook posts received. Campaign-related topics received mostly positive reactions, while issue-based topics received more negative reactions.
There is a lot more to do in the development, testing, and verification of these LLM-assisted techniques, of course – but there are substantial opportunities here, of course. There still are significant challenges in accessing sufficient data for such work, however – both with respect to news and social media data. And for all the justified concerns about the use of LLMs in this work, we should also be careful about uncritically using human coding as a gold standard against we measure LLM results, given the vagaries of human coding in its own right.