The third speaker in this AoIR 2023 session is the excellent Fabio Giglietto, who also works with the URL shares dataset provided by Facebook via Social Science One. He also utilises the generative artificial intelligence tools now provided by OpenAI in order to examine the themes of and partisan attention to the topics circulating in discourse surrounding the 2018 and 2022 Italian election campaigns.
The URL shares dataset is centred on users’ engagement with URLs, and contains some random Gaussian noise designed to prevent the re-identifiability of users. The present project extracted the title and description of political URLs mainly viewed by Italian users within the three months before each election day (around 60,000 in 2018, and only 8,000 in 2022); political URLs were selected from the somewhat larger total datasets by using a manually coded sample dataset for fine-tuning an OpenAI-operated URL filter.
The project then used the OpenAI API’s embedding end-point to extract word embeddings that describe the texts (titles and blurbs) associated with these URLs, and used these embeddings to cluster the texts for similarity using k-means clustering, producing 27 and 24 clusters for the two election periods, respectively. Finally, Fabio’s team also used OpenAI to generate descriptive labels for each of these clusters, which required the selection of a sample of texts to be fed to the OpenAI classifier system.
This process worked remarkably well, and the results of the classification process were then merged back with the additional interaction data for these URLs that are available from the URL shares dataset. Cluster labels differ widely depending on the specific OpenAI model being used, however, and this requires further investigation.
What this produces in the end is a set of insights into the election themes that points to a range of similarities and differences across the two elections that is surprising given the substantial disruption from the COVID-19 pandemic that occurred between them. Additionally, it was also possible to investigate the click-through rates for the links being encountered: information about candidates and parties, and COVID-19 scepticism, produced particularly high click-through rates despite generally low view counts, in fact. Also, partisan attention was unevenly distributed: supporters of the ItalExit party, for instance, were especially interested in COVID-19 scepticism. There was also a surprising overall decrease in the use of Facebook between the two elections, with a strong demographic shift in views towards older users and substantial presence of antagonistic content in both elections.