Reviewing the Performance of Automated Incivility Classifiers

Snurb — Friday 23 February 2024 19:56

The next speaker in this I-POLHYS 2024 session is Patrícia Rossini, who is also focussing on incivility. She begins by noting that this is a feature, and not a bug, of social media, and that conventional empirical research into incivility on social media tends to examine blatant forms (name-calling, profanity) rather than implementing more sophisticated perspectives.

Off-the-shelf solutions for such research like the Google Perspective API also tend to implement these fairly generic ideas, and often produce a merely binary score that shows whether incivility is or is not present. Such tools are often trained for industry rather than scholarly purposes, but are used as shortcuts in research projects; we need to carefully consider their utility in scholarly contexts and should not take their results at face value.

With such oversimplified approaches the concept of incivility loses meaning, and their dictionary approaches tend to emphasise ‘bad language’ rather than more complex forms of toxicity. Patrícia tested their performance against a human-coded, gold-standard dataset of uncivil interactions in order to evaluate their performance and focus more directly on inherently harmful forms of incivility, such as intolerance.

The project, funded by Twitter, was able to sample from the full Twitter firehose of tweets, and selected for tweets related to immigration discussions; the training data of over 18,000 US and 24,000 UK tweets were coded by undergraduate students, and also ran them through a toxicity classifier. It also produced a longitudinal analysis of some 5 million tweets over time.

Profanity and insults produced the highest toxicity scores from automated classifiers, while more sophisticated forms of incivility (discrimination, hostility, character assassinations) were not rated highly by automated classifiers. The longitudinal study using newly trained and more sophisticated classifiers showed that some 75% of all tweets were not uncivil, while both incivility and intolerance made up the remainder of tweets.

This shows that simplistic measures of toxicity tend to fail to pick up on actually harmful content. Existing approaches mainly pick up on profanity and insults, but it is not clear that these forms of incivility actually harm democracies; there is a need to zoom in much more purposefully on actually harmful forms of uncivil online discourses. The recent emergence of Large Language Models provides a significant opportunity to do so – but this needs to be done carefully so they do not just reproduce existing biases about what is considered to be uncivil. Additionally, we must also move beyond studies of Twitter, as well as beyond a focus on textual content only – this research needs to extend to a focus on visual and audiovisual materials.

740 views