Testing the Effectiveness of Counterspeech to Mis- and Disinformation

Snurb — Wednesday 11 June 2025 22:49

The final speaker in this session at the workshop of the Bots Building Bridges project is Holger Heppner, whose focus in on counterspeech to problematic information. Counterspeech techniques include behavioural (referencing social norms and warning of the consequences of breaking them), emotional (empathy, humour, retaliation), and cognitive approaches (debating, and pointing out inconsistencies); in addition, there are also more mechanical approaches like direct regulation and indirect interventions like downvoting or flagging problematic content.

Picking some of these options – highlighting inappropriateness (behavioural), evoking compassion for targets (emotional), and presenting additional facts (cognitive) –, Holger then tested the effectiveness of counterspeech in reducing the potential harm of toxic comments; this was operationalised as a ‘harm score’ for toxic comments as elicited from participants, and points to the most effective techniques.

Each of the three selected approaches generally produced a small but significant reduction in the harm score, yet the patterns remain uneven and the effect also depends on the nature of the original toxic comment.

The generation of such counterspeech may also be able to be automated via Large Language Models, but in that case the question of whether the counterspeech is flagged as computationally generated is also important; as it turns out, the source of the counterspeech does not seem to make any difference. The automation of counterspeech also holds other potentially problematic aspects, however.

30 views