Challenges in Building Moderation Bots | Snurblog

Snurb — Thursday 12 June 2025 22:18

Artificial Intelligence | Social Media | Bots Building Bridges 2025 | Liveblog |

The next speakers at the Bots Building Bridges project workshop are Zlata Kikteva and Arthur Romazanov, representing the DeLab (or Deliberation Laboratory) team at the University of Passau. Their team has developed a bot to take on moderation tasks.

This builds on research by members of the team on how humans moderate online discussions, which has explored key moderation strategies – soft moderation such as probing for elaborations, tone policing, social norm policing, agenda control, fact-checking, inviting experts top contribute; as well as hard moderation such as removing content and users from the discussion.

But can we delegate such moderation effectively to bots? Such moderation might need diverse background and contextual knowledge on the discussion topic, the nature of the community, and the specific participants involved in order to determine appropriate courses of action; could bots ever model these human attributes?

At the same time, automated moderation remains desirable since human moderation is labour-intensive, expensive, challenging for large content volumes, not available around the clock, and especially difficult to sustain for small platforms; human moderators can also be perceived to be more biased than automated systems.

This project conducted ethnographic field work on the small-scale FoodLog network as well as a self-moderation study on Xitter and intervention studies on Reddit and Mastodon; this produced some 211 conversation samples with some 41% ‘true’ labels.

From this, it was possible to identify nine features that could be operationalised in the making of moderation decisions; these features are not all necessarily equivalent to each other (e.g. the detection of hate speech would always lead to an intervention, irrespective of other features), but a combination of these assessments can then be used to determine whether a given comment reaches a set threshold for intervention.

If the threshold is met, this triggers a prompt for a Large Language Model to produce the content of the intervention in response to the preceding conversation.

38 views