Looking for Datasets to Train Moderation Tools

Snurb — Thursday 12 June 2025 22:17

'Big Data' | Social Media | Bots Building Bridges 2025 | Liveblog |

The final session of the Bots Building Bridges workshop project in Bielefeld starts with Gabriella Lapesa and Julia Romberg, whose focus is on e-deliberation as a digitally augmented version of direct democracy. This is said to have huge potential, but as yet does not scale up effectively: quality declines as scale increases. One solution to this is moderation, but human moderation is time-consuming and therefore costly.

Moderators must first decide whether an action must be taken to intervene in the discussion, and if so what type of intervention – policing, quality control, fact-checking, mediation, new idea introductions, summarisation – should happen, and how. The moderator then also needs to monitor the outcome of the intervention, and potentially intervene again in the future.

These are initially evaluation and classification tasks, followed by a content generation task. It may be possible to support these evaluation tasks by using Natural Language Processing tools, but the available signals from online discussions are noisy because they are free-form and human-generated; focussing on moderator actions is also insufficient because an absence of moderation actions does not necessarily mean that such actions were not necessary. Training data will also be very domain-specific, and judgments about appropriate and successful interventions are necessarily subjective.

How do we proceed here, then? First, it is possible to draw on annotated datasets of argument and deliberation quality to assess ‘good’ interventions, and connect this with argument quality and deliberative quality theories. Such theories might highlight logical, rhetorical, and dialectic dimensions, but do not always agree on which attributes are desirable (e.g., at the rhetoric level, persuasive argumentation vs. personal narratives).

An application of these ideas shows that moderators react both to low-quality and high-quality comments, so mere quality ratings are insufficient; composite assessments of various comment attributes are more useful for predicting which comments require moderation.

Discourse quality assessment remains highly subjective, however, and is often related to our stance towards a comment’s argument, which in turn may also be influenced by our own personal attributes, experiences, and positionality. Standard NLP approaches tend to take on a majority perspective to generate a single assessment label; a more useful approach here might be to generate multiple competing labels by training classifiers on non-aggregated datasets representing diverse human perspectives; these might then be used differently in different contexts (e.g. for moderating different groups and themes).

Only a minority of widely accepted training datasets are available in non-aggregated form, however, and may not be entirely reliable. In this case, however, reliability cannot be measured by inter-annotator agreement, of course, which means that the assessment of reliability remains problematic.

This means that we need more datasets to work with. These could be generated by identifying cases where users act like moderators although they do not have any special moderation powers (e.g. by asking others for further detail), for example; reliable annotation of such user-led moderation actions remains complicated by annotators’ own subjectivity, however.

51 views