You are here

How Trolls Emerge: Do Community Evaluations Generate Negativity?

Day two of Web Science 2016 begins with a keynote by Jure Leskovec, whose interest is in antisocial behaviour in social media spaces. He begins by noting that the Web has moved from a document repository or library to a social space, where users contribute content and provide feedback to each other. Platforms for this include the main social media spaces, as well as Reddit, StackOverflow, and the comment sections of news sites.

These two metaphors for the Web – as a library and as a social space – are very different from each other, especially in how users are policed and controlled. In the latter model, one user's experience is a function of other users' experiences, and the question thus becomes how to keep users engaged and promote positive, constructive behaviour – and how to police the small groups of users who disrupt the community and have a disproportionately large effect on all other users' experiences.

Such trolls sow discord online by starting arguments, upsetting people, and posting inflammatory, extraneous, or off-topic material in online communities. Jure has studied this using data from large online community spaces such as CNN, the International Gamers Network, and Breitbart, from some 1.7 million users posting 40 million comments. On CNN, some 21% of user comments were deleted, while the other sites deleted only around 2.5% of comments; user banning was used to police between 1% and 3% of accounts.

So who are these trolls, and what can be done about them – including, for instance, voting or early detection mechanisms? Prior research suggests that trolls are sociopaths: that they are antisocial even before they come to a community site. Is this true, or could anyone turn into a troll? We might instead suspect that a person's current mood affects their likelihood of trolling, and that the discussion context also affects whether someone turns to trolling.

But it is difficult to test this. Jure designed a Mechanical Turk experiment that tried, first, to establish a certain moodset for its participants, and then required them to participate in an online discussion about US politics. The mood component did indeed affect participants' moodset.

The comments left by these participants were then assessed by another set of Mechanical Turk workers, and this assessment showed that participants with negative mood were significantly more likely to write negative comments; further, participants exposed to already highly negative discussions were 60% more likely to themselves contribute negative material. In short, bad mood and contexts make people write bad posts, Jure says.

One way for community sites to address this has been to implement functionality for readers to vote other participants' contributions up or down – and to use such voting data in ordering the display of comments on a site. When users evaluate comments in this way, in essence they also evaluate the authors of such comments – does this help those authors write 'better' posts in future, then?

The first question here is how to assess up- and downvotes in this: is it the number of upvotes, the difference between up- and downvotes, or the ratio between them? Each of these metrics overlooks key aspects of the voting results – and so Jure again used crowdworking tools to assess how contributors would perceive different voting results. The result of this assessment was that the ratio between positive and negative votes – P/(P+N) – best matched the feedback perceived by the crowdworkers.

Upvotes can thus be seen as a reward, and downvotes as a punishment by the community. But feedback can also have negative effects: persistent positive feedback might lead to complacency; persistent negative feedback could lead to alienation from the community.

How might this be tested? The study identified pairs of users posting very similar content over a series of three posts, but received widely differing evaluations for their next post; it then examined the voting evaluation of the following three posts as well. What emerges from this is that positive feedback to the fourth post does not affect the future trajectory, but that negative feedback to the fourth post leads to follow-up posts that are also evaluated much more negatively: text quality drops significantly after the negative evaluation, but does not change after the positive evaluation. There is also a certain community bias effect in this: future evaluations of negatively evaluated users continue to be significantly more negative in future posts.

This may also affect a user's future posting frequency – but users with more negative feedback actually come to post more frequently than users who receive more positive feedback, potentially because they are more likely to become engaged in arguments with other participants. And negatively evaluated users also turn more negative in their evaluations of other users' posts.

This seems to be pointing towards a downward spiral in online communities: negatively evaluated users turn more negative, and also evaluate others more negatively; this may lead those users to turn more negative as well, and in turn to affect yet more others. Across the three sites Jure studied, in fact, there was a gradual increase in downvotes, negative sentiment, and other indicators over the six months of the study – and in a range of other sites, this has led to comment features being shut down.

Can such behaviours be detected automatically, and can such downward spiralling be prevented? What signals are available for this? Content, frequency, community reaction, and moderator responses might all be marshalled for this analysis, and previous user activity, community reaction, and moderator response all emerge as valuable indicators; even after a small number of posts it becomes possible to predict relatively reliably which users are likely to be banned in future. Such learning models even work relatively well across communities; a model trained on CNN data still works relatively well for IGN or Breitbart data, for instance.

One take-away from this study, then, is that trolls are not necessarily 'special people' who are predestined to disrupt community spaces; rather, mood and context may lead a broad range of users to engage in trolling. Further, (negative) community responses and evaluations may turn other participants into trolls, and this can become contagious through social feedback loops. There are many setup and design parameters in online communities that could be explored to see how different features might affect the development of these downward spirals.