The final iCS Symposium session continues the bot theme with a presentation by Pascal Jürgens. Pascal begins by outlining our current dilemma: threats of communicative manipulation via social media are rising, yet our access to the platform data we need to understand these activities is declining. But we may be able to address this dilemma by employing new and different methodologies.
Interestingly, in Germany there are now moves to create a law that requires bots to be labelled – yet this is unlikely to be effective unless there can be a clear definition of bots in the first place, and some agreement about what constitutes a harmful bot. From one perspective, bots as well as the manual production of manipulative content are problematic because they attempt to bypass conventional political advertising systems and regulations.
If we imagine how human-like bots may be able to become, and how bot-like a human may be able to become, it seems increasingly difficult to tell bots from humans. Low-activity bots may appear like humans, for instance, while high-activity humans may appear like bots (and may indeed operate in a bot-like capacity). Machine learning techniques tend to be trained only for very specific applications, and bot detection methods also tend to be platform-specific.
A new approach to the detection of bot commenting on Facebooks may be locality-sensitive hashing, which generates local hash values across different parts of the same comment and then compares the similarity of these hashes across different comments. If the similarity rises above a certain threshold, the comments can be considered (near-)duplicates of each other.
The study found a number of significant clusters of near-duplicate comments in German Facebook pages; in some cases these are concentrated in single Facebook pages, while even more suspiciously others are distributed across a substantial number of pages. In total, the study detected nearly 5,000 bot(-like) accounts that posted nearly 300,000 comments.
The same method could also be applied to data from many different social media platforms, and the longer comment texts thus identified could then be traced through further forensic analysis to their possible place of origin.