Auditing the Responses of Generative AI Chatbots to Conspiracist Questioning

Snurb — Friday 28 November 2025 14:33

The next speaker in this session at the AANZCA 2025 conference is again my QUT colleague Kate FitzGerald, this time presenting our research into how generative AI chatbots respond to queries about conspiracy theories. We have already seen how engagement with such chatbots can create harm, and it is important to examine what safety guardrails are in place to prevent chatbots from supporting conspiracy theories.

We examined this by assuming the persona of a casually curious chatbot user, asking a series of questions related to various such conspiracy theories. These include historical stories such as the assassination of John F. Kennedy and claims about Barack Obama’s birth certificate, as well as more recent claims about the origins of Hurricane Milton or the 2024 assassination attempt on Donald Trump. None of these claims are true, of course.

We designed between 5 and 15 questions for each of the nine conspiracy theories, and queried seven chatbots during November 2024. We coded their responses across several categories, classified across neutral, constructive, and problematic groups – from merely describing the conspiracy theory to pushing back or even supporting it.

Responses to these queries varied widely. For recent events, several of the chatbots refused to engage, both because of their connection with electoral matters and because of the recency of the information; Grok Mini 2 “Fun Mode”, however, consistently made fun of such stories rather than pushing back. It made by far the most problematic responses, including bothsidesing and encouragement to further engage with conspiracist claims. Perplexity, by contrast, responded most reliably to these queries by countering with factual statements and providing verified sources.

Some guardrails appear to be in place, perhaps especially for queries related to racist conspiracy theories and to issues of national trauma such as 9/11. Grok, especially in its seriously unfunny “Fun Mode” is the most problematic performer by far. There are also significant variations between the conspiracy theories, however, with older conspiracy theories being treated by the chatbots as less problematic.

How do we want chatbots to respond, though? Gemini’s persistent pattern of avoiding controversy is perhaps not helpful either, as it leaves the space open for conspiratorial content to circulate unchecked; however, hard pushback against conspiracist ideation might also backfire by further entrenching the beliefs of the already curious.

There is also an need to look beyond this limited study and explore chatbot performance across a wider range of languages other than English, and for a broader range of conspiracy theories. Fewer guardrails may be in place for these contexts. But such audits are time-consuming, and need to be repeated frequently – this cannot be the job of scholarly researchers alone, but needs broader buy-in from other independent, critical, public-interest watchdog organisations.

89 views