You are here

Using LLMs to Assess Bullying in the Australian Parliament?

The next speaker in this ACSPRI 2024 conference session is Sair Buckle, whose interest is in the use of Large Language Models to detect bullying language in organisational contexts. Bullying is of course a major societal problem, including in companies, and presents a psychosocial hazard: there are several proposed approaches to address it, including surveys and interviews and manual linguistic classification (e.g. in federal parliament), which are subjective and manually intensive; pulse surveys and self-labelling questionnaires (e.g. in companies), which are subjective and limited in their data access; and there are technology-first approaches using LLMs and machine learning to detect patters (e.g. in social media platforms), which are computationally intensive and trained on open source data.

The problem with the latter is that LLMs trained on social media posts will not necessarily detect forms of bullying as they might exist in the workplace or in other contexts, which are likely to use different language and communicative strategies. Such LLMs might serve as a foundation, but need to be substantially adjusted to become useful in other contexts.

Sair’s project worked with Hansard transcripts from question time in the Australian federal parliament, and had them annotated by 2-3 clinical psychologists from diverse backgrounds in order to detect cases of bullying as defined by SafeWork standards. Some 18% of cases contained bullying language, and these were then compared to zero-shot coding of the same content by the LLM tools RoBERTa and Llama3.

The results were very limited: in addition to true positives and true negatives, there were also many false positives and false negatives. RoBERTa’s misclassifications appeared to be sentiment-based; Llama3’s errors did not seem to follow any particular pattern. LLMs clearly needed more training, then, and this needs to be the next step in the process.