Skip to main content
Home
Snurblog — Axel Bruns

Main navigation

  • Home
  • Information
  • Blog
  • Research
  • Publications
  • Presentations
  • Press
  • Creative
  • Search Site

Assessing Semantic Drift in Grokipedia as Its LLM Rewrites Wikipedia Content

Snurb — Thursday 19 March 2026 19:20
Wikipedia | Internet Technologies | Artificial Intelligence | Social Media Access Days 2026 | Liveblog |

The next speaker in the morning session at the Social Media Access Days at the German National Library is Veronica Batzendorfer, whose interest is in the Xitter-adjacent Grokipedia project as an ‘alternative’ to Wikipedia. Specifically, her focus is on semantic drift across Grokipedia’s versions, starting with material drawn in part from Wikipedia and then rewritten by LLMs.

Such semantic drift can be examined through a geometric framework. This works with two points in time and explores changes between them, across Wikipedia and Grokipedia; Wikipedia content serves as the training data for the LLM which generates articles for the Grokipedia, so how similar or different are the input and output articles on the same topics?

The focus here is on entries for the 1,000 largest cities in the US and Germany: cities in principle are apolitical ground, but they have political issues which may be covered differently in the two encyclopaedias; semantic drift between article might also result from the length of the source Wikipedia article (which provides more opportunity for difference), but versions may also converge again over time.

Such comparisons are done via sentence embeddings in a high-dimensional space, and article-level comparisons within a lower-dimensional manifold; in the end, this produces a semantic drift vector with a specific magnitude and direction. This allows for comparisons between Wikipedia and Grokipedia versions 0.1 and 0.2, and between the two Grokipedia versions themselves.

Such drift is highly and differently directional for each of these comparisons; it also relates to whether or not Wikipedia was formally attributed in Grokipedia articles or not. Longer articles produce more drift between Wikipedia and Grokipedia; the update of the Grokipedia platform reduced drift from version 0.1 to 0.2. When articles mentioned political topics (refugees, housing, etc.), drift was substantially larger than when they did not. When articles decoupled from the Wikipedia source, drift was considerably greater, too.

But this is just a first snapshot of such content evolution, for a specific subset of entries; a great deal more exploration and analysis is required here to develop more comprehensive findings. These early results are also important as other encyclopaedias are exploring the use of generative AI technologies for producing dynamic content.

  • 1 view
INFORMATION
BLOG
RESEARCH
PUBLICATIONS
PRESENTATIONS
PRESS
CREATIVE

Recent Work

Presentations and Talks

Beyond Interaction Networks: An Introduction to Practice Mapping (ACSPRI 2024)

» more

Books, Papers, Articles

Untangling the Furball: A Practice Mapping Approach to the Analysis of Multimodal Interactions in Social Networks (Social Media + Society)

» more

Opinion and Press

Inside the Moral Panic at Australia's 'First of Its Kind' Summit about Kids on Social Media (Crikey)

» more

Creative Work

Brightest before Dawn (CD, 2011)

» more

Lecture Series


Gatewatching and News Curation: The Lecture Series

Bluesky profile

Mastodon profile

Queensland University of Technology (QUT) profile

Google Scholar profile

Mixcloud profile

[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence]

Except where otherwise noted, this work is licensed under a Creative Commons BY-NC-SA 4.0 Licence.