Skip to main content
Home
Snurblog — Axel Bruns

Main navigation

  • Home
  • Information
  • Blog
  • Research
  • Publications
  • Presentations
  • Press
  • Creative
  • Search Site

Testing the Validity of Twitter API Data

Snurb — Friday 20 October 2017 00:12
'Big Data' | Social Media | Twitter | AoIR 2017 |

The next speaker in this AoIR 2017 session is Rebekah Tromble, whose focus is on the impact of digital data collection methods on scientific inference. Collecting data from social media APIs, how can we know whether we have 'good', valid data?

Twitter, for instance, provides a range of open APIs as well as commercial-quality data access via its subsidiary GNIP; the open streaming API offers up to 1% of the total global Twitter throughput, but potentially offers 100% of the tweets matching specific keywords or hashtags; and the open search API offers access to historical tweets, but also with significant limitations.

Rebekah's project tried a number of different data captures across these three data sources, using the #jointsession hashtag for President Trump's first address to Congress, the #ahca hashtag about the House of Representatives failed vote on healthcare, and the #fomc hashtag for the Federal Open Market Committee; additionally, it also captured all tweets mentioning @realdonaldtrump on Trump's inauguration day.

For some of these events, the streaming API was substantially rate-limited (at around 65% of all tweets). Search also resulted in only a limited (but larger) subset of tweets for these events. The project then tested the variables that potentially influenced which tweets from the total set of matching tweets (as captured via GNIP) were delivered via the rate-limited open APIs – do user properties or tweet properties influence which tweets are selected?

Search appears to be influenced by a range of variables, while streaming shows a more limited set of factors. Overall, when rate limits do not apply, the streaming API approximates the full tweet population. But for short-term, rate-limited data, the API may well introduce important biases in the dataset collected.

  • 997 views
INFORMATION
BLOG
RESEARCH
PUBLICATIONS
PRESENTATIONS
PRESS
CREATIVE

Recent Work

Presentations and Talks

Beyond Interaction Networks: An Introduction to Practice Mapping (ACSPRI 2024)

» more

Books, Papers, Articles

Untangling the Furball: A Practice Mapping Approach to the Analysis of Multimodal Interactions in Social Networks (Social Media + Society)

» more

Opinion and Press

Inside the Moral Panic at Australia's 'First of Its Kind' Summit about Kids on Social Media (Crikey)

» more

Creative Work

Brightest before Dawn (CD, 2011)

» more

Lecture Series


Gatewatching and News Curation: The Lecture Series

Bluesky profile

Mastodon profile

Queensland University of Technology (QUT) profile

Google Scholar profile

Mixcloud profile

[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence]

Except where otherwise noted, this work is licensed under a Creative Commons BY-NC-SA 4.0 Licence.