You are here

Easy Data, Hard Data? Twitter Research and the Politics of Data Access (CD 2013)

Compromised Data 2013

Easy Data, Hard Data? Twitter Research and the Politics of Data Access

Axel Bruns and Jean Burgess

Over the first years in the life of the platform, research into the uses of Twitter has evolved and grown rapidly, helped especially by an initially comparatively liberal regime of providing access to large data on public communication via Twitter through the platform’s Application Programming Interface (API). The API’s capabilities, and the data access it provided, quickly generated a diverse and active community of Twitter researcher exploring and documenting the uses of the platform in a wide range of contexts, and driving the development of new methods for the study of public communication through social media (Rogers, 2013).

Gradual changes to what forms and volumes of data are available through the API, and at what retrieval speeds, have gradually channelled the energy of most Twitter researchers into a number of prevalent directions, however. For example, while during the early years of the platform researchers were able to request and gain ‘whitelisted’, premium access to the API at the discretion of Twitter support staff, enabling them to retrieve a larger amount of data at greater speeds, such access (or the equivalent thereof) is now available only to the paying subscribers of third-party Twitter data resellers such as Gnip or DataSift, at significant cost. Under current API terms and frameworks, standard, unpaid access is suitable in effect only for gathering comparatively small, limited datasets. At the same time, Twitter’s business model has relied progressively more on providing licensed data access to market research companies, in keeping with the rise of the ‘social data’ market (Puschmann & Burgess, 2012), further complicating not only the practical accessibility of the platform to researchers, but introducing new dimensions of the politics of such research, particularly in the reflexive traditions of the social sciences and humanities (Burgess & Bruns, 2012; Langlois & Elmer, 2013)

As a result, there is now a growing divide between the majority of researchers who are forced to work with ‘easy data’ – pursuing the low-hanging fruit in Twitter research because a lack of funding for data access and research tools development prevents them from – and a minority of researchers and institutes who have the funds to pay Gnip or DataSift, or the technical skills to partially circumvent API restrictions, in order to access more difficult ‘hard data’ about public communication on Twitter. Due to underlying differences in funding patterns, this divide is also one between different disciplinary perspectives: prominent commercial as well as university-based market research and computer science institutes dominate in the latter group, while their poorer cousins from the humanities and social sciences are more likely to be found in the former.

Whether intended by Twitter, Inc. or a collateral outcome of internal priorities, this growing imbalance significant skews the trajectory of Twitter research, and thereby is likely also to affect the public perception of Twitter as a tool for public communication. Two key interventions are necessary to address this, therefore, both of which we will pursue in this paper.

First, we must critically re-examine what is and is not covered by conventional Twitter research as it is practiced by the majority of researchers who are able only to access the ‘easy data’. For example, such work often focusses exclusively on datasets which contain the tweets that match certain keywords and hashtags, to the exclusion of the follow-on and ancillary conversations which do not contain these specific hashtags. What is usually absent from such studies is a more thorough contextualisation of these datasets as a part of everyday Twitter use: for example, was a specific topical hashtag able to attract a broad range of participants, or only the usual suspects? Did its activity constitute a significant portion of the total volume of tweets at the time (overall, or from a given location or group of users)? Did the hashtag join together previously unconnected users, or was it used by an established network of followers and followees? An investigation of such questions provides significantly more context for studies which otherwise remain disconnected, or at least highlights what is missing from much contemporary Twitter scholarship.

Second, in order to address these gaps, we must also continue to problematise the material politics of social media data and the data analytics industry which feeds on such data, and to formulate responses to these politics on behalf of the social media research community. Such responses cannot lie in mere acquiescence nor in a wholesale flaunting of Twitter’s Terms of Service; rather, we must seek to highlight the politics of Twitter, Inc. as an example of a wider critique of the politics of platforms (Gillespie, 2013a; 2013b).


Burgess, J., & Bruns, A. (2012). Twitter Archives and the Challenges of “Big Social Data” for Media and Communication Research. M/C Journal, 15(5).

Gillespie, T. (2013a). The politics of “platforms”. A Companion to New Media Dynamics, Eds John Hartley, Jean Burgess and Axel Bruns. London: Wiley-Blackwell, pp. 407-16.

Gillespie, T. (2013b). The relevance of algorithms. Media Technologies, Eds Tarleton Gillespie, Pablo Boczkowski, and Kirsten Foot. Cambridge: MIT Press.

Langlois, G., & Elmer, G. (2013). The research politics of social media platforms. Culture Machine, 14.

Puschmann, C. & Burgess, J. (2013) The politics of Twitter data. HIIG Discussion Paper Series No. 2013-01

Rogers, R. (2013) Debanalizing Twitter: The transformation of an object of study.