You are here

Easy Data, Hard Data, Compromised Data

My QUT DMRC colleague Jean Burgess and I are next at AoIR 2015, presenting the core points from our chapter "Easy Data, Hard Data" in the Compromised Data collection. (The slides are below.) The chapter thinks through the pragmatics and politics of being social media researchers in a complex and precarious environment, and thus builds on David Berry's work on the computational turn in humanities and social science research.

This turn towards large data is instrumental as well as transformational – it has exciting practical dimensions as new but unevenly distributed and challenging research opportunities arise, but it also requires us to confront some of the new political dimensions that emerge from working with proprietary data whose availability is government by commercial providers.

Scientific research is being shaped by the politics of platforms, creating specific relationships between actors; the impact of Twitter's transforming business model on the possibilities and conduct of Twitter research serves as just one instructive example here. Sudden API changes have completely transformed what Twitter research can and cannot do, for instance. Researchers are inherently entangled with these changing and unpredictable politics.

Our chapter traces these changes, from the comparative openness and collaboration with developers and users through the gradual closure of large-scale data access opportunities to the current highly restricted and commercialised data access regime.

The net result is that much of the research on Twitter has focussed on hashtag data, which remain relatively easy to capture; what's harder is to go beyond this macro level of Twitter communication, as Hallvard Moe and I have called it, and to examine what happens at the meso level (everyday communication through follower/followee networks – arguably the most common form of communication on Twitter) and the micro level (direct conversations through @replies).

Our research has tried to push past those boundaries, into the field of 'hard' data: we've been exploring the global Twitter userbase, filtering this for Australian users, mapping their follower/followee networks, and tracking their day-to-day activities independent of whether they use hashtags or not.

One early observation from this, for example, is that even the well-known #auspol hashtag (for Australian politics) – the most consistently prominent hashtag in Australia, with an average of some 9,000 tweets per day – accounts for only 1% of the total volume of around 900,000 tweets per day by the 2.8 million Australian Twitter accounts we have identified to date. A focus only on this hashtag as a representation of political discussion in the Australian Twittersphere would therefore be likely to miss an awful lot of communicative activity.

But we've been able to do this only because we've had relatively good institutional backing for our work. This is not the case for the majority of Twitter researchers, and this situation therefore only perpetuates inequalities between researchers and their institutions. There are few solutions to date: Twitter's own data grants, with six recipients hand-picked out of 1,300 applications through an intransparent process, amount to nothing more than a lottery. Indeed, this isn't just about Twitter, and the Twitter API even remains one of the more open APIs amongst mainstream social media platforms. Was its early openness just a historical blip – are we doomed to a social media research future of strict proprietary data enclosures?