You are here

Developing Alternative Approaches to Sampling Social Media Data

The next speaker at "Compromised Data" today is Carolin Gerlitz, who begins by suggesting that social media data are both standardised and vague at the same time. She notes the German Twitter community which is focussed around favouriting on Twitter: the Favstar sphere sees favourites as a sign of importance and validation, and taking away favourites is therefore a serious affront.

This is an example of how the communicative affordances of social media platforms are being utilised by their users; these standardised activities mark the grammar of action on such platforms, and are specific to the particular platforms. Twitter's grammar has been comparatively stable, while Facebook has modified its available actions on a continuous basis, which destabilised the meaning of such activities.

Standardisation produces comparable and countable numbers; but such actions remain vague as users may regard them to have different meanings - activities are therefore standardised in form, but vague in meaning. Twitter favourites are an example of this: introduced in 2006, favourites were seen at first as an unwanted step-child of the service, as somewhere between bookmarking and Facebook-style liking. Favourites were difficult to organise or manage, or to systematically explore.

Only when third-party developers developed further favouriting functionality did the utility of favourites increase. Feed reader tools began to use them; Favotter and other bookmark ranking tools began to deploy favourites as a measure of popularity for tweets, showing "tweets of the day" and highlighting the most favourited users.

This also supported the emergence of the Favstar scene, demonstrating the effects of third-party tools on the platform itself. This is a process of de- as well as recontextualisation of platform data; the same process is visible in the popularity of Klout scores, which turns platform data into a new measure of "influence" ("the ability to drive action") across different social media platforms.

This, then, connects the interaction grammars of a variety of social media platforms into one score; they are de- and recontextualised by a proprietary algorithm, and the results of this scoring feeds back into the activities of the users who actually care about their Klout scores. Users are reminded of the repercussions of their activities, and may shape their platform engagement strategically around what effects they may have on their Klout score.

Klout further offers perks to users with high scores in specific topic activities - it thus incentivises such strategic activities. Users' social media activities thus become valuable to the service's financial parties. Klout partners also take into account users' Klout scores as a measure of importance, privileging high-ranking users in job interviews, tech support, or a variety of other situations.

The original social media data are thereby turned into new metrics which themselves move into new fourth- and fifth-party contexts and become multivalent. Numbers are both standardised and remaining vague; activities such as favouriting become partible: they are still meaningful to the originating users, but also take on new meaning through the repurposing of such data in other contexts. Social media become multivalence machines.

Who can realise these multiple forms of value, then? Empirical engagement with such platforms proceeds largely from sampling, but such meaningful samples are difficult to create given the diverse meaningful practices which take place on such platforms. Most Twitter research draws on topical, non-representative samples, for example, building on a priori assumptions about Twitter use; representative sampling, on the other hand, draws on random or cluster samples which study emergent and variant use practices.

Carolin and Bernhard Rieder examined this through a random sample of one percent of all Twitter activity over a 24-hour period, finding a variety of hashtag uses, for example: combination hashtags (#lol, #yolo), topic markers, shoutout hashtags (#thingsidowhenigetbored), and spam hashtags. Sampling thus needs to align platform features with uses practices and the sample used, in order to be meaningful.

Platforms providers thus have good platform-political reasons for bad platform data; research creates new relations between social media data, also participating in the de- and recontextualisation of data.