You are here

'Big Data'

Generating Representative Samples from Search Engine Results?

The next plenary speaker at Digital Methods is Martin Emmer, whose focus is on sampling methods in digital contexts. Online media are now important public fora, and conventional media are increasingly using digital channels to transmit their content as well; this also leads to a shift in media usage, of course, and some of that shift is also driven by generational change.

If we need to examine the digital space to understand current debates in the public sphere, then, how do we generate representative samples of online content and activities? With traditional mass media, it was possible to draw on comprehensive lists of media providers, with a small handful of alternative media; in the digital environment, channels and platforms have multiplied massively, and it is no longer trivial to select a small number of sites and spaces which represent all online activity.

The Impact of Social Sharing on Google Search Results

The next session at Digital Methods is a plenary panel which begins with Christina Schumann, whose focus is on Google and other search engines as technological actors on the Internet. Search engines are especially important as they now serve as a kind of gatekeeper on the Net - but the criteria they use for ranking and structuring information are often far from transparent.

The basic approach of search engines is to crawl or otherwise gather Internet data which are then indexed and processed into a database; this database is queried as a search query is entered into the search engine. Factors in returning search results include on-page information (content, programming, and design of Web pages) as well as off-page metadata (especially the link networks surrounding each page, relative to the theme of the query).

The Opportunities and Challenges of 'Big Data' Research

At the end of an extended trip to a range of conferences and symposia I've made my way to Vienna, where I'm attending the DGPuK Digital Methods conference at the University of Vienna. The conference is in German, but I'll try to blog the presentations in English nonetheless - wish me luck... We begin with keynote by Jürgen Pfeffer, addressing - not surprisingly - the question of 'big data' in communications research.

Jürgen begins by asking what's different about 'big data' research. In our field, we're using 'big data' on communication and interaction to work towards a real-time analysis of large-scale, dynamic sociocultural systems, necessarily especially through computational approaches - this draws on the data available from major social networks and other participative sites, but it aims not to research "the Internet", but society by examining communication patterns on the Internet (and elsewhere).

Distinguishing Chain and Name Networks in Social Network Analysis

The final speaker in this "Compromised Data" session is Anatoliy Gruzd, whose interest is in the automated discovery and visualisation of communication networks from social media data. (He's also just launched a new journal in this field, Big Data and Society.) How can such networks be discovered and visualised, and how can we evaluate the sense of community which may exist in them?

Social network analysis enables us to investigate the connections between users in social networks. It reduces large quantities of messages to a smaller number of nodes exchanging communication; it can track longitudinal developments over time; it can show the social dynamics of interaction around specific topics and events; and it can differentiate between different types of network formation in social interaction.

Bottom-Up Measurements of Network Performance

The next session at "Compromised Data" starts with Fenwick McKelvey, who begins with a reference to the emergence of digitised methods for the study of the Web during the mid-2000s. This was the time around which the latest generation of social media emerged, enabling us to begin thinking about society through the study of the Internet, requiring the development of new research methods by repurposing computer science methods for social science research.

In Toronto, Infoscape Labs developed a number of tools for the exploration of political discourse in Web 2.0, including the Blogometer. This is the emergence of platform studies, paying attention to the platform itself - but this also introduces challenges about how to study the platform, as the core object of research itself intervenes in its study, e.g. through the politics of APIs. This work also required compromises around data access and utilisation, and a growing bifurcation between scholarly and commercial research activities emerged.

Archiving Our Personal Digital Milieux

The final presenter in this morning session at "Compromised Data" is Yuk Hui, who will present a social media self-archiving project. He has worked for years on audiovisual archives, but much of the work on this field has focussed on institutional rather than personal archives, with the latter often concerned mainly with privacy issues.

But another set of problems relates to data management instead: we are working with multiple cloud-based systems, but rarely archive our digital objects effectively - archiving is not just about storing, but about preserving the context of digital objects as well: the digital milieu.

Social Media Data and Their Utopian Assumptions

The next speaker at "Compromised Data" is Ingrid Hoofd, whose interest is in how new technologies make certain types of representation possible or impossible. The neoliberalisation of universities, for example, leads to a quantification of research data which generates poor research. This is the violence of numbers: how do we assess the way new media technologies change the face of social sciences research, then?

Social media data mining methodology provides an allegory of the technological apparatuses that use it. This hinges on these technologies' propensity to speed up, and on the associated notion of change. There is a strong emphasis on objectivity, generating more true as well as more questionable coverage of the conditions of the real. Social science via datamining tools is implicated in a push towards an idealised data-driven utopia.

Haunted Data in Cross-Media Controversies

The second day of "Compromised Data" starts with Lisa Blackman, who is tracking social media controversies and mapping information contagion. Can we use quantitative methods in non-positivist ways to understand these processes?

Lisa introduces the idea of haunted data, and suggests that we need to think about digital methods as performative: we need to move behind infographics when thinking about visualising data. Part of this is about priming: creating an experimental apparatus that makes people feel that their actions are self-directed, but actually generates such actions through the interventions of the apparatus. Such research is controversial because of its early ties to research into psychic phenomena, however. It is useful, however, to explore information contagion and virality, especially in the context of social media controversies.

The Push towards Niche Geosocial Data

The final speaker on this first day of "Compromised Data" is Sidneyeve Matrix, who shifts our focus towards geosocial information as generated by smartphones and other mobile devices. Only 12% of US users as surveyed by the Pew Centre posted Foursquare check-ins in 2013, for example, down from 18% in 2011 - but this may mask a greater take-up of other location-based services, not least the Frequent Locations functionality in iOS7.

There is a continuing trend towards the consumerisation of geodata. Geosocial cultural arrangements are explored through the use of mobile communication patterns, but such analysis is notoriously difficult - not because of a lack of data, but because of the difficulties in assigning meaning to the geolocated information which is available from a variety of platforms.

Pages

Subscribe to RSS - 'Big Data'