The next speaker at the CCC symposium is the fabulous Nancy Baym, who begins by noting how overwhelming the buzz about 'big data' has become. There's a great deal of fascination just with the things we can do with big data sources - tracing interesting patterns, attempting to predict future processes, making sense of data by using algorithmic tools.
But the outcomes of such research often remain predictable: they show what we already knew (that various social factors influence each other, for example), and the close studies mean that wider context is often missed. Internet studies has always been very site-specific - but what's happening on one site is also influenced by what happens on other sites, and offline, and this needs to be understood as well. (To begin with, in fact, big data usually excludes private communication.)
Twitter studies, for example, often ignore that Twitter is far from representative for what Internet users (or indeed, society as a whole) do - they simply pick the low-hanging fruit of the data which are readily accessible, and ignore the 'dark' social data which (thankfully) are not as easy to capture. What's necessary is to keep in mind the larger context into which these data and metadata are embedded.
But at the same time, there's value in the close studies as well. Tiny grains of data and metadata have their own value, texture, and context - these need to be recognised as well, and that's a process which can rarely be done quantitatively. There's nothing inherently wrong with 'big data' analysis, then - but it's exceedingly important that we understand the limits of such data, and complement this analysis with other methods that can shed greater light on the context and detail of communication and interaction.