The final presenter at AoIR 2015 today is Anissa Tanweer, whose interest is in the shift towards big data. This has been a major buzzword, and has enabled the rise of data science; her team has conducted an ethnography of the data science environment.
One of the major challenges in working with big data is how researchers envision their data; it is no longer possible to visually review the entire dataset, for example, and the picture of the dataset is constantly evolving. This is especially crucial in moments of breakdown that require a repair of the dataset, and may lead to innovation.
The setting for this work involves a combination of domain experts and data science mentors. This interdisciplinary setup enables an observation of how these researchers work through the challenges of a data breakdown, for example as they bump up against the limits of established processes and practices (which are themselves determined by how the scientists imagine their data).
One example of this is a researcher running out of space for data processing on his own computer, due to poorly compressed raw data; the lack of compression was driven by the specific computational tools used here. The solution was to hack the tools, based at first on the ad hoc solutions identified through Google. This activity opened the black box and revealed the interdependent elements of the research process, which helped reconceptualise the data and the research aims.
What does an observation of such activities tell us, then? What we are seeing here are the effects of cultural norms and work practices, processes of problem-solving and technological adaptation, intersections between career trajectories and institutional frameworks, and epistemological challenges. All of this helps us understand the transformations associated with the emergence of big data.