The next session at AoIR 2015 is starting with Catherine Brooks, whose interest is in scientific collaboration: how do scientists organise themselves and manage their data? This is an increasingly crucial question in (big) data-enabled science. One problem is that of dark data – unused, overlooked, rejected data from past research projects, which could be placed on cloud-based storage platforms to make them useful again.
These questions could be understood from actor-network theory or human and technology studies perspectives – the ultimate goal could be the democratisation of data, to increase their further usefulness. Data often follow the paths of the researchers: they disperse as research teams disperse and move on, and this makes the data tough to synthesise again. This is even more pronounced a problem for highly interdisciplinary teams, too.
So what are the data-sharing possibilities in the cloud? How might we break down the institutional barriers that keep us from sharing our data? Some of these are driven by the competitive and corporatised nature of research, which works against collaboration; some is also about the doubts about the reliability of Amazon and other cloud-based services.
Conversely, there is a belief that greater sharing of data means better collaboration and a more democratic approach – but we are still reluctant to trust others with access to our data. At the same time, some researchers are dealing with datasets that are so large and so globally dispersed that cloud-based sharing is the only feasible option. The only question for scientists here is which software-as-a-service options to use, not whether to use them.
Further, there are legal issues relating to the ownership of data – scientists or their institutions may own the datasets, and see them as assets of commercial value. At the same time, public funding bodies are increasingly pushing for data from the grants they fund to become publicly available.