The next speaker in our AoIR 2018 session is Ericka Menchen-Trevino, whose research interest is on the study of selective exposure; this is often studied through surveys or lab experiments, but can be usefully complemented with Web history data. Such an integration between conventional social science data and digital trace data provides a blueprint for new possibilities across a range of research interests, in fact.
Conventional social science broadly distinguishes between quantitative and qualitative data and methods, but this distinction is not particular useful when working with digital trace data. These data are usually collected by the researcher for a particular purpose, and we usually have large, detailed datasets about small groups or small datasets about large groups. Even before the arrival of digital datasets, however, there were experiments with unobtrusive large-scale data collections (the wearing down of floortiles in front of specific museum exhibits or the noseprints on the glass boxes surrounding exhibits, for instance).
Today, the digital traces that are collected are both rich and thick; they are exceptionally detailed as well as covering a large population of users. But they are also fragmented across many different platforms and services collecting such data, which complicates their use in research. And ethnographic embedding may be required to fully understand the practices that users engage in.
If we are aiming to use such data to investigate the attitudes of users – the sum of all they feel and think – then surveys have been the traditional tool to assess these, but such indirect measures are far from perfect and reliable; put simply, people are bad at self-reporting. Behavioural data provide an alternative to this, but they too are complicated by the level of control a real-world researcher has over their gathering.
Digital traces alone are useful for exploration and in-platform behaviour studies, while social science data complement these in important ways; for digital traces there is also a distinction between platform-level and person-level data. The latter remain significantly underdeveloped as a resource at this stage.