You are here

Big Data before Big Data

The third speaker in this AoIR 2018 session is Harsh Taneja, who promises to present an alternative history of big data. At present, many big data datasets are highly platform-specific, such data can generally be accessed via platform APIs or scraped from platform Websites. But big data research existed before the Internet: Harsh points here to the early days of advertising-supported broadcasting, when advertisers first required audience measurements.

This was done at first through self-reporting, for instance through phone surveys. Soon, however, people like Arthur C. Nielsen developed audience measurement devices, which produced a first kind of big data on audience behaviours (and this is the genesis of the Nielsen market research company, too). The approach here is to find a(n almost) representative audience sample, and install audience meters in their homes to record their media use on a continuous, comprehensive basis.

Such tools began to generate a massive amount of highly granular data, but there was no theory available to analyse them, and Nielsen made available only high-level data patterns rather than more detailed raw data. Subsequently, statisticians began to use such data to analyse audience duplication across competing stations or programmes, and techniques such as factor analysis and cluster analysis were developed and became early precursors of machine learning.

Much of this research remained atheoretical, however; it analyses the data to extract patterns in media use, but still required the development of theories that would explain or even predict such patterns. The ‘law of audience duplication’ is one such theory, but it emerged only some twenty years after the introduction of audience measurement devices. The companies that gather the data underpinning such theory development also have an important role to play here, of course – much as Nielsen has done in the development of media audience theories.