I’ve now moved on to an ICA 2018 high-density session on computational methods, which starts with Rebekah Tromble. She begins by noting the uncertainty about what Twitter data actually represent, and her project was to explore these questions.
Keyword query data collected via the Twitter API are not representative of the underlying population: it returns representative, but not necessarily complete data. When the rate limits are hit, the data are truncated, though not on the basis of specific features. The biases that result from such selection are likely to be substantial.
What factors drive such search API sampling, then? Content …