Finally in this AoIR 2015 session, we move on to Greg Elmer, one of the editors of Compromised Data: From Social Media to Big Data. His contribution is focussed on the practice of collecting data from social media sites, some of which is done using some very simple Web scraping tools (as Edward Snowden did at the NSA, apparently).
Scraping is now a common practice in a number of contexts; some sites scrape from mainstream news sites in order to gain better search rankings, for example. Google briefly introduced a tool to identify where site content had been scraped and reused – but this was discontinued when it revealed that Google itself regularly scrapes substantial amounts of content from other sites.
Can scraping be used as an analytical practice, however? One problem with this is that scraped data are always already compromised because of their personalised nature. First, screen scraping as the earliest form of scraping is inherently based on mirroring and capturing the viewing experience of a real user, and thus depends on how that first person is emulated.
Second, crawler-based scraping through tools such as IssueCrawler navigate the Web by following and mapping out hyperlinks, but this again has problems because not all links are accessible, some are closed to crawlers through robots.txt files, and the accessibility of all of them is determined by the the server location of the crawler itself.
Third, API-based scraping provides comparative ease of access to large-scale data sources, but even such tools are still affected by the ways that APIs often shape access depending on the positioning of the scraper app's user in the social network.
Is this the cost of doing digital methods? Are all digital data inherently compromised in this way? How can we envision data mining from outside the first-person perspective – in other words, what would zero-person scraping look like? This could proceed by moving away from platform, or at least from first-person platform, studies; to enumerate a politics of data points; and to move from mapping to sensing.