You are here

Methods for Tracking Viral Video Dissemination across the U.S. Blogosphere

The final speaker in this session at AoIR 2011 is Shawn Walker, whose interest is in the viral diffusion of information. He focusses here on the viral diffusion of videos during the last U.S. presidential election. Such diffusion addresses the dynamics of viral information flows online; videos sometimes managed to generate some millions of views in a very short time. Shawn’s project compared the diffusion of a number of videos across the blogosphere over the course of a year and a half.

How is this done methodologically? How can relevant data be gathered and analysed? Shawn generated data for some 125 videos across 10,000 blogs; this involves substantial data scraping and capturing, as well as (hand-)coding and analysing data. Extracting data on videos from YouTube is far from easy, and it’s impossible to predict which videos will go viral; instead, the project used a tool called Viral Video Chart to determine the top viral election videos, as well as exploring YouTube manually to identify different versions and mashups of the same video. Shawn also used paid access to viewing data provided by TubeMogul – which was not always comprehensive or entirely accurate, however.

Additionally, there was also a need to gather data on how these videos were disseminated across blogs; given the number of blogs, it turned out to be very difficult to do this. Ultimately, scripts which accessed Google Blog Search for a defined list of known U.S. political blogs were used to identify where such videos were referenced on these blogs (but even this only points to blog posts which link to YouTube – not posts which include video embeds). Ultimately, this generated some 14,000 posts from 10,000 blogs.

Additionally, it was also necessary to rank these blogs for popularity. Technorati rankings turned out to be insufficient; instead, the project used data from traffic tracker Compete, which still had to be further processed, however. In the end, it turned out that leading blogs were linking more frequently to viral videos than the long tail of less visited blogs.

Ultimately, what this involved process enabled was the development of the RetroV Website, which makes these data publicly available and allows users to explore the patterns of video dissemination.