You are here

Towards a Better Methodology for Mapping and Measuring Blog Interaction

I'm crossposting this from Gatewatching.org, where a discussion about the influence of Australian political bloggers on wider political processes that was kicked off by Jason Wilson's recent posts on Tim Blair's move to the Daily Telegraph and Christian Kerr's summary dismissal of Ozblogistan's political combattants in The Australian has prompted me to finally post up some more information about the research we're currently engaged in at QUT, in collaboration with our excellent colleagues at the University St. Gallen in Switzerland. I'm also attaching a detailed discussion paper which documents our methodological model in some more detail - we'd love to get further feedback on this, from fellow researchers and interested bloggers alike. (For a more condensed version of this material, please see our paper for the ISEA 2008 conference in Singapore.)

The main problem with tracking what happens in the blogosphere is two-fold: on the one hand, we've yet to agree what metrics may be appropriate for measuring bloggers' actions and their influence on the wider political sphere (what do we mean by 'influence'?), and on the other, even if we come up with an appropriate working model of what those metrics may be, how do we document them in a reliable, testable fashion across a potentially large population of sites?

A first step towards solving these problems lies in the use of automated tools, and as I've mentioned here and elsewhere before, I did some work during 2007 in exploring network crawlers as a way of tracing patterns of interlinkage and clusters of interconnection in the Australian blogosphere. I'll be the first to point out that this approach has limitations (indeed, I did so in my article in First Monday last May), but the results were nonetheless instructive - my crawls on a number of then-current issues showed a relatively stable structure in the Australian political blogosphere that had a number of pretty clearly defined hubs (and leading sites within these hubs), and I would still argue that these hubs are formed largely on the basis of shared ideology. While this necessarily cuts short a more detailed story, I'd argue that these hubs can be seen as broadly left-of-centre and right-of-centre. More recent information about Larvatus Prodeo's position in the Australian blogosphere certainly seems to support general observations from that research.

But there are significant problems with such research, too - most of all, current tools remain fairly blunt in their analysis of blog-based content. Link crawling and network mapping tools are generally unable to distinguish between the different forms of content that may be found on a blog: they simply crawl the entire page, or the entire Website, which means that navigational and other functional links on a page are lumped together with links in blogrolls, posts, and comments (and even, in the case of some sites, ads) - and this happens possibly for the site overall, combining links today with links from last week, last month, or last year. When we map these data, this means that we get a fairly smudgy, composite picture only. The same goes for textual analysis, of course - running something like Leximancer over a blog's contents takes in the actual blog posts themselves as well as static pages, functional information, and other material. Without further manual correction, the most prominent term that is found for any one blog in this way is probably "Submit Comment" - and that's not very useful...

By contrast, what would be much more interesting would be to focus only on the links and texts included in the blog entries themselves (and possibly - perhaps in a separate category - on user comments), discarding all the ancillary material that is of no direct relevance to the topical coverage of a blog. RSS feeds aren't sufficient here: many of them don't contain links, for example, and some (like the LP feed) include only excerpts of the whole post.

So, what's necessary is a scraper that's smart enough to tell salient content (posts, comments) from ancillary material, and that can process this material for further automated as well as manual analysis. That's what our fabulous colleagues in Switzerland have begun to build for our project - and we're now starting to see what we may do with the material they've gathered so far.

As we continue this process, we hope that ultimately our research will

  • indicate the shape of the networked public sphere overall, and of the individual public spherules which we assume may constitute it;
  • show the level of polarisation of or interconnection between the participants in public debate within any such public spherule;
  • indicate similarities and differences between various subsets of the overall blogosphere, as defined for example by topic, nationality, or language;
  • track the evolution and dissemination of individual memes (terms, themes, concepts) across the blogosphere, thereby providing a quantitative basis for the application of extant communications theories to communication in the blogosphere;
  • show evidence of the collective knowledge distributed across the blog network.

None of this is likely to happen any time soon, but we do think that we've got a good chance of making a significant contribution towards these goals through our work. We'll use Gatewatching.org to update you on our progress along the way, and we invite your feedback on our research plans as we've outlined them so far!

Technorati : , , , , ,
Del.icio.us : , , , , ,