Approaches to Blog Analysis

The last session for today starts with a massively multi-authored paper on Conversation and connectivity in the blogosphere from a group of researchers at Indiana University - I'm counting some seven names on the by-line. Elijah Wright is the first spokesperson.

BROG Group: What Is the Level of Conversations in Blogs?

Elijah begins with some basic definitions of the blogosphere as the intellectual cyberspace inhabited by bloggers, and of blogs as community and blogging as social interaction. There are therefore some very significant claims that have been made of the conversational potential of blogging - but how much conversation is taking place in the blogosphere, and how much social interaction do they therefore support?

The study is based on data extracted from, ultimately identifying some 5,500 unique URLs and sampling some 50 blogs from this set to be studied in more depth. John Paolillo now takes over to describe some more of the study process, which used social network analysis of the 5,500 blogs, developed a visualisation of some 254 most densely interlinked blogs (using the Pajek software), and further performed a content analysis of the 50 randomly selected blogs.

Three main topic clusters were analysed in the visualisation process: catholicism, homeschooling, and A-list bloggers (mainly political commentary, and humour), and John now shows a graphical representation of some of the network structure. The most heavily linked blogs were contained in the catholic cluster, in this case. Blogs can also be analysed depending on the ratio of in- versus out-links - this can indicate the level of conversation (an uneven ratio would mean simple one-directional linking, and thus the absence of conversation).

Ben Clark now takes over. For each blog, references to other blogs in posts and comments were coded in the content analysis, distinguishing references within and outside the sample; Ben now shows some examples of such references showing different types of linking (direct or indirect reference to the blogger of blog, etc.).  In the end, the study coded some 582 conversational units, with 135 references in posts to other bloggers; 48% of these were hyperlinked, with another 10% through trackbacks.

In the sample, none of the references replied to another post in the sample. Comments replied by definition, of course, 83% directly to the previous post, the rest to an earlier post in the same thread. There was also a significant amount of referencing specifically from respondents who were already on the blog roll of the blog they responded to. (Sorry, this is getting complex, and I may not necessarily capture all the details here.)

Sarah Mercure now takes over. She notes that more comments than references in posts participate in conversations, then. Comments address other bloggers directly, references refer to other bloggers and/or link to their content. Thus, not all blogs interconnect, though some do. Conversing blogs don't necessarily link to each other, and most conversations take place in the comment threads. At the same time, Sarah suggests, the study is also limited in its findings by the limited sample size. There is need for more large-scale, systematic study over a longer timeframe.

Thomas Ciszek: Towards a Taxonomy of Hyperlinks

Up next is Thomas Ciszek from the University of North Carolina at Chapel Hill, presenting a paper co-authored with Xin Fu. This paper is particularly investigating the direction in which hyperlinks are evolving - is there structural evidence for a social hyperlink, and how can social hyperlinking be explained? Annotation behaviour through hyperlinking evolves with the development and adoption of relevant technology, and builds on a history of hypertext models from Ted Nelson's Xanadu onwards. Recently, too, links have evolved beyond purely functional, navigational linking towards a sense of more open hypermedia and a use of link-making as annotation, decentring authority and interpreting content. Hyperlinking is operationalised for this purpose. Hyperlinks themselves are a form of communication, and there are a number of taxonomies which attempt to identify different qualities of linking.

Blogs, too, exist in a number of different forms, which use hyperlinks in different ways. Thomas points to an 'ipsativity' in blogging - enabling a narcissist, reflective look at what bloggers have made. There are also an increasing number of other communication technologies which are being connected to blogging and hyperlinked into the information network. Annotation is automated through hyperlinks (for example also by folksonomic tagging, but in the future perhaps also more through the automatic extraction of metadata).

There are user- and not user-based forms of hyperlinking, then. Hyperlinks can be created by persons or machines, or are provided to link to other media; there are also archival, informational, social, commentary, category, and feed link types. Thomas conducted an ethnography of 21 bloggers to study their use of hyperlinking and identify what types of links they used. Six of the blogs did not offer RSS feeds; 9 had no categories features. The links were predominantly (in order) archival, informational, social, commentary, category, and feed; most were informational (over 1200), against just under 600 communicative (i.e. conversational) links in the sample. At the same time, many bloggers stated that they blogged for communicative and conversational reasons, rather than merely to provide information.

Further research, then, will be required into social networking, and into the duality of the network (where a link is not necessarily equal to a node). Social network analysis can also be performed, studying clustering, friend-of-a-friend networks, etc.; further, the economy of the network must also be further investigated.

Lilia Efimova: Discovering Weblog Communities

The final speaker is Lilia Efimova from the Telematica Instituut in the Netherlands, presenting a paper co-authored with Stephanie Hendrick and Anjo Anjewierden, on defining a Weblog community. She begins by pointing to existing work on virtual communities, distinguishing between cyber-communities and their virtual settlements, and noting that these communities produce artefacts which are indicators of community norms and practices. In the case of bloggers and their highly distributed blogosphere, however, it is a shared space without a clearly defined boundary - interaction in the blogosphere takes place in 'a space between buildings' (the individual blogs).

So how can the social structures of Weblog communities be discovered? (This is also an issue for people attempting to commercialise blogging, of course.) The challenges here are that the structure of the blogosphere is uneven, and that data need to be teased out through various means; at the same time there are social clusters and subcultural communities with their artefacts, but it is difficult to discover them and their meanings. Social interactions in these subcultures are also dependent on the subcultures and their interests, values, and norms as well. So, what can be used as community indicators? Lilia suggests a number of options: shared memes and meme paths; Weblog reading patterns; linking patterns (and link types!); Weblog conversations; indicators of events; and 'tribe' marks, group spaces, and blogger directories.

Lilia's study began, then, with a study of her own Weblog community - it attempted to discern the community's boundaries based on artefacts, and to develop a method for mapping the community. A starting point for this could be the links in the text (analysed for their frequency and reciprocity), using social network analysis as the core of the study, and cross-checking against blogrolls, text analysis, RSS reading, and relations as they are reported by the bloggers themselves.

To begin with, then, Lilia and her colleagues collected some post-level data manually, and further analysed this using common spidering and link analysis tools. Some of the parameters used here were the frequency of bloggers linking to another blog, and the reciprocity of links between bloggers (both are indicators of community, of course). Lilia has a number of graphical representations of these networks, then, with the core blogs very heavily interconnected - but the problem here is that the mere links graph may layer a number of different topical interest communities over one another, without an effective, non-manual way of separating out these different subsets.