The final speaker in this "Compromised Data" session is Anatoliy Gruzd, whose interest is in the automated discovery and visualisation of communication networks from social media data. (He's also just launched a new journal in this field, Big Data and Society.) How can such networks be discovered and visualised, and how can we evaluate the sense of community which may exist in them?
Social network analysis enables us to investigate the connections between users in social networks. It reduces large quantities of messages to a smaller number of nodes exchanging communication; it can track longitudinal developments over time; it can show the social dynamics of interaction around specific topics and events; and it can differentiate between different types of network formation in social interaction.
Traditionally, social networks have been identified using surveys or interviews, with such self-reported data (that are often unreliable) then translated through a time-consuming process into network visualisations. Online social networks, by contrast, can be discovered from the digital traces they generate; this is more complicated for many-to-many networks like social media than for predominantly one-to-one networks like email.
But network connections are shown not only by the technical addressing of one user by another in a reply, but also in the free-form textual data that may show references to other users. This moves us from a chain network (based on formal reply chains) to a name network (based on references to others in the messages), and from technologically obvious to textually inferred connections.
Names are more complex in this because if their divergent uses: names are used to directly address another person, but also to refer to them without intending to directly address them as a participant in a conversation. Name networks often have more connections than chain networks, pointing to a greater volume of social interactions as a result. Self-reported (survey-based) networks turn out to be more similar to name networks than chain networks, interestingly.
Anatoliy has explored this for example for the commenter community of a real estate blog, to determine the shape and development of the community and indeed to explore whether a real community has emerged here at all. This showed that participants became more connected, and gradually increased their involvement in the community.