You are here

Gathering Internet Statistics

Lars Kirchhoff at AoIR 2007Vancouver.
The post-lunch session on this first day here at AoIR 2007 starts with a paper by Lars Kirchhoff, Thomas Nicolai, and me, and Lars is here to present it with me. The PDF is already online, and I'm recording our presentation and will add it to our slides below as soon as I can.

Up next is Ben Anderson, whose interest is in small area estimates of e-use, especially in England. He notes that England is usually divided into nine statistical regions, and Internet use and other e-statistics are commonly measured for these regions. This masks some far more diverse data at a much lower level, however, and we have to develop ways in which we may get at this information. Such information is interesting for telecommunications companies, distribution companies, policy makers, and many other operators in such areas. A related problem is that ICT data has as yet been poorly incorporated into the official UK Census - so, Ben has been working on developing simulated census information by combining data from the official Census and smaller surveys.

This is done by incorporating the survey information into the Census data with specific weights determined in a process of iterative proportional fitting. So, for example, Ben would identify those characteristics of the population which are related to online time use, and use these as predictors for more specific localised data. There are also models for validating such information, both internally (checking whether specific results average out again to what would be expected across the region) and externally (comparing the simulated data estimate with reliable external data sources).

Additionally, this micro-data can also support micro-level experimentation with changing parameters (for example changing the level of Internet uptake). This also requires the establishment of a forecasting timeframe, however, and a recognition of the interdependence of variables in a linked system. It is limited by the availability of current data, of course. Ben and his colleagues are now also in the process of publishing some of this information in a mash-up with Google Maps.

Finally on to Tang Tang. She begins by highlighting some of the basic statistics of U.S. Internet use (some half of adult Americans now have broadband access at home, and more than $4 billion is being spent on Internet advertising), and follows up on this with the question of how to characterise Internet user behaviour in this context. Two theoretical schools have possible answers: uses and gratifications theory, which believes that audience exposure is determined by audience motivations, characteristics, and tastes; and structural factors theory, which believes that audience behaviours are best understood by examining audience availability, accessibility, and other structural factors.

In the face of this, then: what factors do predict Internet usage? Tang conducted some 1500 online surveys, weighted abour 2:1 in favour of females, 2:1 in favour of students, and with a median age of around 30. The mean time spent online by this group was 20 hours, with only 12 hours per week spent using TV (heavy Internet users were also heavy TV users, however). Factors influcencing Internet usage were time availability, exposure to television, Web activities, income, and instrumental motivations. This may mean that especially younger media users increasingly use two or more media forms simultaneously (there is no longer a question of either/or). Both individual and structural factors influenced Internet usage, and this means that it may be valuable to bridge the two theoretical models available to us. However, there is also a need to conduct similar studies in other cities and using different survey methods to see how the resulting data differ in such areas.

Technorati : , , , , , , ,
Del.icio.us : , , , , , , ,