You are here

Predicting Twitter-Based Information Cascades

The next session at Web Science 2016 starts with a paper by Jure Leskovec on information cascades. Such cascades emerge as users of social media platforms (re)share content through their networks, and the prediction of such processes is traditionally very difficult.

One question in such predictions is whether a given cascade will reach the median size observed in historical cascades; because of how the median is defined, even a blind guess on this question will have a 50% success rate.

But cascades on a platform like Twitter can consist of multiple cascade trees sharing the same information, as pieces of information also travel outside of Twitter and may appear simultaneously in different places of the underlying network. This makes the assessment and prediction of cascades even more difficult.

What kinds of signals can be used to predict such cascades, then? On Twitter we may see many broadcast trees of depth 1, along with a number of cascades with greater depths. This paper draws on some 30 million cascades observed in 250 tweets within a single month; for all cascades sharing the same information, it creates an artificial meta-node that serves as a unified root for all the cascades, and this substantially improves the quality of the cascade forecasts. Several other underlying features can also be used to improve the predictions.