Spatial Correlations in Social Media Data: Identification and Quantification of Spatial Correlation Structures in Georeferenced Twitter Feeds

Professor Dr. Alexander Zipf
Ruprecht-Karls-Universität Heidelberg, Geographisches Institut, Professur für Geoinformatik, Berliner Straße 48, 69120 Heidelberg

Social media feeds are one of the growing numbers of sources of volunteered geographic information. Thereby, over recent years, this kind of data has proven to be a rich source of information for many areas of research. This proposal aims to contribute methodological advancements, whereby we focus on Twitter data. Specifically, we aim to explore novel ways to derive spatial correlation structures within social media feeds. Our work builds upon the mature theory of spatial autocorrelation, which is the traditional way of measuring spatial structure.

The first research question is concerned with integrating the theory of spatial autocorrelation with the geometric stochasticity of tweets. The latter is typically investigated by means of stochastic geometry. We aim to combine principles from both fields in order to derive more accurate correlation structures within tweets. In a first step we investigate the effect of the stochastic geometries on spatial autocorrelation measures. This includes point pattern modelling and a Monte Carlo simulation study. That investigation will provide insights regarding a better interpretation of autocorrelation results. Moreover, the gained knowledge allows detailed insights into the variability of inter-tweet correlations of certain social activities. After this exploratory study, we investigate a measure of spatial autocorrelation that acknowledges the stochasticity of the underlying geometric structure and is thus able to obtain meaningful patterns within social media data.

Secondly we investigate the mutually overlapping character of phenomena that are reflected within the tweets. This overlap is caused by the autonomous behaviour of the users, which report about multiple phenomena simultaneously in space and time. We aim to explore ways of separating relevant tweets from non-relevant ones. This is done by means of Dempster-Shafer theory and Dirichlet processes. The challenge thereby is to disentangle the geometrically overlapping neighbourhoods. In a second step we expand spatial autocorrelation measures towards acknowledging this overlapping character by means of partial autocorrelation functions. This will prevent mixing different phenomena and leads to realistic dependency structures.

While the first two packages focus on the point level, the third aspect addresses suitable aggregation strategies. These strategies involve traditional clustering techniques and indices from point pattern analysis. This allows analysing dependencies between different kinds of compound social activities. Further, aggregating tweets allows investigating the relationship of social processes towards their immediate surroundings. This will be a second step of this work package.

Overall, our research will enable for gaining an increased and detailed understanding of social activities and their respective spatial mechanisms through improved methods allowing to analyse representations of these within socio-technical systems.