VGIscience | Heidelberg: A common data structure concept

At VGIscience Collaborative Research Week in Heidelberg, we worked on a common conceptual data model for analyzing, comparing and relating information of social networks. Doing so, we focus on three important directions:

Case Studies for applications of this structure and exemplary visualizations
Theoretic description and guidelines of concept and technical implementation
Developing example base functions and algorithms for information retrieval and inter-linkage of data entities

One example output is a common interactive interface based on Metabase, a framework that allows analysts to interactively explore and visualize data based on live connections to multiple types of databases.

Here is one visualization from our example dataset from Heidelberg. It shows all other places visited by users who went to the bar “Hörnchen”. This could be an entry point for analysts to study the spatial frequentation behaviour of a specific user type in Heidelberg (e.g. “Hörnchen Visitors”), for example.

Below is the underlying table, showing other places sorted by descending user count:

Our (brief) interpretation: It seems like we deal with a lot of students with the initial “Hörnchen” filter (University, Library etc)!

For reasons of privacy, original place guids are anonymized to Hashes. Furthermore, only places with more than 5 users visits are listed.

This all links to out our common, standardized lbsn data structure, where we decided to describe the concept with the platform and language independent data description language Protocol Buffers. It can be used to describe our proposed structure of our data, and then compile and implement it in any programming language such as Python, Java or C++.

Here is an example for the description of the data structure Place:

message Place {
    /* Primary Key. The unique identifier of the object in the database. */
    CompositeKey place_pkey = 1;
    /* Optional Attributes */
    string name = 2;
    int64 post_count = 3;
    string url = 4;   
    string geom_center = 5; //WKT Point
    string geom_area = 6; //WKT Polygon
    City city_pkey = 7;
}

We also demonstrated how to implement a mapping algorithm to relate two different sources for information enriching, which can be used, for example, to link OSM Polygons to Twitter, Instagram or Facebook Places. Such algorithms are the key to relating information coming from different datasources, which helps us to get a better overall understanding of relationships.

Finally, we provided a specific case study for applying our structure. The resulting topic modeling visualizations originally developed on Twitter data can be directly applied and tested with other data, in this case to the example datasets of Heidelberg taken from Flickr and Instagram. This gives analysts the ability to test and compare the suitability of algorithms across different networks and datasets.