Social media data is widely used to gain insights about social incidents, whether on a local or global scale. Within the process, it is common practice to download and store it locally, while privacy considerations are often neglected. However, protecting the privacy of social media users is demanded by laws and ethics. In order to prevent subsequent abuse, theft or public exposure of collected datasets, privacy-aware data processing is crucial.
In this project we developed a data storage concept that enables to process social media data with social media user’s privacy in mind. Based on the cardinality estimator HyperLogLog (HLL), the concept enables to store social media data, so that it is not possible to extract individual items from it. You can only estimate the cardinality of items within a certain set and run set operations over multiple sets to extend analytical ranges. However, applying this method requires to define the scope of the result before even gathering the data. This prevents the data from being misused for other purposes at a later point in time and thus follows the privacy by design principles.
Social media data processing graph. (a) Example post. (b) The post’s social, temporal, spatial (green), and topical data, and its hidden unique ID (red). (c) Encode the corresponding geohash from the geo-coordinates. The result represents the area plotted by the rectangle over the outlines of Dresden. (d) Store the post ID in the HLL set of the database record matching the geohash.
Proof-of-concept implementation to store social media data using the cardinality estimator HyperLogLog
Containerized database environment for the VGIsink project
Leaflet application for VGIsink location cardinalities
The EVA-VGI project studies quality, heterogeneity, subjectivity, spatial resolution and temporal relevance of geo-referenced social media data. Focusing on the integration of spatial, temporal, topical and social dimensions combined with an explicit link between events and reactions, the project presents a number of conceptual approaches and methods that enable a privacy-aware visual analysis of VGI in general and geo social media data in particular. The project has taken advantage of the results of our research by implementing HLL on datasets related to their publications.
- Dunkel et al., 2020
- Visualizing opinions, emotions and perceptions in social media data - while preserving user privacy
Together with the DVCHA and VA4VGI projects we formed a Young Research Group and carried out a case study, wherein we explored the deployment of HLL into disaster management processes. We developed and conducted a focus group discussion with VOST members, where we identified challenges and opportunities of working with HLL and compared the process with conventional techniques. Findings showed that deploying HLL in the data acquisition process of VOST operations will not distract their data analysis process. Instead, several benefits, such as improved working with huge datasets, may contribute to a more widespread use and adoption of the presented technique, which provides a basis for a better integration of privacy considerations in disaster management.
- Löchner, M., Dunkel, A., & Burghardt, D. (2018). A privacy-aware model to process data from location-based social media. VGI Geovisual Analytics Workshop, Colocated with Big Data Visual and Immersive Analytics Symposium.
- Löchner, M., Dunkel, A., & Burghardt, D. (2019). Protecting privacy using HyperLogLog to process data from Location Based Social Networks. LESSON 2019 - 1st International Workshop on Legal and Ethical Issues in Crowdsourced Geographic Information.
- Löchner, M., Fathi, R., Schmid, D., Dunkel, A., Burghardt, D., Fiedrich, F., & Koch, S. (2020). Case Study on Privacy-Aware Social Media Data Processing in Disaster Management. ISPRS International Journal of Geo-Information, 9(12), 709. DOI: 10.3390/ijgi9120709
- Dunkel, A., Löchner, M., & Burghardt, D. (2020). Privacy-aware visualization of volunteered geo-graphic information (VGI) to analyze spatial activity: A benchmark implementation. ISPRS International Journal of Geo-Information.
- Löchner, M., & Burghardt, D. (2022). Using HyperLogLog to Prevent Data Retention in Social Media Streaming Data Analytics. ISPRS International Journal of Geo-Information, 12(2). DOI: 10.3390/ijgi12020060