Privacy-aware Flickr YFCC grid aggregation using HyperLogLog
Through volunteering data, people can help assess information on various aspects of their surrounding environment. Particularly in natural resource management, Volunteered Geographic Information (VGI) is increasingly recognized as a significant resource, for example, supporting visitation pattern analysis to evaluate collective values and improve natural well-being. In recent years, however, user privacy has become an increasingly important consideration. Potential conflicts often emerge from the fact that VGI can be re-used in contexts not originally considered by volunteers. Addressing these privacy conflicts is particularly problematic in natural resource management, where visualizations are often explorative, with multifaceted and sometimes initially unknown sets of analysis outcomes. In this paper, we present an integrated and component-based approach to privacy-aware visualization of VGI, specifically suited for application to natural resource management. As a key component, HyperLogLog (HLL)—a data abstraction format—is used to allow estimation of results, instead of more accurate measurements. While HLL alone cannot preserve privacy, it can be combined with existing approaches to improve privacy while, at the same time, maintaining some flexibility of analysis. Together, these components make it possible to gradually reduce privacy risks for volunteers at various steps of the analytical process. A specific use case demonstration is provided, based on a global, publicly-available dataset that contains 100 million photos shared by 581,099 users under Creative Commons licenses. Both the data processing pipeline and resulting dataset are made available, allowing transparent benchmarking of the privacy–utility tradeoffs
This is the git repository containing the supplementary materials for the paper
Dunkel, A., Löchner, M., & Burghardt, D. (2020). Privacy-aware visualization of volunteered geographic information (VGI) to analyze spatial activity: A benchmark implementation. ISPRS International Journal of Geo-Information. DOI / PDF
The notebooks are stored as markdown files with jupytext for better git compatibility.
These notebooks can be run with jupyterlab-docker.
In addition to the release file, latest HTML converts of notebooks are available here:
There are two additional HTMLs produced in notebooks:
- yfcc_compare_raw_hll.html includes all figures and allows to compare raw/HLL through HTML tabs.
- yfcc_usercount_est.html interactive map of worldwide user count, for comparison of raw/HLL per bin.
Convert to ipynb files
First, either download release files or convert the markdown files to working jupyter notebooks.
To convert jupytext markdown files:
If you’re using the docker image, open a terminal inside jupyter and follow these commands:
bash conda activate jupyter_env && cd /home/jovyan/work/
Afterwards, re-create the
.ipynb notebook(s) with:
mkdir notebooks jupytext --set-formats notebooks///ipynb,md///md,py///_/.py --sync md/01_preparations.md jupytext --set-formats notebooks///ipynb,md///md,py///_/.py --sync md/02_yfcc_gridagg_raw.md jupytext --set-formats notebooks///ipynb,md///md,py///_/.py --sync md/03_yfcc_gridagg_hll.md jupytext --set-formats notebooks///ipynb,md///md,py///_/.py --sync md/04_interpretation.md
- v.0.1.1: Minor refactor as part of the submission process
- v.0.1.0: First release submitted alongside the paper.