Improvement of Task-Oriented Visual Interpretation of VGI Point Data (TOVIP)

Jochen Schiewe
g2lab / HafenCity University Hamburg
Martin Knura
g2lab / HafenCity University Hamburg

Volunteered Geographic Information (VGI) is very often generated as point data (e.g. Points of Interests, location of a photo taken). As one of the main characteristics, VGI data show an enormous volume as well as semantic and temporal heterogeneity. At a certain map scale and amount of data, this will lead to point clutters, which are not only hiding important information, but also making the map unreadable. Thus, reducing geometric and thematic clutter and improving the interpretability of static, multi-scale or multi-temporal visualizations of VGI points is a task of major relevance. Instead of looking at isolated generalization operations only, the project TOVIP – „Improvement of task-oriented visual interpretation of VGI point data” focuses on optimizing generalization workflows designed for specific high-level visual interpretation tasks, especially focusing on the identification and preservation of spatial patterns.

Normally, generalization operations like aggregation, selection or simplification, are applied in order to overcome the aforementioned clutter problems, merging the user-generated information by reducing the amount of visible point symbols. Nevertheless, under certain conditions, these generation methods disperse spatial patterns, reducing the usability in visual presentation and exploration, especially when the interpretation of high-level patterns (e.g. hot spots, extreme values) is of interest. Therefore, the TOVIP project focuses on the optimization of generalization workflows regarding these specific visual interpretation tasks.

Modelling and optimizing generalization workflows is often done using a constraint-based approach, where constraints are defined as requirements that shall be fulfilled and therefore need the definition of related quantitative measures. When using constraint-based approaches to interpret spatial patterns in a generalized visualization, there are two potentially contradictory aspects to consider: Preservation constraints ensure that the generalized data inherits the existing patterns like clusters or extreme values, while legibility constraints assure that these patterns are still readable by users. Nevertheless, complex measures for evaluating synoptic interpretation tasks based on generalized visualizations are still difficult to define, so the first step of the research project will be a user study to get a better understanding of user behavior during high-level interpretation tasks, which can be used to define constraints and measures for static, multi-scale or multi-temporal visualizations of VGI points. In the second part of the project, the generalization workflow using the previous defined constraints will be processed and controlled through an agent based modeling approach.

SIDE NOTE: A new method using think-aloud interviews and techniques from visual analytics

study design

Due to the COVID-19 pandemic at the time of the study, we had to look for remote and contactless substitutes for our planned eye-tracking study. As a result, we developed an alternative method, completely feasible under COVID-19 restrictions. The main technique are think aloud interviews, where participants constantly verbalize their thoughts as they move through a test. We record the screen and the mouse movements during the interviews, and analyse both the statements and the mouse positions afterwards. Afterwards, we encode the approximate map position of the user’s attention for each second of the interview. This allows us to use the same visual methods as for eye-tracking studies, like attention maps or trajectory maps. We describe the whole method in [1] and published the code here.

attention map flow map

User study

The first goal of the project was to get a better understanding of task-specific user behavior. Therefore, we conduct a user study where participants have to perform different interpretation tasks – like finding clusters within a dataset, comparing point densities, or finding areas with a specific point distribution –, using our novel method described above. In the following, we will describe our findings regarding the task-solving strategies of the participants, and the implications of the study results on the defining of constraints for our agent-based model. A more detailed description of our study can be found in [2].

Task-solving strategies

Although we had different categories of interpretation tasks such as pattern identification, pattern comparison and relation seeking, the task-solving strategies did not differ significantly between different kind of tasks. For further analysis, we divide the overall task-solving strategies for each task and participant into three sequential actions:

Finding a start position
Obtaining information
Decision-making

As a result, we found out that for all of these steps, point density has the biggest impact on the user behavior. It was the most important factor when selecting a starting position on the map, and more dense clusters were described and analysed earlier and more often (see also the attention map in the left image above). Furthermore, participants discussed both interrelations between clusters of different densities, as well as between different classes of points within the same cluster. Because point density was also the main evaluation measure in comparison tasks and during decision-making, at has to be addressed in the first place when defining constraints.

Implications for defining constraints for map generalization

Following the results of the study, there are two main aspects to consider. First, it is of major importance to preserve the original pattern proportions during the generalization process. More detailed, the agent-based model should:

Retain the proportion of points between areas with different densities
Preserve the ranking of densities between different areas
Preserve proportions between classes while maintaining at least one point per class
Preserve Gestalt Law Rules regarding similarity and proximity of clusters

Second, it could be helpful to use cartographic techniques to guide the interpretation of the data, e.g.:

Use cartographic style elements where pattern preservation is difficult to manage
Optimize the guiding effect of the background map (e.g. preservation of other map objects in close proximity to point clusters)

Measures

The next step in the project is to define constraints and respective measures based on the findings of the study. Therefore, we collect a list of different approaches – both from the literature and own experiments – and test them on exemplary point distributions.

The list of candidates contains of macro measures (such as the Radical Law, the amount of information as the number of all map objects and an index to characterize the spatial distribution of points based on Voronoi regions), micro measures (such as the object-oriented density, the number of natural neighbors and the appearance of local extreme values) and meso measures (such as the number of cluster members, the existence of different point categories in a cluster and the shape of a cluster). We thereby subdivide the constraints and respective measures into three groups:

Measures describing the overall distribution of points and the density ranking between different areas of the map
Measures preserving pattern-specific characteristics like hot spots, extreme values, cluster density etc.
Measures describing Gestalt Law Rules

Next, we compare the measures and their performance on different point distributions to identify redundancies, and examine the robustness on point cardinality, which is essential when applied in map generalization operations. We create a series of experimental point distributions with 100, 200, 500 and 1000 points and different characteristics: a regular and a random distribution, distributions where we predefined regular (gridded distribution) and irregular areas (pattern distribution), and distributions with loose and clear clusters (see the example with 200 points):
point distributions

The final list of measures that we initially implement in the agent-based model can be seen below. Because most of the measures are defined in code blocks outside the actual agent-based model, it is possible to adopt measures from other scale levels during model optimization.

Minimum set of measures to control the agent-based model

1. Overall distribution of points/cluster rankings

spatial distribution of points
cluster density ranking

2. Pattern-specific characteristics

local extreme values
point category preservation
mean distance to cluster members
distance to the origin location

3. Gestalt Law

maximum number of points
shape of a cluster

Model

We implement our agent-based model using the Mesa framework. It includes four core components (Model, Agent, Schedule and Space) along with additional components for analysis and visualization. We combined the framework of Mesa with the requirements of a map generalization model, leading to the following architecture of our TOVIP agent-based model:

tovip model

We have two types of map agents in our model: A PointAgent represents a point object in the map, while a ClusterAgent represents a predefined spatial pattern. For both types, we followed the approach of Duchêne et al. (2018) and decompose the ‘brain’ of the agent into three main components: capacities, mental representation and procedural knowledge. In each step of the model simulation, all measures are updated for each agent. Next, the measures get translated into a Likert-like satisfaction scale which ranges from 1 (“unacceptable”) to 8 (“perfect”) and represents the mental state of each agent - i.e. the degree of which the agent satisfies the constraints. Each measure thereby has its own method for translation, which has to be defined in advance (via measure satisfaction functions). Based on the agent’s constraint satisfaction and the knowledge of the past steps, it decides which operation it should execute in the next step.

Map specifications and model setup

There are also global map specifications and characteristics which are important for the process of generalization in general, and for the point generalization in particular, and which have to be defined in advance, such as:

the scale of the source map
the target scale of the map
information if the source map satisfies all legibility constraints regarding the point symbols (i.e. the source map has no point clutter and is readable)
the (pixel) size of the point symbols
information if the point data set contains different classes (if yes: the respective scale of measurement)

While these global map specifications are determined in most of the use cases for point generalization (e.g. the target map scale via predefined zoom levels), they can also be changed in the model setup. For this model setup, we utilize an approach proposed by Taillandier and Gaffuri (2012) to help the user with parameterisation using a human-machine dialogue. We offer a guided user interface - currently as part of a Jupyter Notebook - where parameter adjustments get visualized via samples on a map. It allows the user to adjust the satisfaction scales by modifying the class dividers, which are predefined as a function of respective global map attributes such as scale and point cardinality. See below for the visualization of parameter adjustments for point size (left) and the distance measure classes:

point size setup distance measure setup

Results

Results of the TOVIP model

Note that the scale value was given as an input for the model and may be appropriate on the images.

The data snippet we use for visualizing the capability of the TOVIP model is the result of our work on the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), which was done in the Young Research Group “Using Object Detection on Social Media Images for Urban Planning (Bikes in Cities)”. The example image above shows the number of bicycles detected on social media images in the city of Dresden at the scale of 1:15.000 (far left), and the generalized output by the TOVIP model at the scale of 1:35.000 (far right). In between are the original data in the scale of 1:35.000 without generalization, and the generalized data in the same scale as the original map. The image below highlights the ability of the model to preserve point densities and dense areas (green area), extreme values (examples marked in red) and eye-catching shapes (marked in blue) while reducing clutter effects (yellow area).

Constraint fulfillment

Note that the scale value was given as an input for the model and may not be appropriate on the images.

Code

Code is available here.

Publications

Knura, M., & Schiewe, J. (2022). Analysis of User Behaviour While Interpreting Spatial Patterns in Point Data Sets. KN - Journal of Cartography and Geographic Information. DOI: 10.1007/s42489-022-00111-9
Knura, M., & Schiewe, J. (2021). Map Evaluation under COVID-19 restrictions: A new visual approach based on think aloud interviews. Proceedings of the ICA, 4, 60. DOI: 10.5194/ica-proc-4-60-2021