Uncertainty-Aware Enrichment of Animal Movement Trajectories by VGI (BirdTrace)

Philipp Meschenmoser
Konstanz
Prof. Dr. Daniel Keim
Konstanz
Matthias Miller, Research Associate
University of Konstanz
Yannick Metz, PhD Student
University of Konstanz

Summary of the project

Recent advances in remote sensing technology (e.g., miniaturization, battery life increases) enable the wholesale, reliable, and accurate recording of animal movements. Further, the sensor-based tracking is strongly promoted by wide-ranging projects such as the [Movebank] data repository or the [ICARUS-Initiative], for which an antenna on the ISS gathers animal locations on a global base. Having the resulting movement trajectories, academia want to better understand and address some of the most urgent problems of our time: climate change, species decline, natural crises, disease transmissions--- to name only a few. However, the sensor-based trajectories commonly lack in population quantification and do not entail a thorough view on the highly complex, spatiotemporal movement context. In this project, we address the two latter issues. We use human observations of animals to enrich the trajectories and improve their understanding. For this enrichment, we can consider observation descriptions, photos, even audio and video recordings from several VGI web portals. Initially, this project refers to data integration, uncertainty assessment, interactive matching, and trajectory annotation. Once suitable VGI is identified, we will enrich existing movement prediction models by VGI and enable a bidirectional verification process. Throughout this project, we realize multiple visual-interactive applications that shall ensure the uncertainty-aware and semi-automated enrichment and analysis by movement ecologists. While VGI portals already provide several uncertainty and quality measures (e.g., accuracy, community-based rankings), we will also derive own measures: e.g., text-derived features, or by the automated comparison with species distribution maps. Moreover, matching quality is of particular interest, for instance referring to spatiotemporal distances or taxon epithet mismatches. With this project, we aim to fill distinct research gaps and hope to improve the thorough understanding of animal movements by combining benefits from sensor recordings and human observations. This project benefits from a direct Movebank linkage, as well as from the close collaboration with the [Max Planck Institute of Animal Behavior] and well-established VGI portals.

Overview

As part of the project, we present 5 individual contributions:

MultiSegVA Visual Analytics for high-quality biologging GPS trajectory data
Integration of User-Generated Content Integrating Flickr images into currated citizen plattforms
BirdTrace Enriching bird movement data with VGI contributions incoroprating uncertainty
RIVA Supporting imitation and reinforcement learning with visual analytics
Skirouting Visual Exploration of Preference-based Routes in Ski Resorts

Analyzing biologging GPS trajectory data

Segmenting biologging time series of animals on multiple temporal scales is an essential step that requires complex techniques with careful parameterization and possibly cross-domain expertise. Yet, there is a lack of visual-interactive tools that strongly support such multi-scale segmentation. To close this gap, we present our MultiSegVA platform for interactively defining segmentation techniques and parameters on multiple temporal scales. MultiSegVA primarily contributes tailored, visual-interactive means and visual analytics paradigms for segmenting unlabeled time series on multiple scales. Further, to flexibly compose the multi-scale segmentation, the platform contributes a new visual query language that links a variety of segmentation techniques. To illustrate our approach, we present a domain-oriented set of segmentation techniques derived in collaboration with movement ecologists. We demonstrate the applicability and usefulness of MultiSegVA in two real-world use cases from movement ecology, related to behavior analysis after environment-aware segmentation, and after progressive clustering. Expert feedback from movement ecologists shows the effectiveness of tailored visual-interactive means and visual analytics paradigms at segmenting multi-scale data, enabling them to perform semantically meaningful analyses. A third use case demonstrates that MultiSegVA is generalizable to other domains.

MultiSegVA, Meschenmoser et al. (2021)

Visualization of Multi-Scale Time Series Segmentation

MultiSegVA supports a workflow targeted at domain experts, from loading time series data, exploring individual dimensions, to interactively defining segmentation techniques and parameters on multiple temporal scales.</a>.

An overview of the main user interface of MultiSegVA. The upper icicle plot visualizes the results of the multi-scale segmentation. The bottom part contains a plot of individual time series dimensions, Meschenmoser et al. (2021).

Visual Query Language

The visual query langague enables the user to flexibly compose multi-scale segmentation techniques. The user can define a query by selecting a set of techniques and parameters, and then apply the query to the data. The query is then executed and the results are visualized in the overview panel.

Summary of Discussion This work ties into our overall goal of linking high-quality but sparse trajectory data with abundant VGI data of varying quality. On it's own, such biologging data can already provide significant insights.

Integration of User-Generated Content

In a joint-collaborative effort within the SPP, we focus on investigating the potential of integrating user-generated content (UGC) from different data sources. The goal is to provide a new data source for movement ecologists, which can be used to complement the analysis of biologging data.

Combining red kite observations from Flickr, eBird and iNaturalist with a text and image analysis workflow, Hartmann et al. (2022)

There is an urgent need to develop new methods to monitor the state of the environment. One potential approach is to use new data sources, such as User-Generated Content, to augment existing approaches. However, to date, studies typically focus on a single date source and modality. We take a new approach, using citizen science records recording sightings of red kites (Milvus milvus) to train and validate a Convolutional Neural Network (CNN) capable of identifying images containing red kites. This CNN is integrated in a sequential workflow which also uses an off-the-shelf bird classifier and text metadata to retrieve observations of red kites in the Chilterns, England. Our workflow reduces an initial set of more than 600,000 images to just 3065 candidate images. Manual inspection of these images shows that our approach has a precision of 0.658. A workflow using only text identifies 14% less images than that including image content analysis, and by combining image and text classifiers we achieve almost perfect precision of 0.992. Images retrieved from social media records complement those recorded by citizen scientists spatially and temporally, and our workflow is sufficiently generic that it can easily be transferred to other species.

Discussion

In the work, care is given to a detailed analysis of the collected contributions, includings factors such as the distribution of contributions, temporal factors, and the quality of the contributions. This analysis is important to understand the potential of the data source, and to identify potential biases.

A visualization of contributions to eBird by top contributor, (Birdtrace).

Our Birdtrace application, which we will present in more detail in the next section, also provides a visualization of user contribuions. This visualization can be used to identify areas of high user activity, which could point to areas of high biodiversity. Furthermore, the interactive visualization also presents temporal trends, and attributes observations to different users and species.

While the presented appraoch can increase the amount of data available for analysis, it also introduces new challenges. For example, the data is often of lower quality of currated and verified data. this work therefore fits well into our overall goal of enriching GPS tracking data with VGI contributions. The presented workflow is able to output confidence scores for each observation, which could be used to quantify uncertainty in the data. In the next step, we investigate the joint analysis of biologging data and VGI contributions, incoroporating uncertainty into the analysis.

Enriching Bird Movement Data with VGI Contributions

After having discussed biologging time series and VGI contributions in isolation, we now turn to the question of how to combine these two data sources. The goal is to enrich the biologging data with VGI contributions, which can be used to complement the analysis of biologging data. Our approach is a two-directional matching approach.

Data Processsing Pipeline & Matching

Our goal is to enable the enrichment of trajectory data with multi-media VGI contributions (e.g., images, video, audio, or text descriptions). This matching can can help the analysis of VGI data in focusing on relevant instances. Here, relevance refers to "how well a domain-expert can use the found VGI contributions to answer specific questions?". As this implies, the criteria for relevance might therefore depend on the problem. We tackle this problem by giving users the possibility to choose between different matching criteria, e.g. based on spatial or temporal distance, as well as potential classified behaviors like breeding. By including additional data sources, one might increase the number of possible matching criteria in the future. Let's look at the way we enable automated matching of VGI contributions with trajectory data:

We assume the availability of GPS trajectory data for individuals of a species of interest. As our data source, we utilize Movebank.
Secondly, collect and locally store multi-media VGI data from citizen science platforms, and potentially Flickr (as described in the previous chapter).
Based on a user query and selected matching criteria, we match individual VGI contributions with GPS trajectory data.
Trajectories and VGI contributions are jointly visualized in an interactive visual analytics application, which facilitates analysis by a domain expert.
The user can use additional interactive tools to search, filter and highlight the matched contributions.

Different data sources are integrated in a joint pipeline, and can be analyzed in a common visual analytics workspace, (Birdtrace).

To facilitate the matching process, we implemented a data processing pipeline, which applies appropriate pre-processing to both the GPS trajectory data and the VGI contributions. We apply steps like line simplification , motif discovery, and outlier detection to the trajectory data to reduce the size of the data, and to simplify the matching computations. VGI contributions from VGI portals like eBird, iNaturalist, and GBIF are collected and processed. To simplify analysis, we use pre-computing and caching of data. This enables efficient clustering of VGI contributions, and fast matching of VGI contributions with trajectory data.

Visual Analytics Application

We present the results in an interactive visual analytics application. Care is taken to visually represent data of different data sources, incorporating charcteristics such as uncertainty of matches and data sources.

, Glyph Designs for different data characteristics: Uncertainy of match, Data source, and observable behavior.(Birdtrace).

Supporting Imitation and Reinforcement Learning with Visual Analytics

A Workflow for Imitation and Reinforcement Learning, Metz et al. (2022)

Multiple challenges hinder the application of reinforcement learning algorithms in experimental and real-world use cases even with recent successes in such areas. Such challenges occur at different stages of the development and deployment of such models. While reinforcement learning workflows share similarities with machine learning approaches, we argue that distinct challenges can be tackled and overcome using visual analytic concepts. Thus, we propose a comprehensive workflow for reinforcement learning and present an implementation of our workflow incorporating visual analytic concepts integrating tailored views and visualizations for different stages and tasks of the workflow.

The presented workflow for visual analytics in imitation and reinforcement learning, Metz et al. (2022).

Approach

We describe a workflow for integrating reinforcement and imitation learning with visual analytics, and introduces a web-based application called RIVA that operationalizes this workflow. The proposed workflow involves three stages: Setup and Debugging, Model Training, and Evaluation. The application supports these three stages via different visualizations and guided user interactions. RIVA is tightly integrated with existing community-driven frameworks for reinforcement and imitation learning, as well as with more generic experiment tracking tools like Tensorboard. The application is kept use-case agnostic and highly modular to accommodate a wide range of use cases. The article presents one use case involving data-driven learning of the behavior of fish schools using both reinforcement and imitation learning, and explains how RIVA was effective in maintaining a high level of productivity and consistency throughout an iterative design process.

A screenshot of the developed application RIVA comprising of different views to support setup, training, and evaluation of reinforcement learning models, Metz et al. (2022).

Relation to the project

Going forward, the promising results of state-of-the-art deep learning based methods, specifically for reinforcement and imitation learning, could help to improve the quality of prediction models, e.g. for animal movement. However, the development of such models is a complex process that requires a lot of expertise and time. The presented workflow and application RIVA can help to reduce the complexity of the development process and to increase the productivity of the development process. The application is designed to be highly modular and can be easily adapted to different use cases. We therefore strive to contribute future development efforts by simplifying the application of reinforcement and imitation learning to new use cases, including in the field of VGI and derived applications. The application is open-source and can be found at: https://github.com/ymetz/rlworkbench

Visual Exploration of Preference-based Ski Routes

Preference-based Ski Routing, Rauscher et al. (2022)

Optimal ski route selection is a challenge based on a multitude of factors, such as the steepness, compass direction, or crowdedness. The personal preferences of every skier towards these factors require individual adaptations, which aggravate this task. Current approaches within this domain do not combine automated routing capabilities with user preferences, missing out on the possibility of integrating domain knowledge in the analysis process.

Visualization

The interface of of SkiVis, Rauscher et al. (2022)

We introduce SkiVis, a visual analytics application to interactively explore ski slopes and provide routing recommendations based on user preferences. In a case study on the resort of Ski Arlberg, we illustrate how to leverage volunteered geographic information to enable a numerical comparison between slopes.

Usage of User-generated Route data

We obtained information regarding ski resorts' slopes and lifts from OpenStreetMaps (OSM), where information such as the name of the slope, its difficulty classification, the reference number, or grooming conditions is available aside for ski slopes. Information regarding traffic conditions is more complicated to obtain, we opted for using volunteered geographic information (VGI) from Strava, where users can record and share exercise data of different leisure activities, such as skiing. We extracted trajectory data from more than 15.000 activities to gain insights on the crowdedness, as well as an estimate for the required skiing time for each individual slope.

VGI Portals

Publications

Meschenmoser, P., Buchmüller, J. F., Seebacher, D., Wikelski, M., & Keim, D. A. (2021). MultiSegVA: Using Visual Analytics to Segment Biologging Time Series on Multiple Scales [Journal Article]. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1623–1633. DOI: 10.1109/TVCG.2020.3030386
Rauscher, J., Miller, M., & Keim, D. A. (2022). Visual Exploration of Preference-based Routes in Ski Resorts. In M. Krone, S. Lenti, & J. Schmidt (Eds.), EuroVis 2022 - Posters. The Eurographics Association. DOI: 10.2312/evp.20221123
Metz, Y., Schlegel, U., Seebacher, D., El-Assady, M., & Keim, D. (2022). A Comprehensive Workflow for Effective Imitation and Reinforcement Learning with Visual Analytics. EuroVis Workshop on Visual Analytics (EuroVA). DOI: 10.2312/eurova.20221074
Hartmann, M. C., Schott, M., Dsouza, A., Metz, Y., Volpi, M., & Purves, R. S. (2022). A text and image analysis workflow using citizen science data to extract relevant social media records: Combining red kite observations from Flickr, eBird and iNaturalist. Ecological Informatics, 71, 101782. DOI: https://doi.org/10.1016/j.ecoinf.2022.101782
Metz, Y., Bykovets, E., Joos, L., Keim, D. A., & El-Assady, M. (2023). VISITOR: Visual Interactive State Sequence Exploration for Reinforcement Learning. Eurovis2023, Under Review.