Extraction and Visually Driven Analysis of VGI for Understanding People’s Behavior in Relation to Multi-Faceted Context (EVA-VGI)

Moris Zahtila
TU Dresden
Maximilian Hartmann
University of Zurich, Geocomputation
Prof. Dr. Ross Purves
Zürich
Prof. Dr. Stefan Wrobel
Bonn
Prof. Dr.-Ing. Dirk Burghardt
TU Dresden
Dr.-Ing. Alexander Dunkel
TU Dresden
Prof. Dr. Gennady Andrienko
Bonn
Prof. Dr. Natalia Andrienko
Bonn

Summary of the project

Volunteered Geographic Information (VGI) in the form of actively and passively generated spatial content offers great potential to understand peoples´ activities, emotional perceptions and mobility behavior. Realizing this potential requires methods which take into account the specific properties of such data, for example its heterogeneity, subjectivity, spatial resolution, but also temporal relevance and bias. The aim of the project was to develop visual methods for analyzing human behavior from location-based social media and movement data.

Theoretical and conceptual frameworks

The facet framework allows characterization and comparison of collective reactions based on the following dimensions: spatial, temporal, social, thematic and interlinkage (Dunkel et al. 2018).

Facet Framework, Dunkel et al. (2018)

Behavior model, Burghardt et al. (in press)

A behavioral model was introduced that summarizes people’s reactions under the influence of one or more events (Burghardt et al., in press). In addition, influencing factors are described using a context model, which makes it possible to analyze visitation and mobility patterns with regard to spatial, temporal and thematic-attribute changes.

Illustration of a referent event E and all collective reactions *R(E)* with two example partitions R₂ and R₃. During the information spread that occurs in response to E, a new referent event E₂ is formed by partition R₃, Dunkel et al. (2018).

In analyzing behaviors, we aim at identifying general or repeated patterns, but what do we mean by “patterns”? In our theoretical model for pattern discovery, we proposed a formal definition of the concept of pattern in data, explained how patterns are formed by relationships between data items, discussed different types of patterns, described operations that can be done with patterns that have been discovered, and interpreted the established principles of visual representation of data from the perspective of enabling correct and effective pattern discovery ( Andrienko et al. 2021).

Theoretical model for pattern discovery, Andrienko et al. (2021)

Seeking Patterns of Visual Pattern Discovery for Knowledge Building, Andrienko et al. (2022)

Privacy-aware model, Löchner et al. (2018)

We performed an exploratory empirical study of how people identify and interpret data patterns in complex cartographic representations of spatial distributions and how they involve these patterns in reasoning and knowledge building. Eye tracking and voice recording were used to capture this process ( Andrienko et al. 2022). We considered several existing theoretical models of visually supported reasoning and knowledge building and found that none of them taken alone can adequately describe the processes we observed, but a combination of three particular models, including the pattern discovery model, may provide sufficient expressive power.

Additional Resources:

LBSN Structure - A common language independent, privacy-aware and cross-network social-media data scheme, implementing the four facets of the conceptual framework (Dunkel et al. 2018)

Bibliography: Theoretical and conceptual frameworks

Dunkel, A.; Andrienko, G.; Andrienko, N.; Burghardt, D.; Hauthal, E. and Purves, R. (2018). A conceptual framework for studying collective reactions to events in location-based social media. International Journal of Geographical Information Science, 33:4, 780-804. https://doi.org/10.1080/13658816.2018.1546390
Burghardt, D.; Dunkel, A.; Hauthal, E.; Shirato, G.; Andrienko, N.; Andrienko, G.; Hartmann, M.; and Purves, R. (in press). Extraction and Visually Driven Analysis of VGI for Understanding People’s Behavior in Relation to Multi-Faceted Context. Springer, 241-270.
Andrienko, N., Andrienko, G., Miksch, S., Schumann, H., & Wrobel, S. (2021). A theoretical model for pattern discovery in visual analytics. Visual Informatics, 5(1), 23–42. https://doi.org/10.1016/j.visinf.2020.12.002
Andrienko, N., G. Andrienko, S. Chen, and B. Fisher. 2022. “Seeking Patterns of Visual Pattern Discovery for Knowledge Building.” Computer Graphics Forum 41 (6): 124–48. https://doi.org/10.1111/cgf.14515.
Löchner, M.; Dunkel, A. and Burghardt, D. (2018). A privacy-aware model to process data from location-based social media. VGI Geovisual Analytics Workshop, colocated with BDVA 2018. Konstanz, Germany, 19. Okt 2018. In: BURGHARDT, Dirk, ed., Siming CHEN, ed., Gennady ANDRIENKO, ed., Natalia ANDRIENKO, ed., Ross PURVES, ed., Alexandra DIEHL, ed.. VGI Geovisual Analytics Workshop http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1tc0wl382uqkr0

Generic Methods

From sunrise to sunset, Dunkel et al. (2023a)

Geosocial media is increasingly recognized as an important resource, for example, to support the analysis of visitation patterns, assessing collective values, or improving human well-being through fair and equitable design of public green spaces. To this end, analysts must first assess "what" is collectively valued, "where", by "whom" and "when”, to understand the how and why of human behavior. However, the reproducibility of human behavior research is often impaired due to several biases affecting Geo Social Media. VGI and geosocial media are often noisy, limitedly representative, difficult to fully sample, and often shared through incompletely documented and opaque application programming interfaces (APIs). This means that samples, populations, and the phenomena being observed often change between studies. For this reason, we sought to develop a robust and transferable ‘workflow template’, for assessing human activities and subjective landscape values.

Chi-value merged - Instagram, user count for "sunrise" (blue) and "sunset" (red), Aug-Dec 2017. Focus on positive chi values, normalized to 1-1000 range, Head-Tail-Breaks, 100 km grid. Most significant five grid cells highlighted for sunrise (diamond) and sunset (square) (Dunkel et al. 2023a).

In a study by (Dunkel et al. 2023a), we explicitly limited the initial set of collected data to a narrow thematic filter - worldwide reactions to sunset and sunrise. This allowed us to compare parameter effects in isolation, test the robustness of existing measures and identify opportunities for improvement. Our results show that it is possible to disconnect the study of landscape preference from overall visitation frequencies, a common bias that analysts encounter in VGI and geosocial media analysis.

Signed chi equation used in Dunkel et al. 2023a, as adapted from Wood et al. (2007).

Code: Signed chi implementation (Python)

DOF = 1
CHI_CRIT_VAL = 3.84
CHI_COLUMN = "usercount_est"

def calc_norm(
    grid_expected: gp.GeoDataFrame,
    grid_observed: gp.GeoDataFrame,
    chi_column: str = CHI_COLUMN):
    """Fetch the number of data points for the observed and
    expected dataset by the relevant column
    and calculate the normalisation value
    """
    v_expected = grid_expected[chi_column].sum()
    v_observed = grid_observed[chi_column].sum()
    norm_val = (v_expected / v_observed)
    return norm_val

def chi_calc(
        x_observed: float, x_expected: float,
        x_normalized: float) -> pd.Series:
    """Apply chi calculation based on observed (normalized)
    and expected value
    """
    value_observed_normalised = x_observed * x_normalized
    a = value_observed_normalised - x_expected
    b = math.sqrt(x_expected)
    chi_value = a / b if b else 0
    return chi_value

def apply_chi_calc(
        grid: gp.GeoDataFrame, norm_val: float,
        chi_column: str = CHI_COLUMN,
        chi_crit_val: float = CHI_CRIT_VAL):
    """Calculate chi-values based on two GeoDataFrames
    (expected and observed values)
    and return new grid with results
    """
    grid['chi_value'] = grid.apply(
        lambda x: chi_calc(
           x[chi_column],
           x[f'{chi_column}_expected'],
           norm_val),
        axis=1)
    # add significance column, default False
    grid['significant'] = False
    # calculate significance for both negative and positive chi_values
    grid.loc[np.abs(grid['chi_value'])>chi_crit_val, 'significant'] = True

Python code snippet from the Jupyter Notebook with the implementation of the signed-chi equation (Dunkel et al. 2023a), as published in a separate data repository (Dunkel et al. 2023b).

By using the signed chi square test (with respect to the sample topic of reactions to the sunset and sunrise), we can identify collectively important places and areas, independent of overall user frequencies. The illustrated process can be seen as a blueprint, offering a workflow that can be adapted and transferred to other contexts, beyond reactions to the sunset and sunrise. To this effect, the code for data processing and creation of figures is fully provided in several notebooks shared in a separate data repository (Dunkel et al. 2023b). Furthermore, the use of abstracted, estimated non-personal data based on HyperLogLog, demonstrates a practically viable solution, supporting a shift towards privacy-preserving and ethically-aware data analytics in research on human preferences.

Additional Resources:

Bibliography: Representativity and bias in location based social media

Dunkel, Alexander, Maximilian C. Hartmann, Eva Hauthal, Burghardt Dirk, and Ross S. Purves. 2023a. “From Sunrise to Sunset: Exploring Landscape Preference through Global Reactions to Ephemeral Events Captured in Georeferenced Social Media.” PLoS ONE 17 (1). https://doi.org/10.1371/journal.pone.0280423.
Dunkel, Alexander, Maximilian C. Hartmann, Eva Hauthal, Burghardt Dirk, and Ross S. Purves. 2023b. “Supplementary Materials for the Publication ‘From Sunrise to Sunset: Exploring Landscape Preference through Global Reactions to Ephemeral Events Captured in Georeferenced Social Media.’” OpARA. https://doi.org/10.25532/OPARA-200.

Methods for comparative analyses

Time series shapes, Shirato et al. (2023)

Tactical Analysis in Football, Andrienko et al. (2021)

Co-Bridges, Chen et al. (2021)

Identifying, exploring, and interpreting time series shapes in multivariate time intervals

To analyze a behavior unfolding during a long time period, we may need to divide it into parts called episodes. We developed an approach to analyzing episodes of behaviors described by multivariate numeric time series data (Shirato et al. 2023). It involves recognition of predefined types of patterns in the temporal variation of the singular variables within episodes and visually supported discovery of more complex patterns made by temporal relationships between the simple patterns.

Constructing Spaces and Times for comparative analysis

We developed a generic visual analytics framework for identifying, exploring, and comparing patterns of collective movement in different classes of situations (Andrienko et al. 2021). It includes a combination of visual query techniques for flexible selection of episodes of situation development, a method for dynamic aggregation of data from selected groups of episodes, and a data structure for representing the aggregates that enables their exploration and use in further analysis. The approach was tested in application to tracking data from football games. It enabled detection and interpretation of interesting general patterns of team behaviors and revealing behavior differences between classes of game situations.

Comparison for Multi-item Data Streams

Examples of comparing the tweets of Hilary Clinton (pink) and Donald Trump (blue) during the presidential election of 2016 with the use of Co-Bridges. Discussions on issues “America”, “families”, “great”, and “AmericaFirst” are juxtaposed for comparison (Chen et al. 2021).

For comparing data streams involving multiple items (e.g., words in texts, actors or action types in action sequences, visited places in itineraries, etc.), we propose Co-Bridges ( Chen et al. 2021), a visual design that uses river and bridge metaphors, where two sides of a river represent data streams, and bridges connecting temporally or sequentially aligned segments of streams are used to show commonalities and differences between the segments.

Bibliography

Gota Shirato, Natalia Andrienko, Gennady Andrienko (2023) Identifying, exploring, and interpreting time series shapes in multivariate time intervals, Visual Informatics, 2023, https://doi.org/10.1016/j.visinf.2023.01.001
Gennady Andrienko, Natalia Andrienko, Gabriel Anzer, Pascal Bauer, Guido Budziak, Georg Fuchs, Dirk Hecker, Hendrik Weber, and Stefan Wrobel (2021). Constructing Spaces and Times for Tactical Analysis in Football. IEEE Transactions on Visualization and Computer Graphics, 2021, vol. 27(4), pp.2280-2297 https://doi.org/10.1109/TVCG.2019.2952129
S. Chen, N. Andrienko, G. Andrienko, J. Li and X. Yuan, "Co-Bridges: Pair-wise Visual Connection and Comparison for Multi-item Data Streams," in IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 1612-1622, Feb. 2021, doi: 10.1109/TVCG.2020.3030411.

Sentiment, emotion and activity analysis

Since geosocial media are used to state opinions, express emotions, or document experiences, they contain a lot of subjective information. The recognition of such subjective phenomena is usually done via natural language processing, which is by now quite sophisticated, but can hardly recognize irony or sarcasm, for example, and is often applied limited to one or a few languages. Promising solutions have been achieved in this context with emojis, which have become extremely popular in geosocial media and are available in steadily growing numbers.

Emojis as contextual indicants

Emojis as contextual indicants, Hauthal et al. (2021)

Sentiment Analysis, Hauthal et al. (2020)

A use of emojis to investigate subjectivity was implemented in a study by Hauthal et al. (2021), proposing the measure of typicality. Typicality is a relative measure specifically tailored for geo-social media that determines how typical a particular object of interest (e.g., emoji or hashtag) is within a sub-dataset compared to the total dataset. Sub-datasets may be formed spatially, temporally, thematically, etc. Typicality is calculated by the normalized difference of two relative frequencies and returns a positive (= typical) or negative (= atypical) value.

Correlation between the typicality of environmental groups of location-specific emojis within each country and the four geographical attributes (bold: respective representative geographical attribute; green: correlation coefficient for representative geographical attribute is positive and the maximum; yellow: correlation coefficient for representative geographical attribute is positive, but not the maximum; red: correlation coefficient for representative geographical attribute is negative), Hauthal et al. 2021.

Typicality was used to identify emojis in the previously mentioned global Instagram dataset that provide information about the context of the user while observing the event. On the one hand, these emojis deliver information about activities performed and on the other hand also about perceived landscape features in the immediate surroundings. It was found that emojis provide more detailed information in this regard than the hashtags contained in the same dataset. Moreover, location-specific emojis were identified, which are chosen depending on the location and match the features of the physical environment, as shown by matching them with geographic attributes. This proves that emojis are not randomly chosen, but provide insights not only into the user's situational context, but also into their perception and thus appreciation of certain aspects of the environment.

Bibliography

Hauthal, E., Dunkel, A., & Burghardt, D. (2021). Emojis as contextual indicants in location-based social media posts. ISPRS International Journal of Geo-Information, 10(6). https://doi.org/10.3390/ijgi10060407
Hauthal, E.; Burghardt, D., Fish, C. and Grifin, A. (2020). Sentiment Analysis. International Encyclopedia of Human Geography (2nd Edition), 169-177, https://doi.org/10.1016/B978-0-08-102295-5.10593-1

Application-oriented workflows

Landscape character assessment

Landscape Value Patterns across Europe, Olafsson (2022)

From Online Texts to Landscape Character Assessment, Olga et al. (2020)

Bibliography

Stahl Olafsson, Anton, Ross S. Purves, Flurina M. Wartmann, Maria Garcia-Martin, Nora Fagerholm, Mario Torralba, Christian Albert, et al. 2022. “Comparing Landscape Value Patterns between Participatory Mapping and Geolocated Social Media Content across Europe.” Landscape and Urban Planning 226 (October): 104511. https://doi.org/10.1016/j.landurbplan.2022.104511.
Koblet, Olga, and Ross S. Purves. 2020. “From Online Texts to Landscape Character Assessment: Collecting and Analysing First-Person Landscape Perception Computationally.” Landscape and Urban Planning 197 (May): 103757. https://doi.org/10.1016/j.landurbplan.2020.103757.

Activity analysis for landscape and urban planning

Even though individual people perceive landscapes and their attributed values differently, there are landscapes which the majority of people perceive as scenic and beautiful. These prolific landscapes (e.g. Preikestolen in Norway or Wildkirchli in Switzerland) are often depicted by characteristic motif images, which are clusters of images all taken from a similar viewpoint and angle. Which landscapes become popular is driven by propagation of landscape or nature appreciation through travel guides or art from the romantic area, popularizing a selective subset of landscapes; thus not a new phenomenon. Today, tourism agencies and other influencers (e.g. celebrities, companies, movies, songs) can shape landscapes through social media promotion by planting seed images that people will try to recreate and by doing so form new motifs.

Automated motif identification, Hartmann et al. (2022)

Tag Maps in der Landschaftsplanung, Dunkel (2021)

Privacy-Aware Visualization of VGI to Analyze Spatial Activity, Dunkel et al. (2020)

Analyzing Flickr images to identify popular viewpoints

Zooming in on Le Mont-Saint-Michel, Hartmann et al. (2022) illustrate how two motifs are extracted (coloured circles with dot), depicting the tidal island from two different viewpoints which make up only a small subset of all existing data at the given location.

By reaching millions of people and potentially influencing their future visiting plans, this social media induced tourism can have drastic physical consequences on the local environment, infrastructure and people (add citation). In a paper by Hartmann et al. (2022), we created an operationalizable conceptual model of motifs that is able to identify, extract and monitor prone landscapes based on geotagged social media data. More specifically, the proposed pipeline leverages creative-commons Flickr images from the YFCC100M dataset within the European Nature 2000 protected areas which represent a network of breeding and resting sites within important landscapes for rare and threatened species. Analysis of the motifs revealed that 65% depict cultural elements such as castles and bridges whereas the remaining 35% contain natural features that were biased towards coastal elements like cliffs. Ultimately, the early detection of emerging motifs and their monitoring allows the identification of locations subject to increased pressure which enables managers to explore why sites are being visited and to take timely and appropriate actions (e.g. allocation of infrastructure such as toilets and rubbish disposals or visitor routing).

Privacy-Aware Visualization of VGI to Analyze Spatial Activity

Illustration of the system model and the two cases of possible adversaries discussed in Dunkel et al. 2020.

In recent years user privacy has become an increasingly important consideration. Potential conflicts often emerge from the fact that VGI can be re-used in contexts not originally considered by volunteers. Addressing these privacy conflicts is particularly problematic in natural resource management, where visualizations are often explorative, with multifaceted and sometimes initially unknown sets of analysis outcomes. In a paper by Dunkel et al. (2020), we present an integrated and component-based approach to privacy-aware visualization of VGI, specifically suited for application to natural resource management. As a key component, HyperLogLog (HLL)—a data abstraction format—is used to allow estimation of results, instead of more accurate measurements.

Analyzing spatial relationships with HLL intersection, based on incremental union of user sets from benchmark data (100 km-grid) for France, Germany and the United Kingdom (**left**). The Venn Diagram (**right**) shows estimation of common user counts for different groups, and the percentage of error compared to raw data (Dunkel et al. 2020).

Additional Resources:

Tag Maps Python package
lbsntransform Python package, for transforming raw data to privacy-aware format (Dunkel et al. 2020)
Gitlab Repository with source code for "Privacy-Aware Visualization of VGI to Analyze Spatial Activity", Dunkel et al. (2020)
Shared Benchmark data (Dunkel et al. 2020)

Assessing experienced tranquility

Assessing experienced tranquillity, Wartmann et al. (2021)

Mapping indicators of cultural ecosystem, Gugulica et al. (2023)

Investigating sense of place, Wartmann et al. (2018)

Identifying tranquil areas is important for landscape planning and policy-making. Research demonstrated discrepancies between modelled potential tranquil areas and where people experience tranquillity based on field surveys. Because surveys are resource-intensive, user-generated text data offers potential for extracting where people experience tranquillity. In a study by Wartmann et al. (2021), we explore and model the relationship between landscape ecological measures and experienced tranquillity extracted from user-generated text descriptions.

Number of terms per facet of sense of place at ten different study sites (Wartmann et al. 2018).

Evaluation of potential keywords yielded six keywords associated with experienced tranquillity, resulting in 15,350 extracted tranquillity descriptions. The two most common land cover classes associated with tranquillity were arable and horticulture, and improved grassland, followed by urban and suburban. In the logistic regression model across all land cover classes, freshwater, elevation and naturalness were positive predictors of tranquillity. Built-up area was a negative predictor. Descriptions of tranquillity were most similar between improved grassland and arable and horticulture, and most dissimilar between aarable and horticulture and aurban. This study highlights the potential of applying natural language processing to extract experienced tranquillity from text, and demonstrates links between landscape ecological measures and tranquillity as a perceived landscape quality.

Indicators of cultural ecosystem services

In our increasingly urbanized world, the cultural ecosystem services (CES) provided by urban nature play a crucial role in enabling and maintaining the well-being of urban dwellers. Despite the increased number of studies leveraging geosocial media data for more efficient and socio-cultural-oriented CES assessment, the high complexity and costs associated with existing methods such as manual or automated image classification hinder their application in urban planning and ecosystems management. A study by Gugulica et al. (2023) introduces a novel method that draws on the semantic similarity between word2vec word embeddings to classify large volumes of geosocial media textual metadata and quantify indicators of CES use. We demonstrated the applicability of our approach by quantifying spatial patterns of aesthetic appreciation and wildlife recreation in the green spaces of the city of Dresden based on the classification of >50,000 geotagged Instagram and Flickr posts. Moreover, we analyzed and mapped semantic patterns embedded in geosocial media and gained essential insights that can contribute toward a context-dependent assessment of CES use, which in turn can help inform decision making for more sustainable planning and management of urban ecosystems. The performance evaluation of the classification proves the validity of the proposed unsupervised text classification approach as a practical, reliable, and more efficient alternative to laborious and expensive annotation efforts required by manual or supervised classification methods.

Social Media Images for Urban Bicycle Infrastructure Planning

Object Detection for Urban Bicycle Infrastructure Planning, Knura et al. (2021)

Visualizing Point Density on Geometry Objects, Zahtila et al. (2022)

Not only descriptive textual information and emojis can be used for the analysis of geosocial media data, but it is also possible to use the image information directly. As an application for urban bicycle infrastructure planning, an object recognition algorithm based on convolutional neural networks was used to identify bicycles and potential parking spaces. The research and development work was carried out as a cooperation of a Young Research Group within the framework of the priority program VGIscience (Knura et al. 2021). The research on object recognition was carried out in the COVMAP project, the processing of social media data and the development of methods for visual analysis was realized by the projects EVA-VGI and TOVIP.

Examples of correctly identified stationary (top, cyan boxes) and moving (bottom, yellow boxes) bicycles, with all detected persons marked in magenta boxes (Knura et al. 2021).

Bibliography

Dunkel, Alexander. 2021. “Tag Maps in Der Landschaftsplanung.” In Handbuch Methoden Visueller Kommunikation in Der Räumlichen Planung, edited by Diedrich Bruns, Boris Stemmer, Daniel Münderlein, and Simone Theile, 137–66. Wiesbaden: Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-29862-3_8.
Dunkel, Alexander, Marc Löchner, and Dirk Burghardt. 2020. “Privacy-Aware Visualization of Volunteered Geographic Information (VGI) to Analyze Spatial Activity: A Benchmark Implementation.” ISPRS International Journal of Geo-Information 9 (10): 607. https://doi.org/10.3390/ijgi9100607.
Hartmann, M. C., Koblet, O., Baer, M. F., & Purves, R. S. (2022). Automated motif identification: Analysing Flickr images to identify popular viewpoints in Europe’s protected areas. Journal of Outdoor Recreation and Tourism, 37 (January). https://doi.org/10.1016/j.jort.2021.100479
Wartmann, Flurina M., Olga Koblet, and Ross S. Purves. 2021. “Assessing Experienced Tranquillity through Natural Language Processing and Landscape Ecology Measures.” Landscape Ecology 36 (8): 2347–65. https://doi.org/10.1007/s10980-020-01181-8.
Gugulica, M. & Burghardt, D. (2023). Mapping indicators of cultural ecosystem services use in urban green spaces based on text classification of geosocial media data. Ecosystem Services, Volume 60, https://doi.org/10.1016/j.ecoser.2022.101508
Wartmann, Flurina M., and Ross S. Purves. 2018. “Investigating Sense of Place as a Cultural Ecosystem Service in Different Landscapes through the Lens of Language.” Landscape and Urban Planning 175 (July): 169–83. https://doi.org/10.1016/j.landurbplan.2018.03.021.
Knura, Martin, Florian Kluger, Moris Zahtila, Jochen Schiewe, Bodo Rosenhahn, and Dirk Burghardt. 2021. “Using Object Detection on Social Media Images for Urban Bicycle Infrastructure Planning: A Case Study of Dresden.” ISPRS International Journal of Geo-Information 10 (11): 733. https://doi.org/10.3390/ijgi10110733.
Zahtila, M., Knura, M. Visualizing Point Density on Geometry Objects: Application in an Urban Area Using Social Media VGI. KN J. Cartogr. Geogr. Inf. 72, 187–200 (2022). https://doi.org/10.1007/s42489-022-00113-7

Exploring people’s mobility behavior

Extracting time-series topics from game episodes, Shirato et al. (2021)

Exploring Twitter to understand traffic events, Das et al. (2020)

Extracting time-series topics from football games

We explored the potential of topic modelling as a tool for analyzing episodes of behaviors described by multivariate time series data (Shirato et al. 2021). The basic idea is to represent data variation by symbolic tokens and treat episodes as pseudo-texts to which topic modelling methods can be applied. We tested this idea on data describing collective movements in episodes from football games. The results showed good potential of the approach.

Exploring the potential of Twitter to understand traffic events

A hybrid multi-layered Geoparser. Layer 1 consists of supervised location retriever. Layer 2 consists of spatial rules based on spatial prepositions, and vernacular placenames and spatial objects. (Das et al. 2020).

Detecting traffic events and their locations is important for an effective transportation management system and better urban policy making. Traffic events are related to traffic accidents, congestion, parking issues, to name a few. Currently, traffic events are detected through static sensors e.g., CCTV camera, loop detectors. However they have limited spatial coverage and high maintenance cost, especially in developing regions. We investigated whether Twitter - a social media platform can be useful to understand urban traffic events from tweets in India (Das et al. 2020). The results show that an SVM based model performs best detecting traffic related tweets. While extracting location information, a hybrid georeferencing model consists of a supervised learning algorithm and a number of spatial rules outperforms other models. The results suggest people in India, especially in Greater Mumbai often share traffic information along with location mentions, which can be used to complement existing physical transport infrastructure in a cost-effective manner to manage transport services in the urban environment.

Bibliography

Shirato, G., Andrienko, N., Andrienko, G. (2021). What are the topics in football? Extracting time-series topics from game episodes. IEEE VIS 2021
Das, Rahul Deb, and Ross S. Purves. 2020. “Exploring the Potential of Twitter to Understand Traffic Events and Their Locations in Greater Mumbai, India.” IEEE Transactions on Intelligent Transportation Systems 21 (12): 5213–22. https://doi.org/10.1109/TITS.2019.2950782.

Supporting comparative visual analytics for political science research

Analyzing and Visualizing Emotional Reactions Expressed by Emojis, Hauthal et al. (2019)

Human migration: the big data perspective, Sîrbu et al. (2020)

Studying the Brexit-Referendum

Hauthal et al. (2019) used a Twitter dataset to investigate reactions to the political event Brexit in terms of opinions and emotions using emojis in two different approaches. In the first approach, emojis and hashtags were combined. Hashtags, established in political campaigns before the referendum, indicate which sub-topic of the overall Brexit debate is addressed in a tweet, i.e. leave or remain. A spatial comparison of the analysis results with the actual referendum results on NUTS1 level (the highest level in the hierarchical classification used to clearly identify and classify the spatial reference units of official statistics in the Member States of the European Union) showed a higher consistency than a pure hashtag-based consideration without including emojis.

Migration analysis

Focusing on human migration in Sîrbu et al. (2020), we consider three stages of migration: the journey, the stay, and the return. For each stage, we discuss the traditional and novel sources and types of data that can be used in analysis, paying particular attention to the opportunities created by big data and challenges involved in their analysis.

Superdiversity index (left) and immigration levels (right) across UK regions at NUTS2 level (Sîrbu et al. 2020).

Bibliography

Hauthal, E.; Burghardt, D.; Dunkel, A. Analyzing and Visualizing Emotional Reactions Expressed by Emojis in Location-Based Social Media. ISPRS Int. J. Geo-Inf. 2019, 8, 113. https://doi.org/10.3390/ijgi8030113
Alina Sîrbu, Gennady Andrienko, Natalia Andrienko, Chiara Boldrini, Marco Conti, Fosca Giannotti, Riccardo Guidotti, Simone Bertoli, Jisu Kim, Cristina Ioana Muntean, Luca Pappalardo, Andrea Passarella, Dino Pedreschi, Laura Pollacci, Francesca Pratesi & Rajesh Sharma (2020) Human migration: the big data perspective. International Journal of Data Science and Analytics, 2020, vol. 11(4), pp.341-360 https://doi.org/10.1007/s41060-020-00213-5