World-Scale Completion of Geographic Knowledge (WorldKG)



Knowledge graphs provide rich semantic representations of real-world entities and their relations. Whereas popular general-purpose knowledge graphs such as Wikidata and DBpedia contain selected geographic entities, their coverage of geographic information is limited. An essential source of openly available geographic information is OpenStreetMap (OSM). In contrast to knowledge graphs, OSM lacks a clear semantic representation of the rich geographic information it contains. Generation of semantic representations of OSM entities and their interlinking with knowledge graphs are inherently challenging due to the large, heterogeneous, ambiguous, and flat OSM schema and the annotation sparsity.

The objectives of the WoldKG project include:

  1. Development of methods for world-scale alignment of OSM datasets and knowledge graphs at the schema and instance level.
  2. Preparation and release of data resulting from these methods in the form of the WorldKG knowledge graph that provides a comprehensive semantic representation of geographical information and its context.

The results of the WorldKG project can substantially benefit applications in mobility, transportation, tourism, and logistics domains, as well as provide a basis for informational maps and services through the high-quality semantic representation of geographical and contextual information.

Project Results

As a result of the WorldKG project, we developed methods for knowledge graph completion, establishing links between OSM and knowledge graphs at the entity and schema levels. In particular, we created the OSM2KG approach aimed at interlinking geographic entities in OSM and knowledge graphs. Furthermore, we presented a novel Neural Class Alignment (NCA) algorithm to link the OSM tags providing entity-type information to the corresponding knowledge graph classes. As a result of these methods, we semantically enriched geospatial entities in OSM and made these entities available in WorldKG - a novel geographic knowledge graph. Furthermore, we developed methods and datasets to enhance data quality in OSM and to make OSM data available to machine learning applications. Moreover, we developed a number of predictive models and applications building upon these datasets.

Overview of the pipeline for creating WorldKG from OSM, consisting of three main steps: (i) Geographic entity linking with OSM2KG, (ii) geographic class alignment with NCA, and (iii) WorldKG geographic knowledge graph creation.

Selected Contributions

  • WorldKG Knowledge Graph
    • Description: WorldKG is a geographic knowledge graph providing a comprehensive semantic representation of geographic entities in OpenStreetMap. The WorldKG knowledge graph is built according to the WorldKG ontology, providing its semantic backbone.
    • Paper PDF
    • Website
    • Code
  • Neural Class Alignment (NCA)
    • Description: Interlinking OSM entities with knowledge graphs is inherently difficult due to the large, heterogeneous, ambiguous, and flat OSM schema and the annotation sparsity. NCA holistically aligns OSM tags with the corresponding knowledge graph classes by jointly considering the schema and instance layers. It trains a novel neural architecture that capitalizes upon a shared latent space for tag-to-class alignment created using linked entities in OSM and knowledge graphs.
    • Paper PDF
    • Code
  • Linking OSM to Knowledge Graphs (OSM2KG)
    • Description: OSM2KG - a link discovery approach predicts identity links between OSM nodes and geographic entities in a knowledge graph. The core of the OSM2KG approach is a latent representation of OSM nodes that captures semantic node similarity in an embedding. OSM2KG adopts this latent representation to train a supervised model for link prediction and utilizes existing links between OSM and knowledge graphs for training.
    • Paper PDF
    • Code
  • GeoVectors corpus of OSM entity embeddings
    • Description: GeoVectors – a comprehensive world-scale linked open corpus of OSM entity embeddings covers the entire OSM dataset and provides latent representations of over 980 million geographic entities in 180 countries. The GeoVectors corpus captures the semantic and geographic dimensions of OSM entities and makes these entities directly accessible to machine learning algorithms and semantic applications.
    • Paper PDF
    • Website
    • Code
  • Attention-Based Vandalism Detection in OpenStreetMap (OVID)
    • Description: Ovid is an attention-based method for vandalism detection in OSM. Ovid relies on a neural architecture that adopts a multi-head attention mechanism to effectively summarize information indicating vandalism from OSM changesets. To facilitate automated vandalism detection, OVID introduces a set of original features that capture changeset, user, and edit information. Furthermore, a dataset of real-world vandalism incidents from the OSM edit history is provided as open data.
    • Paper PDF
    • Code


  1. Tempelmeier, N., & Demidova, E. (2021). Linking OpenStreetMap with knowledge graphs - Link discovery for schema-agnostic volunteered geographic information. Future Generation Computer Systems, 116, 349–364. DOI: 10.1016/j.future.2020.11.003
  2. Dsouza, A., Tempelmeier, N., & Demidova, E. (2021). Towards Neural Schema Alignment for OpenStreetMap and Knowledge Graphs. Proceedings of the 20th International Semantic Web Conference, ISWC 2021, 12922, 56–73. DOI: 10.1007/978-3-030-88361-4_4
  3. Dsouza, A., Tempelmeier, N., Yu, R., Gottschalk, S., & Demidova, E. (2021). WorldKG: A World-Scale Geographic Knowledge Graph. Proceedings of the 30th ACM International Conference on Information and Knowledge Management, 2021, 4475–4484. DOI: 10.1145/3459637.3482023
  4. Tempelmeier, N., Gottschalk, S., & Demidova, E. (2021). GeoVectors: A Linked Open Corpus of OpenStreetMap Embeddings on World Scale. CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, 4604–4612. DOI: 10.1145/3459637.3482004
  5. Latif, S., Agarwal, S., Gottschalk, S., Chrosch, C., Feit, F., Jahn, J., Braun, T., Tchenko, Y. C., Demidova, E., & Beck, F. (2021). Visually Connecting Historical Figures Through Event Knowledge Graphs. Proceedings of the IEEE Visualization Conference, IEEE VIS 2021 - Short Papers, 156–160. DOI: 10.1109/VIS49827.2021.9623313
  6. Tempelmeier, N., & Demidova, E. (2021). Ovid: A Machine Learning Approach for Automated Vandalism Detection in OpenStreetMap. Proceedings of the 29th International Conference on Advances in Geographic Information Systems, SIGSPATIAL, 2021, 415–418. DOI: 10.1145/3474717.3484204
  7. Tempelmeier, N., & Demidova, E. (2022). Attention-Based Vandalism Detection in OpenStreetMap. Proceedings of the ACM Web Conference 2022, 643–651. DOI: 10.1145/3485447.3512224
  8. Sao, A., Tempelmeier, N., & Demidova, E. (2021). Deep Information Fusion for Electric Vehicle Charging Station Occupancy Forecasting. Proceedings of the 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, 3328–3333. DOI: 10.1109/ITSC48978.2021.9565097
  9. Demidova, E., Dsouza, A., Gottschalk, S., Tempelmeier, N., & Yu, R. (2022). Creating Knowledge Graphs for Geographic Data on the Web. SIGWEB Newsletter. DOI: 10.1145/3522598.3522602
  10. Gottschalk, S., & Demidova, E. (2022). Tab2KG: Semantic table interpretation with lightweight semantic profiles. Semantic Web, 13(3), 571–597. DOI: 10.3233/SW-222993
  11. Dadwal, R., Funke, T., & Demidova, E. (2021). An Adaptive Clustering Approach for Accident Prediction. Proceedings of the 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, 1405–1411. DOI: 10.1109/ITSC48978.2021.9564564
  12. Abdollahi, S., Gottschalk, S., & Demidova, E. (2023). LaSER: Language-specific event recommendation. Journal of Web Semantics, 2022, 75, 100759. DOI: 10.1016/j.websem.2022.100759
  13. von Wahl, L., Tempelmeier, N., Sao, A., & Demidova, E. (2022). Reinforcement Learning-based Placement of Charging Stations in Urban Road Networks. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, 3992–4000. DOI: 10.1145/3534678.3539154
  14. Dadwal, R., Funke, T., Nüsken, M., & Demidova, E. (2022). W-trace: robust and effective watermarking for GPS trajectories. Proceedings of the 30th International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2022, 77:1–77:4. DOI: 10.1145/3557915.3561474
  15. Gurtovoy, D., & Gottschalk, S. (2022). Linking Streets in OpenStreetMap to Persons in Wikidata. Proceedings of the Companion of The Web Conference 2022, 294–297. DOI: 10.1145/3487553.3524267
  16. Dsouza, A., Schott, M., & Lautenbach, S. (2022). Comparative Integration Potential Analyses of OSM and Wikidata – The Case Study of Railway Stations. Proceedings of the Academic Track at State of the Map 2022, 19–22. DOI: 10.5281/zenodo.7004483
  17. Hartmann, M. C., Schott, M., Dsouza, A., Metz, Y., Volpi, M., & Purves, R. S. (2022). A text and image analysis workflow using citizen science data to extract relevant social media records: Combining red kite observations from Flickr, eBird and iNaturalist. Ecological Informatics, 71, 101782. DOI: 10.1016/j.ecoinf.2022.101782