Algorithmically-Guided User Interaction: Smart Crowdsourcing and the Extraction of Metadata from Old Maps
Dr. Thomas Christian van Dijk & Prof. Dr. Alexander Wolff
Julius-Maximilians-Universität Würzburg, Institut für Informatik, Lehrstuhl für Informatik I, Am Hubland, 97074 Würzburg
Software and data
- Lineman is a tool for aligning GeoJSON LineStrings to bitmap images such as old maps.
- Glyph Miner is a system for extracting glyphs from early typeset prints.
- User Reputation has data and analyses of the NYPL Building Inspector building footprint data set. Joint work with the ENAP project.
There are many practical problems that currently cannot be solved by algorithmic means, not because our algorithms are too slow but because we have no satisfactory algorithm at all. An example for such a practical problem is the extraction of meaningful metadata from historical maps. The problem has so far defied automation, which is deplorable because old maps contain lots of useful information, e.g., for historians, economists, and city planners. Libraries are producing massive amounts of digitised maps by scanning their historical collections, but struggle to “understand” the contents of these collections: computer-vision algorithms are not up to the task of extracting the desired metadata and there are too many maps to do it by hand. We will attack these problems by developing smart crowdsourcing solutions. With our collaborators (such as three libraries and the Open Historical Data Map project) we will attack real problems on real data.
Most crowdsourcing applications rely on the crowd’s most obvious power: its multitude. If we throw enough users at a problem, we may solve it by brute force. But just like brute-force algorithms are often grossly inefficient, so is the indiscriminate application of human effort. In the proposed project we will develop algorithmically-guided, adaptive user interaction where the algorithm and (sets of) users cooperate to efficiently and reliably solve a problem instance. This is achieved through proper algorithmic modeling of the underlying computational problems, followed by techniques such as active learning (efficient task selection) and sensitivity analysis (quality assurance).
Through this project, we will arrive at a more refined understanding of how the power of crowdsourcing can be harnessed – for generating general geographic data, as well as specifically for metadata extraction from historical maps. What are the correct notions of efficiency and what are the algorithmic techniques to optimise it?