Craig Knoblock - Profile on Academia.edu (original) (raw)

Papers by Craig Knoblock

Research paper thumbnail of Building Linked Data from Historical Maps

Historical maps provide a rich source of data for social science researchers since they contain d... more Historical maps provide a rich source of data for social science researchers since they contain detailed documentation of a wide variety of factors, such as land-use changes, development of transportation networks, changes in waterways, destruction of wetlands, etc. However, these maps are typically available only as scanned documents and it is labor intensive for a scientist to extract the needed data for a study. In this paper, we address the problem of how to convert vector data extracted from multiple historical maps into Linked Data. We describe the methods for efficiently finding the links across maps, converting the data into RDF, and querying the resulting knowledge graphs. We present preliminary results that demonstrate that our approach can be used to efficiently determine changes in the Los Angeles railroad network from data extracted from multiple maps.

Research paper thumbnail of Towards the automated large-scale reconstruction of past road networks from historical maps

Computers, Environment and Urban Systems, Jun 1, 2022

Transportation infrastructure, such as road or railroad networks, represent a fundamental compone... more Transportation infrastructure, such as road or railroad networks, represent a fundamental component of our civilization. For sustainable planning and informed decision making, a thorough understanding of the long-term evolution of transportation infrastructure such as road networks is crucial. However, spatially explicit, multi-temporal road network data covering large spatial extents are scarce and rarely available prior to the 2000s. Herein, we propose a framework that employs increasingly available scanned and georeferenced historical map series to reconstruct past road networks, by integrating abundant, contemporary road network data and color information extracted from historical maps. Specifically, our method uses contemporary road segments as analytical units and extracts historical roads by inferring their existence in historical map series based on image processing and clustering techniques. We tested our method on over 300,000 road segments representing more than 50,000 km of the road network in the United States, extending across three study areas that cover 53 historical topographic map sheets dated between 1890 and 1950. We evaluated our approach by comparison to other historical datasets and against manually created reference data, achieving F-1 scores of up to 0.95, and showed that the extracted road network statistics are highly plausible over time, i.e., following general growth patterns. We demonstrated that contemporary geospatial data integrated with information extracted from historical map series open up new avenues for the quantitative analysis of long-term urbanization processes and landscape changes far beyond the era of operational remote sensing and digital cartography.

Research paper thumbnail of Automatic alignment of geographic features in contemporary vector data and historical maps

With large amounts of digital map archives becoming available, the capability to automatically ex... more With large amounts of digital map archives becoming available, the capability to automatically extracting information from historical maps is important for many domains that require long-term geographic data, such as understanding the development of the landscape and human activities. In the previous work, we built a system to automatically recognize geographic features in historical maps using Convolutional Neural Networks (CNN). Our system uses contemporary vector data to automatically label examples of the geographic feature of interest in historical maps as training samples for the CNN model. The alignment between the vector data and geographic features in maps controls if the system can generate representative training samples, which has a significant impact on recognition performance of the system. Due to the large number of training data that the CNN model needs and tens of thousands of maps needed to be processed in an archive, manually aligning the vector data to each map in an archive is not practical. In this paper, we present an algorithm that automatically aligns vector data with geographic features in historical maps. Existing alignment approaches focus on road features and imagery and are difficult to generalize for other geographic features. Our algorithm aligns various types of geographic features in document images with the corresponding vector data. In the experiment, our alignment algorithm increased the correctness and completeness of the extracted railroad and river vector data for about 100% and 20%, respectively. For the performance of feature recognition, the aligned vector data had a 100% improvement on the precision while maintained a similar recall.

Research paper thumbnail of Automated Extraction of Human Settlement Patterns From Historical Topographic Map Series Using Weakly Supervised Convolutional Neural Networks

IEEE Access, 2020

Information extraction from historical maps represents a persistent challenge due to inferior gra... more Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and the large data volume of digital map archives, which can hold thousands of digitized map sheets. Traditional map processing techniques typically rely on manually collected templates of the symbol of interest, and thus are not suitable for large-scale information extraction. In order to digitally preserve such large amounts of valuable retrospective geographic information, high levels of automation are required. Herein, we propose an automated machine-learning based framework to extract human settlement symbols, such as buildings and urban areas from historical topographic maps in the absence of training data, employing contemporary geospatial data as ancillary data to guide the collection of training samples. These samples are then used to train a convolutional neural network for semantic image segmentation, allowing for the extraction of human settlement patterns in an analysis-ready geospatial vector data format. We test our method on United States Geological Survey historical topographic maps published between 1893 and 1954. The results are promising, indicating high degrees of completeness in the extracted settlement features (i.e., recall of up to 0.96, F-measure of up to 0.79) and will guide the next steps to provide a fully automated operational approach for large-scale geographic feature extraction from a variety of historical map series. Moreover, the proposed framework provides a robust approach for the recognition of objects which are small in size, generalizable to many kinds of visual documents. INDEX TERMS Convolutional neural networks, digital humanities, digital preservation, document analysis, geospatial analysis, geospatial artificial intelligence, human settlement patterns, image analysis, weakly supervised learning.

Research paper thumbnail of A Label Correction Algorithm Using Prior Information for Automatic and Accurate Geospatial Object Recognition

arXiv (Cornell University), Dec 10, 2021

Thousands of scanned historical topographic maps contain valuable information covering long perio... more Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlapping geo-referenced external vector data with topographic maps according to their coordinates can annotate the desired objects' locations in the maps automatically. However, directly overlapping the two datasets causes misaligned and false annotations because the publication years and coordinate projection systems of topographic maps are different from the external vector data. We propose a label correction algorithm, which leverages the color information of maps and the prior shape information of the external vector data to reduce misaligned and false annotations. The experiments show that the precision of annotations from the proposed algorithm is 10% higher than the annotations from a state-of-the-art algorithm. Consequently, recognition results using the proposed algorithm's annotations achieve 9% higher correctness than using the annotations from the state-of-the-art algorithm.

Research paper thumbnail of Training Deep Learning Models for Geographic Feature Recognition from Historical Maps

Training Deep Learning Models for Geographic Feature Recognition from Historical Maps

Springer briefs in geography, Nov 18, 2019

Historical map scans contain valuable information (e.g., historical locations of roads, buildings... more Historical map scans contain valuable information (e.g., historical locations of roads, buildings) enabling the analyses that require long-term historical data of the natural and built environment. Many online archives now provide public access to a large number of historical map scans, such as the historical USGS (United States Geological Survey) topographic archive and the historical Ordnance Survey maps in the United Kingdom. Efficiently extracting information from these map scans remains a challenging task, which is typically achieved by manually digitizing the map content. In computer vision, the process of detecting and extracting the precise locations of objects from images is called semantic segmentation. Semantic segmentation processes take an image as input and classify each pixel of the image to an object class of interest. Machine learning models for semantic segmentation have been progressing rapidly with the emergence of Deep Convolutional Neural Networks (DCNNs or CNNs). A key factor for the success of CNNs is the wide availability of large amounts of (labeled) training data, but these training data are mostly for daily images not for historical (or any) maps. Today, generating training data needs a significant amount of manual labor that is often impractical for the application of historical map processing. One solution to the problem of training data scarcity is by transferring knowledge learned from a domain with a sufficient amount of labeled data to another domain lacking labeled data (i.e., transfer learning). This chapter presents an overview of deep-learning semantic segmentation models and discusses their strengths and weaknesses concerning geographic feature recognition from historical map scans. The chapter also examines a number of transfer learning strategies that can reuse the state-of-the-art CNN models trained from the publicly available training datasets for the task of recognizing geographic features from historical maps. Finally, this chapter presents a comprehensive experiment for extracting railroad features from USGS historical topographic maps as a case study.

Research paper thumbnail of An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

arXiv (Cornell University), Dec 2, 2021

Historical maps contain detailed geographic information difficult to find elsewhere covering long... more Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at . • Applied computing → Document analysis; Graphics recognition and interpretation; • Information systems → Digital libraries and archives.

Research paper thumbnail of Summary and Discussion

Summary and Discussion

Springer briefs in geography, Nov 18, 2019

Research paper thumbnail of Creating Structured, Linked Geographic Data from Historical Maps: Challenges and Trends

Creating Structured, Linked Geographic Data from Historical Maps: Challenges and Trends

Springer briefs in geography, Nov 18, 2019

Historical geographic data are essential for a variety of studies of cancer and environmental epi... more Historical geographic data are essential for a variety of studies of cancer and environmental epidemiology, urbanization, and landscape ecology. However, existing data sources typically contain only contemporary information. Historical maps hold a great deal of detailed geographic information at various times in the past. Yet, finding relevant maps is difficult, and the map content is not machine-readable. This chapter presents the challenges and trends in building a map processing, modeling, linking, and publishing framework. The framework will enable querying historical map collections as a unified and structured spatiotemporal source in which individual geographic phenomena (extracted from maps) are modeled (described) with semantic descriptions and linked to other data sources (e.g., DBpedia). This framework will allow making use of historical geographic datasets from a variety of maps, efficiently, over large geographic extents. Realizing such a framework poses significant research challenges in multiple fields in computer science including digital map processing, data integration, and the Semantic Web technologies, and other disciplines such as spatial, social, and health sciences. Tackling these challenges will not only advance research in computer science and geographic information science but also present a unique opportunity for interdisciplinary research.

Research paper thumbnail of Extracting Human Settlement Footprint from Historical Topographic Map Series Using Context-Based Machine Learning

Information extraction from historical maps represents a persistent challenge due to inferior gra... more Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and large data volume in digital map archives, which can hold thousands of digitized map sheets. In this paper, we describe an approach to extract human settlement symbols in United States Geological Survey (USGS) historical topographic maps using contemporary building data as the contextual spatial layer. The presence of a building in the contemporary layer indicates a high probability that the same building can be found at that location on the historical map. We describe the design of an automatic sampling approach using these contemporary data to collect thousands of graphical examples for the symbol of interest. These graphical examples are then used for robust learning to then carry out feature extraction in the entire map. We employ a Convolutional Neural Network (LeNet) for the recognition task. Results are promising and will guide the next steps in this research to provide an unsupervised approach to extracting features from historical maps.

Research paper thumbnail of Using Historical Maps in Scientific Studies: Applications, Challenges, and Best Practices

Using Historical Maps in Scientific Studies: Applications, Challenges, and Best Practices

Historical maps are fascinating to look at and contain valuable retrospective place information d... more Historical maps are fascinating to look at and contain valuable retrospective place information difficult to find elsewhere. However, the full potential of historical maps has not been realized because the users of scanned historical maps and the developers of digital map processing technologies are from a wide range of disciplines and often work in silos. This book aims to make the first connection between the map user community and the developers of digital map processing technologies by illustrating several applications, challenges, and best practices in working with historical maps. This chapter presents a brief introduction to various types of historical maps and the scientific studies that could benefit from using them. Further, the chapter summarizes the general considerations critical for building successful computational processes that can be used to analyze historical map content. Finally, the chapter provides an overview of the book structure, describing the connections between individual chapters.

Research paper thumbnail of Towards the large-scale extraction of historical land cover information from historical maps

Research paper thumbnail of Unmapped terrain and invisible communities: Analyzing topographic mapping disparities across settlements in the United States from 1885 to 2015

Zenodo (CERN European Organization for Nuclear Research), Jul 1, 2022

Mapping is an important and deeply political process. While much attention is now being devoted t... more Mapping is an important and deeply political process. While much attention is now being devoted to the definition of boundaries (e.g., redlining, gerrymandering, redistricting), less is systematically known about where, when and at what scales maps are first created. This is an area of key concern because the creation of maps is key to generating spatial, topographic, demographic, or socio-economic data, resources which are of great strategic and economic importance. The absence of such information can, among other processes, impede strategic planning, political transparency, and sustainable development. There is thus much to learn about where and when maps are created, and which communities are either prioritized or "undermapped" within this decision-making process.

Research paper thumbnail of A Label Correction Algorithm Using Prior Information for Automatic and Accurate Geospatial Object Recognition

2021 IEEE International Conference on Big Data (Big Data)

Thousands of scanned historical topographic maps contain valuable information covering long perio... more Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlapping geo-referenced external vector data with topographic maps according to their coordinates can annotate the desired objects' locations in the maps automatically. However, directly overlapping the two datasets causes misaligned and false annotations because the publication years and coordinate projection systems of topographic maps are different from the external vector data. We propose a label correction algorithm, which leverages the color information of maps and the prior shape information of the external vector data to reduce misaligned and false annotations. The experiments show that the precision of annotations from the proposed algorithm is 10% higher than the annotations from a state-of-the-art algorithm. Consequently, recognition results using the proposed algorithm's annotations achieve 9% higher correctness than using the annotations from the state-of-the-art algorithm.

Research paper thumbnail of Coalition Agents Experiment: Multiagent in International Coalitions

Coalition Agents Experiment: Multiagent in International Coalitions

Research paper thumbnail of An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020

Historical maps contain detailed geographic information difficult to find elsewhere covering long... more Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at . • Applied computing → Document analysis; Graphics recognition and interpretation; • Information systems → Digital libraries and archives.

Research paper thumbnail of Artificial Intelligence for Modeling Complex Systems: Taming the Complexity of Expert Models to Improve Decision Making

ACM Transactions on Interactive Intelligent Systems, 2021

Major societal and environmental challenges involve complex systems that have diverse multi-scale... more Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new agricultural practices in response to changing weather patterns, can reduce the damage caused by natural processes. Understanding how these natural and human processes affect one another allows forecasting the effects of undesirable situations and study interventions to take preventive measures. For many of these processes, there are expert models that incorporate state-of-the-art theories and knowledge to quantify a system's response to a diversity of conditions. A major challenge for efficient modeling is the diversity of modeling approaches across disciplines and the wide variety of data sources available only in formats that require...

Research paper thumbnail of Historical Map Applications and Processing Technologies

Historical Map Applications and Processing Technologies

SpringerBriefs in Geography, 2019

Digital map processing has been an interest in the computer science and geographic information sc... more Digital map processing has been an interest in the computer science and geographic information science communities since the early 1980s. With the increase of available map scans, a variety of researchers in the natural and social sciences developed a growing interest in using historical maps in their studies. The lack of an understanding of how historical maps can be used in research and the capabilities of map processing technologies creates a significant gap between the wide range of communities that could benefit from the advances in digital map processing technologies and the disciplines in which the technologies are developed. As a result, researchers who intend to use historical maps in their studies still need a significant amount of resources to digitize their maps, while the existing digital map processing technologies are difficult to apply and understand and thus do not look promising. In many cases, existing digital map processing technologies could help facilitate the digitization process, and it just requires additional knowledge to select an appropriate technology given the problem scope (e.g., the number of maps for processing, map conditions, and style varieties). The result is that researchers waste time and resources building and testing various systems that partially duplicate prior work and cannot fully use the potential of existing technology. This chapter presents real-world applications of historical maps and case studies of both semi-automatic and fully automatic approaches for geographic feature extraction from historical maps. These real-world applications illustrate and exemplify various needs and scopes of using historical maps in scientific studies (e.g., processing thousands of historical maps from a map series vs. a few historical maps from various publishers and with different cartographic styles). The two example map processing technologies described help understand current strengths and weaknesses. These examples also illustrate tremendous collaboration opportunities between and beyond the computer science and geographic information science communities to build advanced map processing technologies that are more effective in transforming the scientific studies that use historical maps.

Research paper thumbnail of Automatic alignment of contemporary vector data and georeferenced historical maps using reinforcement learning

International Journal of Geographical Information Science, 2019

With large amounts of digital map archives becoming available, automatically extracting informati... more With large amounts of digital map archives becoming available, automatically extracting information from scanned historical maps is needed for many domains that require long-term historical geographic data. Convolutional Neural Networks (CNN) are powerful techniques that can be used for extracting locations of geographic features from scanned maps if sufficient representative training data are available. Existing spatial data can provide the approximate locations of corresponding geographic features in historical maps and thus be useful to annotate training data automatically. However, the feature representations, publication date, production scales, and spatial reference systems of contemporary vector data are typically very different from those of historical maps. Hence, such auxiliary data cannot be directly used for annotation of the precise locations of the features of interest in the scanned historical maps. This research introduces an automatic vector-to-raster alignment algorithm based on reinforcement learning to annotate precise locations of geographic features on scanned maps. This paper models the alignment problem using the reinforcement learning framework, which enables informed, efficient searches for matching features without pre-processing steps, such as extracting specific feature signatures (e.g. road intersections). The experimental results show that our algorithm can be applied to various features (roads, water lines, and railroads) and achieve high accuracy.

Research paper thumbnail of Spatialising uncertainty in image segmentation using weakly supervised convolutional neural networks: a case study from historical map processing

IET Image Processing, 2018

Convolutional neural networks (CNNs) such as encoder-decoder CNNs have increasingly been employed... more Convolutional neural networks (CNNs) such as encoder-decoder CNNs have increasingly been employed for semantic image segmentation at the pixel-level requiring pixel-level training labels, which are rarely available in real-world scenarios. In practice, weakly annotated training data at the image patch level are often used for pixel-level segmentation tasks, requiring further processing to obtain accurate results, mainly because the translation invariance of the CNN-based inference can turn into an impeding property leading to segmentation results of coarser spatial granularity compared with the original image. However, the inherent uncertainty in the segmented image and its relationships to translation invariance, CNN architecture, and classification scheme has never been analysed from an explicitly spatial perspective. Therefore, the authors propose measures to spatially visualise and assess class decision confidence based on spatially dense CNN predictions, resulting in continuous decision confidence surfaces. They find that such a visual-analytical method contributes to a better understanding of the spatial variability of class score confidence derived from weakly supervised CNN-based classifiers. They exemplify this approach by incorporating decision confidence surfaces into a processing chain for the extraction of human settlement features from historical map documents based on weakly annotated training data using different CNN architectures and classification schemes.

Research paper thumbnail of Building Linked Data from Historical Maps

Historical maps provide a rich source of data for social science researchers since they contain d... more Historical maps provide a rich source of data for social science researchers since they contain detailed documentation of a wide variety of factors, such as land-use changes, development of transportation networks, changes in waterways, destruction of wetlands, etc. However, these maps are typically available only as scanned documents and it is labor intensive for a scientist to extract the needed data for a study. In this paper, we address the problem of how to convert vector data extracted from multiple historical maps into Linked Data. We describe the methods for efficiently finding the links across maps, converting the data into RDF, and querying the resulting knowledge graphs. We present preliminary results that demonstrate that our approach can be used to efficiently determine changes in the Los Angeles railroad network from data extracted from multiple maps.

Research paper thumbnail of Towards the automated large-scale reconstruction of past road networks from historical maps

Computers, Environment and Urban Systems, Jun 1, 2022

Transportation infrastructure, such as road or railroad networks, represent a fundamental compone... more Transportation infrastructure, such as road or railroad networks, represent a fundamental component of our civilization. For sustainable planning and informed decision making, a thorough understanding of the long-term evolution of transportation infrastructure such as road networks is crucial. However, spatially explicit, multi-temporal road network data covering large spatial extents are scarce and rarely available prior to the 2000s. Herein, we propose a framework that employs increasingly available scanned and georeferenced historical map series to reconstruct past road networks, by integrating abundant, contemporary road network data and color information extracted from historical maps. Specifically, our method uses contemporary road segments as analytical units and extracts historical roads by inferring their existence in historical map series based on image processing and clustering techniques. We tested our method on over 300,000 road segments representing more than 50,000 km of the road network in the United States, extending across three study areas that cover 53 historical topographic map sheets dated between 1890 and 1950. We evaluated our approach by comparison to other historical datasets and against manually created reference data, achieving F-1 scores of up to 0.95, and showed that the extracted road network statistics are highly plausible over time, i.e., following general growth patterns. We demonstrated that contemporary geospatial data integrated with information extracted from historical map series open up new avenues for the quantitative analysis of long-term urbanization processes and landscape changes far beyond the era of operational remote sensing and digital cartography.

Research paper thumbnail of Automatic alignment of geographic features in contemporary vector data and historical maps

With large amounts of digital map archives becoming available, the capability to automatically ex... more With large amounts of digital map archives becoming available, the capability to automatically extracting information from historical maps is important for many domains that require long-term geographic data, such as understanding the development of the landscape and human activities. In the previous work, we built a system to automatically recognize geographic features in historical maps using Convolutional Neural Networks (CNN). Our system uses contemporary vector data to automatically label examples of the geographic feature of interest in historical maps as training samples for the CNN model. The alignment between the vector data and geographic features in maps controls if the system can generate representative training samples, which has a significant impact on recognition performance of the system. Due to the large number of training data that the CNN model needs and tens of thousands of maps needed to be processed in an archive, manually aligning the vector data to each map in an archive is not practical. In this paper, we present an algorithm that automatically aligns vector data with geographic features in historical maps. Existing alignment approaches focus on road features and imagery and are difficult to generalize for other geographic features. Our algorithm aligns various types of geographic features in document images with the corresponding vector data. In the experiment, our alignment algorithm increased the correctness and completeness of the extracted railroad and river vector data for about 100% and 20%, respectively. For the performance of feature recognition, the aligned vector data had a 100% improvement on the precision while maintained a similar recall.

Research paper thumbnail of Automated Extraction of Human Settlement Patterns From Historical Topographic Map Series Using Weakly Supervised Convolutional Neural Networks

IEEE Access, 2020

Information extraction from historical maps represents a persistent challenge due to inferior gra... more Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and the large data volume of digital map archives, which can hold thousands of digitized map sheets. Traditional map processing techniques typically rely on manually collected templates of the symbol of interest, and thus are not suitable for large-scale information extraction. In order to digitally preserve such large amounts of valuable retrospective geographic information, high levels of automation are required. Herein, we propose an automated machine-learning based framework to extract human settlement symbols, such as buildings and urban areas from historical topographic maps in the absence of training data, employing contemporary geospatial data as ancillary data to guide the collection of training samples. These samples are then used to train a convolutional neural network for semantic image segmentation, allowing for the extraction of human settlement patterns in an analysis-ready geospatial vector data format. We test our method on United States Geological Survey historical topographic maps published between 1893 and 1954. The results are promising, indicating high degrees of completeness in the extracted settlement features (i.e., recall of up to 0.96, F-measure of up to 0.79) and will guide the next steps to provide a fully automated operational approach for large-scale geographic feature extraction from a variety of historical map series. Moreover, the proposed framework provides a robust approach for the recognition of objects which are small in size, generalizable to many kinds of visual documents. INDEX TERMS Convolutional neural networks, digital humanities, digital preservation, document analysis, geospatial analysis, geospatial artificial intelligence, human settlement patterns, image analysis, weakly supervised learning.

Research paper thumbnail of A Label Correction Algorithm Using Prior Information for Automatic and Accurate Geospatial Object Recognition

arXiv (Cornell University), Dec 10, 2021

Thousands of scanned historical topographic maps contain valuable information covering long perio... more Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlapping geo-referenced external vector data with topographic maps according to their coordinates can annotate the desired objects' locations in the maps automatically. However, directly overlapping the two datasets causes misaligned and false annotations because the publication years and coordinate projection systems of topographic maps are different from the external vector data. We propose a label correction algorithm, which leverages the color information of maps and the prior shape information of the external vector data to reduce misaligned and false annotations. The experiments show that the precision of annotations from the proposed algorithm is 10% higher than the annotations from a state-of-the-art algorithm. Consequently, recognition results using the proposed algorithm's annotations achieve 9% higher correctness than using the annotations from the state-of-the-art algorithm.

Research paper thumbnail of Training Deep Learning Models for Geographic Feature Recognition from Historical Maps

Training Deep Learning Models for Geographic Feature Recognition from Historical Maps

Springer briefs in geography, Nov 18, 2019

Historical map scans contain valuable information (e.g., historical locations of roads, buildings... more Historical map scans contain valuable information (e.g., historical locations of roads, buildings) enabling the analyses that require long-term historical data of the natural and built environment. Many online archives now provide public access to a large number of historical map scans, such as the historical USGS (United States Geological Survey) topographic archive and the historical Ordnance Survey maps in the United Kingdom. Efficiently extracting information from these map scans remains a challenging task, which is typically achieved by manually digitizing the map content. In computer vision, the process of detecting and extracting the precise locations of objects from images is called semantic segmentation. Semantic segmentation processes take an image as input and classify each pixel of the image to an object class of interest. Machine learning models for semantic segmentation have been progressing rapidly with the emergence of Deep Convolutional Neural Networks (DCNNs or CNNs). A key factor for the success of CNNs is the wide availability of large amounts of (labeled) training data, but these training data are mostly for daily images not for historical (or any) maps. Today, generating training data needs a significant amount of manual labor that is often impractical for the application of historical map processing. One solution to the problem of training data scarcity is by transferring knowledge learned from a domain with a sufficient amount of labeled data to another domain lacking labeled data (i.e., transfer learning). This chapter presents an overview of deep-learning semantic segmentation models and discusses their strengths and weaknesses concerning geographic feature recognition from historical map scans. The chapter also examines a number of transfer learning strategies that can reuse the state-of-the-art CNN models trained from the publicly available training datasets for the task of recognizing geographic features from historical maps. Finally, this chapter presents a comprehensive experiment for extracting railroad features from USGS historical topographic maps as a case study.

Research paper thumbnail of An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

arXiv (Cornell University), Dec 2, 2021

Historical maps contain detailed geographic information difficult to find elsewhere covering long... more Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at . • Applied computing → Document analysis; Graphics recognition and interpretation; • Information systems → Digital libraries and archives.

Research paper thumbnail of Summary and Discussion

Summary and Discussion

Springer briefs in geography, Nov 18, 2019

Research paper thumbnail of Creating Structured, Linked Geographic Data from Historical Maps: Challenges and Trends

Creating Structured, Linked Geographic Data from Historical Maps: Challenges and Trends

Springer briefs in geography, Nov 18, 2019

Historical geographic data are essential for a variety of studies of cancer and environmental epi... more Historical geographic data are essential for a variety of studies of cancer and environmental epidemiology, urbanization, and landscape ecology. However, existing data sources typically contain only contemporary information. Historical maps hold a great deal of detailed geographic information at various times in the past. Yet, finding relevant maps is difficult, and the map content is not machine-readable. This chapter presents the challenges and trends in building a map processing, modeling, linking, and publishing framework. The framework will enable querying historical map collections as a unified and structured spatiotemporal source in which individual geographic phenomena (extracted from maps) are modeled (described) with semantic descriptions and linked to other data sources (e.g., DBpedia). This framework will allow making use of historical geographic datasets from a variety of maps, efficiently, over large geographic extents. Realizing such a framework poses significant research challenges in multiple fields in computer science including digital map processing, data integration, and the Semantic Web technologies, and other disciplines such as spatial, social, and health sciences. Tackling these challenges will not only advance research in computer science and geographic information science but also present a unique opportunity for interdisciplinary research.

Research paper thumbnail of Extracting Human Settlement Footprint from Historical Topographic Map Series Using Context-Based Machine Learning

Information extraction from historical maps represents a persistent challenge due to inferior gra... more Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and large data volume in digital map archives, which can hold thousands of digitized map sheets. In this paper, we describe an approach to extract human settlement symbols in United States Geological Survey (USGS) historical topographic maps using contemporary building data as the contextual spatial layer. The presence of a building in the contemporary layer indicates a high probability that the same building can be found at that location on the historical map. We describe the design of an automatic sampling approach using these contemporary data to collect thousands of graphical examples for the symbol of interest. These graphical examples are then used for robust learning to then carry out feature extraction in the entire map. We employ a Convolutional Neural Network (LeNet) for the recognition task. Results are promising and will guide the next steps in this research to provide an unsupervised approach to extracting features from historical maps.

Research paper thumbnail of Using Historical Maps in Scientific Studies: Applications, Challenges, and Best Practices

Using Historical Maps in Scientific Studies: Applications, Challenges, and Best Practices

Historical maps are fascinating to look at and contain valuable retrospective place information d... more Historical maps are fascinating to look at and contain valuable retrospective place information difficult to find elsewhere. However, the full potential of historical maps has not been realized because the users of scanned historical maps and the developers of digital map processing technologies are from a wide range of disciplines and often work in silos. This book aims to make the first connection between the map user community and the developers of digital map processing technologies by illustrating several applications, challenges, and best practices in working with historical maps. This chapter presents a brief introduction to various types of historical maps and the scientific studies that could benefit from using them. Further, the chapter summarizes the general considerations critical for building successful computational processes that can be used to analyze historical map content. Finally, the chapter provides an overview of the book structure, describing the connections between individual chapters.

Research paper thumbnail of Towards the large-scale extraction of historical land cover information from historical maps

Research paper thumbnail of Unmapped terrain and invisible communities: Analyzing topographic mapping disparities across settlements in the United States from 1885 to 2015

Zenodo (CERN European Organization for Nuclear Research), Jul 1, 2022

Mapping is an important and deeply political process. While much attention is now being devoted t... more Mapping is an important and deeply political process. While much attention is now being devoted to the definition of boundaries (e.g., redlining, gerrymandering, redistricting), less is systematically known about where, when and at what scales maps are first created. This is an area of key concern because the creation of maps is key to generating spatial, topographic, demographic, or socio-economic data, resources which are of great strategic and economic importance. The absence of such information can, among other processes, impede strategic planning, political transparency, and sustainable development. There is thus much to learn about where and when maps are created, and which communities are either prioritized or "undermapped" within this decision-making process.

Research paper thumbnail of A Label Correction Algorithm Using Prior Information for Automatic and Accurate Geospatial Object Recognition

2021 IEEE International Conference on Big Data (Big Data)

Thousands of scanned historical topographic maps contain valuable information covering long perio... more Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlapping geo-referenced external vector data with topographic maps according to their coordinates can annotate the desired objects' locations in the maps automatically. However, directly overlapping the two datasets causes misaligned and false annotations because the publication years and coordinate projection systems of topographic maps are different from the external vector data. We propose a label correction algorithm, which leverages the color information of maps and the prior shape information of the external vector data to reduce misaligned and false annotations. The experiments show that the precision of annotations from the proposed algorithm is 10% higher than the annotations from a state-of-the-art algorithm. Consequently, recognition results using the proposed algorithm's annotations achieve 9% higher correctness than using the annotations from the state-of-the-art algorithm.

Research paper thumbnail of Coalition Agents Experiment: Multiagent in International Coalitions

Coalition Agents Experiment: Multiagent in International Coalitions

Research paper thumbnail of An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020

Historical maps contain detailed geographic information difficult to find elsewhere covering long... more Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at . • Applied computing → Document analysis; Graphics recognition and interpretation; • Information systems → Digital libraries and archives.

Research paper thumbnail of Artificial Intelligence for Modeling Complex Systems: Taming the Complexity of Expert Models to Improve Decision Making

ACM Transactions on Interactive Intelligent Systems, 2021

Major societal and environmental challenges involve complex systems that have diverse multi-scale... more Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new agricultural practices in response to changing weather patterns, can reduce the damage caused by natural processes. Understanding how these natural and human processes affect one another allows forecasting the effects of undesirable situations and study interventions to take preventive measures. For many of these processes, there are expert models that incorporate state-of-the-art theories and knowledge to quantify a system's response to a diversity of conditions. A major challenge for efficient modeling is the diversity of modeling approaches across disciplines and the wide variety of data sources available only in formats that require...

Research paper thumbnail of Historical Map Applications and Processing Technologies

Historical Map Applications and Processing Technologies

SpringerBriefs in Geography, 2019

Digital map processing has been an interest in the computer science and geographic information sc... more Digital map processing has been an interest in the computer science and geographic information science communities since the early 1980s. With the increase of available map scans, a variety of researchers in the natural and social sciences developed a growing interest in using historical maps in their studies. The lack of an understanding of how historical maps can be used in research and the capabilities of map processing technologies creates a significant gap between the wide range of communities that could benefit from the advances in digital map processing technologies and the disciplines in which the technologies are developed. As a result, researchers who intend to use historical maps in their studies still need a significant amount of resources to digitize their maps, while the existing digital map processing technologies are difficult to apply and understand and thus do not look promising. In many cases, existing digital map processing technologies could help facilitate the digitization process, and it just requires additional knowledge to select an appropriate technology given the problem scope (e.g., the number of maps for processing, map conditions, and style varieties). The result is that researchers waste time and resources building and testing various systems that partially duplicate prior work and cannot fully use the potential of existing technology. This chapter presents real-world applications of historical maps and case studies of both semi-automatic and fully automatic approaches for geographic feature extraction from historical maps. These real-world applications illustrate and exemplify various needs and scopes of using historical maps in scientific studies (e.g., processing thousands of historical maps from a map series vs. a few historical maps from various publishers and with different cartographic styles). The two example map processing technologies described help understand current strengths and weaknesses. These examples also illustrate tremendous collaboration opportunities between and beyond the computer science and geographic information science communities to build advanced map processing technologies that are more effective in transforming the scientific studies that use historical maps.

Research paper thumbnail of Automatic alignment of contemporary vector data and georeferenced historical maps using reinforcement learning

International Journal of Geographical Information Science, 2019

With large amounts of digital map archives becoming available, automatically extracting informati... more With large amounts of digital map archives becoming available, automatically extracting information from scanned historical maps is needed for many domains that require long-term historical geographic data. Convolutional Neural Networks (CNN) are powerful techniques that can be used for extracting locations of geographic features from scanned maps if sufficient representative training data are available. Existing spatial data can provide the approximate locations of corresponding geographic features in historical maps and thus be useful to annotate training data automatically. However, the feature representations, publication date, production scales, and spatial reference systems of contemporary vector data are typically very different from those of historical maps. Hence, such auxiliary data cannot be directly used for annotation of the precise locations of the features of interest in the scanned historical maps. This research introduces an automatic vector-to-raster alignment algorithm based on reinforcement learning to annotate precise locations of geographic features on scanned maps. This paper models the alignment problem using the reinforcement learning framework, which enables informed, efficient searches for matching features without pre-processing steps, such as extracting specific feature signatures (e.g. road intersections). The experimental results show that our algorithm can be applied to various features (roads, water lines, and railroads) and achieve high accuracy.

Research paper thumbnail of Spatialising uncertainty in image segmentation using weakly supervised convolutional neural networks: a case study from historical map processing

IET Image Processing, 2018

Convolutional neural networks (CNNs) such as encoder-decoder CNNs have increasingly been employed... more Convolutional neural networks (CNNs) such as encoder-decoder CNNs have increasingly been employed for semantic image segmentation at the pixel-level requiring pixel-level training labels, which are rarely available in real-world scenarios. In practice, weakly annotated training data at the image patch level are often used for pixel-level segmentation tasks, requiring further processing to obtain accurate results, mainly because the translation invariance of the CNN-based inference can turn into an impeding property leading to segmentation results of coarser spatial granularity compared with the original image. However, the inherent uncertainty in the segmented image and its relationships to translation invariance, CNN architecture, and classification scheme has never been analysed from an explicitly spatial perspective. Therefore, the authors propose measures to spatially visualise and assess class decision confidence based on spatially dense CNN predictions, resulting in continuous decision confidence surfaces. They find that such a visual-analytical method contributes to a better understanding of the spatial variability of class score confidence derived from weakly supervised CNN-based classifiers. They exemplify this approach by incorporating decision confidence surfaces into a processing chain for the extraction of human settlement features from historical map documents based on weakly annotated training data using different CNN architectures and classification schemes.