Craig Knoblock | University of Southern California (original) (raw)
Papers by Craig Knoblock
2021 IEEE International Conference on Big Data (Big Data), 2021
The increasing availability and accessibility of numerous overhead images allows us to estimate a... more The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional supervised object detection approaches can estimate accurate spatial arrangement but require large amounts of bounding box annotations. Recent semisupervised clustering approaches can reduce manual labeling but still require annotations for all object categories in the image. This paper presents the target-guided generative model (TGGM), under the Variational Auto-encoder (VAE) framework, which uses Gaussian Mixture Models (GMM) to estimate the distributions of both hidden and decoder variables in VAE. Modeling both hidden and decoder variables by GMM reduces the required manual annotations significantly for spatial arrangement estimation. Unlike existing approaches that the training process can only update the GMM as a whole in the optimization iterations (e.g., a "minibatch"), TGGM allows the update of individual GMM components separately in the same optimization iteration. Optimizing GMM components separately allows TGGM to exploit the semantic relationships in spatial data and requires only a few labels to initiate and guide the generative process. Our experiments shows that TGGM achieves results comparable to the state-of-the-art semi-supervised methods and outperformes unsupervised methods by 10% based on the F1 scores, while requiring significantly fewer labeled data.
Semantic Web
Historical maps provide rich information for researchers in many areas, including the social and ... more Historical maps provide rich information for researchers in many areas, including the social and natural sciences. These maps contain detailed documentation of a wide variety of natural and human-made features and their changes over time, such as changes in transportation networks or the decline of wetlands or forest areas. Analyzing changes over time in such maps can be labor-intensive for a scientist, even after the geographic features have been digitized and converted to a vector format. Knowledge Graphs (KGs) are the appropriate representations to store and link such data and support semantic and temporal querying to facilitate change analysis. KGs combine expressivity, interoperability, and standardization in the Semantic Web stack, thus providing a strong foundation for querying and analysis. In this paper, we present an automatic approach to convert vector geographic features extracted from multiple historical maps into contextualized spatio-temporal KGs. The resulting graphs...
Remote Sensing, 2021
Spatially explicit, fine-grained datasets describing historical urban extents are rarely availabl... more Spatially explicit, fine-grained datasets describing historical urban extents are rarely available prior to the era of operational remote sensing. However, such data are necessary to better understand long-term urbanization and land development processes and for the assessment of coupled nature–human systems (e.g., the dynamics of the wildland–urban interface). Herein, we propose a framework that jointly uses remote-sensing-derived human settlement data (i.e., the Global Human Settlement Layer, GHSL) and scanned, georeferenced historical maps to automatically generate historical urban extents for the early 20th century. By applying unsupervised color space segmentation to the historical maps, spatially constrained to the urban extents derived from the GHSL, our approach generates historical settlement extents for seamless integration with the multi-temporal GHSL. We apply our method to study areas in countries across four continents, and evaluate our approach against historical buil...
Historical maps constitute unique sources of retrospective geographic information. Recently, seve... more Historical maps constitute unique sources of retrospective geographic information. Recently, several map archives containing map series covering large spatial and temporal extents have been systematically scanned and made available to the public. The geographic information contained in such data archives allows extending geospatial analysis retrospectively beyond the era of digital cartography. However, given the large data volumes of such archives and the low graphical quality of older map sheets, the processes to extract geographic information need to be automated to the highest degree possible. In order to understand the salient characteristics, data quality variation, and potential challenges in large-scale information extraction tasks, preparatory analytical steps are required to efficiently assess spatio-temporal coverage, approximate map content, and spatial accuracy of such georeferenced map archives across different cartographic scales. Such preparatory steps are often negl...
2014 IEEE International Conference on Data Mining Workshop, 2014
Data mining tasks typically require significant effort in data preparation to find, transform, in... more Data mining tasks typically require significant effort in data preparation to find, transform, integrate and prepare the data for the relevant data mining tools. In addition, the work performed in data preparation is often not recorded and is difficult to reproduce from the raw data. In this paper we present an integrated approach to data preparation and data mining that combines the two steps into a single integrated process and maintains detailed metadata about the data sources, the steps in the process, and the resulting learned classifier produced from data mining algorithms. We present results on an example scenario, which shows that our approach provides significant reduction in the time in takes to perform a data mining task.
Lecture Notes in Computer Science, 2015
Despite the recent growth in the size of the Linked Data Cloud, the absence of links between the ... more Despite the recent growth in the size of the Linked Data Cloud, the absence of links between the vocabularies of the sources has resulted in heterogenous schemas. Our previous work tried to find conceptual mapping between two sources and was successful in finding alignments, such as equivalence and subset relations, using the instances that are linked as equal. By using existential concepts and their intersections to define specialized classes (restriction classes), we were able to find alignments where previously existing concepts in one source did not have corresponding equivalent concepts in the other source. Upon inspection, we found that though we were able to find a good number of alignments, we were unable to completely cover one source with the other. In many cases we observed that even though a larger class could be defined completely by the multiple smaller classes that it subsumed, we were unable to find these alignments because our definition of restriction classes did not contain the disjunction operator to define a union of concepts. In this paper we propose a method that discovers alignments such as these, where a (larger) concept of the first source is aligned to the union of the subsumed (smaller) concepts from the other source. We apply this new algorithm to the Geospatial, Biological Classification, and Genetics domains and show that this approach is able to discover numerous concept coverings, where (in most cases) the subsumed classes are disjoint. The resulting alignments are useful for determining the mappings between ontologies, refining existing ontologies, and finding inconsistencies that may indicate that some instances have been erroneously aligned.
Joint Proceedings of the Workshop on AI Problems and Approaches for Intelligent Environments and Workshop on Semantic Cities, 2013
There is a tremendous amount of geospatial data available, and there are numerous methods for ext... more There is a tremendous amount of geospatial data available, and there are numerous methods for extracting, processing and integrating geospatial sources. However, end-users' ability to retrieve, combine, and integrate heterogeneous geospatial data is limited. This paper presents a new semantic approach that allows users to easily extract, link, and integrate geospatial data from various sources by demonstration in an interactive interface, which is implemented in a tool called Karma. First, we encapsulate the retrieval algorithms as web services and invoke the services to extract geospatial data from various sources. Then we model and publish the extracted geospatial data to RDF for eliminating the data heterogeneity. Finally, we link the geospatial data (in RDF) from different sources using a semantic matching algorithm and integrate them using SPARQL queries. This approach empowers end users to rapidly extract geospatial data from diverse sources, to easily eliminate heterogeneity and to semantically link and integrate sources.
The Semantic Web – ISWC 2012, 2012
Despite the increase in the number of linked instances in the Linked Data Cloud in recent times, ... more Despite the increase in the number of linked instances in the Linked Data Cloud in recent times, the absence of links at the concept level has resulted in heterogenous schemas, challenging the interoperability goal of the Semantic Web. In this paper, we address this problem by finding alignments between concepts from multiple Linked Data sources. Instead of only considering the existing concepts present in each ontology, we hypothesize new composite concepts defined as disjunctions of conjunctions of (RDF) types and value restrictions, which we call restriction classes, and generate alignments between these composite concepts. This extended concept language enables us to find more complete definitions and to even align sources that have rudimentary ontologies, such as those that are simple renderings of relational databases. Our concept alignment approach is based on analyzing the extensions of these concepts and their linked instances. Having explored the alignment of conjunctive concepts in our previous work, in this paper, we focus on concept coverings (disjunctions of restriction classes). We present an evaluation of this new algorithm to Geospatial, Biological Classification, and Genetics domains. The resulting alignments are useful for refining existing ontologies and determining the alignments between concepts in the ontologies, thus increasing the interoperability in the Linked Open Data Cloud.
Lecture Notes in Computer Science, 2010
The Web of Linked Data is characterized by linking structured data from different sources using e... more The Web of Linked Data is characterized by linking structured data from different sources using equivalence statements, such as owl:sameAs, as well as other types of linked properties. The ontologies behind these sources, however, remain unlinked. This paper describes an extensional approach to generate alignments between these ontologies. Specifically our algorithm produces equivalence and subsumption relationships between classes from ontologies of different Linked Data sources by exploring the space of hypotheses supported by the existing equivalence statements. We are also able to generate a complementary hierarchy of derived classes within an existing ontology or generate new classes for a second source where the ontology is not as refined as the first. We demonstrate empirically our approach using Linked Data sources from the geospatial, genetics, and zoology domains. Our algorithm discovered about 800 equivalences and 29,000 subset relationships in the alignment of five source pairs from these domains. Thus, we are able to model one Linked Data source in terms of another by aligning their ontologies and understand the semantic relationships between the two sources.
Recently, large amounts of data are being published using Semantic Web standards. Simultaneously,... more Recently, large amounts of data are being published using Semantic Web standards. Simultaneously, there has been a steady rise in links between objects from multiple sources. However, the ontologies behind these sources have remained largely disconnected, thereby challenging the interoperability goal of the Semantic Web. We address this problem by automatically finding alignments between concepts from multiple linked data sources. Instead of only considering the existing concepts in each ontology, we hypothesize new composite concepts, defined using conjunctions and disjunctions of (RDF) types and value restrictions, and generate alignments between them. In addition, our techniques provide a novel method for curating the linked data web by pointing to likely incorrect or missing assertions. Our approach provides a deeper understanding of the relationships between linked data sources and increases the interoperability among previously disconnected ontologies.
Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2014
Given the increasing popularity and availability of location tracking devices, large quantities o... more Given the increasing popularity and availability of location tracking devices, large quantities of spatiotemporal data are available from many different sources. Quick interactive analysis of such data is important in order to understand the data, identify patterns, and eventually make a marketable product. Since the data do not necessarily follow the relational model and may require flexible processing possibly using advanced machine learning techniques, spatial databases or similar query tools do not make the best means for such analysis. Moreover, the high complexity of geometric operations makes the quick interactive analysis very difficult. In this paper, we present a highly flexible functional query engine that 1) works with multiple schema types, 2) provides low response times by spatiotemporal indexing and parallelization, 3) helps understand the data using visualizations, and 4) is highly extensible to easily add complex functionality. To demonstrate its usefulness, we use our tool to solve a real world problem of crime pattern analysis in Los Angeles County and compare the process with some other well known tools.
The road network is one of the most important types of information on raster maps. In particular,... more The road network is one of the most important types of information on raster maps. In particular, the set of road intersection templates, which consists of the road intersection positions, the road connectivities, and the road orientations, represents an abstraction of the road network and is more accurate and easier to extract than the extraction of the entire road network. To extract the road intersection templates from raster maps, the thinning operator is commonly used to find the basic structure of the road lines (i.e., to extract the skeletons of the lines). However, the thinning operator produces distorted lines near line intersections, especially at the T-shaped intersections. Therefore, the extracted position of the road intersection and the road orientations are not accurate. In this paper, we utilize our previous work on automatically extracting road intersection positions to identify the road lines that intersect at the intersections and then trace the road orientations and refine the positions of the road intersections. We compare the proposed approach with the usage of the thinning operator and show that our proposed approach extracts more accurate road intersection positions and road orientations than the previous approach.
Raster maps are widely available and contain useful geographic features such as labels and road l... more Raster maps are widely available and contain useful geographic features such as labels and road lines. To extract the geographic features, most research work relies on a manual step to first extract the foreground pixels from the maps using the distinctive colors or grayscale intensities of the pixels. This strategy requires user interaction for each map to select a set of thresholds. In this paper, we present a map classification technique that uses an image comparison feature called the luminance-boundary histogram and a nearest-neighbor classifier to identify raster maps with similar grayscale intensity usage. We can then apply previously learned thresholds to separate the foreground pixels from the raster maps that are classified in the same group instead of manually examining each map. We show that the luminance-boundary histogram achieves 95� accuracy in our map classification experiment compared to 13.33�, 86.67�, and 88.33� using three traditional image comparison features. The accurate map classification results make it possible to extract geographic features from previously unseen raster maps.
Raster maps are an important source of road information. Because of the overlapping map features ... more Raster maps are an important source of road information. Because of the overlapping map features (e.g., roads and text labels) and the varying image quality, extracting road vector data from raster maps usually requires significant user input to achieve accurate results. In this paper, we present an accurate road vectorization technique that minimizes user input by combining our previous work on extracting road pixels and road-intersection templates to extract accurate road vector data from raster maps. Our approach enables GIS applications to exploit the road information in raster maps for the areas where the road vector data are otherwise not easily accessible, such as the countries of the Middle East. We show that our approach requires minimal user input and achieves an average of 93.2% completeness and 95.6% correctness in an experiment using raster maps from various sources.
Document Recognition and Retrieval Xvii, 2010
Maps can be a great source of information for a given geographic region, but they can be difficul... more Maps can be a great source of information for a given geographic region, but they can be difficult to find and even harder to process. A significant problem is that many interesting and useful maps are only available in raster format, and even worse many maps have been poorly scanned and they are often compressed with lossy compression algorithms. Furthermore, for many of these maps there is no meta data providing the geographic coordinates, scale, or projection. Previous research on map processing has developed techniques that typically work on maps from a single map source. In contrast, we have developed a general approach to finding and processing street maps. This includes techniques for discovering maps online, extracting geographic and textual features from maps, using the extracted features to determine the geographic coordinates of the maps, and aligning the maps with imagery. The resulting system can find, register, and extract a variety of features from raster maps, which can then be used for various applications, such as annotating satellite imagery, creating and updating maps, or constructing detailed gazetteers.
Raster maps contain rich road information, such as the topology and names of roads, but this info... more Raster maps contain rich road information, such as the topology and names of roads, but this information is "locked" in images and inaccessible in a geographic information system (GIS). Previous approaches for road extraction from raster maps typically handle this problem as raster-to-vector conversion and hence the extracted road vector data are line segments without the knowledge of road names and where a road starts and ends. This paper presents a technique that builds on the results from our previous road vectorization and text recognition work to generate named road vector data from raster maps. This technique first segments road vectorization results using road intersections to determine the lines that represent individual roads in the map. Then the technique exploits spatial relationships between roads and recognized text labels to generate road names for individual road segments. We implemented this approach in our map processing system, called Strabo, and demonstrate that the system generates accurate named road vector data on example maps with 92.83% accuracy.
Historical maps contain rich cartographic information, such as road networks, but this informatio... more Historical maps contain rich cartographic information, such as road networks, but this information is "locked" in images and inaccessible to a geographic information system (GIS). Manual map digitization requires intensive user effort and cannot handle a large number of maps. Previous approaches for automatic map processing generally require expert knowledge in order to fine-tune parameters of the applied graphics recognition techniques and thus are not readily usable for non-expert users. This paper presents an efficient and effective graphics recognition technique that employs interactive user intervention procedures for processing historical raster maps with limited graphical quality. The interactive procedures are performed on color-segmented preprocessing results and are based on straightforward user training processes, which minimize the required user effort for map digitization. This graphics recognition technique eliminates the need for expert users in digitizing map images and provides opportunities to derive unique data for spatiotemporal research by facilitating timeconsuming map digitization efforts. The described technique generated accurate road vector data from a historical map image and reduced the time for manual map digitization by 38%.
Raster maps are easily accessible and contain rich road information; however, converting the road... more Raster maps are easily accessible and contain rich road information; however, converting the road information to vector format is challenging because of varying image quality, overlapping features, and typical lack of metadata (e.g., map geocoordinates). Previous road vectorization approaches for raster maps typically handle a specific map series and require significant user effort. In this paper, we present a general road vectorization approach that exploits common geometric properties of roads in maps for processing heterogeneous raster maps while requiring minimal user intervention. In our experiments, we compared our approach to a widely-used commercial product using 40 raster maps from 11 sources. We showed that overall our approach generated high quality results with low redundancy with considerably less user input compared to competing approaches.
2006 IEEE International Conference on Multimedia and Expo, 2006
The rapid increase in the availability of geospatial data has motivated the effort to seamlessly ... more The rapid increase in the availability of geospatial data has motivated the effort to seamlessly integrate this information into an information-rich and realistic 3D environment. However, heterogeneous data sources with varying degrees of consistency and accuracy pose a challenge to such efforts. We describe the Geospatial Decision Making (GeoDec) system, which accurately integrates satellite imagery, three-dimensional models, textures and video streams, road data, maps, point data and temporal data. The system also includes a glove-based user interface.
Proceedings of the 12th annual ACM international workshop on Geographic information systems, 2004
Recent growth of the geospatial information on the web has made it possible to easily access vari... more Recent growth of the geospatial information on the web has made it possible to easily access various maps and orthoimagery. By integrating these maps and imagery, we can create intelligent images that combine the visual appeal and accuracy of imagery with the detailed attribution information often contained in diverse maps. However, accurately integrating maps and imagery from different data sources remains a challenging task. This is because spatial data obtained from various data sources may have different projections and different accuracy levels. Most of the existing algorithms only deal with vector to vector spatial data integration or require human intervention to accomplish imagery to map conflation. In this paper, we describe an information integration approach that utilizes common vector datasets as "glue" to automatically conflate imagery with street maps. We present efficient techniques to automatically extract road intersections from imagery and maps as control points. We also describe a specialized point pattern matching algorithm to align the two point sets and conflation techniques to align the imagery with maps. We show that these automatic conflation techniques can automatically and accurately align maps with images of the same area. In particular, using the approach described in this paper, our system automatically aligns a set of TIGER maps for an area in El Segundo, CA to the corresponding orthoimagery with an average error of 8.35 meters per pixel. This is a significant improvement considering that simply combining the TIGER maps with the corresponding imagery based on geographic coordinates provided by the sources results in error of 27 meters per pixel.
2021 IEEE International Conference on Big Data (Big Data), 2021
The increasing availability and accessibility of numerous overhead images allows us to estimate a... more The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional supervised object detection approaches can estimate accurate spatial arrangement but require large amounts of bounding box annotations. Recent semisupervised clustering approaches can reduce manual labeling but still require annotations for all object categories in the image. This paper presents the target-guided generative model (TGGM), under the Variational Auto-encoder (VAE) framework, which uses Gaussian Mixture Models (GMM) to estimate the distributions of both hidden and decoder variables in VAE. Modeling both hidden and decoder variables by GMM reduces the required manual annotations significantly for spatial arrangement estimation. Unlike existing approaches that the training process can only update the GMM as a whole in the optimization iterations (e.g., a "minibatch"), TGGM allows the update of individual GMM components separately in the same optimization iteration. Optimizing GMM components separately allows TGGM to exploit the semantic relationships in spatial data and requires only a few labels to initiate and guide the generative process. Our experiments shows that TGGM achieves results comparable to the state-of-the-art semi-supervised methods and outperformes unsupervised methods by 10% based on the F1 scores, while requiring significantly fewer labeled data.
Semantic Web
Historical maps provide rich information for researchers in many areas, including the social and ... more Historical maps provide rich information for researchers in many areas, including the social and natural sciences. These maps contain detailed documentation of a wide variety of natural and human-made features and their changes over time, such as changes in transportation networks or the decline of wetlands or forest areas. Analyzing changes over time in such maps can be labor-intensive for a scientist, even after the geographic features have been digitized and converted to a vector format. Knowledge Graphs (KGs) are the appropriate representations to store and link such data and support semantic and temporal querying to facilitate change analysis. KGs combine expressivity, interoperability, and standardization in the Semantic Web stack, thus providing a strong foundation for querying and analysis. In this paper, we present an automatic approach to convert vector geographic features extracted from multiple historical maps into contextualized spatio-temporal KGs. The resulting graphs...
Remote Sensing, 2021
Spatially explicit, fine-grained datasets describing historical urban extents are rarely availabl... more Spatially explicit, fine-grained datasets describing historical urban extents are rarely available prior to the era of operational remote sensing. However, such data are necessary to better understand long-term urbanization and land development processes and for the assessment of coupled nature–human systems (e.g., the dynamics of the wildland–urban interface). Herein, we propose a framework that jointly uses remote-sensing-derived human settlement data (i.e., the Global Human Settlement Layer, GHSL) and scanned, georeferenced historical maps to automatically generate historical urban extents for the early 20th century. By applying unsupervised color space segmentation to the historical maps, spatially constrained to the urban extents derived from the GHSL, our approach generates historical settlement extents for seamless integration with the multi-temporal GHSL. We apply our method to study areas in countries across four continents, and evaluate our approach against historical buil...
Historical maps constitute unique sources of retrospective geographic information. Recently, seve... more Historical maps constitute unique sources of retrospective geographic information. Recently, several map archives containing map series covering large spatial and temporal extents have been systematically scanned and made available to the public. The geographic information contained in such data archives allows extending geospatial analysis retrospectively beyond the era of digital cartography. However, given the large data volumes of such archives and the low graphical quality of older map sheets, the processes to extract geographic information need to be automated to the highest degree possible. In order to understand the salient characteristics, data quality variation, and potential challenges in large-scale information extraction tasks, preparatory analytical steps are required to efficiently assess spatio-temporal coverage, approximate map content, and spatial accuracy of such georeferenced map archives across different cartographic scales. Such preparatory steps are often negl...
2014 IEEE International Conference on Data Mining Workshop, 2014
Data mining tasks typically require significant effort in data preparation to find, transform, in... more Data mining tasks typically require significant effort in data preparation to find, transform, integrate and prepare the data for the relevant data mining tools. In addition, the work performed in data preparation is often not recorded and is difficult to reproduce from the raw data. In this paper we present an integrated approach to data preparation and data mining that combines the two steps into a single integrated process and maintains detailed metadata about the data sources, the steps in the process, and the resulting learned classifier produced from data mining algorithms. We present results on an example scenario, which shows that our approach provides significant reduction in the time in takes to perform a data mining task.
Lecture Notes in Computer Science, 2015
Despite the recent growth in the size of the Linked Data Cloud, the absence of links between the ... more Despite the recent growth in the size of the Linked Data Cloud, the absence of links between the vocabularies of the sources has resulted in heterogenous schemas. Our previous work tried to find conceptual mapping between two sources and was successful in finding alignments, such as equivalence and subset relations, using the instances that are linked as equal. By using existential concepts and their intersections to define specialized classes (restriction classes), we were able to find alignments where previously existing concepts in one source did not have corresponding equivalent concepts in the other source. Upon inspection, we found that though we were able to find a good number of alignments, we were unable to completely cover one source with the other. In many cases we observed that even though a larger class could be defined completely by the multiple smaller classes that it subsumed, we were unable to find these alignments because our definition of restriction classes did not contain the disjunction operator to define a union of concepts. In this paper we propose a method that discovers alignments such as these, where a (larger) concept of the first source is aligned to the union of the subsumed (smaller) concepts from the other source. We apply this new algorithm to the Geospatial, Biological Classification, and Genetics domains and show that this approach is able to discover numerous concept coverings, where (in most cases) the subsumed classes are disjoint. The resulting alignments are useful for determining the mappings between ontologies, refining existing ontologies, and finding inconsistencies that may indicate that some instances have been erroneously aligned.
Joint Proceedings of the Workshop on AI Problems and Approaches for Intelligent Environments and Workshop on Semantic Cities, 2013
There is a tremendous amount of geospatial data available, and there are numerous methods for ext... more There is a tremendous amount of geospatial data available, and there are numerous methods for extracting, processing and integrating geospatial sources. However, end-users' ability to retrieve, combine, and integrate heterogeneous geospatial data is limited. This paper presents a new semantic approach that allows users to easily extract, link, and integrate geospatial data from various sources by demonstration in an interactive interface, which is implemented in a tool called Karma. First, we encapsulate the retrieval algorithms as web services and invoke the services to extract geospatial data from various sources. Then we model and publish the extracted geospatial data to RDF for eliminating the data heterogeneity. Finally, we link the geospatial data (in RDF) from different sources using a semantic matching algorithm and integrate them using SPARQL queries. This approach empowers end users to rapidly extract geospatial data from diverse sources, to easily eliminate heterogeneity and to semantically link and integrate sources.
The Semantic Web – ISWC 2012, 2012
Despite the increase in the number of linked instances in the Linked Data Cloud in recent times, ... more Despite the increase in the number of linked instances in the Linked Data Cloud in recent times, the absence of links at the concept level has resulted in heterogenous schemas, challenging the interoperability goal of the Semantic Web. In this paper, we address this problem by finding alignments between concepts from multiple Linked Data sources. Instead of only considering the existing concepts present in each ontology, we hypothesize new composite concepts defined as disjunctions of conjunctions of (RDF) types and value restrictions, which we call restriction classes, and generate alignments between these composite concepts. This extended concept language enables us to find more complete definitions and to even align sources that have rudimentary ontologies, such as those that are simple renderings of relational databases. Our concept alignment approach is based on analyzing the extensions of these concepts and their linked instances. Having explored the alignment of conjunctive concepts in our previous work, in this paper, we focus on concept coverings (disjunctions of restriction classes). We present an evaluation of this new algorithm to Geospatial, Biological Classification, and Genetics domains. The resulting alignments are useful for refining existing ontologies and determining the alignments between concepts in the ontologies, thus increasing the interoperability in the Linked Open Data Cloud.
Lecture Notes in Computer Science, 2010
The Web of Linked Data is characterized by linking structured data from different sources using e... more The Web of Linked Data is characterized by linking structured data from different sources using equivalence statements, such as owl:sameAs, as well as other types of linked properties. The ontologies behind these sources, however, remain unlinked. This paper describes an extensional approach to generate alignments between these ontologies. Specifically our algorithm produces equivalence and subsumption relationships between classes from ontologies of different Linked Data sources by exploring the space of hypotheses supported by the existing equivalence statements. We are also able to generate a complementary hierarchy of derived classes within an existing ontology or generate new classes for a second source where the ontology is not as refined as the first. We demonstrate empirically our approach using Linked Data sources from the geospatial, genetics, and zoology domains. Our algorithm discovered about 800 equivalences and 29,000 subset relationships in the alignment of five source pairs from these domains. Thus, we are able to model one Linked Data source in terms of another by aligning their ontologies and understand the semantic relationships between the two sources.
Recently, large amounts of data are being published using Semantic Web standards. Simultaneously,... more Recently, large amounts of data are being published using Semantic Web standards. Simultaneously, there has been a steady rise in links between objects from multiple sources. However, the ontologies behind these sources have remained largely disconnected, thereby challenging the interoperability goal of the Semantic Web. We address this problem by automatically finding alignments between concepts from multiple linked data sources. Instead of only considering the existing concepts in each ontology, we hypothesize new composite concepts, defined using conjunctions and disjunctions of (RDF) types and value restrictions, and generate alignments between them. In addition, our techniques provide a novel method for curating the linked data web by pointing to likely incorrect or missing assertions. Our approach provides a deeper understanding of the relationships between linked data sources and increases the interoperability among previously disconnected ontologies.
Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2014
Given the increasing popularity and availability of location tracking devices, large quantities o... more Given the increasing popularity and availability of location tracking devices, large quantities of spatiotemporal data are available from many different sources. Quick interactive analysis of such data is important in order to understand the data, identify patterns, and eventually make a marketable product. Since the data do not necessarily follow the relational model and may require flexible processing possibly using advanced machine learning techniques, spatial databases or similar query tools do not make the best means for such analysis. Moreover, the high complexity of geometric operations makes the quick interactive analysis very difficult. In this paper, we present a highly flexible functional query engine that 1) works with multiple schema types, 2) provides low response times by spatiotemporal indexing and parallelization, 3) helps understand the data using visualizations, and 4) is highly extensible to easily add complex functionality. To demonstrate its usefulness, we use our tool to solve a real world problem of crime pattern analysis in Los Angeles County and compare the process with some other well known tools.
The road network is one of the most important types of information on raster maps. In particular,... more The road network is one of the most important types of information on raster maps. In particular, the set of road intersection templates, which consists of the road intersection positions, the road connectivities, and the road orientations, represents an abstraction of the road network and is more accurate and easier to extract than the extraction of the entire road network. To extract the road intersection templates from raster maps, the thinning operator is commonly used to find the basic structure of the road lines (i.e., to extract the skeletons of the lines). However, the thinning operator produces distorted lines near line intersections, especially at the T-shaped intersections. Therefore, the extracted position of the road intersection and the road orientations are not accurate. In this paper, we utilize our previous work on automatically extracting road intersection positions to identify the road lines that intersect at the intersections and then trace the road orientations and refine the positions of the road intersections. We compare the proposed approach with the usage of the thinning operator and show that our proposed approach extracts more accurate road intersection positions and road orientations than the previous approach.
Raster maps are widely available and contain useful geographic features such as labels and road l... more Raster maps are widely available and contain useful geographic features such as labels and road lines. To extract the geographic features, most research work relies on a manual step to first extract the foreground pixels from the maps using the distinctive colors or grayscale intensities of the pixels. This strategy requires user interaction for each map to select a set of thresholds. In this paper, we present a map classification technique that uses an image comparison feature called the luminance-boundary histogram and a nearest-neighbor classifier to identify raster maps with similar grayscale intensity usage. We can then apply previously learned thresholds to separate the foreground pixels from the raster maps that are classified in the same group instead of manually examining each map. We show that the luminance-boundary histogram achieves 95� accuracy in our map classification experiment compared to 13.33�, 86.67�, and 88.33� using three traditional image comparison features. The accurate map classification results make it possible to extract geographic features from previously unseen raster maps.
Raster maps are an important source of road information. Because of the overlapping map features ... more Raster maps are an important source of road information. Because of the overlapping map features (e.g., roads and text labels) and the varying image quality, extracting road vector data from raster maps usually requires significant user input to achieve accurate results. In this paper, we present an accurate road vectorization technique that minimizes user input by combining our previous work on extracting road pixels and road-intersection templates to extract accurate road vector data from raster maps. Our approach enables GIS applications to exploit the road information in raster maps for the areas where the road vector data are otherwise not easily accessible, such as the countries of the Middle East. We show that our approach requires minimal user input and achieves an average of 93.2% completeness and 95.6% correctness in an experiment using raster maps from various sources.
Document Recognition and Retrieval Xvii, 2010
Maps can be a great source of information for a given geographic region, but they can be difficul... more Maps can be a great source of information for a given geographic region, but they can be difficult to find and even harder to process. A significant problem is that many interesting and useful maps are only available in raster format, and even worse many maps have been poorly scanned and they are often compressed with lossy compression algorithms. Furthermore, for many of these maps there is no meta data providing the geographic coordinates, scale, or projection. Previous research on map processing has developed techniques that typically work on maps from a single map source. In contrast, we have developed a general approach to finding and processing street maps. This includes techniques for discovering maps online, extracting geographic and textual features from maps, using the extracted features to determine the geographic coordinates of the maps, and aligning the maps with imagery. The resulting system can find, register, and extract a variety of features from raster maps, which can then be used for various applications, such as annotating satellite imagery, creating and updating maps, or constructing detailed gazetteers.
Raster maps contain rich road information, such as the topology and names of roads, but this info... more Raster maps contain rich road information, such as the topology and names of roads, but this information is "locked" in images and inaccessible in a geographic information system (GIS). Previous approaches for road extraction from raster maps typically handle this problem as raster-to-vector conversion and hence the extracted road vector data are line segments without the knowledge of road names and where a road starts and ends. This paper presents a technique that builds on the results from our previous road vectorization and text recognition work to generate named road vector data from raster maps. This technique first segments road vectorization results using road intersections to determine the lines that represent individual roads in the map. Then the technique exploits spatial relationships between roads and recognized text labels to generate road names for individual road segments. We implemented this approach in our map processing system, called Strabo, and demonstrate that the system generates accurate named road vector data on example maps with 92.83% accuracy.
Historical maps contain rich cartographic information, such as road networks, but this informatio... more Historical maps contain rich cartographic information, such as road networks, but this information is "locked" in images and inaccessible to a geographic information system (GIS). Manual map digitization requires intensive user effort and cannot handle a large number of maps. Previous approaches for automatic map processing generally require expert knowledge in order to fine-tune parameters of the applied graphics recognition techniques and thus are not readily usable for non-expert users. This paper presents an efficient and effective graphics recognition technique that employs interactive user intervention procedures for processing historical raster maps with limited graphical quality. The interactive procedures are performed on color-segmented preprocessing results and are based on straightforward user training processes, which minimize the required user effort for map digitization. This graphics recognition technique eliminates the need for expert users in digitizing map images and provides opportunities to derive unique data for spatiotemporal research by facilitating timeconsuming map digitization efforts. The described technique generated accurate road vector data from a historical map image and reduced the time for manual map digitization by 38%.
Raster maps are easily accessible and contain rich road information; however, converting the road... more Raster maps are easily accessible and contain rich road information; however, converting the road information to vector format is challenging because of varying image quality, overlapping features, and typical lack of metadata (e.g., map geocoordinates). Previous road vectorization approaches for raster maps typically handle a specific map series and require significant user effort. In this paper, we present a general road vectorization approach that exploits common geometric properties of roads in maps for processing heterogeneous raster maps while requiring minimal user intervention. In our experiments, we compared our approach to a widely-used commercial product using 40 raster maps from 11 sources. We showed that overall our approach generated high quality results with low redundancy with considerably less user input compared to competing approaches.
2006 IEEE International Conference on Multimedia and Expo, 2006
The rapid increase in the availability of geospatial data has motivated the effort to seamlessly ... more The rapid increase in the availability of geospatial data has motivated the effort to seamlessly integrate this information into an information-rich and realistic 3D environment. However, heterogeneous data sources with varying degrees of consistency and accuracy pose a challenge to such efforts. We describe the Geospatial Decision Making (GeoDec) system, which accurately integrates satellite imagery, three-dimensional models, textures and video streams, road data, maps, point data and temporal data. The system also includes a glove-based user interface.
Proceedings of the 12th annual ACM international workshop on Geographic information systems, 2004
Recent growth of the geospatial information on the web has made it possible to easily access vari... more Recent growth of the geospatial information on the web has made it possible to easily access various maps and orthoimagery. By integrating these maps and imagery, we can create intelligent images that combine the visual appeal and accuracy of imagery with the detailed attribution information often contained in diverse maps. However, accurately integrating maps and imagery from different data sources remains a challenging task. This is because spatial data obtained from various data sources may have different projections and different accuracy levels. Most of the existing algorithms only deal with vector to vector spatial data integration or require human intervention to accomplish imagery to map conflation. In this paper, we describe an information integration approach that utilizes common vector datasets as "glue" to automatically conflate imagery with street maps. We present efficient techniques to automatically extract road intersections from imagery and maps as control points. We also describe a specialized point pattern matching algorithm to align the two point sets and conflation techniques to align the imagery with maps. We show that these automatic conflation techniques can automatically and accurately align maps with images of the same area. In particular, using the approach described in this paper, our system automatically aligns a set of TIGER maps for an area in El Segundo, CA to the corresponding orthoimagery with an average error of 8.35 meters per pixel. This is a significant improvement considering that simply combining the TIGER maps with the corresponding imagery based on geographic coordinates provided by the sources results in error of 27 meters per pixel.