K.Stock et al. Creating a corpus of geospatial language (original) (raw)

Creating a corpus of geospatial language

2012

The description of location using natural language is of interest for a number of research activities in geography, linguistics and cognitive science, including the development of methods for automated interpretation and generation of natural language to ease interaction with geographic information systems, as well as a number of related endeavours. For such research activities, examples of geospatial language are usually collected from the personal knowledge of researchers, or in small scale collection activities specific to the project concerned. This paper describes the process used to develop a more generic corpus of geospatial language. While the motivation for development was the authors’ ongoing research into natural language geospatial querying, it also has wider applications across a range of research areas. The paper describes the development and evaluation of four methods for semiautomated harvesting of geospatial language clauses from text to create a corpus of geospatia...

Identifying Patterns in Geospatial Natural Language

The automated interpretation of geospatial natural language in order to identify the locations and geometric relationships described in natural language expressions is difficult because many spatial prepositions and other spatial words may be interpreted in many different ways, depending on the context and the underlying conceptual models that apply in that context.

On the Geo-Indicativeness of non-Georeferenced Text

Geographic location is a key component for information retrieval on the Web, recommendation systems in mobile computing and social networks, and placebased integration on the Linked Data cloud. Previous work has addressed how to estimate locations by named entity recognition, from images, and via structured data. In this paper, we estimate geographic regions from unstructured, non geo-referenced text by computing a probability distribution over the Earth's surface. Our methodology combines natural language processing, geostatistics, and a data-driven bottom-up semantics. We illustrate its potential for mapping geographic regions from non geo-referenced text.

MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information

Lecture Notes in Computer Science, 2008

This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the "where" component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the "what" and "geo-relation" components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1 st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in "user-oriented" disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives.

Detecting the Geospatialness of Prepositions from Natural Language Text (Short Paper)

2019

There is increasing interest in detecting the presence of geospatial locative expressions that include spatial relation terms such as near or within . Being able to do so provides a foundation for interpreting relative descriptions of location and for building corpora that facilitate the development of methods for spatial relation extraction and interpretation. Here we evaluate the use of a spatial role labelling procedure to distinguish geospatial uses of prepositions from other spatial and non-spatial uses and experiment with the use of additional machine learning features to improve the quality of detection of geospatial prepositions. An annotated corpus of nearly 2000 instances of preposition usage was created for training and testing the classifiers. 2012 ACM Subject Classification Computing methodologies → Artificial intelligence; Computing methodologies → Natural language processing

Cognitive Characterization of Geographic Objects Based on Spatial Descriptions in Web Resources

This paper examines the effectiveness of geographic information retrieval using partial natural language analysis. In information retrieval, two most popular methods are term frequencies and cooccurences. However, these methods do not take account of grammatical and semantical structure, therefore information that can be extracted are limited. We propose a method for geographic information retrieval based on case structures and modification relationships in sentences. Experiments were performed onto web resources and results were obtained, showing preciseness of our method.

Challenges in Creating an Annotated Set of Geospatial Natural Language Descriptions (Short Paper)

2018

In order to extract and map location information from natural language descriptions, a first step is to identify different language elements within the descriptions. In this paper, we describe a method and discuss the challenges faced in creating an annotated set of geospatial natural language descriptions using manual tagging, with the purpose of supporting validation and machine learning approaches to annotation and text interpretation. 2012 ACM Subject Classification Applied computing → Annotation

Experiments with geographic evidence extracted from documents

2009

Abstract. For the 2008 participation at GeoCLEF, we focused on improving the extraction of geographic signatures from documents and optimising their use for GIR. The results show that the detection of explicit geographic named entities for including their terms in a tuned weighted index field significantly improves retrieval performance when compared to classic text retrieval.

What's missing in geographical parsing

Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information in many real-world applications such as emergency responses, real-time social media geographical event analysis, understanding location instructions in auto-response systems and more. However, geoparsing is still widely regarded as a challenge because of domain language diversity, place name ambiguity, metonymic language and limited leveraging of context as we show in our analysis. Results to date, whilst promising, are on laboratory data and unlike in wider NLP are often not cross-compared. In this study, we evaluate and analyse the performance of a number of leading geoparsers on a number of corpora and highlight the challenges in detail. We also publish an automatically geotagged Wikipedia corpus to alleviate the dearth of (open source) corpora in this domain.