Toponym Disambiguation in Natural Language Processing (original) (raw)

Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text

Remote. Sens., 2020

The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. The second task consists of assigning such entities to their most likely coordinates. Frequently, the latter process involves solving referential ambiguities. In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called dynamic context disambiguation. Once place names are recognized in an input text, they are solved using a grammar, in which a set of rules specifies how ambigu...

On metonymy recognition for geographic information retrieval

International Journal of Geographical Information Science, 2007

Metonymically used location names (toponyms) refer to other, related entities and thus possess a meaning different from their literal, geographic sense. Metonymic uses are to be treated differently to improve the performance of geographic information retrieval (GIR). Statistics on toponym senses show that 75.06% of all location names are used in their literal sense, 17.05% are used metonymically, and 7.89% have a mixed sense. This article presents a method for disambiguating location names in texts between literal and metonymic senses, based on shallow features.The evaluation of this method is two‐fold. First, we use a memory‐based learner (TiMBL) to train a classifier and determine standard evaluation measures such as F‐score and accuracy. The classifier achieved an F‐score of 0.842 and an accuracy of 0.846 for identifying toponym senses in a subset of the CoNLL (Conference on Natural Language Learning) data.Second, we perform retrieval experiments based on the GeoCLEF data (newspaper article corpus and queries) from 2005 and 2006. We compare searching location names in a database index containing both their literal and metonymic senses with searching in an index containing their literal senses only. Evaluation results indicate that removing metonymic senses from the index yields a higher mean average precision (MAP) for GIR. In total, we observed a significant gain in MAP: an increase from 0.0704 to 0.0715 MAP for the GeoCLEF 2005 data, and an increase from 0.1944 to 0.2100 MAP for the GeoCLEF 2006 data.

Disambiguating toponyms in news

Proceedings of the Conference on Human …, 2005

This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine ...