Place disambiguation with co-occurrence models (original) (raw)

Geographic Information Retrieval: Classification, Disambiguation and Modelling

My thesis aims to augment the Geographic Information Retrieval process with information extracted from world knowledge. This aim is approached from three directions: classifying world knowledge, disambiguating placenames and modelling users. Geographic information is becoming ubiquitous across the Internet, with a significant proportion of web documents and web searches containing geographic entities, and the proliferation of Internet enabled mobile devices. Traditional information retrieval treats these geographic entities in the same way as any other textual data. In this thesis I augment the retrieval process with geographic information, and show how methods built upon world knowledge outperform methods based on heuristic rules. The source of world knowledge used is Wikipedia. Wikipedia has become a phenomenon of the Internet age and needs little introduction. As a linked corpus of semi-structured data, it is unsurpassed. Two approaches to mining information from Wikipedia are rigorously explored: initially I classify Wikipedia articles into broad categories; this is followed by much finer classification where Wikipedia articles are disambiguated as specific locations. The thesis concludes with the proposal of the Steinberg hypothesis: By analysing a range of wikipedias in different languages I demonstrate that a localised view of the world is ubiquitous and inherently part of human nature. All people perceive closer places as larger and more important than distant ones. The core contributions of mythesis are in the areas of extracting information from Wikipedia, supervised placename disambiguation, and providing a quantitative model for how people view the world. The findings clearly have a direct impact for applications such as geographically aware search engines, but in a broader context documents can be automatically annotated with machine readable meta-data and dialogue enhanced with a model of how people view the world. This will reduce ambiguity and confusion in dialogue between people or computers.

On metonymy recognition for geographic information retrieval

International Journal of Geographical Information Science, 2007

Metonymically used location names (toponyms) refer to other, related entities and thus possess a meaning different from their literal, geographic sense. Metonymic uses are to be treated differently to improve the performance of geographic information retrieval (GIR). Statistics on toponym senses show that 75.06% of all location names are used in their literal sense, 17.05% are used metonymically, and 7.89% have a mixed sense. This article presents a method for disambiguating location names in texts between literal and metonymic senses, based on shallow features.The evaluation of this method is two‐fold. First, we use a memory‐based learner (TiMBL) to train a classifier and determine standard evaluation measures such as F‐score and accuracy. The classifier achieved an F‐score of 0.842 and an accuracy of 0.846 for identifying toponym senses in a subset of the CoNLL (Conference on Natural Language Learning) data.Second, we perform retrieval experiments based on the GeoCLEF data (newspaper article corpus and queries) from 2005 and 2006. We compare searching location names in a database index containing both their literal and metonymic senses with searching in an index containing their literal senses only. Evaluation results indicate that removing metonymic senses from the index yields a higher mean average precision (MAP) for GIR. In total, we observed a significant gain in MAP: an increase from 0.0704 to 0.0715 MAP for the GeoCLEF 2005 data, and an increase from 0.1944 to 0.2100 MAP for the GeoCLEF 2006 data.

On assigning place names to geography related web pages

2005

Abstract In this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose our approaches to address these sub-problems. In particular, we have modified GATE, a well-known named entity extraction software, to perform place name extraction using a US Census gazetteer.

University of Pittsburgh at GeoCLEF 2008: Towards effective geographic information retrieval

Abstract. This paper reports University of Pittsburgh's participation in GeoCLEF 2008. As the first time participants, we only worked on the monolingual GeoCLEF task and submitted four runs under two different methods. Our GCEC method aims to test the effectiveness of our online geographic coordinate extraction and clustering algorithm, and our WIKIGEO method wants to examine the usefulness of using the geo-coordinate information in Wikipedia for identifying geo-locations. Our experiments results show that: 1) our online geographic ...