Proposing a geographic co-occurrence model as a tool for GIR (original) (raw)

Geographic Co-occurrence as a Tool for GIR.

In this paper we describe the development of a geographic co-occurrence model and how it can be applied to geographic information retrieval. The model consists of mining cooccurrences of placenames from Wikipedia, and then mapping these placenames to locations in the Getty Thesaurus of Geographical Names. We begin by quantifying the accuracy of our model and compute theoretical bounds for the accuracy achievable when applied to placename disambiguation in free text. We conclude with a discussion of the improvement such a model could provide for placename disambiguation and geographic relevance ranking over traditional methods.

Geographic Information Retrieval: Classification, Disambiguation and Modelling

My thesis aims to augment the Geographic Information Retrieval process with information extracted from world knowledge. This aim is approached from three directions: classifying world knowledge, disambiguating placenames and modelling users. Geographic information is becoming ubiquitous across the Internet, with a significant proportion of web documents and web searches containing geographic entities, and the proliferation of Internet enabled mobile devices. Traditional information retrieval treats these geographic entities in the same way as any other textual data. In this thesis I augment the retrieval process with geographic information, and show how methods built upon world knowledge outperform methods based on heuristic rules. The source of world knowledge used is Wikipedia. Wikipedia has become a phenomenon of the Internet age and needs little introduction. As a linked corpus of semi-structured data, it is unsurpassed. Two approaches to mining information from Wikipedia are rigorously explored: initially I classify Wikipedia articles into broad categories; this is followed by much finer classification where Wikipedia articles are disambiguated as specific locations. The thesis concludes with the proposal of the Steinberg hypothesis: By analysing a range of wikipedias in different languages I demonstrate that a localised view of the world is ubiquitous and inherently part of human nature. All people perceive closer places as larger and more important than distant ones. The core contributions of mythesis are in the areas of extracting information from Wikipedia, supervised placename disambiguation, and providing a quantitative model for how people view the world. The findings clearly have a direct impact for applications such as geographically aware search engines, but in a broader context documents can be automatically annotated with machine readable meta-data and dialogue enhanced with a model of how people view the world. This will reduce ambiguity and confusion in dialogue between people or computers.

Place disambiguation with co-occurrence models

CLEF 2006 Workshop, Working …, 2006

In this paper we describe the geographic information retrieval system developed by the Multimedia & Information Systems team for GeoCLEF 2006 and the results achieved. We detail our methods for generating and applying co-occurrence models for the purpose of place name disambiguation, our use of named entity recognition tools and text indexing applications. The presented system is split into two stages: a batch text & geographic indexer and a real time query engine. The query engine takes manually crafted queries where the text component is separated from the geographic component. Two monolingual runs were submitted for the GeoCLEF evaluation, the first constructed from the title and description, the second included the narrative also. We explain in detail our use of co-occurrence models for place name disambiguation using a model generated from Wikipedia. The paper concludes with a full description of future work and ways in which the system could be optimised.

University of Pittsburgh at GeoCLEF 2008: Towards effective geographic information retrieval

Abstract. This paper reports University of Pittsburgh's participation in GeoCLEF 2008. As the first time participants, we only worked on the monolingual GeoCLEF task and submitted four runs under two different methods. Our GCEC method aims to test the effectiveness of our online geographic coordinate extraction and clustering algorithm, and our WIKIGEO method wants to examine the usefulness of using the geo-coordinate information in Wikipedia for identifying geo-locations. Our experiments results show that: 1) our online geographic ...

Workshop on geographic information retrieval, SIGIR 2004

ACM SIGIR Forum, 2004

Geographic Information Retrieval is fast emerging as an interdisciplinary hot-topic, both in an academic and commercial sense. Retrieving data based not only on conceptual key words, but some notion of the locational relevance of the information requires research of a range of techniques, for example• the extraction of geographic terms from structured and, more challengingly, unstructured data;• the identification and removal of ambiguities in such extraction procedures;• methodologies for sffeciently storing information about locations and their relationship;• development of search engines and algorithms to take advantage of such geographic information;• the combination of geographic and contextual relevance to give a meaningful combined relevance to documents; and• techniques to allow the user to interact with and explore the results of queries to a geographically-aware IR system.