A Linked Open Data Platform for Historical Geographic Data (original) (raw)
Recogito is an open source tool for the semi-automatic annotation of place references in maps and texts. It was developed as part of the Pelagios 3 research project, which aims to build up a comprehensive directory of places referred to in early maps and geographic writing predating the year 1492. Pelagios 3 focuses specifically on sources from the Classical Latin, Greek and Byzantine periods; on Mappae Mundi and narrative texts from the European Medieval period; on Late Medieval Portolans; and on maps and texts from the early Islamic and early Chinese traditions. Since the start of the project in September 2013, the team has harvested more than 120,000 toponyms, manually verifying almost 60,000 of them. Furthermore , the team held two public annotation workshops supported through the Open Humanities Awards 2014. In these workshops, a mixed audience of students and academics of different backgrounds used Recogito to add several thousand contributions on each workshop day. A number of benefits arise out of this work: on the one hand, the digital identification of places – and the names used for them – makes the documents' contents amenable to information retrieval technology, i.e. documents become more easily search-and discoverable to users than through conventional metadata-based search alone. On the other hand, the documents are opened up to new forms of re-use. For example, it becomes possible to " map " and compare the narrative of texts, and the contents of maps with modern day tools like Web maps and GIS; or to analyze and contrast documents' geographic properties, toponymy and spatial relationships. Seen in a wider context, we argue that initiatives such as ours contribute to the growing ecosystem of the " Graph of Humanities Data " that is gathering pace in the Digital Humanities (linking data about people, places, events, canonical references, etc.), which has the potential to open up new avenues for computational and quantitative research in a variety of fields including History, Geography, Archaeology, Classics, Genealogy and Modern Languages .
Towards semi-automatic annotation of toponyms on old maps. In: e-Perimetron 9.3 (2014) pp.105–112
Present-day map digitization methods produce data that is semantically opaque; that is to a machine, a digitized map is merely a collection of bits and bytes. The area it depicts, the places it mentions, any text contained within legends or written on its margins remain unknown - unless a human appraises the image and manually adds this information to its metadata. This problem is especially severe in the case of old maps: these are typically handwritten, may contain text in varying orientations and sizes, and can be in a bad condition due to varying levels of deterioration or damage. As a result, searching for the contents of these documents remains challenging, which makes them hard to discover for users, unusable for machine processing and analysis, and thus effectively lost to many forms of public, scientific or commercial utilization. Fully automatic detection and transcription of place names and legends is, likely, not achievable with today's technology. We argue, however, that semi-automated methods can eliminate much of the tedious effort required to annotate map scans entirely by hand. In this paper, we showcase early work on semi-automatic place name annotation. In our experiment, we utilize open source tools to identify potential locations on the map representing toponyms. We present how, in next steps, we aim to extend our experiment by exploiting the spatial layout of identified candidates to deduce possible place names based on existing toponym lists. Ultimately, or goal is to combine this work with a toolset for manual image annotation into a convenient online environment. This will allow curators, researchers, and potentially also the general public “tag” and annotate toponyms on digitized maps rapidly.
Annotating Geographical Entities
This paper describes a study based on exploration of relations between geographical entities. We suggested a new tool for training and evaluation required by related annotation experiments. It relates to an annotator used for semi-automatic annotation, starting with the geography manual. We define fifteen types of entities: location, geo_position, geology, landform, clime, water, dimension, person, organization, URL, Timex, resource, industry, cultural, unknown with their specific subtypes. Moreover, we present the annotation conventions for three semantic relations: referential, structural and spatial, considered to be optimal operators in understanding a geographical manual. A part of the annotation is done manually, while the other part is done automatically, such as the token, lemma, part-of-speech. The study is intended to create a tool for the automatic detection of semantic relations in texts on geographic issues such as geography manuals, travel guides, geography atlases, etc., in order to help children, professors, guides, PR specialists and to be useful for tourists, generally to discover the complexity and the beauty of the nature.
La toponomastica è una testimonianza della “saggezza del passato”, il patrimonio culturale stratificato di una comunità. È quindi necessario preservare i toponimi come espressione di questo patrimonio con una attenta analisi critica, interdisciplinare e “globale” delle loro proprietà geografiche e della loro genesi. In quest’ottica, il Laboratorio di Cartografia e Toponomastica Storica dell’Università degli Studi di Salerno raccoglie, cataloga, analizza e utilizza scientificamente, per la ricerca e la didattica della geografia, mappe e toponimi antichi su scala nazionale ed europea. Le ricerche, finora pubblicate in diversi saggi, sono basate su un modello originale di analisi e classificazione che mira a considerare tutti i diversi metodi di indagine geo-toponomastica, in senso diacronico e sincronico, con riferimento ai temi dell’identità e della pianificazione territoriale “geo culturalmente” sostenibile. Il presente contributo propone in particolare l’applicazione delle nuove tecnologie, con la realizzazione di una geo-atlante di toponomastica basato su GIS e web semantico, in grado di interfacciarsi con altre banche dati e aperto all’interazione attraverso l’attuazione di un sistema di codici a barre (basato su Web Tag, guide virtuali e mappe georeferenziate).
2018
Bibliotheca Hertziana’s Biondo research group questions an epistemology of spaces and their changes in the early modern history. At focus are relations between historical maps and texts aiming to explore the historical understanding of spaces and the knowledge associated with it. We take up approaches from cognitive science and computational linguistics arguing that cognitive maps depict culture-specific spatial knowledge and practices. Our interdisciplinary project combines cognitive-semantic parameters such as toponyms, landmarks, spatial frames of reference, geometric relations, gestalt principles and different perspectives with computational and cognitive linguistic analysis. Using new text and map markup and corpus-specific quantitative methods, historical geographical texts are processed and reinterpreted. Long-term research questions are: Which forms of knowledge represent spatial relations? How can spatial transformation processes be represented and analyzed? What is the con...
Semantically geo-annotating an ancient Greek "travel guide" Itineraries, Chronotopes, Networks, and Linked Data. In Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities (GeoHumanities'20). , 2020
Pausanias's second-century CE Periegesis Hellados presents a tenvolume grand tour of the Greek mainland. After the post-enlightenment rediscovery of ancient Greek literature, his Description of Greece proved highly influential as a guidebook to Greece's antiquities, directing travellers and archaeologists alike to uncovering and interpreting major sites, notably at Athens, Corinth and Olympia. Recent studies focusing on his Description as a narrative, however, have drawn attention to the textual construction of space, and the different ways in which space and place are conceptualised and related to each other. This paper outlines the initial work of the Digital Periegesis project, which is using semantic geo-annotation to capture and analyse the forms of space within and the spatial form of this narrative. In particular, it discusses the challenges and affordances of using geo-parsing, spatio-temporal analysis, network analysis, and Linked Open Data (LOD) for rethinking the geographies of a non-modern literary text as based more on topological connections than topographic proximity.
Towards semi-automatic annotation of toponyms on old maps
Present-day digitization methods produce data that is semantically opaque; that is to a machine, a digitized map is merely a collection of bits and bytes. The area it depicts, the places it mentions, any text contained within legends or written on its margins remain unknown - unless a human appraises the image and manually adds this information to its metadata. This problem is especially severe in the case of old maps: these are typically handwritten, may contain text in varying orientations and sizes, and can be in a bad condition due to varying levels of deterioration or damage. As a result, searching for the contents of these documents remains challenging, which makes them hard to discover for users, unusable for matching processing and analysis, and thus effectively lost to many forms of public, scientific or commercial utilization. Fully automatic detection and transcription of place names and legends is, likely, not achievable with today’s technology. We argue, however, that s...
Defining and identifying the roles of geographic references within text
Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references -, 2003
Reliably recognizing, disambiguating, normalizing, storing, and displaying geographic names poses many challenges. However, associating each name with a geographical point location cannot be the final stage. We also need to understand each name's role within the document, and its association with adjacent text. The paper develops these points through a discussion of two different types of historical texts, both rich in geographic names: descriptive gazetteer entries and travellers' narratives. It concludes by discussing the limitations of existing mark-up systems in this area.
2009 International Multiconference on Computer Science and Information Technology, 2009
The Language and Location: Map Annotation Project (LL-MAP) has been funded by the US National Science Foundation to build a database of linguistic information integrated into a Web-based geographical information system. LL-MAP embodies several innovative concepts of computational linguistics, such as spatial data engine driven architecture, dynamic joining of linguistic information with related cultural and geographic data, multi-layered and linked visualization, real time online data harvesting, collaborative toolboxes for linguistic studies, quick search of digital gazetteers, and toponymical analysis. This paper will demonstrate these LL-MAP functions and discuss their disciplinary implications in linguistic studies.
2013
The emergence of the Semantic Web and its underlying knowledge technologies has brought changes in data han- dling. Transferring expert knowledge to machines through knowledge formalization provides us the required support in managing huge datasets like the information in the World Wide Web. In the field of geospatial technology semantic technologies not only entail the capability to achieve higher degree of data integration but also infer semantics to discover new and hidden knowledge. This is of particular interest in the field of archaeology, where complex interrelations among heterogeneous datasets exist. Although researches on seman- tics are active areas in geospatial communities, their initial use is mainly for spatial data integration. This article tries to go one step further and imply semantics for spatial knowl- edge discovery through spatial built-ins within SWRL and SPARQL. The work resembles the approach of the Open Geo- spatial Consortium (OGC) to define standards for...
A Survey of Textual Data & Geospatial Technology
Springer eBooks, 2020
The geographic realm can be viewed as a three-dimensional space projected onto the ellipsoid that represents planet Earth. For navigation purposes, this space has been projected down to two dimensions to create maps for centuries, and human communications and actions have been made more precise by using a grid of coordinates, latitude and longitude, to uniquely and exactly identify any point location on our planet of origin. But latitude/longitude pairs are not the first or only way to communication about locations: human communication has used language to name and describe places and how to get there, before a grid coordinate system was conceived, and referring by name ("New York") or description ("the green hill") remain more popular usage for human-to-human communication than grid references: people name the most relevant locations they inhabit by assigning words to them (toponyms) by convention, and then use these to collaborate (e.g. to instruct another human how to reach a place using navigation instructions). In this chapter, we discuss how these two ways, the numeric, precise but less human-friendly way to reference locations can be linked with our primary means of communication, languages like English and others, through automatic means, and we explore what application uses are enabled now this is possible. The remainder of this chapter is structured as follows. Section 16.2 disects the notion of location from a different perspectives and poses a list of research questions that we may ask when looking at the domain where geographic space and textual data intersect. Section 16.3 describes some data structures for spatial indexing, which permit fast computational operations. Section 16.4 describes
Premodern Geographical Description: Data Retrieval and Identification
Geographical and spatial descriptions in the premodern world are structurally different from the modern era, where spatial understanding is based on cartographic navigation. This paper presents an experimental process to tag, retrieve, and identify geographical information as described in premodern primary sources, together with the issues and possible solutions. The proposed method defines specific categories of geographical information and a markdown system to mark these categories in the source. Having tagged the data, we extract it and geographical locations and their connections are identified through a heuristic approach: the extracted geographical entities are initially aligned with existing geographical references and secondary sources. String similarity approaches might provide fuzzy identifications which need to be verified and disambiguated. In this paper, we describe the process of annotation and extraction of geographical descriptions, experiment some toponyms matching metrics, report the results, and offer possible solutions to handle disambiguation through the existing contextual information in the source. The process is applied to two different datasets, proposed as test cases: a classical Arabic geographical text and a Roman itinerary.
Geospatial semantics is a broad field that involves a variety of research areas. The term semantics refers to the meaning of things, and is in contrast with the term syntactics. Accordingly, studies on geospatial semantics usually focus on understanding the meaning of geographic entities as well as their counterparts in the cognitive and digital world, such as cognitive geographic concepts and digital gazetteers. Geospatial semantics can also facilitate the design of geographic information systems (GIS) by enhancing the interoperability of distributed systems and developing more intelligent interfaces for user interactions. During the past years, a lot of research has been conducted, approaching geospatial semantics from different perspectives, using a variety of methods, and targeting different problems. Meanwhile, the arrival of big geo data, especially the large amount of unstructured text data on the Web, and the fast development of natural language processing methods enable new research directions in geospatial semantics. This chapter, therefore, provides a systematic review on the existing geospatial semantic research. Six major research areas are identified and discussed, including semantic interoperability, digital gazetteers, geographic information retrieval, geospatial Semantic Web, place semantics, and cognitive geographic concepts.
The two authors of this paper (a linguist and a historian) work together in a research team on synchronic toponymy (Lidile EA 3874) : the aim of this common work is the elaboration of a multilingual database (Dinopro) and of a systematic linguistic description of toponyms in the eleven languages covered by the team. The results of our research (see Löfström & Schnabel-Le Corre (in press) or Löfström & Schnabel-Le Corre 2005) made us consider the insertion of linguistic data on a cartographic medium, and therefore examine the implications of the insertion of toponyms in maps. The results of our analysis inspired this paper, where we thus will be treating toponym localization, naming processes and inscription principles with a historical perspective and on the basis of linguistic criteria. Our contribution can be looked upon as the point of view of an enlightened map reader. By means of applying two scientific approaches to some examples of toponyms on maps, we wish to highlight a number of implications that are normally forgotten by users and sometimes not explicitly stated by producers. Thanks to these observations, we will be able to show cases in which cartographic conventions and techniques can collide with the formal and semantic complexities carried by toponyms. Linguistic specificities appear more clearly in a contrastive perspective: we will therefore adopt a multilingual approach in our discussion concerning the constraints for toponym selection. We will otherwise proceed by keeping in mind the particular role of toponyms in cartographic representations, omnipresent although always complementary -you can hardly imagine a map intended only to represent the toponyms of a given territory, except from certain pedagogical applications -: the toponyms selected for a given map always depend on the overall aims of this map, as well as on its scale. Toponyms, although essential in a map, are hardly ever the main object and purpose of a cartographic document. We will first discuss the implications of toponym insertion in maps through a series of historical examples and then proceed to the linguistic analysis of toponym structure and of its role for the inscription on a cartographic medium. Toponyms and maps: historical notes Historians of cartography normally study the conventions which shape the production of maps, their task being not only the analysis of the finished document, but also of the process that made this document exist, circulate and survive. The techniques available, the social expectations, the particular object of the document, the shared practices, the aim of the activity are all part of this process and therefore the object of the historian.When toponyms are inserted in cartography they are inserted in a knot of techniques, scientific practices and social conventions which are subject to change. The question for the historian is how toponyms interact with this changing basis? What are the issues of the interaction? We isolate three interdependent issues of the insertion of toponyms in maps: to locate, to produce, to name. To locate. The process of location of the toponym depends on two preliminary choices: a decision has to be taken about the toponyms which will appear on the final document, and the ones which will have to be excluded, then the selected toponyms have to be adequately placed. The techniques of representation of the relief have direct influence over these choices. Whenever objects in frontal projection are used to figure heights, like the stylized mountains in this 18th century map of Liguria (Blaeu 16..), the room available for toponyms becomes rare.
Representing places in texts: a spatial investigation into Agathemerus
IJHAC: International Journal of Humanities and Arts Computing. 15.1-2, 2021
This article presents a case study for the digital mapping of an ancient Greek geographical compendium, the Sketch of Geography by Agathemerus. We examine various possibilities of investigation, including semantic annotation, georeferencing and network analysis, to verify how the digital mapping of a text can contribute to a better understanding of its underlying spatial perception. We examine the following aspects: spatial distribution, functionality and frequency of place types, semantic/symbolic definition of boundaries, place connectivity and problems of textual corruption. In the conclusion, we show that, while the general perspective of the work is programmatically speculative, Agathemerus' way of modelling the world is navigational and pragmatic. A predominantly non-cartographic perspective dictates a way of reasoning about space that is highly semantical in the definition of important landmarks and spatial relations. However, it also determines a strongly navigational approach in the treatment of geographical problems. Finally, we emphasize the value of an integrated semantic and mapped approach to the investigation of premodern geographies, and the opportunities of using these methods to address old and new research questions.
Annotation of toponyms in TEI digital literary editions and linking to the web of data
Matlit, 2016
This paper aims to discuss the challenges and benefits of the annotation of place names in literary texts and literary criticism. We shall first highlight the problems of encoding spatial information in digital editions using the TEI format by means of two manual annotation experiments and the discussion of various cases. This will lead to the question of how to use existing semantic web resources to complement and enrich toponym mark-up, in particular to provide mentions with precise geo-referencing. Finally the automatic annotation of a large corpus will show the potential of visualizing places from texts, by illustrating an analysis of the evolution of literary life from the spatial and geographical point of view.DOI: http://dx.doi.org/10.14195/2182-8830\_4-2\_3
Revisiting Linking Early Geospatial Documents with Recogito
2019
Recogito is a web-based environment for collaborative semantic annotation. It is open source software, and provides support for working with either text or image documents, including those served via the IIIF protocol. Originally, the tool has been designed for geographic annotation, i.e. the transcription, marking up and geo-resolving of maps and geographical texts (such as itineraries and travel reports) in the context of historical scholarship, e.g. to map or extract data from a source, or to prepare a digital edition. Over time, however, Recogito’s feature set has grown to provide more general annotation functionality, broadening the scope for further potential application areas. Following up from an earlier article we published in e-Perimetron in 2015, in which we first introduced Recogito, this article looks back on the past four years of use and development. We present how Recogito has technologically evolved; how it has been applied in practice in different projects and for ...