Map Text Annotation Guidelines (original) (raw)
Introduction
While there are conventions in several disciplines for annotating images and others for annotating texts, there is little overlap between the two. In these guidelines, we propose a method for annotating text on scanned map images for the purpose of collecting similarly structured datasets across one or many collections of maps.
Map text data can be collected in an annotation environment of your choice. However, on Machines Reading Maps, we work with a map-specific version of the Recogito semantic annotation platform to support data collection. The main objective is to turn text on map images into structured data.
To support the Machine Learning methods being used on Machines Reading Maps, we need gold standard annotations created manually. To maximise their efficiency and reliability, it is important that the annotations we use in our project are created consistently, and at a very high standard.
To increase transparency and replicability of our work, we are sharing the annotation guidelines that we followed. They were elaborated to meet different needs in our multidisciplinary research: to test and improve machine learning models and to produce meaningful data from maps for humanities research.
Decisions about best practices for annotation are therefore based on a combination of factors:
- Attentive to accurate transcription of historical text.
- Reuses established text annotation practices where possible.
- Accounts for computer vision task requirements.
These guidelines are not (necessarily) prescriptive. If you are interested in reusing our tools, you are free to follow a set of guidelines specific to your research goals. On the other hand, these guidelines provide an example of how to bridge the concerns of computer vision and information retrieval communities with those of the humanities or other domains. we think that sharing our annotation methodology can help other researchers to find their own approach to computational research with historical maps, while also providing documentation for how we collected gold standard data specific to our experiments and map collections.
These guidelines will follow the four main steps of the annotation process, which mirror the different computational tasks in the Machines Reading Maps method:
- creating bounding polygons,
- transcribing the text,
- linking to external knowledge bases,
- and semantic type tagging.
1. DRAWING BOUNDING POLYGONS
Text on maps appears in many different styles, sizes and orientations, making it sometimes challenging to create bounding polygons. For this reason, Recogito offers a suite of annotation tools that are designed specifically for text on maps. You can find out more about each of the tools in our wiki. In this document, we will focus on three main annotation options:
Text aligned horizontally: if the text looks fairly straight, a good option is the basic rectangular annotation. It will enable you to create a bounding rectangle quickly and easily, but it won’t be possible to change the orientation.
Text with a different orientation: if the text is still on a straight line but it has a different orientation, you can take advantage of the tilted box, which enables you to draw boxes with any orientation.
Complex shapes: These two tools will be suitable for the majority of bounding polygons on most maps. However, there will be cases in which the words on the map have been arranged in a more complex way, to follow the course of a river or a street, for example. In these cases, the best choice is to use the polygon tool that allows you to draw a bespoke bounding polygon, drawing each of its points/vertices.
In general, we recommend choosing the simplest suitable tool and avoid, for example, to use the polygon tool for very straightforward labels.
When drawing the bounding polygons around the text, it is important that all characters are visible within the polygon. Consider zooming into the map during the drawing process to be sure that you are very close to the edges of the characters/numerals, but not on top of or passing through them.
Pay special attention to longer letters (such as “f” or “p” and initials).
Overlaps: When the text on the map is particularly dense, it is likely that the polygons will slightly overlap, but that won't affect the quality of the annotation. When possible, try to avoid including any element in the bounding polygon that is not the annotated text (like icons, building outlines, or other words), although we are aware that this is not always possible.
How many polygons?
Maps feature different kinds of longer text such as compound names or additional information (distance, height, capacity, etc.). We want to be able to capture the relationship between these related components, while, at the same time, creating useful and precise data for evaluating the model.
Compound labels and grouping: for compound labels we create one separate polygon for each word (including short words, abbreviations, or digits). All components of the same label are then recorded as part of a single unit through the grouping function. You can read more about how grouping works at the end of this page of our wiki. We use the numbered group option, so that we know the order of the words in a multi-word label.
We apply the same principle when there is any other extra text attached to a place name, for example the height of a mountain or the capacity of a theatre. Each word is enclosed in a bounding polygon, and then they are all grouped together and ordered.
Punctuation: Signs such as parentheses, dots, or commas should be included with the word that they relate to. Do not annotate them separately.
Text with large spacing: Some words on maps are printed with a large gap between characters. This is usually the case for names of regions or counties that describe a large area of the map. When annotating these labels, for the purpose of evaluating and improving machine learning model performance, it is better to encase each character in a single polygon, and then group those characters together as a word. As a general rule, we annotate map text in this way when the space between characters is larger than the size of the characters themselves.
When the names with large spacing are also compound, they need a little extra element during the annotation process that is detailed in the "transcribing" section below.
2. TRANSCRIBING THE TEXT
We transcribe the text exactly as it appear on the map. That includes abbreviations, capitalisation, or name spellings that my vary from current ones. The text also needs to be transcribed completely, i.e. every single word, punctuation mark, and number.
Transcribing compound labels with large spacing
Ideally, when annotating this kind of map text, we would group the characters from each word first, and then create a higher-level group to link the words together, as we do with the regular compound labels. However, this function is not available yet in the current interface, so we need a suitable workaround. The issue is representing the space between words that disappears when all characters of a compound name are grouped together (like the space between “Saint” and “Louis” in our example). That space is crucial for the model to correctly identify the letters S A N T L O U I S and understand that it is a name made up of two separate, but linked, words. To record that space, we add a “_” sign to represent the separation between words. The sign is arbitrary, but needs to be used consistently.
Transcribing incomplete or unclear text
Especially when dealing with older maps, it is not uncommon that the text is partly obscured or hard to read. Instead of excluding these labels from our corpus, we record as much information as possible, and document the gaps using conventions for accounting for uncertainty.
Known number of characters: When a word on the map has any number of characters that are missing or illegible, we substitute each of them in the transcription with the sign “#”. So, for example, in the map that we have selected, the annotated place name is hard to read, and the first character could easily be either a "V" or a "Y". In this case we would transcribe the place name as #ISALIA. For the purpose of these guidelines, we have also obscured the second character of the place name. If we were unable to clearly detect the first two characters of the place name, this time we would transcribe it as ##SALIA. Illegible characters can be in any position of the word, not necessarily at the beginning. As long as the total number of characters is known, you can substitute any unreadable or uncertain character with the symbol #.
Unknown number of characters: In some cases, it is not possible to guess exactly how many characters compose the unclear word. When transcribing this kind of label, we represent these more generic gaps with the double square brackets signs “[ ]”.
In this second example, it is relatively easy to decipher the first part of the place name ("Tay"), but the second half looks more problematic, and it is difficult to guess how many characters follow. We would then transcribe it as "Tay[ ]".
If you feel confident in guessing what characters/numerals are hard to read (because of additional research or expert knowledge of the area), you can type your educated guess within the brackets, instead of the blank space. To go back to our example, if you believe that the place name partly obscured on the map reads Taylor's, you can transcribe the place name as Tay[lor's].
SpacingGenerally, we suggest not to add blank spaces after punctuation marks to make the transcription more legible. In case of single-word labels such as "Ch." (abbreviation of "Church") the transcription should stop with (but include) the dot. When annotating multi-word labels, the necessary separation between the different components of the label is achieved through the use of groups. So, for example, "St. Michael's Church" will be annotated as three separate polygons—"St." "Michael's" and "Church"—linked via the grouping feature. No space should be added in the transcription process. This suggestion proved functional with the corpora of maps we annotated for our project, but different maps may have different typographic conventions that require an alternative approach.
Mistakes and incorrect textTo create reliable gold standard annotations while being philologically rigorous, it is important that no corrections are made during the transcription process. We can add clarifications in the comments field of the annotation interface. Errors and other idiosyncrasies that we find on the maps are actually very valuable pieces of information that can tell us a lot about place interpretation as well as the map making process itself.
Transcribing outside the neatline
The task of transcribing everything on each map includes all the text that is outside the neatline, and somehow not part of the representation of the place. This kind of text may feature information about the cartographer, the printer, dates of publication, and so on. Text outside the neatline will be annotated following the same principles as those for the content inside the neatline. The words that make up a sentence outside the neatline don’t need to be grouped, unless they form a compound place name or a personal name: names with more than one word should be grouped.
Abbreviations
Abbreviated names need to be transcribed as they appear on the map. However, the comments field in the Recogito annotation pop-up can be used to add additional information, such as the expanded names.
3. LINKING TO EXTERNAL KNOWLEDGE BASES (KBs)
The linking process during the annotation of text on maps is still very experimental, and different approaches are required by the different nature of each KB. These guidelines will only discuss entity linking using a subset of WikiData, which is the knowledge base we have decided to test in this context. Linking to other KBs such as historical gazetteers or Open Street Map is possible, but it is not part of the gold standard for this stage of the project.
Links to WikiData: When dealing with historical maps it is important to account for the many changes that the landscape may have undergone. For this purpose we have chosen WikiData as our KB, and created a library selecting the entities categorised as "place" that fall within the boundary of our area of interest. Unlike other resources that exclusively feature very recent data, WikiData accounts, at least partially, for entities that may have disappeared, moved, or undergone transformation.
It is exactly the temporal dimension that makes some of the matches between place names on the map and wikidata not obvious and, perhaps, even more powerful. Links between past locations and modern places (especially buildings), or modern locations and past names and functions are not only considered correct but also encouraged. When WikiData shows two matches for a place on the map (for example an entry for a former military hospital and an entry for the boarding school it became after the war) both should be connected to the annotation. The statement we are making trough this link is not that the entry in WikiData perfectly identifies the entity on the map, but that there is an informative relationship between the two; a relationship that researchers may want to investigate further.
WikiData is implemented by volunteers and it is not a consistent and systematic resource. Especially when dealing with maps with very high scale, like urban maps, it is likely that only the major landmarks will have a match in WikiData. Don't feel discouraged if only a handful of annotations can be successfully linked.
4. SEMANTIC TYPE TAGGING
Transcriptions can be complemented by semantic tags, i.e. words that identify categories of interest to the annotator. (These might be types of things like streets or buildings, or they might represent more abstract concepts like graphical styles.) Creating tags in Recogito is very easy, just type a word in the “Add tag” area and then press the “enter” key to confirm. If you see the tag encased in a rectangle, a tag has been successfully added.
NB: Tags must be added separately for each word in a group.
Tags
Ideally, each annotation should state the type of map feature it identifies. The three categories that appear in the Recogito annotation interface are discussed on this section of our wiki. However, for this project we are annotating exclusively one kind of feature, i.e. labels. This initial focus makes it redundant to categorise each annotation as "label", while it would be useful if we were also annotating entities and symbols.
Each annotation should have at least two tags.
- Type of label: the first tag should state the type of label that is annotated.
- a. name: if it is a unique name identifying a named entity (like the name of a hamlet or the name of a theatre)
- b. type: if it is a generic name that indicates a type of feature, such as "hospital" or "spring".
- c. numeral: if it is a number, like a postcode or the height of a mountain.
- d. otherText: if it does not belong to any of the previous categories.
- Type of feature: the second tag should tell us something about the kind of feature that has been annotated.We have identified five macro categories, and we are in the process of gathering more data in order to refine our minimal taxonomy.
- a. area: for all county names, district names, cities, towns and hamlets. In general, this is the appropriate tag for any kind of administrative division.
- b. street: for all street and road names, but also squares, courtyards, crescents or, basically, anything that could be used as an address.
- c. building: for any built structure, rural or urban: palaces, hospitals, theatres, but also windmills or warehouses.
- d. natural: can be applied to all natural features, such as rivers, hills, mountains, creeks, bays and so on.
- e. other: for all the cases that do not belong to any of the previous four categories.
Annotations can be belong to more than one category. A label with the name of a bridge, for example, could be tagged as "street" and well as "building". Annotators could choose one or the other, depending on context, or use them both.
- Special tags: Extra tags can be added to focus on a particular research question. In our case study, we want to highlight text that does not pertain to the description of place, i.e. it's not about the object (city, state, landscape, and so on) that the map wants to depict and represent, but it is about the map itself as a cultural object. These kind of text include, for example, the name of the printer or the cartographer, indexes, page numbers, library stamps, and even handwritten annotations. To highlight this information, we use the tag "meta".