Preprint: Handwritten Text Recognition (HTR) of Historical Documents as a Shared Task for Archivists, Computer Scientists and Humanities Scholars. The Model of a Transcription & Recognition Platform (TRP) (original) (raw)

Transforming scholarship in the archives through handwritten text recognition

Journal of Documentation

Purpose An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues. Design/methodology/approach This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material. Findings Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified. Research limitations/implications The pap...

Handwritten text recognition for historical documents in the transcriptorium project

Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage - DATeCH '14, 2014

Transcription of historical handwritten documents is a crucial problem for making easier the access to these documents to the general public. Currently, huge amount of historical handwritten documents are being made available by on-line portals worldwide. It is not realistic to obtain the transcription of these documents manually, and therefore automatic techniques has to be used. tranScriptorium is a project that aims at researching on modern Handwritten Text Recognition (HTR) technology for transcribing historical handwritten documents. The HTR technology used in tranScriptorium is based on models that are learnt automatically from examples. This HTR technology has been used on a Dutch collection from 15th century selected for the tranScriptorium project. This paper provides preliminary HTR results on this Dutch collection that are very encouraging, taken into account that minimal resources have been deployed to develop the transcription system.

Transkribus: Handwritten Text Recognition technology for historical documents

2017

Transkribus is a platform for the automated recognition, transcription and searching of handwritten historical documents. Transkribus is part of the EUfunded Recognition and Enrichment of Archival Documents (READ) project. The core mission of the READ project is to make archival material more accessible through the development and dissemination of Handwritten Text Recognition (HTR) and other cuttingedge technologies. The workshop is aimed at researchers and students who are interested in the transcription, searching and publishing of historical documents. It will introduce participants to the technology behind the READ project and demonstrate the Transkribus transcription platform. Our team has already conducted 30 similar workshops over the course of 2016, including several sessions with digital humanities scholars and students. Transkribus can be freely downloaded from the Transkribus website. Participants will be instructed to create a Transkribus account and install Transkribus ...

H2020 Project READ (Recognition and Enrichment of Archival Documents) - 2016-2019

The overall objective of READ is to implement a Virtual Research Environment where archivists, humanities scholars, computer scientists and volunteers are collaborating with the ultimate goal of boosting research, innovation, development and usage of cutting edge technology for the automated recognition, transcription, indexing and enrichment of handwritten archival documents. This Virtual Research Environment will gather thirteen research organisations and archives and make the technology available to the public in various forms: Open Access, Open Research Data, Open Source. A protoype implementation can be explored at the Transkribus website.

Transforming scholarship in the archives through handwritten text recognition Transkribus as a case study

Journal of Documentation, 2018

An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.

Handwritten Text Recognition for Ancient Documents

Journal of Machine Learning Research, 2010

Huge amounts of legacy documents are being published by on-line digital libraries world wide. However, for these raw digital images to be really useful, they need to be transcribed into a textual electronic format that would allow unrestricted indexing, browsing and querying. In some cases, adequate transcriptions of the handwritten text images are already available. In this work three systems are presented to deal with this sort of documents. The first two address two different approaches for semi-automatic transcription of document images. The third system implements an alignment method to find mappings between word images of a handwritten document and their respective words in its given transcription.

tranScriptorium (a european project on handwritten text recognition)

Proceedings of the 2013 Acm Symposium on Document Engineering, 2013

The tranScriptorium project aims to develop innovative, efficient and cost-effective solutions for annotating handwritten historical documents using modern, holistic Handwritten Text Recognition (HTR) technology. Three actions are planned in tranScriptorium: i) improve basic image preprocessing and holistic HTR techniques; ii) develop novel indexing and keyword searching approaches; and iii) capitalize on new, user-friendly interactive-predictive HTR approaches for computer-assisted operation.

Evaluation of an Automated Program for the Transcription of Handwritten Historic Documents (SOP Grant)

2020

The Libraries' project team, in partnership with the University of Florida’s initiative Storms of the Past, Stories for the Future: The Atsena Otie Archaeological Project, requests funding to evaluate an open-source platform, Transkribus, designed to assist with the automated recognition, transcription, and searching of digitized, handwritten, historic texts. The platform will be evaluated for efficacy and accuracy in its ability to generate transcriptions as well as metadata. Documents preselected from the Atsena Otie project will be transcribed and uploaded with metadata as a result of this project. Additional project tangibles include an assessment report, workshop, and a user-friendly manual.

anyOCR: An Open-Source OCR System for Historical Archives

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017

Currently an intensive amount of research is going on in the field of digitizing historical Archives for converting scanned page images into searchable full text. anyOCR is a new OCR system which mainly emphasize the techniques requires for digitizing a historical archive with high accuracy. It is an open-source system for the research community who can be easily applied the anyOCR system for digitization of a historical archive. The anyOCR system can also be used for contemporary document images containing diverse, simple to complex, layouts. This paper describes the current state of the anyOCR system, its architecture, as well as its major features. The anyOCR system supports a complete document processing pipeline, which includes layout analysis, training OCR models and text line prediction, with an addition of fast and interactive layout and OCR error corrections web-based services.