Use of Computer and Corpus Tols in the Research of a 19th Century German -Language Manuscript Bok of Notes and Extracts (original) (raw)
Related papers
Words in Contexts. Digital Editions of Literary Journals in the" AAC-Austrian Academy Corpus
2007
In this paper two highly innovative digital editions will be presented. For the creation and the implementation of these editions the latest developments within corpus research have been taken into account. The digital editions of the historical literary journals "Die Fackel" (published by Karl Kraus in Vienna from 1899 to 1936) and "Der Brenner" (published by Ludwig Ficker in Innsbruck from 1910 to 1954) have been developed within the corpus research framework of the "AAC-Austrian Academy Corpus" at the Austrian Academy of Sciences in collaboration with other researchers and programmers in the AAC from Vienna together with the graphic designer Anne Burdick from Los Angeles. For the creation of these scholarly digital editions the AAC edition philosophy and edition principles have been applied whereby new corpus research methods have been made use of for questions of computational philology and textual studies in a digital environment. The examples of the digital online editions of the literary journals "Die Fackel" and "Der Brenner" will give insights into the potentials and the benefits of making corpus research methods and techniques available for scholarly research into language and literature.
Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches discusses the possibilities offered by collaboration between classical studies and digital resources, in order to explore what could be the future of digital humanities. The digital revolution has changed the approach to research, especially for humanities. In the last decades the process of image scanning, transcription and creation of digital archives of text has materialized, but many scholars aren't conscious of the results it can achieve for their studies. For that reason, Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches aims to show the possibilities given by computer-assisted methods in the analysis of ancient and medieval codices. In particular, there is a focus on the combination and comparison of data, which can lead
Digital Editions for Corpus Linguistics: Representing manuscript reality in electronic corpora
Editing involves making decisions which are practical on the surface, but have underlying hermeneutic and theoretical implications (cf. e.g. Machan 1994: 2-5). When the aim is to create digital editions which encode a wide range of manuscript-related phenomena into standardised XML markup, the challenge to editorial principles is significant. The issue is further complicated by the heterogeneous target audience: historians and linguists can have widely differing assumptions about what constitutes data and how it should be presented. Consequently, it is necessary to outline the underlying theoretical orientations of the DECL project, and to place them in the context of theory and bibliographical practice within the field.
06491 Summary -- Digital Historical Corpora Architecture, Annotation, and Retrieval
2006
The seminar brought together scholars from (historical) linguistics, (historical) philology, computational linguistics and computer science who work with collections of historical texts. These texts or digital libraries or corpora 1 are collected for a number of different purposes such as lexicography, history, linguistics, philology etc. This, naturally, leads to different decisions in their design and architecture. However, there are many issues that are common to many projects working with historical texts. These include: Standards and methods of digitization: historical texts have to be digitized from different sources. Sometimes it is necessary to digitize directly from a manuscript or early print. In these cases it is not possible to use current OCR technology, and the texts have to be double keyed (for example according to the standards developed in the Kompetenzzentrum Retrodigitalisierung in Trier). Newer texts can sometimes be scanned and OCRed, although even the relatively 'clean' 19 th century newspaper texts are often problematic. Fraktur and some other scripts (e.g. old Cyrillic scripts) also pose problems for OCR. For some research questions it is possible to work with editions. In these cases the digitization itself is not an issue (if the editions are new). It has to be decided, however, how to deal with a critical apparatus. Design (composition) of corpora: While literary scholars often work on one text (or a small number of related texts), many research questions in linguistics and lexicography require a collection of several texts. Corpus design is, of course, always an issue in corpus construction. Ideally a matrix of the necessary parameters (text type, author, time etc.) is constructed and all 'cells' are filled with the appropriate texts. For older time periods this is often not possible since the texts might not have survived. A 'skewed' corpus, of course, only permits certain research questions. Standards and methods of annotation: For many research questions it is not sufficient to have the 'naked' text. The texts need to be annotated with further information. The texts need (a) header annotation (information about the whole text), (b) positional annotation (annotation for each token), and (c) structural annotation. The Text Encoding Initiative and other groups have developed suggestions for historical texts (the most detailed suggestions pertain to the header annotation). Annotation often cannot be done automatically since older texts are less standardized than newer texts -it is difficult to develop statistical or rule-based methods. It is necessary to discuss possible automation. It is also necessary to develop good annotation tools for manual or semi-automatic annotation. Corpus architecture: Most large modern corpora are stored in some table or tree format. Such architectures might not be the best option for historical corpora since they cannot accommodate conflicting annotation. Therefore one has to think about alternatives like multi-layer models or database models. 1 Henceforth we will speak of corpora even though some of the text collections would not be considered corpora by some scholars.
The Electronic Corpus of 17th- and 18th-century Polish Texts
Language Resources and Evaluation, 2021
The paper describes the process of building the electronic corpus of 17th- and 18th-century Polish texts, a relatively large, balanced, structurally and morphologically annotated resource of the Middle Polish language, available for searching at https://www.korba.edu.pl. The corpus consists of samples extracted from over seven hundred texts written and published between 1601 and 1772, summing up to a total size of 13.5 million tokens which makes it one of the largest historical corpora for a Slavic language.
Proceedings of Digital Humanities Conference, 14.07.2023, Graz, Austria, 2023, 110–111, 2023
AI-supported indexing of handwritten dialect lexis: The pilot study "DWA Austria" as a case study.
Digital Approaches to Historical Semantics: new research directions at Frankfurt University
Storicamente, 2015
?), led by Professor Bernhard Jussen (Goethe Universität Frankfurt-Historisches Seminar), has been funded by the Gottfried Wilhelm Leibniz Award (2008-2014). It aims at contributing to the history of political ideas by introducing computer-assisted methods of textual analysis. The project employs corpus-linguistic methods developed by research in modern history and applies them to medieval texts, by making use of
A geometrical approach to literary text analysis
It has been often noticed that computer based literary critics is still relying on studies of concordances as traditionally intended since the 13th century. All the intermediate digital representations (storage, indexes, data structures or records) are not capitalized although they can play the role of a new literary "monster" (like the Cheiron centaur) as a new meaningful, artistic and hermeneutic macro unit. It is indeed true that the digital representation, its metadata and its digital derivatives (e.g. indexes, parse trees, semantic references to external dictionaries) are new and more complex forms of "concordances" and should be used by the literary scholar in cooperation with the original content. New processes of narrative analysis should thus take all of this into account by exploiting the fruitful interactions among the parts of the monster within suitable software architectures (that are thus more complex than digital archives/catalogs). In the Natural...
The paper investigates the potentials of applying the model for the automatic recognition of (Russian) Church Slavonic manuscripts to Serbian medieval manuscripts written in various types of Cyrillic scripts by employing the Transkribus software platform. The analysis has shown: (a) that the use of the existing generic model for the recognition of Church Slavonic manuscripts can yield rather good results when applied to Serbian medieval manuscripts written in uncial or semiuncial script, (b) that the manuscripts written in the cursive script require the creation of a separate model, and (c) that the creation of a generic model within the Transkribus platform for the Serbian medieval manuscripts would make the process of digitization substantially faster, which in turn would lead to faster realization of tasks within the existing projects related to Serbian historical corpus linguistics and lexicography .