Unicode - der unbekannte Weltstandard (original) (raw)
Related papers
Character Encoding of Classical Languages
De Gruyter eBooks, 2019
Underlying any processing and analysis of texts is the need to represent the individual characters that make up those texts. For the first few decades, scholars pioneering digital classical philology had to adopt various workaround for dealing with the various scripts of historical languages on systems that were never intended for anything but English. The Unicode Standard addresses many of the issues with character encoding across the world's writing systems, including those used by historical languages, but its practical use in digital classical philology is not without challenges. This chapter will start with a conceptual overview of character coding systems and the Unicode Standard in particular but will discuss practical issues relating to the input, interchange, processing and display of classical texts. As well as providing guidelines for interoperability in text representation, various aspects of text processing at the character level will be covered including normalisation, search, regular expressions, collation, and alignment.
Editorial Introduction to Issue 11 of the Journal of the Text Encoding Initiative
Journal of the Text Encoding Initiative, 2019
This issue illustrates the variety of research that is being done within the TEI community. The call for papers was issued by the Program Committee without limitation to a specic topic. Participants submitted proposals reecting their wide range of interests, disciplines, and elds of research, opening a huge range of possible approaches. 3 The open call brought almost two hundred participants to Vienna to enjoy a rich conference program 1 comprising thirty-eight papers, seventeen posters, ve workshops, and four demos, which are described in the Book of Abstracts. 2 The conference opened with a keynote address, and continued in seventeen plenary or parallel sessions, a poster session, and four special interest group meetings. As we write these introductory lines-at a time when conference attendance and travel have been limited by a global health crisis-we think back to the conference's social
Character encoding in corpus construction.
2005
This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.
Proceedings of the LREC 2018 Workshop “CCURL 2018 – Sustaining knowledge diversity in the digital age”, 2018
Across the world’s languages and cultures, most writing systems predate the use of computers. In the early years of ICT, standards and protocols for encoding and rendering the majority of the world’s writing systems were not in place. The opportunity to deploy less commonly used orthographies in cross-platform digital contexts has steadily increased since Unicode became the most widely used encoding on the web in late 2007 (Davis, 2008). But what happens to resources that were developed before Unicode standards became widespread? While many tools have been created to address this problem and other issues related to transliteration and character level substitutions, this paper describes the process undertaken for the Indigenous and endangered Heiltsuk (Wakashan) language, and outlines a tool (Convertextract) that was designed to convert not only plain text, but also Microsoft Office (pptx, xlsx, docx) documents with the goals of updating and upgrading pre-existing digital textual resources to Unicode standards, and thus preserving the knowledge they contain for both the present and the future.
Journal of the Text Encoding Initiative, Issue 2 | 2012
2016
It is incumbent upon libraries holding Arabic manuscripts to provide access to digitized surrogates of their holdings. Users require access both by authority list and by content. Thus, an exhaustive cataloguing method is essential. The TEI P5 Manuscript Description module is a suitable tool for manuscript cataloguing, but it lacks certain features that would allow for exhaustive description of ancient Arabic manuscripts. In this article we make several suggestions that would augment the TEI P5 Manuscript Description module allowing for a richer and more accurate description and cataloging of ancient Arabic manuscripts.
Computerization of Local Language Characters
International Journal of Advanced Computer Science and Applications
The objective of this study is to provide innovative model for the approach of language preservation. It is necessary to maintain indigenous languages in order to avoid language death. Script applications for indigenous languages are one of the solutions being pursued. This script program will facilitate communication through writing between speakers of indigenous languages. Additionally, the study illustrates the implementation of the Lontara script (Bugis-Makassar local language letters and characters). This script application is compatible with the Microsoft Windows operating system and the Hypertext Transfer Protocol (HTTP). This study employed the research and development (R&D) approach. Six stages are followed in this R & D study: 1) doing a requirements analysis to determine the viability of Bugis-Makassar indigenous languages in everyday life and also to determine ways to retain them 2) designing and constructing Lontara scripts with hypertext-based applications, 3) producing Lontara scripts with hypertext-based applications, and 4) validating the hypertext-based applications through oneto-one testing, small and large group testing. 5) Lontara application revision; and 6) Lontara application as a finished product. This product is designed to be used in conjunction with other interactive applications.