A rationale for the TEI recommendations for feature-structure markup (original) (raw)


Abstract In this paper, we address the issue of how to annotate discontinuous elements in XML. We will take discontinuous multiwords as a case study to investigate different annotation possibilities, in the framework of the linguistic annotation of the MEANING Italian Corpus.

This work presents the development and implementation of a full morphological analyzer for Basque, an agglutinative language. Several problems (phrase structure inside word-forms, noun ellipsis, multiplicity of values for the same feature and the use of complex linguistic representations) have forced us to go beyond the morphological segmentation of words, and to include an extra module that performs a full morphosyntactic parsing of each word-form. A unification-based word-level grammar has been defined for that purpose. ...

This paper focusses on the types of questions that are raised in the encoding of historical documents. Using the example of a 17th century Scottish Sasine, the authors show how TEI-based encoding can produce a text which will be of major value to a variety of future historical researchers. Firstly, they show how to produce a machine-readable transcription which would be comprehensible to a word-processor as a text stream filled with print and formatting instructions; to a text analysis package as compilation of named text segments of some known structure; and to a statistical package as a set of observations each of which comprises a number of defined and named variables. Secondly, they make provision for a machine-readable transcription where the encoder's research agenda and assumptions are reversible or alterable by secondary analysts who will have access to a maximum amount of information contained in the original source.

Mankind has been using, in a long history of preserving and storing knowledge, a wide range of media for this purpose. Over the centuries, clay tablets, papyri, parchment, paper 5 and, most recently, all kinds of discs have been used to store and therefore preserve information which was, at their times, supposed to be worthy of preservation. Reference works have always been an integral part of this development (cf. McArthur, 1988). Nowadays we are facing another media revolution with a period of synchronous use of so ...

This paper describes the evolution of a lexical resource project for Nxaʔamxcin, an endangered Salish language, from the project’s inception in the 1990s, based on legacy materials recorded in the 1960s and 1970s, to its current form as an online database that is transformable into various print and web-based formats for varying uses. We illustrate how we are using TEI P5 for data-encoding and archiving and show that TEI is a mature, reliable, flexible standard which is a valuable tool for lexical and morphological markup and for the production of lexical resources. Lexical resource creation, as is the case with language documentation and description more generally, benefits from portability and thus from conformance to standards (Bird and Simons 2003, Thieberger 2011). This paper therefore also discusses standards-harmonization, focusing on our attempt to achieve interoperability in format and terminology between our database and standards proposed for LMF, RELISH and GOLD. We show...

This work presents the development and implementation of a full morphological analyzer for Basque, an agglutinative language. Several problems (phrase structure inside word-forms, noun ellipsis, multiplicity of values for the same feature and the use of complex linguistic representations) have forced us to go beyond the morphological segmentation of words, and to include an extra module that performs a full morphosyntactic parsing of each word-form. A unification-based word-level grammar has been defined for that purpose. The system has been integrated into a general environment for the automatic processing of corpora, using TEI-conformant SGML feature structures.