Chuvash and Linguistic Documentation (original) (raw)
(2014) Collaborative Language Documentation: the Construction of the Huastec Corpus
Proceedings CCURL 2014. Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era. Workshop in the 9th Language Resources and Evaluation Conference (LREC 2014), p.67-70, 2014
In this paper, we describe the design and functioning of a web-based platform called Nenek, which aims to be an on-going language documentation project for the Huastec language. In Nenek, speakers, linguistic associations, government instances and researchers work together to construct a centralized repository of materials about the Huastec language. Nenek not only organizes different types of contents in repositories, it also uses this information to create online tools such as a searchable database with documents on Huastec language and culture, E-dictionaries and spell checkers. Nenek is also a monolingual social network in which users discuss contents on the platform. Until now, the speakers have created a monolingual E-dictionary and we have initiated an on-going process of the construction of a repository of written texts in the Huastec language. In this context, we have been able to localize and digitally archive documents in other formats (audios, videos, images), yet the retrieval, creation, storage, and documentation of this type of materials is still in a preliminary phase. In this presentation, we want to present the general methodology of the project.
on Language Documentation and Linguistic Theory 2. London: SOAS. or
2015
© 2009 The Authors No part of this publication may be reproduced, stored in a retrieval system, or transmitted, on any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the author(s) of that part of the publication, except as permitted by UK copyright law. ISBN: 978-0-7286-0392-9 Printed in the United Kingdom Hans Rausing Endangered Languages Project
Language documentation in comparative Turkic linguistics (Eds. Éva Á. Csató, Birsel Karakoç)
Language documentation in comparative Turkic linguistics, 2024
This volume contains original analyses of thirteen, mostly endangered, Turkic languages, and as such is a valuable contribution not only to the field of Turkic studies but also to the general field of comparative linguistics. The book is a rich source of data for dialects otherwise not readily accessible for specialists in the structure of Turkic languages. For typologists looking for the linguistic intricacies of agglutinative languages, the book provides a wide spectrum of structural features from varieties of a large number of Turkic languages and as such is an invaluable data mine analyzed and presented with the insight of the experts in the field... (A. SUMRU ÖZSOY)..........This is a welcome book on documentation in the Turkic language family. It advocates for standardization of text representations and descriptions based on the scholarly tradition in Turkic linguistics. The previously unpublished texts vary considerably in style, gender, and register. The accompanying audio recordings are accessible online, which is something quite new in linguistic publications. Different branches of Turkic are represented in the volume, e.g. endangered languages such as Southwest Karaim, Bayat Turkic in Iraq, Golan Turkic in Syria, and Yellow Uyghur in China, all with a very weak status. This type of new data from field work is essential both for historical linguistics and for the description of the Turkic language type...(LARS JOHANSON).......... This very interesting volume presents new materials and analysis of 13 Turkic languages, most of which are endangered, ranging from Western Europe to China. It will contribute to Turkic synchronic and diachronic studies, and be a nice source for language typology, and historical and theoretical linguistics. The use of a uniform transcription and morphological representation in all chapters makes them more readily comparable and easier to use. The availability of downloadable sound files for the analysed texts is a plus. The editors are to be congratulated for putting together such a diverse and useful collection...(PETER AUSTIN)
Language documentation meets language technology
The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which use similar data and technical frameworks and are carried out collaboratively in Uppsala, Tromsø, Syktyvkar and Freiburg. Our projects record and annotate spoken language data in order to provide comprehensive speech corpora as databases for future research on and for these endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Ultimately, the multimodal corpora created by our projects will be useful for scientifically significant quantitative investigations on these languages in the future.
(2018) Hamburg Corpora for Indigenous Northern Eurasian Languages
Tomsk Journal of Linguistics and Anthropology, 2018
The long-term INEL project (2016–2033), carried out at the University of Hamburg, aims to develop digital linguistic corpora and supporting infrastructure for a number of selected languages of Northern Eurasia. At present, corpora of Selkup, Kamas and Dolgan are being created. The project builds upon existing materials from various archive sources, including the Selkup archive of Angelina I. Kuzmina preserved at the University of Hamburg, Kamas audio recordings from the archives in Tartu and Helsinki, and Dolgan recordings provided by the House of the Cultures of Taimyr Peninsula. All the texts in the corpora are provided with a phonological transcription, morphological interlinear glossing, free translations; selected subsets also bear additional annotations for semantic and syntactic features, information status of referents, borrowings and code-switching. The corpora are intended for typologically aware grammatical research but may also be of interest for a wider audience. A number of satellite information resources are also being developed, contributing towards a more efficient research infrastructure.
An Annotated Bibliography of Language Documentation
Since the development of language documentation as a separate sub-field of Linguistics is relatively new, there are only a few reference works that deal with theoretical and practical issues. Gippert et al. 2006 covers definitional concepts, and the practicalities of data collection, analysis and archiving. Many of the authors are researchers associated with the DOBES (Documentation of Endangered Languages) program funded by the Volkswagen Foundation. Chapters vary in complexity but most will be useful for beginning researchers. A Spanish translation of the volume is available. Gippert et al. 2006 is critically reviewed by Evans 2008, who argues that the approach it takes, which excludes grammar writing, is detrimental to the field. Austin 2010 is a series of lectures from the 3L Summer School 2009 and is aimed at beginning students. Grenoble and Furbee 2010 originated in discussions at a series of meetings of concerned researchers in 2004-2006, and a conference at Harvard University in 2005. It addresses praxis and values in documentation, measures of documentary adequacy, technologies, collaboration models, and training needs. Its audience is more advanced practitioners. Austin and Sallabank 2011 deals with a wide range of endangered languages issues and is intended for students; Part II and Part IV of the book have seven chapters on language documentation. The edited series Language Documentation and Description, published since 2003 by the Hans Rausing Endangered Languages Project at SOAS, University of London, contains articles on language documentation theory and practice, mostly arising from workshops organized by the project. Austin, Peter K. Language Documentation and Description, Volume 7. London : SOAS, 2010.
Pragmatic Description of Linguistic Units in English and Uzbek
CERN European Organization for Nuclear Research - Zenodo, 2022
This article discusses the linguistic units in English and Uzbek and their pragmalinguistic aspects. The pragmatic features of linguistic units can serve as a sign of a clearer understanding of their meaning in the text. Differences in the use of pragmatics in English and Uzbek, their specific features are analyzed.