Klamer, M., Trilsbeek, P., Hoogervorst, T. & Haskett, C. (2017). ‘Creating a language archive of Insular South East Asia and West New Guinea.’ In: Odijk, J. & van Hessen, A. (eds.), CLARIN in the Low Countries, pp. 113-21. London: Ubiquity Press. (original) (raw)
Related papers
Boyd Michailovsky, Martine Mazaudon, Alexis Michaud, Séverine Guillaume, Alexandre François & Evangelia Adamou. 2014. Documenting and Researching Endangered Languages: The Pangloss Collection. _Language Documentation & Conservation_ 8 (2014), pp.119-135., 2014
The Pangloss Collection [http://lacito.vjf.cnrs.fr/pangloss/index\_en.htm\] is a language archive developed since 1994 at the Langues et Civilisations à Tradition Orale (LACITO) research group of the French Centre National de la Recherche Scientifique (CNRS). It contributes to the documentation and study of the world’s languages by providing free access to documents of connected, spontaneous speech, mostly in endangered or under-resourced languages, recorded in their cultural context and transcribed in consultation with native speakers. The Collection is an Open Archive containing media files (recordings), text annotations, and metadata; it currently contains over 1,400 recordings in 70 languages, including more than 400 transcribed and annotated documents. The annotations consist of transcription, free translation in English, French and/or other languages, and, in many cases, word or morpheme glosses; they are time-aligned with the recordings, usually at the utterance level. A web interface makes these annotations accessible online in an interlinear display format, in synchrony with the sound, using any standard browser. The structure of the XML documents makes them accessible to searching and indexing, always preserving the links to the recordings. Long-term preservation is guaranteed through a partnership with a digital archive. A guiding principle of the Pangloss Collection is that a close association between documentation and research is highly profitable to both. This article presents the collections currently available; it also aims to convey a sense of the range of possibilities they offer to the scientific and speaker communities and to the general public
Documenting and researching endangered languages: the Pangloss Collection
The Pangloss Collection is a language archive developed since 1994 at the Langues et Civilisations à Tradition Orale (LACITO) research group of the French Centre National de la Recherche Scientiique (CNRS). It contributes to the documentation and study of the world's languages by providing free access to documents of connected, spontaneous speech, mostly in endangered or under-resourced languages, recorded in their cultural context and transcribed in consultation with native speakers. The Collection is an Open Archive containing media iles (recordings), text annotations, and metadata; it currently contains over 1,400 recordings in 70 languages, including more than 400 transcribed and annotated documents. The annotations consist of transcription, free translation in English, French and/or other languages, and, in many cases, word or morpheme glosses; they are time-aligned with the recordings, usually at the utterance level. A web interface makes these annotations accessible online in an interlinear display format, in synchrony with the sound, using any standard browser. The structure of the XML documents makes them accessible to searching and indexing, always preserving the links to the recordings. Longterm preservation is guaranteed through a partnership with a digital archive. A guiding principle of the Pangloss Collection is that a close association between documentation and research is highly proitable to both. This article presents the collections currently available; it also aims to convey a sense of the range of possibilities they offer to the scientiic and speaker communities and to the general public.
2013
With hundreds of endangered and under-documented languages, Papua New Guinea presents an enormous challenge to the documentary linguistics community. This article reports on a workshop held at the University of Goroka in May and June of 2012. The workshop aimed to collect written texts and their translations for several languages, while building local capacity through hands-on training, and improving our understanding of the appropriate use of technology. The majority of participants were mother tongue speakers who seek to preserve their languages through the preparation of written language resources.
How usable are digital collections for endangered languages? A review
Proceedings of the Linguistic Society of America, 2022
Here, we report on pilot research on the extent to which language collections in digital linguistic archives are discoverable, accessible, and usable for linguistic research. Using a test case of common tasks in phonetic and phonological documentation, we evaluate a small random sample of collections and find substantial, striking problems in all domains. Of the original 20 collections, only six had digitized audio files with associated transcripts (preferably phrase-aligned). That is, only 30% of the collections in our sample were even potentially suitable for any type of phonetic work (regardless of quality of recording). Information about the contents of the collection was usually discoverable, though there was variation in the types of information that could be easily searched for in the collection. Though eventually three collections were aligned, only one collection was successfully forcealigned from the archival materials without substantial intervention. We close with recommendations for archive depositors to facilitate discoverability, accessibility, and functionality of language collections. Consistency and accuracy in file naming practices, data descriptions, and transcription practices is imperative. Providing a collection guide also helps. Including useful search terms about collection contents makes the materials more findable. Researchers need to be aware of the changes to collection structure that may result from archival uploads. Depositors need to consider how their metadata is included in collections and how items in the collection may be matched to each other and to metadata categories. Finally, if our random sample is indicative, linguistic documentation practices for future phonetic work need to change rapidly, if such work from archival collections is to be done in future.
Internet applications for endangered languages: a talking dictionary of Ainu
2011
There are an estimated 6,900 languages spoken in the world today and at least half of them are under threat of extinction. This is mainly because speakers of smaller languages are switching to other larger languages for economic, social or political reasons, or because they feel ashamed of their ancestral language. The language can thus be lost in one or two generations, often to the great regret of their descendants. Over the past ten years a new field of study called “language documentation” has developed. Language documentation is concerned with the methods, tools, and theoretical bases for compiling a representative and lasting multipurpose record of languages. It has developed in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them. It is also fueled by developments in information and media technologies which make documentation and the preservation and dissemin...
Developing a Living Archive of Aboriginal Languages
Language Documentation & Conservation, 2014
The fluctuating fortunes of Northern Territory bilingual education programs in Australian languages and English have put at risk thousands of books developed for these programs in remote schools. In an effort to preserve such a rich cultural and linguistic heritage, the Living Archive of Aboriginal Languages project is establishing an open access, online repository comprising digital versions of these materials. Using web technologies to store and access the resources makes them accessible to the communities of origin, the wider academic community, and the general public. The process of creating, populating, and implementing such an archive has posed many interesting technical, cultural and linguistic challenges, some of which are explored in this paper.
Linguistic Archives and Language Communities Questionnaire
Proceedings of the International Workshop on Digital Language Archives: LangArc 2021, 2021
Digital language archives hold vast amounts of materials in endangered or marginalised languages. However, due to limitations in technical infrastructure and the design of these archives, the materials are usually not easily accessible to speakers of the languages represented or their descendants. With the goal to establish best practices for researchers archiving linguistic data, this paper presents a questionnaire designed to assess how archival materials can be made more readily available to language communities.