Combining Documentation And Research: Ongoing Work On An Endangered Language (original) (raw)

Documenting and researching endangered languages: the Pangloss Collection

The Pangloss Collection is a language archive developed since 1994 at the Langues et Civilisations à Tradition Orale (LACITO) research group of the French Centre National de la Recherche Scientiique (CNRS). It contributes to the documentation and study of the world's languages by providing free access to documents of connected, spontaneous speech, mostly in endangered or under-resourced languages, recorded in their cultural context and transcribed in consultation with native speakers. The Collection is an Open Archive containing media iles (recordings), text annotations, and metadata; it currently contains over 1,400 recordings in 70 languages, including more than 400 transcribed and annotated documents. The annotations consist of transcription, free translation in English, French and/or other languages, and, in many cases, word or morpheme glosses; they are time-aligned with the recordings, usually at the utterance level. A web interface makes these annotations accessible online in an interlinear display format, in synchrony with the sound, using any standard browser. The structure of the XML documents makes them accessible to searching and indexing, always preserving the links to the recordings. Longterm preservation is guaranteed through a partnership with a digital archive. A guiding principle of the Pangloss Collection is that a close association between documentation and research is highly proitable to both. This article presents the collections currently available; it also aims to convey a sense of the range of possibilities they offer to the scientiic and speaker communities and to the general public.

Endangered languages documentation: from standardization to mobilization

Digital Resources for the Humanities, Cheltenham, …, 2003

Currently, the main arena for computer-based linguistic contribution towards endangered languages is in data encoding and standardization. This phase urgently needs to be complemented by a period of working out how to deliver computer-based language support to endangered language communities. Established linguistic practice has neither sufficiently documented nor strengthened endangered languages; Himmelmann (1998) identified this problem and proposed a new discipline he called documentary linguistics. Although this emerging discipline has stimulated digital projects for data standardisation, encoding, and archiving, it lacks two vital aspects: a methodology that provides role s for community members, and new genres of dissemination. Without identifying new ways to mobilize the products of documentation, documentary linguistics will remain indistinguishable from its predecessors in its ability to support language communities. 1

The Cambridge Handbook of Endangered Languages

2009

It is generally agreed that about 7,000 languages are spoken across the world today and at least half may no longer be spoken by the end of this century. This state-of-the-art Handbook examines the reasons behind this dramatic loss of linguistic diversity, why it matters, and what can be done to document and support endangered languages. The volume is relevant not only to researchers in language endangerment, language shift and language death, but to anyone interested in the languages and cultures of the world. It is accessible to both specialists and non-specialists: researchers will find cutting-edge contributions from acknowledged experts in their fields, while students, activists and other interested readers will find a wealth of readable, yet thorough and up-to-date, information. The Handbook covers the essentials of language documentation and archiving, and also includes hands-on chapters on advocacy and support for endangered languages, development of writing systems for previously unwritten languages, education, training the next generation of researchers and activists, dictionary making, the ecology of languages, language and culture, language and society, language policy, and harnessing technology and new media in support of endangered languages.

Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

2017

These proceedings contain the papers presented at the 2nd Workshop on the Use of Computational Methods in the Study of Endangered languages held in Honolulu, March 6-7, 2017. The workshop itself was co-located and took place after the 5th International Conference on Language Documentation and Conservation (ICLDC) at the University of Hawai'i at Mānoa. As the name implies, this is the second workshop held on the topic-the previous meeting was co-located with the ACL main conference in Baltimore, Maryland in 2014. The workshop covers a wide range of topics relevant to the study and documentation of endangered languages, ranging from technical papers on working systems and applications, to reports on community activities with supporting computational components. The purpose of the workshop is to bring together computational researchers, documentary linguists, and people involved with community efforts of language documentation and revitalization to take part in both formal and informal exchanges on how to integrate rapidly evolving language processing methods and tools into efforts of language description, documentation, and revitalization. The organizers are pleased with the range of papers, many of which highlight the importance of interdisciplinary work and interaction between the various communities that the workshop is aimed towards. We received 39 submissions as long papers, short papers, or extended abstracts, of which 23 were selected for this volume (59%). In the proceedings, all papers are either short (≤5 pages) or long (≤9 pages). In addition, the workshop also features presentations from representatives of the National Science Foundation (NSF). Two panel dicussions on the topic of interaction between computational linguistics and the documentation and revitalization community as well as future planning of ComputEL underlined the demand and necessity of a workshop of this nature.

José Antonio Flores Farán and Fernando Ramallo (Eds): New Perspectives on Endangered Languages

Language Policy, 2011

In their introduction, editors Farfán and Ramallo write that the purpose of their book ''is to contribute to the debate regarding the perspectives of documenting languages,'' with language revitalization in mind (p. 7). The idea is to serve the language community being documented while also serving science in general. The editors note Lenore Grenoble's perspective in the fourth chapter that ''the more oriented towards revitalization the [language] documentation is the more effective and relevant it will be'' (p. 9). They conclude that a ''sociolinguistics of development, in which the revitalization of linguistic communities is the priority, opens new perspectives for the emerging field of linguistic documentation, in which the societal aspects of the research have frequently been marginal'' (p. 10). The ideal put forward by Grenoble in chapter 4 is of the linguist as ''facilitator and collaborator'' in well-planned documentation projects that include training and capacity building for the local community (p. 84; see also Grenoble 2009). This call for researchers to serve and give back to the Indigenous communities they study is not new, at least not in North America. In his 1969 book Custer Died for Your Sins: An Indian Manifesto, Vine Deloria, Jr. took to task anthropologists, and by implication linguists and other researchers who studied American Indians, and the issue has been roundly debated since then (see e.g., Biolsi and Zimmerman 1997). All linguists, the contributors to this volume take up a variety of issues surrounding language documentation and revitalization that have recently been discussed at length (see e.g., Kroskrity and Field 2009). The central question concerns the place of Indigenous languages in the modern world. Are they destined to become extinct one by one as is happening now, or can they find a place in our modern globalized world where only a few languages seem to be the ''fittest'' and able to survive? In the second chapter, Alexandra Y. Aikhenvald notes how Manambu, a language spoken by about 2,500 people in Papua New Guinea, is

Internet applications for endangered languages: a talking dictionary of Ainu

2011

There are an estimated 6,900 languages spoken in the world today and at least half of them are under threat of extinction. This is mainly because speakers of smaller languages are switching to other larger languages for economic, social or political reasons, or because they feel ashamed of their ancestral language. The language can thus be lost in one or two generations, often to the great regret of their descendants. Over the past ten years a new field of study called “language documentation” has developed. Language documentation is concerned with the methods, tools, and theoretical bases for compiling a representative and lasting multipurpose record of languages. It has developed in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them. It is also fueled by developments in information and media technologies which make documentation and the preservation and dissemin...

Utilizing Language Technology in the Documentation of Endangered Uralic Languages

The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future. * The order of the authors' names is alphabetical.

[Michailovsky, Mazaudon, Michaud, Guillaume, François & Adamou] Documenting and Researching Endangered Languages: The Pangloss Collection

Boyd Michailovsky, Martine Mazaudon, Alexis Michaud, Séverine Guillaume, Alexandre François & Evangelia Adamou. 2014. Documenting and Researching Endangered Languages: The Pangloss Collection. _Language Documentation & Conservation_ 8 (2014), pp.119-135., 2014

The Pangloss Collection [http://lacito.vjf.cnrs.fr/pangloss/index\_en.htm\] is a language archive developed since 1994 at the Langues et Civilisations à Tradition Orale (LACITO) research group of the French Centre National de la Recherche Scientifique (CNRS). It contributes to the documentation and study of the world’s languages by providing free access to documents of connected, spontaneous speech, mostly in endangered or under-resourced languages, recorded in their cultural context and transcribed in consultation with native speakers. The Collection is an Open Archive containing media files (recordings), text annotations, and metadata; it currently contains over 1,400 recordings in 70 languages, including more than 400 transcribed and annotated documents. The annotations consist of transcription, free translation in English, French and/or other languages, and, in many cases, word or morpheme glosses; they are time-aligned with the recordings, usually at the utterance level. A web interface makes these annotations accessible online in an interlinear display format, in synchrony with the sound, using any standard browser. The structure of the XML documents makes them accessible to searching and indexing, always preserving the links to the recordings. Long-term preservation is guaranteed through a partnership with a digital archive. A guiding principle of the Pangloss Collection is that a close association between documentation and research is highly profitable to both. This article presents the collections currently available; it also aims to convey a sense of the range of possibilities they offer to the scientific and speaker communities and to the general public