A Brief History of Archiving in Language Documentation, With an Annotated Bibliography (original) (raw)

The International Workshop on Language Preservation: An Experiment in Text Collection and Language Technology

2013

With hundreds of endangered and under-documented languages, Papua New Guinea presents an enormous challenge to the documentary linguistics community. This article reports on a workshop held at the University of Goroka in May and June of 2012. The workshop aimed to collect written texts and their translations for several languages, while building local capacity through hands-on training, and improving our understanding of the appropriate use of technology. The majority of participants were mother tongue speakers who seek to preserve their languages through the preparation of written language resources.

How usable are digital collections for endangered languages? A review

Proceedings of the Linguistic Society of America, 2022

Here, we report on pilot research on the extent to which language collections in digital linguistic archives are discoverable, accessible, and usable for linguistic research. Using a test case of common tasks in phonetic and phonological documentation, we evaluate a small random sample of collections and find substantial, striking problems in all domains. Of the original 20 collections, only six had digitized audio files with associated transcripts (preferably phrase-aligned). That is, only 30% of the collections in our sample were even potentially suitable for any type of phonetic work (regardless of quality of recording). Information about the contents of the collection was usually discoverable, though there was variation in the types of information that could be easily searched for in the collection. Though eventually three collections were aligned, only one collection was successfully forcealigned from the archival materials without substantial intervention. We close with recommendations for archive depositors to facilitate discoverability, accessibility, and functionality of language collections. Consistency and accuracy in file naming practices, data descriptions, and transcription practices is imperative. Providing a collection guide also helps. Including useful search terms about collection contents makes the materials more findable. Researchers need to be aware of the changes to collection structure that may result from archival uploads. Depositors need to consider how their metadata is included in collections and how items in the collection may be matched to each other and to metadata categories. Finally, if our random sample is indicative, linguistic documentation practices for future phonetic work need to change rapidly, if such work from archival collections is to be done in future.

Endangered languages documentation: from standardization to mobilization

Digital Resources for the Humanities, Cheltenham, …, 2003

Currently, the main arena for computer-based linguistic contribution towards endangered languages is in data encoding and standardization. This phase urgently needs to be complemented by a period of working out how to deliver computer-based language support to endangered language communities. Established linguistic practice has neither sufficiently documented nor strengthened endangered languages; Himmelmann (1998) identified this problem and proposed a new discipline he called documentary linguistics. Although this emerging discipline has stimulated digital projects for data standardisation, encoding, and archiving, it lacks two vital aspects: a methodology that provides role s for community members, and new genres of dissemination. Without identifying new ways to mobilize the products of documentation, documentary linguistics will remain indistinguishable from its predecessors in its ability to support language communities. 1

Applying Current Methods in Documentary Linguistics in the Documentation of Endangered Languages: A Case Study on Fieldwork in Arvanitic.

Athens Journal of Philology 2, (4): 243-254, 2015

Arvanitic is a language of Greece also called Arberichte or Arvanitika. UNESCO has classified Arvanitic as a “severely endangered language” in Greece, which is in need of documentation as it is being used by the last generation of speakers. In the case of Arvanitic in Greece, it appears more weight has been given to the process of description at the expense of documentation proper. This paper will discuss how current methods in documentary linguistics are being applied in its documentation. It will report on a field study being carried out with the last native speakers in the community of Zarakas, Laconia, Greece. The aim of the fieldwork being carried out is to create a reliable, representative, comprehensive and lasting record of the language in this specific community, in light of new developments in information, communication and media technology which can aid not only its documentation but also its archiving, processing, preservation as well as its accessibility. It places importance on collaboration with the local native speakers as well as ethics involving the speakers’ needs and rights of privacy and ownership, while at the same time giving something back to the community.

Born archival: The ebb and flow of digital documents from the field

History and Anthropology, 2011

Facilitated by an infusion of funding from philanthropic sources, descriptive linguists have been galvanized to document the world’s languages before they disappear without record. Linguists have responded to the “crisis of documentation” (Dobrin, L. M. & Berson, J. (2011), “Speakers and Language Documentation”, in The Cambridge Handbook of Endangered Languages, P. K. Austin & J. Sallabank (eds), Cambridge University Press, Cambridge, pp. 187–211) by entering into increasingly collaborative partnerships with speech communities, producing “documents” that have both local relevance and academic integrity. The growth in access to digital recording technology has meant that contemporary research initiatives on endangered languages are not only born digital, but often birthed straight into an archive. Yet heritage collections of recordings made by ethnographers and linguists in the past are ever more endangered, becoming orphaned when their collectors die or fragmented into their component parts based on the medium of documentation when they are finally archived. Drawing on fieldwork in Nepal with a community speaking an endangered Tibeto–Burman language, and reflecting on the decade I have spent directing a digital humanities research initiative—the Digital Himalaya Project—I discuss how linguists and anthropologists are collecting, protecting and connecting their data, and how technology influences their relationship to documents.

The social lives of linguistic legacy materials

Language Documentation and Description, 2021

Documentary linguistic data may be acquired not only firsthand, but by consulting materials that were produced by scholars, missionaries, speakers, and others in the past. Such linguistic legacy materials may reside in an archive or in an individual’s private collection, or they may be embedded in published literature that was created for purposes other than linguistics. In this introduction to a special issue of Language Documentation and Description, we explore some of the reasons why linguistic legacy materials, while potentially treasure troves of evidence and insight, are nevertheless challenging to use. The main challenge, we argue, is inherent in the very nature of such materials: inasmuch as they are the products of past human meaning-making activity, they are invested with the goals, knowledge, points of view, and circumstances of those who were involved in their creation. To that extent, legacy materials can be said to possess social lives that originate in the past and that continue to unfold over time as they are accessed, analyzed, or put to new uses. The articles published together here tell the “biographies” of linguistic legacy materials in particular instances, drawing lessons for all who revisit and recirculate data from the past and offering perspective for documentary linguists working now to create the legacy collections of the future.