Endangered Languages and Cultures (original) (raw)
In PARADISEC we store media files with their transcriptions whenever possible, typically in .eaf format, created by the standard transcription tool Elan. Best practice in language documentation includes creating a corpus of media with transcripts so that others can access it in future and locate what is in the files. Untranscribed files remain largely inaccessible, … Read more
Following the earlier discussion of creating collections for offline delivery (particularly on Raspberry Pi), we now have a simple method that indexes a set of items from the PARADISEC collection and generates an html view, which means that files are not disconnected from the catalog in the way they were in the past. To do … Read more
This week we launched a new way of viewing the PARADISEC catalog. We have been working towards this over the past two years, basing the approach to displaying our collections and items on the PILARS principles developed by the Language Data Commons of Australia. We can now separate the original catalog from the view of … Read more
Following on from the work reported on two posts ago with progress on speech recognition for Bislama and Nafsan, Aso Mahmudi has now created a desktop app (called Easper – Elan Automated Speech Recognition) that takes a wav file as input, segments it, does speaker diarisation, and transcribes it, delivering an Elan file as the … Read more
Of the 7,000 languages in the world today, most have little presence on the internet. What records there are in these languages are often religious translations from large languages and so, while being a valuable set of texts in the language, they have little local cultural content. For some of these languages, there are recordings … Read more
Some cassette collections we receive for digitisation are in good shape and digitisation is a relatively smooth process. Often though, this is not the case. Some collections we are currently receiving were recorded over 50 years ago, have often been sitting for years in less than ideal conditions for maintaining their quality, e.g. high humidity, … Read more
This week I received a set of six collections from Masayuki Onishi. Three were from his fieldwork, mainly in Bougainville (with Baitsi, Naasioi, and Motuna (Siwai)), two were his reworking of Douglas Oliver’s records dating back to the 1930s, one in a range of languages, and another on Siwai , and the sixth was from … Read more
In 2024, I initiated the “ENB Digitisation and Preservation Project” aiming to collect old analogue tape recordings from my community in East New Britain in Papua New Guinea. As a staff member at PARADISEC, community members were approaching me to let me know they had recordings of church choir songs, gospel songs, choral music, string … Read more
We wrote about dried out cassette tapes in an earlier blog post, and the problem they create for playback, screeching as they try to move through the playback machine’s mechanism and ultimately failing to play. You can hear an audio example in that post. To get the tapes into a playable form, they need to … Read more
PARADISEC contains, at an informed guess, in the tens-of-thousands of pages of handwritten notes relating to the languages and cultures of the Pacific region. Many of those pages pertain directly to audio-visual media also housed in the archive, such as audio or video files, and the pages might include transcriptions, translations, explanations, notes, etc, of … Read more