ទស្សនៈ​៖ ហេតុអ្វី​កម្ពុជា​គួរ​ជំរុញ​«​ការស្រាវជ្រាវ​»​ឱ្យបាន​ខ្លាំងក្លា​ជាទីបំផុត​? (original) (raw)

ទស្សនៈ​៖ ហេតុអ្វី​កម្ពុជា​គួរ​ជំរុញ​«​ការស្រាវជ្រាវ​»​ឱ្យបាន​ខ្លាំងក្លា​ជាទីបំផុត​?

ការស្រាវជ្រាវ​គឺជា​ដំណើរការ ឬ​ជា​ជំហាន​បន្តបន្ទាប់​គ្នា​ដែលមាន​គោលបំណង ប្រមូល​និង​វិភាគ​ព័ត៌មាន​ដើម្បី​ទទួលបាន​ចម្លើយ​ចំពោះ​សំណួរ​ជាក់លាក់​ណាមួយ​។​

PHONOLOGICAL PRINCIPLES AND AUTOMATIC PHONEMIC AND PHONETIC TRANSCRIPTION OF KHMER WORDS

This thesis explores using phonological principles in Khmer to build a model which can automatically transduce orthographic native Khmer words into a phonemic transcription and a close phonetic transcription. The approach chosen for this research involves two processes: (1) converting the orthographic words into phonemic transcription which represents careful speech, and (2) converting the phonemic transcription to the phonetic transcription which represents casual speech. Three datasets are created to manually train the model as well as test it, and two Thrax grammars were written to fulfill the two processes. Dataset 01 is a list of manually selected 140 words which covers most spelling and pronunciation cases in native Khmer words. Dataset 02 and Dataset 03 serve as testing dataset. Dataset 02 is a list of 7,654 words drawn from the official Khmer monolingual dictionary published in 1967. Phonemic and phonetic transcriptions of each word in Datasets 01 and 02 are manually created based on existing phonological principles/regularities postulated by previous scholars. Dataset 03 is a list of 6,492 words drawn from the CambodianEnglish dictionary together with their phonemic transcription done by Robert Headley published in 1997. Ruby was used to do the data preparation for Dataset 02 and Dataset 03 because it is easier and quicker to manipulate the data using simple syntax and regular expression. The first Thrax grammar was written and validated to successfully convert the orthographic native words into phonemic transcription. This process involves mapping a Khmer character to an IPA character, determining syllable parsing, and implementing phonological regularities. The second Thrax grammar was written to convert the phonemic transcription into phonetic transcription by implementing some known phonological regularities such as: schwa and aspiration transition in the initial consonant clusters, pre-syllable reduction in disyllabic words, and others. The results show that the error generated by the first Thrax grammar is reduced from 97.86% (using simple grapheme to phoneme mapping) to 2.14% (using the mapping and phonological regularities) on Dataset 01. On Datasets 02 and 03 the average error rates are 0.89% and 2.87% respectively. Most transcriptions errors stem from exceptional and irregular cases where the orthographic words are pronounced differently from their spelling. All error cases are fully documented.

This document is currently being converted. Please check back in a few minutes.