GitHub - uniglot/korean-word-ipa-dictionary: Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition) (original) (raw)
1. Getting List of Word Entries
From the latest Kowiktionary dump, I got the list of every word in main namespace. After getting this list, I filtered out all entries which are not written in Hangul, and stored Korean word entries in the file kodict_entry.txt.
2. Crawling
By running crawl.py simultaneously on 11 subsets of kodict_entry.txt, which consist of 6000 words (except the last one), I extracted IPA information, forming a word-IPA dictionary for Korean language. After the crawling processes are all completed, I appended the results in alphabetical order, and deleted entries with no extracted IPA.
3. Converting IPA to X-SAMPA
From any word-IPA dictionary files, you can convert it to word-X-SAMPA dictionary.
from convert import Converter
conv = Converter() conv.subst_dict()
4. Licenses
You can make use of the results of scripts (i.e., .dict files and kodict_entry.txt file) under CC BY-SA. You can use the scripts under MIT License.