GitHub - uniglot/korean-word-ipa-dictionary: Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition) (original) (raw)

1. Getting List of Word Entries

From the latest Kowiktionary dump, I got the list of every word in main namespace. After getting this list, I filtered out all entries which are not written in Hangul, and stored Korean word entries in the file kodict_entry.txt.

2. Crawling

By running crawl.py simultaneously on 11 subsets of kodict_entry.txt, which consist of 6000 words (except the last one), I extracted IPA information, forming a word-IPA dictionary for Korean language. After the crawling processes are all completed, I appended the results in alphabetical order, and deleted entries with no extracted IPA.

3. Converting IPA to X-SAMPA

From any word-IPA dictionary files, you can convert it to word-X-SAMPA dictionary.

from convert import Converter

conv = Converter() conv.subst_dict()

4. Licenses

You can make use of the results of scripts (i.e., .dict files and kodict_entry.txt file) under CC BY-SA. You can use the scripts under MIT License.