A word segmentation system for handling space omission problem in Urdu script (original) (raw)
2010, 23rd International Conference on Computational Linguistics
Word Segmentation is the foremost obligatory task in almost all the NLP applications, where the initial phase requires tokenization of input into words. Like other Asian languages such as Chinese, Thai and Myanmar, Urdu also faces word segmentation challenges. Though the Urdu word segmentation problem is not as severe as the other Asian language, since space is used for word delimitation, but the space is not consistently used, which gives rise to both space omission and space insertion errors in Urdu. In this ...