Using morphological analyzer to statistical POS Tagging on Persian Text (original) (raw)

Due to the growing number of textual resources available in digital form, the ability of understanding and processing them automatically has recently become critical. The first fundamental step in understanding these resources is the ability to identify the parts-of-speech of each given token or a word in the sentence in order to disambiguate them. Parts-of-speech (POS) tagging is one of the tools for understanding and processing of natural language and it is of infrastructural stages in some speech and text processing applications. Several methods have been presented for POS tagging that each one has been applied in taggers in order to achieve to a high performance and accuracy. Statistical methods have been of primary techniques and have acquired the most successful results in the field of natural language processing in recent years. This success also has been used in other areas of natural language and is very popular. One of the most important issues in POS tagging systems is identifying unknown words. In this paper, for identifying unknown words we have used morphological analyzer. Before the tagging, the words are checked morphologically and appropriate tag is assigned to the word, and thereby the overall accuracy is increased by using morphological analyzer. We have used 5-Fold cross validation technique for evaluating proposed tagger. Regarding to the obtained results of experiments, the use of text pre-processing and morphological analyzer in the proposed POS Tagger is very effective and demonstrates the performance of the POS Tagging system.