Automated Tagging System And Tagset Design For Arabic Text (original) (raw)

This paper presents diacritics rule-based part-of-speech (POS) tagger which automatically tags a partially vocalized Arabic text. The aim is to remove ambiguity and to enable accurate fast automated tagging system. A tagset is being designed in support of this system. Tagset design is at an early stage of research related to automatic morphosyntactic annotation in Arabic language. Preliminary results of the tagset design have been reported in this paper. Arabic language has a valuable and important feature, called diacritics, which are marks placed over and below the letters of Arabic word. This feature plays a great role in adding linguistic attributes to Arabic words and in indicating pronunciation and grammatical function of the words. This feature enriches the language syntactically while removing a great deal of morphological and semantically ambiguities.

Sign up for access to the world's latest research.

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact