Arabic Natural Language Processing for Information Retrieval (original) (raw)

2004

Abstract

Human Language Technologyhas played a big role in implementing Latin based information retrieval systems. Two of the most sited techniques are stemming and truncation. Numerous studies have showed that the inflectional structure of words has a big impact on the retrieval accuracy of Latin-based languages information retrieval systems (IRS). Stemming or truncation is done for two principal reasons: the reduction in index storage required and the increase in performance due to the use of word variants. Several stemming algorithms were proposed for stemming text such as Porter for English. While these studies were concerned with Latin-based languages, only few studies give attention to the Arabic language. This paper we present a study of the Arabic language characteristics that can be useful to integrate in an information retrieval system and the kind of stemming techniques that can be used for the Arabic language. We used the .ae domain as a case study. We present some characteristic...

Harmain Harmain hasn't uploaded this paper.

Let Harmain know you want this paper to be uploaded.

Ask for this paper to be uploaded.