Mahak: A Test Collection for Evaluation of Farsi Information Retrieval Systems (original) (raw)
Related papers
An evaluation of retrieval performance using farsi text
A series of experiments has been conducted on the Farsi language in the University of Tehran during past three years. The goal of these experiments was to establish the characteristics of Farsi text with respect to different retrieval models and weight schemes. These experiments included different Fuzzy methods and Vector Space, Probablistic and N-gram models of retrieval. The results show that vectore space model with ltu.Lnu weighting outperforms other models and weighting schemes.
In this paper, we will introduce a new approach for scoring Farsi (also called Persian) documents in a Persian Search engine. This approach is based on a new stemming method for Farsi language. Our new stemming method works without any dictionary. Evaluation results show significant improvement in performance (precision/recall) of the Information Retrieval (IR) system using this stemmer. we have combine our stemming method with a mathematical scoring approach named FDS to obtain a powerful scoring policy for relevant documents in a Persian search engine.
Assessment of query reweighing, by rocchio method in farsi information retrieval
2008
Due to the lack of users knowledge of the collections used by search engines and in general retrieval systems, users can not express their information need appropriately in queries. In other words, they do not have enough experience to formulate their needs to find related documents. The idea of user's query expansion aims to help users to improve and correct the queries. In fact, retrieval system, regarding the feedback it receives from user at the first stage, moves the query in set space to more related documents. Different approaches in information retrieval systems have been used; however, there has not been any assessment of efficacy of query expansion in Farsi information retrieval systems. In this paper, expansion basic model of Rocchio, assessed as the primary model to retrieve Farsi documents, has been presented. As a matter of fact, the purpose of this study is to determine the effect of a standard and basic model on query expansion to retrieve Farsi documents, so that the researchers can compare their achievements of query expansion with the findings of this paper which showed a straightforward and positive effect on Farsi document retrieval.
Language model-based retrieval for Farsi documents
International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., 2004
Modeling techniques to the retrieval of Farsi documents. We discovered that Language Modeling improves the precision of retrieval when compared to a standard vector space model.
Experiments with English-Persian text retrieval
2008
As the number of non-English documents is increasing dramatically on the web nowadays, the study and design of information retrieval systems for these languages is very important. The Persian language is the official language of Iran, Afghanistan and Tajikistan and is also spoken in some other countries in the Middle East, so there are significant amount of Persian documents available on the web. In this study, we will present and compare our English-Persian cross language text retrieval experiments on Hamshahri text collection. Also, we will present Combinatorial Translation Probability (CTP) calculation method for query translation that estimates translation probabilities based on the collection itself.
Development of Arabic evaluations in information retrieval
International Journal of ADVANCED AND APPLIED SCIENCES
The field of information retrieval has observed noticeable growth over the past decades in reaction to the prolonged practice of the internet and the dreadful requirement of users to hunt for huge amounts of digital information. Assuming the stable intensification of Arabic e-content, brilliant information retrieval systems must be planned to uniform the nature and needs of the Arabic language. This paper shelters graceful on the present development in the field of Arabic information retrieval finds the trials that delay the development of this learning and proposes recommendations for additional research. This paper practices the imaginative analytical technique to scrutinize the genuineness of Arabic educations in the field of information retrieval and to learn the difficulties that are being confronted in this area. Especially, the earlier literature on information retrieval is reviewed by searching the connected databases and websites.
Improving Information Retrieval Results for Persian Documents using FarsNet
Cornell University - arXiv, 2018
In this paper we propose a new method for query expansion, which uses FarsNet (Persian WordNet) to find similar tokens related to the query and expand the semantic meaning of the query. For this purpose, we use synonymy relations in FarsNet and extract the related synonyms to query words. This algorithm is used to enhance information retrieval systems and improve search results. The overall evaluation of this system in comparison to baseline method (without using query expansion) shows an improvement of about 9 percent in Mean Average Precision (MAP).
Assessment of a Modern Farsi Corpus
2004
The development of Language Engineering (LE) and Information Retrieval (IR) applications requires availability of sizeable, reliable and representative corpora. This paper describes how we have constructed a well-structured 345 MB tagged corpus of news, and presents some beneficial statistics of this corpus based upon the characteristics of Farsi language. It also goes into particular detail on the fitness of the frequency and rank of Farsi words with Zipf-Mandelbrot's law. We will then present our measurement of Entropy of Farsi for this corpus.
Arabic information retrieval perspectives
… Journes d'Etude sur la Parole …, 2004
Arabic IR (Information Retrieval) has recently become a focus of research and commercial development. Very few standards for evaluation of such tools are known and available. A concrete evaluation for Arabic IR systems is necessary for the advancement of this field.