Towards effective strategies for monolingual and bilingual information retrieval: Lessons learned from NTCIR-4 (original) (raw)

IJERT-Improve Cross Language Information Retrieval with Pseudo-Relevance Feedback

International Journal of Engineering Research and Technology (IJERT), 2015

https://www.ijert.org/improve-cross-language-information-retrieval-with-pseudo-relevance-feedback https://www.ijert.org/research/improve-cross-language-information-retrieval-with-pseudo-relevance-feedback-IJERTV4IS060530.pdf In dictionary-based Cross-language Information Retrieval systems, structured query translation has been shown to be an useful method for improving system performance. In this paper, we examine the effects of using pseudo relevance feedback to refine the structured query in the target language. We propose different methods for term weighting based on word distributions and the mutual information between expanded terms and original query terms. Our experimental results in a dictionary-based Vietnamese-English CLIR system show that while changing query terms weights has effects on improving precision, query expansion improves recall rates. The combination of these two techniques helps to improve system performance up to 12%, in terms of Mean Average Precision.

Improve Cross Language Information Retrieval with Pseudo-Relevance Feedback

FAIR - NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN 2015, 2016

In dictionary-based Cross-language Information Retrieval systems, structured query translation has been shown to be an useful method for improving system performance. In this paper, we examine the effects of using pseudo relevance feedback to refine the structured query in the target language. We propose different methods for term weighting based on word distributions and the mutual information between expanded terms and original query terms. Our experimental results in a dictionary-based Vietnamese-English CLIR system show that while changing query terms weights has effects on improving precision, query expansion improves recall rates. The combination of these two techniques helps to improve system performance up to 12%, in terms of Mean Average Precision.

Enhancing query translation with relevance feedback in translingual information retrieval

2011

As an effective technique for improving retrieval effectiveness, relevance feedback (RF) has been widely studied in both monolingual and translingual information retrieval (TLIR). The studies of RF in TLIR have been focused on query expansion (QE), in which queries are reformulated before and/or after they are translated. However, RF in TLIR actually not only can help select better query terms,

Experiments with Monolingual, Bilingual, and Robust Retrieval

Lecture Notes in Computer Science, 2007

For our participation in the CLEF 2006 campaign, our first objective was to propose and evaluate a decompounding algorithm and a more aggressive stemmer for the Hungarian language. Our second objective was to obtain a better picture of the relative merit of various search engines for the French, Portuguese/Brazilian and Bulgarian languages. To achieve this we evaluated the test-collections using the Okapi approach, some of the models derived from the Divergence from Randomness (DFR) family and a language model (LM), as well as two vector-processing approaches. In the bilingual track, we evaluated the effectiveness of various machine translation systems for a query submitted in English and automatically translated into the French and Portuguese languages. After blind query expansion, the MAP achieved by the best single MT system was around 95% for the corresponding monolingual search when French was the target language, or 83% with Portuguese. Finally, in the robust retrieval task we investigated various techniques in order to improve the retrieval performance of difficult topics.

Exeter at CLEF 2002: Experiments with Machine Translation for Monolingual and Bilingual Retrieval

Lecture Notes in Computer Science, 2003

This year, the University of Exeter participated in both the CLEF 2002 monolingual and bilingual task for two languages: Italian and Spanish. We submitted 4 ranked results each for both Italian and Spanish Monolingual tasks and 5 each for the bilingual tasks. We report experimental results from our investigations of merging topic translations from two machine translation (MT) systems and recent experimental results for query expansion and term weighting from alternative collections. Our results show that although, query expansion and term weighting from a pilot collection has been shown to be effective in improving retrieval performance in information retrieval, the performance can be affected negatively if the lexicon of the pilot and the test collection differ.

S.: UniNE at CLEF 2006: Experiments with Monolingual, Bilingual, Domain-Specific and Robust Retrieval

2015

For our participation in this CLEF evaluation campaign, the first objective was to propose and evaluate various indexing and search strategies for the Hungarian language in order to produce better retrieval effectiveness than language-independent approach (n-gram). Using both a new stemmer including some derivational suffixes removals, and a more aggressive automatic decompounding scheme, we were able to produce better retrieval effectiveness than corresponding 4-gram indexing scheme. Our second objective was to obtain a better picture of the relative merit of various search engines with the French, Brazilian/Portuguese and Bulgarian languages. To do so we evaluated these test-collections using the Okapi, Divergence from Randomness (DFR) and language model (LM) models together with nine vector-processing approaches. After pseudo-relevance feedback, either the DFR or the LM approach tends to produce the best IR performance. For the Bulgarian language, we also found that word-based in...

Toward a unified retrieval outcome analysis framework for cross-language information retrieval

Proceedings of the American Society for Information Science and Technology, 2006

This paper proposes a Retrieval Outcome Analysis Framework, or ROA Framework, to systematically evaluate retrieval performance of Cross-Language Information Retrieval systems. The ROA framework goes beyond TREC-type retrieval evaluation methodology by including procedures focusing on individual queries, especially difficult queries. The framework is comprised of four interrelated components: (1) Overall System Performance Evaluation, (2) Query Categorization, (3) Translation Analysis, and (4) Individual Query Analysis. An example of applying the framework is discussed in detail. The author believes the proposed framework would be especially useful for the development of realworld Cross-Language Information Retrieval systems because the evaluation guided by the framework has the potential to discover causes behind poor retrieval performance.

UniNE at CLEF 2006: Experiments with Monolingual, Bilingual, Domain-Specific and Robust Retrieval

2006

For our participation in this CLEF evaluation campaign, the first objective was to propose and evaluate various indexing and search strategies for the Hungarian language in order to produce better retrieval effectiveness than language-independent approach (n-gram). Using both a new stemmer including some derivational suffixes removals, and a more aggressive automatic decompounding scheme, we were able to produce better retrieval effectiveness than corresponding 4-gram indexing scheme. Our second objective was to obtain a better picture of the relative merit of various search engines with the French, Brazilian/Portuguese and Bulgarian languages. To do so we evaluated these test-collections using the Okapi, Divergence from Randomness (DFR) and language model (LM) models together with nine vector-processing approaches. After pseudorelevance feedback, either the DFR or the LM approach tends to produce the best IR performance. For the Bulgarian language, we also found that word-based ind...

Applying multiple characteristics and techniques to obtain high levels of performance in information retrieval

2004

Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson's 2-Poisson model and Rocchio's formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita's method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the shortest terms. This allows us to use both compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms that have been statistically shown to be related to the top-ranked documents obtained in the first retrieval. We also use a numerical term, QIDF, which is an IDF term for queries. QIDF decreases the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot determine stop words. We also use web-based unknown word translation for bilingual information retrieval. We participated in two monolingual information retrieval tasks (Korean and Japanese) and five bilingual information retrieval tasks (Chinese-Japanese, English-Japanese, Japanese-Korean, Korean-Japanese, and English-Korean) at NTCIR-6. We obtained good results in all the tasks.