Universities of Alicante and Jaen at iCLEF (original) (raw)

Interactive Cross-Language Document Selection

Information Retrieval, 2004

The problem of finding documents written in a language that the searcher cannot read is perhaps the most challenging application of cross-language information retrieval technology. In interactive applications, that task involves at least two steps: (1) the machine locates promising documents in a collection that is larger than the searcher could scan, and (2) the searcher recognizes documents relevant to their intended use from among those nominated by the machine. This article presents the results of experiments designed to explore three techniques for supporting interactive relevance assessment: (1) full machine translation, (2) rapid term-by-term translation, and (3) focused phrase translation. Machine translation was found to better support this task than term-by-term translation, and focused phrase translation further improved recall without an adverse effect on precision. The article concludes with an assessment of the strengths and weaknesses of the evaluation framework used in this study and some remarks on implications of these results for future evaluation campaigns.

iCLEF 2003 at Maryland: Translation Selection and Document Selection

2003

Maryland performed two sets of experiments for the 2003 Cross-Language Evaluation Forum’s interactive track, one focused on interactive selection of appropriate translations for query terms, the second focused on interactive selection of relevant documents. Translation selection was supported using possible synonyms discovered through back translation and two techniques for generating KeyWord In Context (KWIC) examples of usage. The results indicate that searchers typically achieved a similar search effectiveness using fewer query iterations when interactive translation selection was available. For document selection, a complete extract of the first 40 words of each news story was compared to a compressed extract generated using an automated parse-and-trim approach that approximates one way in which people can produce headlines. The results indicate that compressed “headlines” result in faster assessment, but with a 20% relative reduction in the F α= 0.8 search effectiveness measure.

A Novel Method for Cross-Language Retrieval of Chunks Using Monolingual and Bilingual Corpora

Communications in Computer and Information Science, 2010

Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different.

Passage Retrieval vs. Document Retrieval in the CLEF 2006 Ad Hoc Monolingual Tasks with the IR-n System

Lecture Notes in Computer Science, 2007

This paper discusses the impact of resource-driven stemming in information retrieval tasks. We conducted experiments in order to identify the relative benefit of various stemming strategies in a language with highly complex morphology. The results reveal the importance of various aspects of stemming in enhancing system performance in the IR task of the CLEF ad-hoc monolingual Hungarian track.

Document Expansion for Cross-Lingual Passage Retrieval

2010

This article describes the participation of the joint Elhuyar-IXA group in the ResPubliQA exercise at QA&CLEF 2010. In particular, we participated in the English–English monolingual task and in the Basque– English cross-lingual one. Our focus was threefold: (1) to check to what extent information retrieval (IR) can achieve good results in passage retrieval without question analysis and answer validation, (2) to check dictionary techniques for Basque to English retrieval when faced with the lack of parallel corpora for Basque in this domain, and (3) to check the contribution of semantic relatedness based on WordNet to expand the passages to related words. Our results show that IR provides good results in the monolingual task, that our performance drop in the cross-lingual system was much greater than in previous CLIR experiments, and that expansion improves the results in the monolingual task.

Corpus-based CLIR in retrieval of highly relevant documents

Journal of The American Society for Information Science and Technology, 2000

IR systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collec- tions, such as the WWW. Our aim was to find out how corpus-based CLIR manages in retrieving highly relevant documents. We created a Finnish- Swedish comparable corpus and used it as a source of knowledge for query translation. Finnish

A Review on Text Similarity Technique used in IR and its Application

International Journal of Computer Applications, 2015

With large number of documents on the web, there is a increasing need to be able to retrieve the best relevant document. There are different techniques through which we can retrieve most relevant document from the large corpus. Similarity between words, sentences, paragraphs and documents is an important component in various tasks such as information retrieval, document clustering, word-sense disambiguation, automatic essay scoring, short answer grading, machine translation and text summarization. Text similarity means user's query text is matched with the document text and on the basis on this matching user retrieves the most relevant documents. Text similarity also plays an important role in the categorization of text as well as document. We can measure the similarity between sentences, words, paragraphs and documents to categorize them in an efficient way. On the basis of this categorization, we can retrieve the best relevant document corresponding to user's query. This paper describes different types of similarity like lexical similarity, semantic similarity etc.

Using a Passage Retrieval System to Support Question Answering Process

Lecture Notes in Computer Science, 2002

Previous works in Information Retrieval show that using pieces of text obtain better results than using the whole document as the basic unit to compare with the user's query. This kind of IR systems is usually called Passage Retrieval (PR). This paper discusses the use of our PR system in the question answering process (QA). Our main objective is to examine if a PR system provide a better support for QA than Information Retrieval systems based in the whole document..

Arbitrary Passage Retrieval Based on Fixed-Length Window using Arabic Documents

2009

Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of different types of documents represent shortcomings in current approaches toward ranking schemes. The use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings. Passage ranking provides suitable units of text, to be returned to the user, can avoid the difficulties of comparing documents of different lengths, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. This paper proposes a new method of passage retrieval called fixed length arbitrary passage retrieval for Arabic documents. This method has been discussed, implemented, and evaluated. The experiment results show that ranking with fixed arbitrary passage gives substantial improvements in retrieval effectiveness over traditional document ranking s...