Applying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval at NTCIR-4 (original) (raw)
Related papers
Applying multiple characteristics and techniques in the NICT information retrieval system in NTCIR-5
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robert-son's 2-Poisson model and Rocchio's formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita's method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the short-est terms. This allows us to use both compound and single-word terms. The statistical test used in expand-ing queries through an automatic feedback process is described. The method gives us terms that have been statistically shown to be related to the top-ranked doc-uments obtained in the first retrieval. We also use a numerical term, QIDF, which is an IDF term for queries. QIDF decreases the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot determine...
Applying Multiple Characteristics and Techniques in the NICT Information Retrieval System at NTCIR-6
2004
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson's 2-Poisson model and Rocchio's formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita's method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the shortest terms. This allows us to use both compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms that have been statistically shown to be related to the top-ranked documents obtained in the first retrieval. We also use a numerical term, QIDF, which is an IDF term for queries. QIDF decreases the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot determine stop words. We also use web-based unknown word translation for bilingual information retrieval. We participated in two monolingual information retrieval tasks (Korean and Japanese) and five bilingual information retrieval tasks (Chinese-Japanese, English-Japanese, Japanese-Korean, Korean-Japanese, and English-Korean) at NTCIR-6. We obtained good results in all the tasks.
2004
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson's 2-Poisson model and Rocchio's formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita's method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the shortest terms. This allows us to use both compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms that have been statistically shown to be related to the top-ranked documents obtained in the first retrieval. We also use a numerical term, QIDF, which is an IDF term for queries. QIDF decreases the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot determine stop words. We also use web-based unknown word translation for bilingual information retrieval. We participated in two monolingual information retrieval tasks (Korean and Japanese) and five bilingual information retrieval tasks (Chinese-Japanese, English-Japanese, Japanese-Korean, Korean-Japanese, and English-Korean) at NTCIR-6. We obtained good results in all the tasks.
Information retrieval: an overview of system characteristics
International Journal of Medical Informatics, 1997
The paper gives an overview of characteristics of information retrieval (IR) systems. The characteristics are identified from the descriptions of 23 IR systems. Four IR models are discussed: the Boolean model, the vector model, the probabilistic model and the connectionistic model. Twelve other characteristics of IR models are identified: search intermediary, domain knowledge, relevance feedback, natural language interface, graphical query language, conceptual queries, full-text IR, field searching, fuzzy queries, hyptertext integration, machine learning, and ranked output. Finally, the relevance of IR systems for the World Wide Web is established. © 1997 Elsevier Science B.V.
Information retrieval approaches
International journal of electrical and computer engineering systems, 2022
The area of information retrieval (IR) has taken on increasing importance in recent years. This field is now of interest to large communities in several application domains (security, medicine, aeronautics, etc.). IR studies find relevant information from the semi-unstructured type of data. As the information resources generated after the search can be extensive in quantity and different in quality, it is essential to rank these results according to the degree of relevance. This paper focuses on text information retrieval (TIR) and emphasizes the importance of each IR approach. This study presents insightful aspects of TIR and provides a comparative study between some proposed approaches and models. Each model offers IR advantages and suffers from several limitations.
Review of Information Retrieval models
International Journal of Research and Engineering, 2017
A large number of information of all the domains are available online in the form of hyper text in web pages. Peoples from different disciplines are consulting different web sites to fetch information according to their need. It is very difficult to remember the names of the websites for a specific domain for which the user wants to search. So a search is a system which mines information from the world wide web and present it to the user according to its query. Information retrieval system(IRs) works for search engine arranges the web documents systematically and retrieves the result according to the user query. In this paper we discuss the widely used information retrieval models, their evaluation parameters and application.
" Trends and issues in Modern Information Retrieval "
– This paper attempts to present an overview of the modern information retrieval models. Firstly, the classical IR models have been discussed and then their improved versions commonly called as alternative or modern information retrieval models have been briefly presented. In the end the main issues and challenges in modern information retrieval techniques have been summarized.
Improving the Effectiveness of Information Retrieval System
American Scientific Research Journal for Engineering, Technology, and Sciences, 2016
With the rapid growth of information and easy access of information, in particular the boom of the World Wide Web, the problem of finding useful information and knowledge becomes one of the most important topics in information and computer science. Information Retrieval (IR) systems, also called text retrieval systems, facilitate users to retrieve information which is relevant or close to their information needs. This research provides an effective IR system for retrieving not only relevant but also related documents. For retrieving relevant documents, Probabilistic Model is applied. For retrieving related documents, the related indexed table is built including extracted keywords and related documents lists. In constructing related index table in the database, Shannon’s entropy difference between intrinsic and extrinsic mode is used to extract the highly significant keywords. Entropy threshold value was assigned to 0.5 of normalized entropy difference square ( ) according to the an...