A simple solution for improving the effectiveness of traditional information retrieval systems (original) (raw)
Related papers
International Journal of Data Engineering (IJDE)
In this paper information retrieval system for local databases are discussed. The approach is to search the web both semantically and syntactically. The proposal handles the search queries related to the user who is interested in the focused results regarding a product with some specific characteristics. The objective of the work will be to find and retrieve the accurate information from the available information warehouse which contains related data having common keywords. This information retrieval system can eventually be used for accessing the internet also. Accuracy in information retrieval that is achieving both high precision and recall is difficult. So both semantic and syntactic search engine are compared for information retrieval using two parameters i.e. precision and recall.
A SURVEY ON VARIOUS ARCHITECTURES, MODELS AND METHODOLOGIES FOR INFORMATION RETRIEVAL
iaeme
The typical Information Retrieval (IR) model of the search process consists of three essentials: query, documents and search results. An user looking to fulfill information need has to formulate a query usually consisting of a small set of keywords summarizing the information need. The goal of an IR system is to retrieve documents containing information which might be useful or relevant to the user. Throughout the search process there is a loss of focus, because keyword queries entered by users often do not suitably summarize their complex information needs, and IR systems do not sufficiently interpret the contents of documents leading to result lists containing irrelevant and redundant information. The short keyword query used as input to the retrieval system can be supplemented with topic categories from structured Web resources. The topic categories can be used as query context to retrieve documents that are not only relevant to the query but also belongs to a relevant topic category. Category information is especially useful for the task of entity ranking where the user is searching for a certain type of entity such as companies or persons. Category information can help to improve the search results by promoting in the ranking pages belonging to relevant topic categories, or categories similar to the relevant categories. Users may raise various queries to describe the same information need. For example, to search for National Board of Accreditation, queries “National Board of Accreditation (NBA)” or “NB Accreditation” may be formulated. Directly using individual queries to describe context cannot capture contexts concisely and accurately. Also queries may arise where “NBA” can be expanded as either “National Basketball Association” or “National Board of accreditation”. Hence it becomes extremely important to go for context based query based on the user history and present requirements of the user in that context. In this paper, an extensive survey has been made on different Architectures, Models and Methodologies that have been used in IR by various researchers along with the comparison of results against various performance metrics, also highlighting the need for context based query
Fast LSI-based techniques for query expansion in text retrieval systems
2009
Abstract. It is widely known that spectral techniques are very effective for document retrieval. Recently, a lot of effort has been spent by researchers to provide a formal mathematical explanation for this effectiveness [3]. Latent Semantic Indexing, in particular, is a text retrieval algorithm based on the spectral analysis of the occurrences of terms in text documents. Despite of its value in improving the quality of a text search, LSI has the drawback of an elevate response time, which makes it unsuitable for on-line search in large collections of documents (e.g., web search engines). In this paper we present two approaches aimed to combine the effectiveness of latent semantic analysis with the efficiency of text matching retrieval, through the technique of query expansion. We show that both approaches have relatively small computational cost and we provide experimental evidence of their ability to improve document retrieval. 1
A survey in traditional information retrieval models
2008 2nd IEEE International Conference on Digital Ecosystems and Technologies, 2008
As a matter of fact, many so-called semantic search algorithms are derived from the traditional indexterm-based search models. In this paper, we survey the traditional information retrieval models by categorizing them into three main classes and eleven subclasses, and analyse their benefits and issues of them.
A New Approach in Query Expansion Methods for Improving Information Retrieval
2021
This research develops a new approach to query expansion by integrating Association Rules (AR) and Ontology. In the proposed approach, there are several steps to expand the query, namely (1) the document retrieval step; (2) the step of query expansion using AR; (3) the step of query expansion using Ontology. In the initial step, the system retrieved the top documents via the user's initial query. Next is the initial processing step (stopword removal, POS Tagging, TF-IDF). Then do a Frequent Itemset (FI) search from the list of terms generated from the previous step using FP-Growth. The association rules search by using the results of FI. The output from the AR step expanded using Ontology. The results of the expansion with Ontology use as new queries. The dataset used is a collection of learning documents. Ten queries used for the testing, the test results are measured by three measuring devices, namely recall, precision, and f-measure. Based on testing and analysis results, in...
Improving retrieval experience exploiting semantic representation of documents
… Web Applications and …, 2008
The traditional strategy performed by Information Retrieval (IR) systems is ranked keyword search: for a given query, a list of documents, ordered by relevance, is returned. Relevance computation is primarily driven by a basic string-matching operation. To date, several attempts have been made to deviate from the traditional keyword search paradigm, often by introducing some techniques to capture word meanings in documents and queries. The general feeling is that dealing explicitly with only semantic information does not improve significantly the performance of text retrieval systems. This paper presents SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. We show how SENSE is able to manage documents indexed at three separate levels, keywords, word meanings, and entities, as well as to combine keyword search with semantic information provided by the two other indexing levels.
Performance Comparison between Keyword-based and WQCA-based Information Retrieval System
2018
Today, semantic logics are very important in query understanding to create successful web search engines. A user might not formalize the query when he seeks information although he knows what he wants. As a result, understanding the nature of the information that is needed behind the queries are important research problem. So, this system proposes the Web Query Classification Algorithm (WQCA) for efficient Information Retrieval (IR) system. In the WQCA process, this system firstly classifies the web queries into each characteristic (taxonomies). Then, this system extracts the domain terms from the query. By using NoSQL graph database, this system classifies each domain term into their relevant categories according to the WQCA algorithm. In the WQCA-based IR process, this system uses the classified query to find the relevant document form the document collection. Finally, this system compares the performance between keyword-based IR and WQCA-based IR to show the effectiveness of the ...
Improving query precision using semantic expansion
Information Processing and Management, 2007
Query Expansion (QE) is one of the most important mechanisms in the information retrieval field. A typical short Internet query will go through a process of refinement to improve its retrieval power. Most of the existing QE techniques suffer from retrieval performance degradation due to imprecise choice of query’s additive terms in the QE process. In this paper, we introduce a novel automated QE mechanism. The new expansion process is guided by the semantics relations between the original query and the expanding words, in the context of the utilized corpus. Experimental results of our “controlled” query expansion, using the Arabic TREC-10 data, show a significant enhancement of recall and precision over current existing mechanisms in the field.
Information retrieval: an overview of system characteristics
International Journal of Medical Informatics, 1997
The paper gives an overview of characteristics of information retrieval (IR) systems. The characteristics are identified from the descriptions of 23 IR systems. Four IR models are discussed: the Boolean model, the vector model, the probabilistic model and the connectionistic model. Twelve other characteristics of IR models are identified: search intermediary, domain knowledge, relevance feedback, natural language interface, graphical query language, conceptual queries, full-text IR, field searching, fuzzy queries, hyptertext integration, machine learning, and ranked output. Finally, the relevance of IR systems for the World Wide Web is established. © 1997 Elsevier Science B.V.
Sabio: Soft agent for extended information retrieval
Applied Artificial Intelligence, 2013
In the current study, an integrated system called SABIO is presented. The current system applies Information Retrieval (IR) techniques developed for collections of textual documents to nontextual corpa. SABIO integrates a fuzzy logic-based procedure for IR. Its search algorithm improves the IR efficiency and decreases the computational burden by using a fuzzy logic-based procedure for IR. This procedure is integrated in a flexible and fault-tolerant, human-reasoning-based search algorithm. The Accumulated Knowledge Set (AKS) of the system is sorted in a hierarchic multilevel tree-structure-like ontology. The objects in the AKS are represented using a novel human-reasoning-based-method. This representation takes into account the occurrence of related terms. The system uses a novel fuzzy logic-based term-weighting (TW) method. The developed fuzzy logic method improves the classical term frequency-inverse document frequency (TF=IDF) method, generally used for TW. The abovementioned system is the core of a wizard for search into the website of the University of Seville, www.us.es, which is currently in testing.