AN INVESTIGATIVE SCHEME FOR KEYWORD SEARCH USING INVERTED KEY TACTIC (original) (raw)
Related papers
Data Organization and Keyword Search: A Detailed Review
2018
Data is the part of every document. In other word we can say that the data is the building block of the document. This paper reviews about the characterization of the data as well as the concepts of various types of search techniques used for searching in the document.
An approach for document retrieval using cluster-based inverted indexing
Journal of Information Science, 2021
Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of ...
Enhance Inverted Index Using in Information Retrieval
This paper proposes a method to represent the first step in information retrieval (IR) (that prepare the document set (preprocessing), In Information retrieval systems, tokenization is an integral part whose prime objective is to identify the token and their count. In this paper, an effective tokenization approach which is based on proposed new method called enhance inverted index (EII). The result shows that efficiency/ effectiveness of the proposed algorithm. Tokenization on documents helps to satisfy user's information need more precisely and reduced search sharply, believed to be a part of information retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing of documents and generates its respective tokens, which is the basis of these tokens. Probabilistic IR generates its scoring and gives reduced search space. The comparative analysis based on the two parameters; reduce the time of search space, Pre-processing time. Keywords: information retrieval (IR), enhance inverted index (EII). INTRODUCTION mount of operational data has been increasing exponentially from past few decades, the expectations of data-user is changing proportionally as well. The data-user expects more deep, exact, and detailed results. Retrieval of relevant results is always affected by the pattern, how they are stored indexed. There are various techniques are designed to index the documents, which is done on the token's identified with in documents, new techniques by using inverted index.[1] Information retrieval (IR) handles the representation, storage, organization, and access to information items. In IR, one of the main problems is to determine which documents are relevant and which are not to the user's needs. In practice, this problem usually mentioned as a ranking problem, which aims to solve according to the degree of relevance (matching) between all documents and the query of user [1] [2] [3].Which deals with information retrieval. General structure of information retrieval is as shown in Figure (1).
IEEE Data(base) Engineering Bulletin, 2001
Querying using keywords is easily the most widely used form o f querying today. While keyword searching is widely used to search documents on the Web, querying of dat abases currently relies on complex query languages that are inappropriate for casual end-users, sin ce they are complex and hard to learn. Given the popularity of keyword search, and the increasing
KSHI: Keyword Search using SQL in Database using Hybrid Indexing to Improve Efficiency
search for the information he is looking for. The input information is the related information for information they require. This gave good results initially and more researchers joined the army of the keyword searching of the databases. (Online) 260 | P a g e scoring and filtering is used to find most relevant records. Another approach discussed in ,"Efficient relational keyword search system" [6] makes use of ranking. Ranking is maintained using user feedback. It helps less for keywords which are not searched too frequently. The keyword search results were also affected by the functional dependency and the foreign key constraint. This is discussed in "Referenced attribute Functional Dependency Database for visualizing web relational tables" . Authors provided the method to eliminate duplicates.
IJERT-An Empirical Study of Effective and Versatile Keyword Query Search
International Journal of Engineering Research and Technology (IJERT), 2015
https://www.ijert.org/an-empirical-study-of-effective-and-versatile-keyword-query-search https://www.ijert.org/research/an-empirical-study-of-effective-and-versatile-keyword-query-search-IJERTV4IS050573.pdf In today's world, the huge amount of information is maintained and stored on World Wide Web. In day to day life every person need some information which can be extracted from web through the various search engine. It is the simple and easiest way to gain knowledge of any unknown field which user eager to know about. User expects to get relevant information according to its query. The information should be relevant as well as valid. To fulfill the user requirements number of techniques are used to provide the best and expected results. In this information searching is done with the help of keyword which is known as query keyword and searching is called as keyword searching. A huge amount of research work focusing on the keyword searching, retrieval and query processing has been done in the relational database. The overall work of the respective field is in scattered and diverse form which needs to collect and organize it in well manner so that it can be helpful for further research. In this paper, a survey of work on keyword querying in databases is presented. Relating to the explained context, this paper gives a brief description of various keyword searching, retrieval of the relevant keywords and query processing techniques with their limitations.
Documents Retrieval Using the Combination of Two Keywords
In the search engine, the NLP (Natural Language Processing) and statistically-based systems are used for making the query. The statistical system is recognizing the terms for searching and also it provides the stems and singular and plural forms of words. The statically based system may also provide the weights of every term. In the Natural Language processing system the parts of speech, identifies objects, verbs, subjects, agents and synonyms and alternating forms for appropriate nouns are tags. Then it is able for creating an unambiguous representation of submitted query and the term weights are computed. For the particular query request the list of the documents are retrieve on the search engine from the database. Using the keywords the search engine obtained the results for submitted query. The Stemming algorithms and Stop-lists/Stop-words are used for reducing the consuming of size of the disk. 'the', 'is', 'an' are the example of stop-words and 'reading', 'playing', 'watches' are the examples of stemming algorithms. In the Information Retrieval system the vector space model and the Boolean model are using for the documents ranking. The search engine optimization is started with submitting the keywords on the search engine that should be very clear and understanding for the query processing and also known that which keywords are more relevant and will performs well for better results. So, in this paper, for retrieving the documents from the database the new technique, combination of the 'two keywords' are proposed and rearranges the list of documents in the order of weight.
2010
Querying using keywords is easily the most widely used form of querying today. While keyword searching is widely used to search documents on the Web, querying of databases currently relies on complex query languages that are inappropriate for casual end-users, since they are complex and hard to learn. Given the popularity of keyword search, and the increasing use of databases as the back end for data published on the Web, the need for querying databases using keywords is being increasingly felt. One key problem in applying document or web keyword search techniques to databases is that information related to a single answer to a keyword query may be split across multiple tuples in different relations. In this paper, we first present a survey of work on keyword querying in databases. We then report on the BANKS system which we have developed. BANKS integrates keyword querying and interactive browsing of databases. By their very nature, keyword queries are imprecise, and we need a model for answering keyword queries. BANKS, like an earlier system called DataSpot, models a database as a graph. In the BANKS model, tuples correspond to nodes, and foreign key and other links between tuples correspond to edges. Answers to a query are modeled as rooted trees connecting tuples that match individual keywords in the query. Answers are ranked using a notion of proximity coupled with a notion of prestige of nodes based on inlinks, the latter being inspired by techniques developed for Web search. We illustrate the power of the model and our prototype through examples.
A Proposed Method for Documents Indexing
In this paper, a new method is proposed for documents indexing based on constructing two tables, namely, words-information table and pages-information table. These two tables used to represent the first step in information retrieval (which prepare the documents set (preprocessing)). In Information retrieval systems, tokenization is an integrals part whose prime objective is to identifying the tokens and their count. In this paper, can be proposed an effective tokenization approach, which is based on proposed new method called documents indexing and results shows that efficiency of proposed algorithm. Tokenization on documents helps to satisfy user's information need more precisely and reduced search sharply. Preprocessing of input document is an integral part of Tokenization, which involves preprocessing of documents and generates its respective tokens, which is the basis of these tokens. Probabilistic IR generate its scoring and gives reduced search space. Comparative analysis based on the two parameters; reduce the time of search space, Pre-processing time, and reduce the size of memory.
A Virtual Document Approach for Keyword Search in Databases
Proceedings of the International Conference on Data Technologies and Applications, 2012
It is clear that in recent years the amount of information available in a variety of data sources, like those found on the Web, has presented an accelerated growth. This information can be classified based on its structure in three different forms: unstructured (free text documents), semi-structured (XML documents) and structured (a relational database or XML database). A search technique that has gained wide acceptance for use in massive data sources, such as the Web, is the keyword based search, which is simple to people who are familiar with the use of Web search engines. Keyword search has become an alternative to users without any knowledge about formal query languages and schema used in structured data. There are some traditional approaches to perform keyword search over relational databases such as Steiner Trees, Candidate Networks and recently Tuple Units. Nevertheless these methods have some limitations. In this paper we propose a Virtual Document (VD) approach for keyword search in databases. We represent the structured information as graphs and propose the use of an index that captures the structural relationships of the information. This approach produce fast and accuracy results in search responses. We have conducted extensive experiments on large-scale real databases and the results demonstrates that our approach achieves high search efficiency and high accuracy for keyword search in databases.