A Wikipedia powered state-based approach to automatic search query enhancement (original) (raw)

A Comparison of Automatic Search Query Enhancement Algorithms That Utilise Wikipedia as a Source of A Priori Knowledge

Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation, 2017

This paper describes the benchmarking and analysis of five Automatic Search Query Enhancement (ASQE) algorithms that utilise Wikipedia as the sole source for a priori knowledge. The contributions of this paper include: 1) A comprehensive review into current ASQE algorithms that utilise Wikipedia as the sole source for a priori knowledge; 2) benchmarking of five existing ASQE algorithms using the TREC-9 Web Topics on the ClueWeb12 data set and 3) analysis of the results from the benchmarking process to identify the strengths and weaknesses each algorithm. During the benchmarking process, 2,500 relevance assessments were performed. Results of these tests are analysed using the Average Precision @10 per query and Mean Average Precision @10 per algorithm. From this analysis we show that the scope of a priori knowledge utilised during enhancement and the available term weighting methods available from Wikipedia can further aid the ASQE process. Although approaches taken by the algorithms are still relevant, an over dependence on weighting schemes and data sources used can easily impact results of an ASQE algorithm.

A qualitative analysis of the Wikipedia N-Substate Algorithm's Enhancement Terms

Journal of Computer-Assisted Linguistic Research, 2019

Automatic Search Query Enhancement (ASQE) is the process of modifying a user submitted search query and identifying terms that can be added or removed to enhance the relevance of documents retrieved from a search engine. ASQE differs from other enhancement approaches as no human interaction is required. ASQE algorithms typically rely on a source of a priori knowledge to aid the process of identifying relevant enhancement terms. This paper describes the results of a qualitative analysis of the enhancement terms generated by the Wikipedia NSubstate Algorithm (WNSSA) for ASQE. The WNSSA utilises Wikipedia as the sole source of a priori knowledge during the query enhancement process. As each Wikipedia article typically represents a single topic, during the enhancement process of the WNSSA, a mapping is performed between the user’s original search query and Wikipedia articles relevant to the query. If this mapping is performed correctly, a collection of potentially relevant terms and acr...

Improved concept-based query expansion using Wikipedia

The query formulation has always been a challenge for the users. In this paper, we propose a novel interactive query expansion methodology that identifies and presents the potential directions (generalised concepts) for the given query enabling the user to explore the interested topic further. The methodology proposed is concept-based direction (CoD) finder which relies on the external knowledge repository for finding the directions. Wikipedia, the most important non-profit crowdsourcing project, is considered as the external knowledge repository for CoD finder methodology. CoD finder identifies the concepts for the given query and derives the generalised direction for each of the concepts, based on the content of the Wikipedia article and the categories it belongs to. The CoD finder methodology has been evaluated in the crowdsourcing marketplace -Amazon's Mechanical Turk for measuring the quality of the identified potential directions. The evaluation result shows that the potential directions identified by the CoD finder methodology produces better precision and recall for the given queries.

Improving Retrieval Performance Based on Query Expansion with Wikipedia and Text Mining Technique

International Journal of Intelligent Engineering and Systems

Textual query is the simple mean for communicating with a retrieval system. However, there is a risk of providing an incomplete query which hinders the system from satisfying the user information needs. By reformulating the queries, query expansion is solution for this problem, this mainly relies on an accurate choice of the added terms to an initial query. It can yield a large number of irrelevant terms, which in turn negatively influences quality of retained documents. In this paper, we propose Query Expansion approach. It consists of reformulating queries by semantically related terms extracted from a semantic graph called query graph derived from Wikipedia. Furthermore, we propose a similarity measure which computes the similarity between a candidate terms and initial query using the query graph, Explicit Semantic Analysis (ESA) measure, and text mining technique. The experiments on Text Retrieval Conference (TREC) collection show that the proposed approach performs significantly better than the baseline system and some existing techniques.

Wiki-MetaSemantik: A Wikipedia-derived query expansion approach based on network properties

2017 5th International Conference on Cyber and IT Service Management (CITSM), 2017

This paper discusses the use of Wikipedia for building semantic ontologies to do Query Expansion (QE) in order to improve the search results of search engines. In this technique, selecting related Wikipedia concepts becomes important. We propose the use of network properties (degree, closeness, and pageRank) to build an ontology graph of user query concept which is derived directly from Wikipedia structures. The resulting expansion system is called Wiki-MetaSemantik. We tested this system against other online thesauruses and ontology based QE in both individual and meta-search engines setups. Despite that our system has to build a Wikipedia ontology graph in order to do its work, the technique turns out to works very fast (1:281) compared to other ontology QE baseline (Persian Wikipedia ontology QE). It has thus the potential to be utilized online. Furthermore, it shows significant improvement in accuracy. Wiki-MetaSemantik shows better performance in a meta-search engine (MSE) set up rather than in an individual search engine set up.

Ad-hoc Information Retrieval focused on Wikipedia based Query Expansion and Entropy Based Ranking

This paper presents the experiments carried out at Jadavpur University as part of the participation in the Forum for Information Retrieval Evaluation (FIRE) 2012 in ad-hoc monolingual information retrieval task for Bengali Hindi and English languages. The experiments carried out by us for FIRE 2012 are based on query expansion and entropy based ranking. The document collection for Bengali, Hindi and English contained 4, 57,370 , 3,31,599 and 3,92,577 documents respectively. Each query was specified using title, narration and description format. 100 queries were used for training the system while the system was tested with 50 queries in Bengali.

Web Query Expansion and Refinement using Query -Level Clustering

The objectives raised in this paper are to pave the new dimension to Internet searching and bring the semantic core strategies to the forefront to add values to the search process. In precise, " the search must be what user wish, not what user types ". To know the process of search intricacy, we observed the vocabulary contradiction and mismatch problem existence during retrieval can estimate the irrelevant document matching. Generally, a term or vocabulary mismatch can happens to the search iteration only if the terms not present in the fetched documents. Many techniques have been proposed such as library science, pseudo relevance feedback and later semantic indexing etc, where all the algorithms tend to find the objectives sustained but did not deal with alternate process. Hence we have proposed a technique which gives the sheer implications of all the pitfalls and device a new mechanism to support the mismatch problem. By bringing the semantics aspects of the sentences and word order of the sentence to the core part, we have emulated the proper solution to get rid of sentence or term mismatch problem. 1. Introduction It is observed that searches conducted on search engines are purely for learning, entertainment or to carry business transactions. But many searches are having the real purpose and made some impact on to take important decision about life, health, major purchase of certain things or quenching the business community quest for an acquisition target. Although the search engines have been achieving remarkable success in recent years and reaching new heights in bringing the quality results to the users, but still poor at helping the people to find exactly what they want, and their needs, especially in the circumstances where the users don't have a clear idea of what they are actually looking for. Both the conventional and the modern search engines are simply attempt to find the best match between what users asks for and what is available in their indices. Search engines have not done a good job of assessing exactly what the user wants because they are lack in the sheer knowledge of the context that made the user to generate the poor search query. Besides, the ambiguities of language are an issue which is more difficult to understand the exact intent or absolute meaning of the user's query. Searching is a iterative process in which a users grab the intended web pages via trial and error query methods that work best for the issue to resolve. It might surprise most people to know that search engines only index a small percentage of the knowledge resources available. This occurs because many web pages are stored behind password protected sites, pages are dynamically created and disappear once they serve their purpose, and several types of information are in formats that are not useable by search engines. Users search the web for the information with their needs and mostly their queries are explicit expression of their search needs. The information need in web search process can be termed as intent and that demands more productive fetching of web pages. Many times, the user query is not adequate to describe the intent which they actually aimed but it only contains few terms. This problem exists, because of the lack of domain knowledge or insufficient skills to express their intents. And also, the intent primarily resides in the mind of the user and thus difficult to observe. Despite all these hiccups, even if the user is obliged to reveal his actual intent, it's also a challenging task to describe the intent accurately. Hence, users can reformulate the initial query following the search results shown to them and their understanding would become more specific by extracting clues from search activities. Basically, the web users are categorically separated as: navigational, informational and transactional. The navigational query can be used to reach the specific web site or web pages where the users don't have the clear indication of it. The navigational queries can take the user to different web pages which are all relevant to one another. The information queries are very specific where it demands the relevant information about the given topic. The users want to learn or find the information which might scatter at various web pages or sites. The transactional queries are absolutely interactive and carry out a robust transaction with the websites like downloading music, carry out online shopping, playing online games etc. In order to achieve the search process more productive, we need to extract the semantics from the questions which the user often posed in the web. The questions can be categorized in many ways like the queries which are only yes or no type, some queries are seeking the reasons of particular thing (like why type questions), few queries are asking the opinion of particular things, some queries wants to know the details of the particular

Semantifying queries over large-scale Web search engines

Journal of Internet Services and Applications, 2012

In many situations, searching the web is synonymous to information seeking. Currently, web search engines are the most popular vehicle via which people get access to the web. Their popularity is partially due to the intrinsic way that people interact with them, i.e., by typing some keywords to the corresponding input box. Despite their popularity, search engines often fail to satisfy certain information needs, especially when the latter are haze and poorly articulated. In this paper, we focus on the occasions when largescale web search engines find it difficult to cope with specific information-seeking behaviors and we accordingly introduce a query construction service that is targeted towards the solution of this problem. The proposed service leverages information coming from various DBpedia datasets and provides an intuitive GUI via which searchers determine the semantic orientation of their queries before these are addressed to the underlying search engine. The evaluation of the query construction service justifies the motive of this paper and indicates that it can considerably improve the searchers' querying ability when search engines fail to provide adequate help.

Interactive Query Expansion Using Concept-Based Directions Finder Based on Wikipedia

Despite the advances in information retrieval the search engines still result in imprecise or poor results, mainly due to the quality of the query being submitted. The query formulation to express their information need has always been challenging for the users. In this paper, we have proposed an interactive query expansion methodology using Concept-Based Directions Finder (CBDF). The approach determines the directions in which the search can be continued by the user using Explicit Semantic Analysis (ESA) for a given query. The CBDF identifies the relevant terms with a corresponding label for each of the directions found, based on the content and link structure of Wikipedia. The relevant terms identified along with its label are suggested to the user for query expansion through the new visual interface proposed. The visual interface named as terms mapper, accepts the query, and displays the potential directions and a group of relevant terms along with the label for the direction chosen by the user. We evaluated the results of the proposed approach and the visual interfacefor the identified queries. The experimental result shows that the approach produces a good Mean Average Precision (MAP) for the queries chosen.

Expansion of Single-word Weak Queries Using Wikipedia as External Data Resource

2013

Query expansion is an effective technique to improve the performance of weak web search queries. The external data sources like Wikipedia data dump can be taken into confidence to provide reliable related terms for expanding queries. In this work we propose a query expansion approach based on Wikipedia as external knowledge repository and WordNet to detect linguistically important terms. Pseudo Relevance Feedback is used to retrieve top documents from Wikipedia with respect to a query. These top documents serve as a pool of potential expansion terms. Moreover, we focus on travel and lifestyle domain. Hence, intuitively the words absent in WordNet are broadly categorized as Named Entities and certain boosting were given to those terms. We also incorporated some linguistic information extracted from Wikipedia articles within the scoring of the terms. By defining proper term weighting strategies, the query expansion performs effectively. Consequently, we observed that the expansion lea...