A personalized search engine based on Web‐snippet hierarchical clustering (original) (raw)

Grouper: a dynamic clustering interface to Web search results

Computer Networks, 1999

Users of Web search engines are often forced to sift through the long ordered list of document 'snippets' returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search engines. The NorthernLight search engine organizes its output into 'custom folders' based on pre-computed document labels, but does not reveal how the folders are generated or how well they correspond to users" interests. In this paper, we introduce Grouper, an interface to the results of the HuskySearch meta-search engine, which dynamically groups the search results into clusters labeled by phrases extracted from the snippets. In addition, we report on the first empirical comparison of user Web search behavior on a standard ranked-list presentation versus a clustered presentation. By analyzing HuskySearch logs, we are able to demonstrate substantial differences in the number of documents followed, and in the amount of time and effort expended by users accessing search results through these two interfaces.

A Survey On Web Search Result Clustering And Engines

2013

Now a days World Wide Web is a very large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. This paper highlights the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance.

Cluster Generation and Labeling for Web Snippets: A Fast, Accurate Hierarchical Solution

Internet Mathematics, 2006

This paper describes Armil, a meta-search engine that groups the web snippets returned by auxiliary search engines into disjoint labeled clusters. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to his/her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labeling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and they use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-pointfirst algorithm for metric k-center clustering. Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted "external" metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labeling algorithms. On a standard desktop PC (AMD Athlon 1-Ghz Clock with 750 Mbytes RAM), Armil performs clustering and labeling altogether of up to 200 snippets in less than one second.

Personalized Web Search via Query Expansion based on User’s Local Hierarchically-Organized Files

2017

Users of Web search engines generally express information needs with short and ambiguous queries, leading to irrelevant results. Personalized search methods improve users' experience by automatically reformulating queries before sending them to the search engine or rearranging received results, according to their specific interests. A user profile is often built from previous queries, clicked results or in general from the user's browsing history; different topics must be distinguished in order to obtain an accurate profile. It is quite common that a set of user files, locally stored in sub-directory, are organized by the user into a coherent taxonomy corresponding to own topics of interest, but only a few methods leverage on this potentially useful source of knowledge. We propose a novel method where a user profile is built from those files, specifically considering their consistent arrangement in directories. A bag of keywords is extracted for each directory from text documents within it. We can infer the topic of each query and expand it by adding the corresponding keywords, in order to obtain a more targeted formulation. Experiments are carried out using benchmark data through a repeatable systematic process, in order to evaluate objectively how much our method can improve relevance of query results when applied upon a third-party search engine.

Captain Nemo: A Metasearch Engine with Personalized Hierarchical Search Space

Personalization of search has gained a lot of publicity the last years. Personalization features in search and metasearch engines are a follow-up to the research done. On the other hand, text categorization methods have been successfully applied to document collections. Specifically, text categorization methods can support the task of classifying Web content in thematic hierarchies. Combining these two research fields, we have developed Captain Nemo, a fully-functional metasearch engine with personalized hierarchical search spaces. Captain Nemo, retrieves and presents search results according to personalized retrieval models and presentation styles. Here, we present the hierarchical Web page classification approach newly adopted. Captain Nemo lets users define a hierarchy of topics of interest. Search results are automatically classified into the hierarchy, exploiting hierarchical k-Nearest Neighbor classification techniques. The user study conducted demonstrates the effectiveness of our metasearch engine. Povzetek: Opisan je metaiskalnik Captain Nemo.

A Novel Approach for Organizing Web Search Results using Ranking and Clustering

International Journal of Computer …, 2010

World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. While retrieving information through user queries, a search engine results in a large and unmanageable collection of documents. Web mining tools are used to classify, cluster and order the documents so that users can easily navigate through the search results and find the desired information content. A more efficient way to organize the documents can be a combination of clustering and ranking, where clustering can group the documents and ranking can be applied for ordering the pages within each cluster. Based on this approach, in this paper, a mechanism is being proposed that provides ordered results in the form of clusters in accordance with user"s query. An efficient page ranking method is also proposed that orders the results according to both the relevancy and the importance of documents. This approach helps user to restrict his search to some top documents in particular clusters of his interest.

Uncovering User’s Search Patterns to Personalise Web Search

2018

In today’s world, search engines have become a very convenient method of searching and retrieving information. But this increasing use of search engines goes hand in hand with the everincreasing data available on the internet. With such large number of websites available, it is essential to have these websites sorted in decreasing order of their relevance to the user’s query for effective operation and retrieval of data. This paper explores various domains related to Computer Science and proposes a framework that seems the best fix to this problem. We have proposed a new system to provide personalized web search according to the user’s internet surfing patterns. The system extracts the user’s history and scrapes the web pages’ content (title, keywords, headings, sub-headings, meta tags). These documents are then clustered using Word2Vec model and Latent Semantic Indexing to give better results. User’s search query is mapped to the profile and an appropriate cluster is selected. The ...

Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

2006

This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric k-center clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms.

PSSE: An Architecture For A Personalized Semantic Search Engine

INTERNATIONAL JOURNAL ON Advances in Information Sciences and Service Sciences, 2010

Semantic technologies promise a next generation of semantic search engines. General search engines don't take into consideration the semantic relationships between query terms and other concepts that might be significant to user. Thus, semantic web vision and its core ontologies are used to overcome this defect. The order in which these results are ranked is also substantial. Moreover, user preferences and interests must be taken into consideration so as to provide user a set of personalized results. In this paper we propose, an architecture for a Personalized Semantic Search Engine (PSSE). PSSE is a crawler-based search engine that makes use of multi-crawlers to collect resources from both semantic as well as traditional web resources. In order for the system to reduce processing time, web pages' graph is clustered, then clusters are annotated using document annotation agents that work in parallel. Annotation agents use methods of ontology matching to find resources of the semantic web as well as means of information extraction techniques in order to provide a well description of HTML documents. System ranks resources based on a final score that's calculated based on traditional link analysis, content analysis and a weighted user profile for more personalized results. We have a belief that the merge of these techniques together enhances search results.

Hierarchical Classification of Web Search Results Using Personalized Ontologies

2000

In this paper, we propose an approach to presenting web search results that supports personalization, taking into consideration users' perspectives. We developed a post-retrieval algorithm which uses document classification techniques to organize search results into a meaningful hierarchy of topics, based on the perspective of the user performing the search, represented as a taxonomic ontology. A demonstration system called WEBCLUSTERS