Learning to cluster web search results (original) (raw)
Related papers
Clustering Web Search Results-A Review
The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval (IR) community. Web search result clustering has been emerged as a method which overcomes these drawbacks of conventional information retrieval (IR) community. It is the clustering of results returned by the search engines into meaningful, thematic groups. This paper gives issues that must be addressed in the development of a Web clustering engine and categorizes various techniques that have been used in clustering of web search results.
Topical clustering of search results
2012
Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them.
Web Search Clustering and Labeling with Hidden Topics
ACM Transactions on Asian Language Information Processing, 2009
Web search clustering is a solution to reorganize search results (also called "snippets") in a more convenient way for browsing. There are three key requirements for such post-retrieval clustering systems: (1) the clustering algorithm should group similar documents together; (2) clusters should be labeled with descriptive phrases; and (3) the clustering system should provide high-quality clustering without downloading the whole Web page.
Improving Search Engines by Query Clustering
In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
Clustering of Web Search Results Based on Document Segmentation
Computer and Information Science, 2013
The process of clustering documents in a manner which produces accurate and compact clusters becomes increasingly significant mainly with the vast size of information on the web. This problem becomes even more complicated with the multi-topics nature of documents these days. In this paper, we deal with the problem of clustering documents retrieved by a search engine, where each document deals with multiple topics. Our approach is based on segmenting each document into a number of segments and then clustering segments of all documents using the Lingo algorithm. We evaluate the quality of clusters obtained by clustering full documents directly and by clustering document segments using the distance-based average intra-cluster similarity measure. Our results illustrate that average intra-cluster similarity is increased by approximately 75% as a result of clustering document segments as compared to clustering full documents retrieved by the search engine.
An empirical evaluation on textual results clustering for web search
Proceedings of the American Society for Information Science and Technology, 2009
Clustering web search results into dynamic clusters and cluster hierarchies has been shown to be promising in reducing the information overload typically found in the ranked list search engines. The study compared sixteen participants' search performance and subjective satisfaction level in using textual clustering and ranked list search interfaces towards conducting assigned and selfdesignated search tasks. The results show participants searched slightly faster, better, and were more satisfied using the ranked list interface. However, it is worth noting that participants performed slightly well in easy type of questions with the clustering interface, and obtained nonrepetitive relevant results not found from using the ranked list interface. The study shows the clustering interface provides the values of highlighting prominent concepts and offering richer context for exploring, learning and discovering related concepts; yet it also induces certain degree of information uncertainty, lost, and anxiety. Discussions on the contrast view of clustering search and suggestions for future studies are also provided.
A Novel Approach for Organizing Web Search Results using Ranking and Clustering
International Journal of Computer …, 2010
World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. While retrieving information through user queries, a search engine results in a large and unmanageable collection of documents. Web mining tools are used to classify, cluster and order the documents so that users can easily navigate through the search results and find the desired information content. A more efficient way to organize the documents can be a combination of clustering and ranking, where clustering can group the documents and ranking can be applied for ordering the pages within each cluster. Based on this approach, in this paper, a mechanism is being proposed that provides ordered results in the form of clusters in accordance with user"s query. An efficient page ranking method is also proposed that orders the results according to both the relevancy and the importance of documents. This approach helps user to restrict his search to some top documents in particular clusters of his interest.
Term Ranking for Clustering Web Search Results
Clustering web search engine results for ambiguous keyword searches poses unique challenges. First, we show that one cannot readily import the frequency based feature ranking to cluster the web search results as in the text document clustering. Next, we present TermRank, a variation of the PageRank algorithm based on a relational graph representation of the content of web document collections. TermRank achieves desirable ranking of discriminative terms higher than the ambiguous terms, and ranking ambiguous terms higher than common terms. We experiment with two clustering algorithms to demonstrate the efficacy of TermRank. TermRank is shown to perform substantially better than frequency based classical methods.
Semantic, Hierarchical, Online Clustering of Web Search Results
Lecture Notes in Computer Science, 2004
Today, search engine is the most commonly used tool for Web information retrieval, however, its current status is still far from satisfaction. This paper focuses on clustering Web search results in order to help users find relevant Web information more easily and quickly. The main contributions of this paper include the following. (1) The benefits of using key phrases as natural language information features are discussed. An effective and efficient algorithm based on suffix array for key phrase discovery is presented. The efficiency of this method is very high no matter how large the language's alphabet is. (2) The concept of orthogonal clustering is proposed for general clustering problems. The reason why matrix SVD (Singular Value Decomposition) can provide solution to orthogonal clustering is strictly proved. The orthogonal clustering algorithm has a solid mathematics foundation and many advantages over traditional heuristic clustering algorithms. (3) The WICE system is designed and implemented to automatically organize multilingual Web search results through a semantic, hierarchical, online clustering approach named SHOC.
An evaluation of techniques for clustering search results
2005
Abstract: The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects. In this paper, we compare classification methods from IR and Machine Learning (ML) for clustering search results.