Web Page Clustering Using Heuristic Search in the Web Graph (original) (raw)

Web Pages Clustering: A New Approach

—The rapid growth of web has resulted in vast volume of information. Information availability at a rapid speed to the user is vital. English language (or any for that matter) has lot of ambiguity in the usage of words. So there is no guarantee that a keyword based search engine will provide the required results. This paper introduces the use of dictionary (standardised) to obtain the context with which a keyword is used and in turn cluster the results based on this context. These ideas can be merged with a metasearch engine to enhance the search efficiency.

Clustering Web Search Results-A Review

The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval (IR) community. Web search result clustering has been emerged as a method which overcomes these drawbacks of conventional information retrieval (IR) community. It is the clustering of results returned by the search engines into meaningful, thematic groups. This paper gives issues that must be addressed in the development of a Web clustering engine and categorizes various techniques that have been used in clustering of web search results.

IJERT-Clustering of Web Search Results using Hybrid Algorithm

International Journal of Engineering Research and Technology (IJERT), 2016

https://www.ijert.org/clustering-of-web-search-results-using-hybrid-algorithm https://www.ijert.org/research/clustering-of-web-search-results-using-hybrid-algorithm-IJERTV4IS120183.pdf Clustering the web search has become a very fascinating research area among scientific and academic associations involved in information retrieval. It is also knows as Web Clustering Engines, appeal to increase the description of documents presented to the user for review, while decreasing the time spent reviewing them. Many algorithms for web document clustering already exist, but conclusions show there is room for more algorithms. Our Project works on providing concise information on an ambiguous search. This allows the user to gain precise information faster and reduces the time spent on looking through thousands of pages for simple information. The information obtained will be segmented, sorted and irrelevant information will be avoided.

Using Metaheuristic Approaches in Web Document Clustering in Web Search

2018

Internet is a gigantic information resource, which is rapidly growing day by day as more and more data are being added to the World Wide Web. With the rapid growth of web documents on the internet, it is becoming difficult to organize, analyze and present these documents efficiently. Clustering can act as a key player in organizing such a hefty amount of documents into groups. The performance of the IR system could be improved by document clustering. It has been found that HTML tags which have particular meanings could be used to enhance the performance of IR system. This paper provides a brief survey of the available literature on a web search in which HTML tags have been used in information retrieval and Meta-heuristics approaches used in web document clustering.

An Analysis of Web Document Clustering Algorithms

Evidently there is a tremendous increase in the amount of information found today on the largest shared information source, the World Wide Web. The process of finding relevant information on the web is overwhelming. Even with the presence of today's search engines that index the web it is difficult to wade through the large number of returned documents in a response to a user query. Furthermore, users without domain expertise are not familiar with the appropriate terminology thus not submitting the right query terms, leading to the retrieval of more irrelevant pages and the most relevant documents do not necessarily appear at the top of the query output sequence. Users of Web search engines are thus often forced to sift through the long ordered list of document " snippets " returned by the engines. This fact has lead to the need to organize a large set of documents into categories through clustering. The Information Retrieval community has explored document clustering as an alternative method of organizing retrieval results. Grouping similar documents together into clusters will help the users find relevant information quicker and will allow them to focus their search in the appropriate direction. Various web document clustering techniques are now being used to give meaningful search result on web. In this paper an analysis of the various categories of web document clustering and also the various existing web clustering engines with its relevant clustering techniques are presented.

Clustering of Web Search Results using Hybrid Algorithm

International Journal of Engineering Research and, 2015

Clustering the web search has become a very fascinating research area among scientific and academic associations involved in information retrieval. It is also knows as Web Clustering Engines, appeal to increase the description of documents presented to the user for review, while decreasing the time spent reviewing them. Many algorithms for web document clustering already exist, but conclusions show there is room for more algorithms. Our Project works on providing concise information on an ambiguous search. This allows the user to gain precise information faster and reduces the time spent on looking through thousands of pages for simple information. The information obtained will be segmented, sorted and irrelevant information will be avoided.

Clustering of Web Page Search Results: A Full Text Based Approach

International Journal of …, 2008

With so much information available on the web, looking for relevant documents on the Internet has become a tough task. In this paper we present as approach which is a match between a query-based Google and a category-based Yahoo. WISE is a web page hierarchical ...

Search result clustering based on clustering context extended abstract

This paper introduces a novel, interactive and exploratory, approach to information retrieval (search engines) based on clustering. Presented method allows users to change the clustering structure by applying a free-text clustering context query that is treated as a criterion for document-to-cluster allocation. Exploration mechanisms are also delivered by redening the interaction scenario in which the user can interact with data on the level of topic discovery or cluster labeling. In this paper, the presented idea is realized by a graph structure called the Query-Summarize Graph. This data structure is useful in the denition of the similarity measure between the snippets as well as in the snippet clustering algorithm. The experiments on real-world data are showing that the proposed solution has many interesting properties and can be an alternative approach to interactive information retrieval.

A Survey On Web Search Result Clustering And Engines

2013

Now a days World Wide Web is a very large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. This paper highlights the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance.

Web Data Clustering

Studies in Computational Intelligence, 2009

This chapter provides a survey of some clustering methods relevant to clustering Web elements for better information access. We start with classical methods of cluster analysis that seems to be relevant in approaching the clustering of Web data. Graph clustering is also described since its methods contribute significantly to clustering Web data. The use of artificial neural networks for clustering has the same motivation. Based on previously presented material, the core of the chapter provides an overview of approaches to clustering in the Web environment. Particularly, we focus on clustering Web search results, in which clustering search engines arrange the search results into groups around a common theme. We conclude with some general considerations concerning the justification of so many clustering algorithms and their application in the Web environment.