Searching the World Wide Web (original) (raw)

Information Retrieval on the World Wide Web

IEEE Internet Computing, 1997

T he World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months.

WEB SEARCH ENGINE

Abstract: In this paper, we present Web search engine, a prototype of a Web search engine that makes heavy use of the structure present in hypertext. Google is designed to crawl, index the Web efficiently, and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ to engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system that can exploit the additional information present in hypertext. In addition, we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want. Keywords: World Wide Web, Search Engines, Information Retrieval, PageRank, Google. Title: WEB SEARCH ENGINE Author: Raghav Arora, Rana Rahul Sathyaprakash, Saurabh Rauthan, Shrey Jakhetia International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online), ISSN 2348-1196 (print) Research Publish Journals

Web Search Engines: Mining Right Information

Ijca Proceedings on National Workshop Cum Conference on Recent Trends in Mathematics and Computing 2011, 2012

A Web Search Engine maintains and catalogs the content of Web pages in order to make them easier to find and browse. There are many Search Engines which are similar, differentiates from the other by the methods for scouring, storing, and retrieving information from the Web. Usually Search Engines search through Web pages for specified keywords, in response they return a list of containing specified keywords documents. After finding the list of specified keywords documents, list is sorted by relevance criteria which try to put at the very first positions the documents that best match the user's query. The usefulness of a search engine to most people is based on the relevance of results it retrieves from the web. This paper tries to address some issues regarding some of the major challenges faced by Search Engines, since the size of the Web is rapidly growing.

Information Retrieval Issues on the World Wide Web

The World Wide Web (Web) is the largest information repository containing billions of interconnected documents (called the web pages) which are authored by billions of people and organizations. The Web is huge, diverse, unstructured or semi structured, dynamic contents, and multilingual nature; make the effectively and efficiently searching information on the Web a challenging research problem. In this paper we briefly explore the issues related to finding relevant information on the Web such as crawling, indexing and ranking the Web.

Web Information Retrieval

2008

Searching for information is commonly an individual task which aims at solving any information need. To do that, one may go to a library, or go surfing the Web in order to find relevant information. Indeed, due to the large amount of available documents, the Web has become a favorite information source for solving daily information needs. An issue remains: the Web is in perpetual evolution; so the problem is less the existence of relevant information rather than the way users find it. One may compare searching for information on the Web with "looking for a needle in a haystack." Thus, searching the Web suffers from many limits that can be reduced by using a search assistant. Such an assistant helps the user to find relevant information on the Web. At the beginning, those assistants were principally helping each user individually. Nowadays, we are witnessing the rise of social approaches in such systems. Those latter systems help users to find relevant information by using other users' experience, shared information… Therefore, each user is helped thanks to the mass crowd. This chapter underlines this search assistants evolution, it is organized as follows: section 1 introduces the underlying concepts and limits of traditional information search process and its application to the Web. Section 2 explains the search assistant concept by detailing their evolution from individual to social approaches. Sections 3 up to 5 present current approaches that search assistants may use to help any user to query and browse the Web as well as to improve search-related activities. To conclude, future trends for Web information assistants are discussed.

Web Information Resource Discovery: Past, Present, and Future

2003

In a time span of twelve years, the World Wide Web--only a computer and an internet connection away from anybody anywhere, and with abundant, diverse and sometimes incorrect, redundant, spam, and bad information--has become the major information repository for the masses and the world. The web is becoming all things to all people, totally oblivious to nation/country/continent boundaries, promising mostly free information to all, and quickly growing into a repository in all languages and all cultures. With large digital libraries and increasingly significant educational resources, the web is becoming an equalizer, a balancing force, and an opportunity for all, especially for underdeveloped/developing countries. The web is both exciting and overwhelming, changing the way the world communicates, from the way businesses are conducted to the way masses are educated, from the way research is performed to the way research results are disseminated. It is fair to say that the web will only get more diverse, larger and more chaotic in the near future.