In-Depth Guide to How Google Search Works | Google Search Central | Documentation | Google for Developers (original) (raw)
Google Search is a fully-automated search engine that uses software known as web crawlers that explore the web regularly to find pages to add to our index. In fact, the vast majority of pages listed in our results aren't manually submitted for inclusion, but are found and added automatically when our web crawlers explore the web. This document explains the stages of how Search works in the context of your website. Having this base knowledge can help you fix crawling issues, get your pages indexed, and learn how to optimize how your site appears in Google Search.
A few notes before we get started
Before we get into the details of how Search works, it's important to note that Google doesn't accept payment to crawl a site more frequently, or rank it higher. If anyone tells you otherwise, they're wrong.
Google doesn't guarantee that it will crawl, index, or serve your page, even if your page follows the Google Search Essentials.
Introducing the three stages of Google Search
Google Search works in three stages, and not all pages make it through each stage:
Crawling: Google downloads text, images, and videos from pages it found on the internet with automated programs called crawlers.
Indexing: Google analyzes the text, images, and video files on the page, and stores the information in the Google index, which is a large database.
Serving search results: When a user searches on Google, Google returns information that's relevant to the user's query.
Crawling
The first stage is finding out what pages exist on the web. There isn't a central registry of all web pages, so Google must constantly look for new and updated pages and add them to its list of known pages. This process is called "URL discovery". Some pages are known because Google has already visited them. Other pages are discovered when Google follows a link from a known page to a new page: for example, a hub page, such as a category page, links to a new blog post. Still other pages are discovered when you submit a list of pages (asitemap) for Google to crawl.
Once Google discovers a page's URL, it may visit (or "crawl") the page to find out what's on it. We use a huge set of computers to crawl billions of pages on the web. The program that does the fetching is called Googlebot (also known as a crawler, robot, bot, or spider). Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site.Google's crawlers are also programmed such that they try not to crawl the site too fast to avoid overloading it. This mechanism is based on the responses of the site (for example,HTTP 500 errors mean "slow down").
However, Googlebot doesn't crawl all the pages it discovered. Some pages may bedisallowed for crawling by the site owner, other pages may not be accessible without logging in to the site.
During the crawl, Google renders the page andruns any JavaScript it finds using a recent version ofChrome, similar to how your browser renders pages you visit. Rendering is important because websites often rely on JavaScript to bring content to the page, and without rendering Google might not see that content.
Crawling depends on whether Google's crawlers can access the site. Some common issues with Googlebot accessing sites include: