Crawl Stats report - Search Console Help (original) (raw)

The Crawl Stats report shows you statistics about Google's crawling history on your website. For instance, how many requests were made and when, what your server response was, and any availability issues encountered. You can use this report to detect whether Google encounters serving problems when crawling your site.

This report is aimed at advanced users. If you have a site with fewer than a thousand pages, you should not need to use this report or worry about this level of crawling detail.

This report is available only for root-level properties. That is, the property must be either a Domain property (such as example.com or m.example.com) or a URL-prefix property at the root level (https://example.com, http://example.com, http://m.example.com).

Open Crawl Stats report

Crawl Budget and the Crawl Stats report - Google Search Console Training

You can reach the Crawl Stats report in Search Console by clicking Settings(Property settings) > Crawl stats.

Getting started

You should understand the following information before using this report:

About the data

Known issue: The Crawl Stats report currently reports most crawl requests, but some requests might not be counted for various reasons. We expect our coverage to increase over time to cover most, if not all requests. Therefore you might see slight differences between your site's request logs and the numbers reported here.

The report shows the following crawl information about your site:

Click any table entry to get a detailed view for that item, including a list of example URLs; click a URL to get details for that specific crawl request. For example, in the table showing responses grouped by type, click the HTML row to see aggregated crawl information for all HTML pages crawled on your site, as well as details such as the crawl time, response code, response size, and more for an example selection of those URLs.

Hosts and child domains

If your property is at the domain level (example.com, http://example.com, https://m.example.com), and it contains two or more child domains (say fr.example.com and de.example.com), you can see data for the parent, which includes all children, or scoped to a single child domain.

To see the report scoped to a specific child, click the child in the Hosts lists on the landing page of the parent domain. Only the top 20 child domains that received traffic in the past 90 days are shown.

Example URLs

You can click into any of the grouped data type entries (response, file type, purpose, Googlebot type) to see a list of example URLs of that type.

Example URLs are not comprehensive, but just a representative example. If you don't find a URL listed, it doesn't mean that we didn't request it. The number of examples can be weighted by day, and so you might find that some types of requests might have more examples than other types. This should balance out over time.

Total crawl requests

The total number of crawl requests issued for URLs on your site, whether successful or not. Includes requests for resources used by the page if these resources are on your site; requests to resources hosted outside of your site are not counted. Duplicate requests for the same URL are counted individually. If your robots.txt file is insufficiently available, potential fetches are counted.

Unsuccessful requests that are counted include the following:

Total download size

Total number of bytes downloaded from your site during crawling, for the specified time period. If Google cached a page resource that is used by multiple pages, the resource is only requested the first time (when it is cached).

Average response time

Average response time for all resources fetched from your site during the specified time period. Each resource linked by a page is counted as a separate response.

Host status

Host status describes whether or not Google encountered availability issues when trying to crawl your site. Status can be one of the following values:

What to look for

Ideally your host status should be Green. If your availability status is red, click to see availability details for robots.txt availability, DNS resolution, and host connectivity.

Host status details

Host availability status is assessed in the following categories. A significant error in any category can lead to a lowered availability status. Click on a category in the report to get more details.

For each category, you'll see a chart of crawl data for the time period. The chart has a dotted red line; if the metric was above the dotted line for this category (for example, if DNS resolution fails for more than 5% of requests on a given day), that is considered an issue for that category, and the status will reflect the recency of the last issue.

More robots.txt availability details

Here is a more detailed description of how Google checks (and depends on) robots.txt files when crawling your site.

Your site is not required to have a robots.txt file, but it must return a successful response (as defined below) when asked for this file, or else Google might stop crawling your site.

Here is how Google requests and uses robots.txt files when crawling a site:

  1. Before Google crawls your site, it first checks if there's a recent successful robots.txt request (less than 24 hours old).
  2. If Google has a successful robots.txt response less than 24 hours old, Google uses that robots.txt file when crawling your site. (Remember that 404 Not Found is successful, and means that there is no robots.txt file, which means that Google can crawl any URLs on the site.)
  3. If the last response was unsuccessful or more than 24 hours old, Google requests your robots.txt file:
    • If successful, the crawl can start.
    • If not successful:
      * For the first 12 hours, Google will stop crawling your site, but will continue to request your robots.txt file.
      * From 12 hours to 30 days, Google will use the last successfully fetched robots.txt file, while still requesting your robots.txt file.
      * After 30 days:
      * If the site homepage is available, Google will act as if there is no robots.txt file, and crawl without restraints.
      * If the site homepage is not available, Google will stop crawling the site.
      * In either case, Google will continue to request your robots.txt file periodically.

Any crawls that were abandoned because the robots.txt file was unavailable are counted in crawling totals. However, these crawls were not actually made, so some grouping reports (crawls by purpose, crawls by response, and so on) won't list those crawls, or may have limited information about them.

Crawl responses

This table shows the responses that Google received when crawling your site, grouped by response type, as a percentage of all crawl responses. Data is based on total number requests, not by URL, so if Google requested a URL twice and got Server error (500) the first time, and OK (200) the second time, the response would be 50% Server error and 50% OK.

What to look for

Most responses should be 200 or other "Good"-type responses, unless you are doing a site reorganization or site move. See the list below to learn how to handle other response codes.

Here are some common response codes, and how to handle them:

Good response codes

These pages are fine and not causing any issues.

Possibly good response codes

These responses might be fine, but you might check to make sure that this is what you intended.

Bad response codes

You should fix pages returning these errors to improve your crawling.

Crawled file types

The file type returned by the request. Percentage value for each type is the percentage of responses of that type, not the percentage of of bytes retrieved of that type.

Possible file type values:

What to look for

If you are seeing availability issues or slow response rates, check this table to get a feel for what types of resources Google is crawling, and why this might be slowing down your crawl. Is Google requesting many small images that should be blocked? Is Google requesting resources that are hosted on another, less responsive site? Click into different file types to see a chart of average response time by date, and number of requests by date, to see if spikes in slow responses of that type correspond to spikes in general slowness or unavailability.

Crawl purpose

If you have rapidly changing pages that are not being recrawled often enough, ensure that they are included in a sitemap. For pages that update less rapidly, you might need to specifically ask for a recrawl. If you recently added a lot of new content, or submitted a sitemap, you should ideally see a bump in discovery crawls on your site.

Googlebot type

The type of user agent used to make the crawl request. Google has a number of user agents that crawl for different reasons and have different behaviors.

Possible Googlebot type values:

If you are having crawling spikes, check the user agent type. If the spikes seem to be caused by the AdsBot crawler, see Why did my crawl rate spike.

Troubleshooting

Crawl rate too high

Googlebot has algorithms to prevent it from overloading your site during crawling. However if, for some reason, you need to limit the crawl rate, learn how to do so here.

Why did my crawl rate spike?

If you put up a bunch of new information, or have some really useful information on your site, you might be crawled a bit more than you'd like. For example:

If your site is being crawled so heavily that your site is having availability issues, here is how to protect it:

  1. Determine which Google crawler is overcrawling your site. Look at your website logs or use the Crawl Stats report.
  2. Immediate relief:
    • If you want a simple solution, use robots.txt to block crawling for the overloading agent (googlebot, adsbot, etc.). However, this can take up to a day to take effect. But don't block for too long, as this can have long-term effects on your crawling.
    • If you're able to detect and respond to increased load dynamically, return HTTP 503/429 when you're nearing your serving limit. Be sure not to return 503 or 429 for more than two or three days, though, or it can signal Google to crawl your site less frequently in the long term.
  3. Two or three days later, when Google's crawl rate has adapted, you can remove your robots.txt blocks or stop returning 503 or 429 error codes.
  4. If you are being overwhelmed by AdsBot crawls, the problem is likely that you have created too many targets for Dynamic Search Ads on your site using URL_Equals or page feeds. If you don't have the server capacity to handle these crawls you should either limit your ad targets, add URLs in smaller batches, or increase your serving capacity. Note that AdsBot will crawl your pages every 2 weeks, so you will need to fix the issue or it will recur.

Crawl rate seems too low

You can't tell Google to increase your crawl rate. However, you can learn more about how to manage your crawling for very large or frequently updated websites.

For small or medium websites, if you find that Google isn't crawling all of your site, try updating your website's sitemaps, and make sure that you're not blocking any pages.

Why did my crawl rate drop?

In general, your Google crawl rate should be relatively stable over the time span of a week or two; if you see a sudden drop, here are a few possible reasons:

Report crawling totals are much higher than your site's server logs totals

If the total crawl count shown in this report is much higher than Google crawling requests in your server logs, this can occur when Google cannot crawl your site because your robots.txt file is unavailable for too long. When this happens, Google counts crawls that it might have made if your robots.txt file were available, but doesn't actually make those calls. Check your robots.txt fetching status to confirm if this is the issue.

Was this helpful?

How can we improve it?