Crawl budget basics: Why Google isn’t indexing your pages—and what to do about it (original) (raw)

As a marketer, you’ve spent hours adding value to your website. Now imagine a visitor drops by regularly to check what’s new and decide what’s worth showing in Google Search.

That visitor? It’s called Googlebot, and it’s the crawler responsible for discovering and indexing your content. It scans your pages to decide what should be included in Google Search and how often to return for updates.

Robot

But Googlebot doesn’t have unlimited resources to always crawl in-depth. Each site gets a set crawl budget, or an allowance of time and bandwidth for Googlebot to spend exploring your site.

The more efficiently you use your crawl budget, the easier for Googlebot to find and prioritize your most valuable content which can help you rank.

Let’s start with the basics: What is crawl budget, and why does it matter?

What is crawl budget (and why does it matter)?

Crawl budget is the limit that Googlebot has for how many pages it’s willing to “crawl” on your website in a given timeframe.

Think of Googlebot as having a set amount of time and energy each day to explore your site. It flips through your site’s pages, deciding what to read and what to skip.

If your site has 10,000 URLs but Googlebot only has the energy to crawl 2,000 today, it has to prioritize. And you want it to prioritize the right things because without guidance, Googlebot might waste time on low-value pages.

Pie Chart

Instead of indexing your latest blog post or your new campaign landing page, it could get stuck crawling 300 nearly identical filter URLs.

Let’s say you run an online shop with 6,000 pages. Now imagine half of those pages are variations—color filters, size options, slight duplicates.

To a customer, those variations are useful. But to Googlebot, they’re mostly the same.

So while it’s busy crawling:

It might skip pages like:

Even if the content is ready, the most important pages might not be crawled—or indexed—soon enough. All because your crawl budget was spent elsewhere.

Crawlability vs. crawl budget: What’s the difference?

Crawlability and crawl budget sound similar, but they’re not the same thing.

Both matter because without access and priority, even your best pages can go unseen by Google, and never show up in search.

1. Crawlability = Access

Crawlability answers a simple question: Can Googlebot access this page?

If the answer is no, it won’t crawl the page, no matter how important it is.

Example: It still exists, but Googlebot sees that block as a “Do not enter” sign.

It skips the page entirely, freeing up crawl budget for other areas.

2. Crawl budget = Priority and choice

Crawl budget comes after crawlability.

It’s no longer “Can I crawl this page?”—it’s:

“Do I have the time and energy to crawl this page soon?”

Even if a page is crawlable, Googlebot might decide it’s not worth its limited attention right now.

Example: You’ve got a crawlable event page from 2017 that’s still live. It isn’t blocked, but it’s outdated and gets no traffic.

Googlebot might think:

“Hmm. Not urgent. I’ll come back to it… eventually.”

So even though the page is crawlable, it might go untouched for months.

In the case of crawlability vs crawl budget, which should you use? You need both crawlability and crawl budget to work together.

If a page isn’t crawlable, it won’t be discovered.

If it’s crawlable but low priority, it might be ignored until it’s too late.

This helps show how they’re related, but not interchangeable.

Your customers search everywhere. Make sure your brand shows up. The SEO toolkit you know, plus the AI visibility data you need. Start Free Trial Get started with Semrush One Logo

Why crawl budget matters—and when it actually applies to your site

If Googlebot hasn’t crawled your page, it can’t rank it.

It might not even know it exists—or worse, it could be showing an outdated version in search results.

Your crawl budget decides whether Google sees your page and when, which has everything to do with your chances of showing up (and showing up well) in search.

For example, if you launch a new product page that hasn’t been crawled, it won’t appear in search. Or if you’ve updated pricing across service pages but Googlebot hasn’t had a chance to recrawl, users might still see outdated prices in the SERP.

This is where crawl budget gets serious.

When crawl budget becomes a real concern

While crawl budget affects every site, it’s especially critical for:

If Googlebot can’t keep up, your most important or time-sensitive content might be the very thing that gets missed.

Running a smaller site?

Larger sites are more difficult to manage, including from a crawl perspective. If your site has fewer than 500–1,000 indexable URLs, crawl budget likely isn’t your main issue. Googlebot can typically handle small and mid-sized sites with ease, crawling all the parts of your site.

In these cases, focus on what’s blocking indexing, not crawling. Common culprits include:


Pro tip: Use the Pages report in Google Search Console to see which URLs are excluded and why. You might spot indexability problems faster than expected.


How Google calculates your crawl budget

Google looks at two main factors when deciding what, and how much, to crawl:

  1. Crawl Demand: How much Google wants to crawl from your site.
  2. Crawl Capacity Limit: How much your server can handle without performance issues.

Let’s look at what shapes them.

What drives crawl demand

Crawl demand reflects how valuable or fresh Google thinks your content is. With limited resources, it prioritizes pages that seem worth its time.

Here’s what affects that demand:

What limits Google from crawling your site

Even if Google wants to crawl everything, it won’t if your site shows signs of instability. There are generally two key sources of crawl budget issues.

Crawl signals: How to influence what Googlebot prioritizes

Google doesn’t just crawl everything on your site equally. It prioritizes pages that seem valuable, updated, or in demand.

Several signals influence whether and how often Google crawls a page. Some say “skip this,” while others flag content as important.

Signals that influence crawl budget

So, what exactly tells Google whether to pay attention to a page or skip it?

These signals behind the scenes shape how your crawl budget gets spent.

Robots.txt

Robots Txt

This is a simple text file that sits in the root of your website. It tells Googlebot what not to crawl.

So if you block a page here, Google won’t waste any crawl budget trying to reach it. It’ll just move on.

Example: You might block your admin login page or thank-you pages after a form is submitted.

Noindex tags

This is a bit different. A noindex tag tells Google, “You can crawl this page, but don’t show it in search results.”

Google might still crawl it, but if it sees that noindex signal over time, it might decide not to crawl it much at all, since it’s not useful for search.

Example: A staging version of a landing page that’s not ready to go live.

Canonicals

Canonicals tell Google which version of similar pages to treat as primary, preventing crawl budget waste across duplicates. So if you’ve got loads of near-identical versions (like product filters or UTM-tagged URLs), a canonical says: “Hey, treat this version as the real deal.”

If you have five filtered product pages for “pink shoes under $20,” but they all show similar items, you can set a canonical tag to point back to the main “pink shoes” page.

That way, you’re not wasting crawl budget on all the lookalikes.

Sitemap entries

A sitemap is like a treasure map of your site. It tells Google: “These are all the key pages I want you to know about.”

If your sitemap is clean, well-structured, and updated regularly, it’s like giving Googlebot a guided tour.

Make sure your sitemap includes your blog posts, main product pages, and key categories—not broken pages or expired URLs.

Internal linking depth

This just means: how many clicks does it take to get to a page from your homepage? If it takes six to seven clicks to find a page, Google might think: “This page must not be that important since it’s not easily accessible for customers.”

Example: Pages linked directly from your homepage, footer, or main menu tend to get crawled more than ones buried deep inside subfolders.

Quick comparison:

What wastes crawl budget (and how to fix it)

Think of it like this: Googlebot is flipping through the pages of your website with limited energy. The more it wastes on low-value pages, the less it spends on your top content.

Before we get into the biggest crawl budget wasters, it’s worth running a quick site audit to see if any of these issues are already showing up on your site.

So let’s look at the biggest offenders, how to spot and stop them.

1. Duplicate pages

These are different URLs that show the exact same or very similar content.

Multiple Users

All those pages might look the same to a person, but to Googlebot? They’re separate pages. So it reads the same content over and over.

Exhausting, right?

Why it’s a problem: Google is spending energy crawling versions of the same thing instead of using that energy on new or updated content.

How to fix it:

Think of canonicals as a gentle nudge saying, “Hey, this version’s the one that matters.”

These are pages that no longer exist but still appear in your internal links or XML sitemaps.

Examples: A deleted product page that still lives in your sitemap or a blog link that returns a vague “Sorry, page not found” message (aka a soft 404).

Why it’s a problem: Google will keep trying to visit these pages like knocking on a door that’s not there. Over and over.

A total waste of time.

How to fix it:

Think of it like tidying up the hallways so Google doesn’t keep bumping into locked doors.

3. Orphan pages

These include pages that exist, but nothing links to them. They’re floating around your site with no clear way in, almost like a ghost floating around your website.

Example: An old blog post from 2019 that has no links from your homepage, no category page, and no tags. Just… lost.

Why it’s a problem: Google might stumble on it eventually, but it’s using crawl budget on a page that’s not helping your site in any way.

How to fix it:

No one likes to be left out in the cold. Help Google find your content with proper links.

4. Faceted navigation

These endless combinations of filters or sort orders—think size, color, price, category—generate thousands of slightly different URLs.

Examples:

Why it’s a problem: Googlebot gets stuck in a loop. It keeps crawling tiny variations in URL parameters showing the same products, wasting budget on pages that offer nothing new.

How to fix it:

Think of this as closing the door on an endless maze. By doing so, you’re helping Google get to the good stuff faster.

How do you check crawl activity?

Once you understand crawl budget, the next step is monitoring it. Google Search Console (GSC) gives you direct insight into how Googlebot interacts with your site.

This tool gives you a behind-the-scenes look at how Google is crawling your site:

We’ll walk through where to find this info and what each part means.

1. GSC crawl stats overview

To get started, head over to your GSC property and:

Gsc Settings Scaled

Gsc Settings Overview Scaled

You’ll now be in the Crawl Stats report. This is where the good stuff lives.

From here, you’ll get a 90-day snapshot of Google’s crawl activity across your site, including any red flags or changes worth noting. Think of it as a little health check for your crawl budget.

How do you know if you’re hitting your crawl budget limit?

One common sign is a high number of pages in Google Search Console marked as:

These signals suggest Google knows the pages exist, but hasn’t prioritized them for crawling or indexing yet.


Pro tip: If these messages show up often and your site has thousands of URLs, it’s a strong sign your crawl budget needs attention.


2. Overtime charts (aka Google’s crawl timeline)

Gsc Settings Crawl Stats Scaled

Right at the top, you’ll see a visual chart of crawl activity over the last 90 days. This helps you spot any patterns or sudden drops or spikes in crawling. And underneath the chart, you’ll see three key stats:

3. Host status

This part shows you how well your site is handling Google’s crawling, especially from a technical or server perspective.

If everything’s smooth, you’ll see something like: “Hosts are healthy.”

If not, you might get a warning like: “Hosts had problems in the past.”

Click into the box to find more details. You’ll see:

Gsc Server Connectivity Scaled

Why it matters: If Google can’t reach your site reliably, it’ll crawl less often. You’ll want to address any of these issues quickly.

4. Crawl requests breakdown

This is the really meaty bit. Google breaks down what it’s crawling, how, and why. You’ll see four handy categories:

Clicking into any item shows you specific pages that match that type, like which URLs returned a 404 or which ones were crawled by a specific bot.

Google Search Console gives you the basics straight from the source.

For enterprise or ecommerce websites with tens of thousands of URLs, consider running a crawl budget audit using tools like Semrush Log File Analyzer, Botify, or OnCrawl.

These help uncover how Googlebot behaves over time, where it’s spending crawl budget, where it’s dropping off, and which sections of your site may be undercrawled. You can quickly pinpoint opportunities for crawl budget optimization.


Pro tip: Use log file data to compare crawl activity against revenue-driving URLs. If top-converting pages aren’t getting regular crawls, you’ve got an optimization opportunity.


Want to see what Google’s seeing?

You don’t need to master crawl budget today, but it does play a key role in how your content gets discovered and ranked. When search engines focus on the right pages, you’re more likely to show up where it counts.

Crawl budget helps Google prioritize your most valuable content. Make sure it’s working in your favor.

Start by checking what’s already visible. Use our SERP Checker to see which pages are ranking and which ones aren’t. This can help you spot missed opportunities and make your digital marketing efforts more effective.

Search Engine Land is owned by Semrush. We remain committed to providing high-quality coverage of marketing topics. Unless otherwise noted, this page’s content was written by either an employee or a paid contractor of Semrush Inc.