Create and Submit a robots.txt File | Google Search Central  |  Documentation  |  Google for Developers (original) (raw)

How to write and submit a robots.txt file

You cancontrol which files crawlers may access on your site with a robots.txt file.

A robots.txt file lives at the root of your site. So, for site www.example.com, the robots.txt file lives at www.example.com/robots.txt. robots.txt is a plain text file that follows theRobots Exclusion Standard. A robots.txt file consists of one or more rules. Each rule blocks or allows access for all or a specific crawler to a specified file path on the domain or subdomain where the robots.txt file is hosted. Unless you specify otherwise in your robots.txt file, all files are implicitly allowed for crawling.

Here is a simple robots.txt file with two rules:

User-agent: Googlebot Disallow: /nogooglebot/

User-agent: * Allow: /

Sitemap: https://www.example.com/sitemap.xml

Here's what that robots.txt file means:

  1. The user agent named Googlebot is not allowed to crawl any URL that starts withhttps://example.com/nogooglebot/.
  2. All other user agents are allowed to crawl the entire site. This could have been omitted and the result would be the same; the default behavior is that user agents are allowed to crawl the entire site.
  3. The site's sitemap file is located athttps://www.example.com/sitemap.xml.

See the syntax section for more examples.

Basic guidelines for creating a robots.txt file

Creating a robots.txt file and making it generally accessible and useful involves four steps:

  1. Create a file named robots.txt.
  2. Add rules to the robots.txt file.
  3. Upload the robots.txt file to the root of your site.
  4. Test the robots.txt file.

You can use almost any text editor to create a robots.txt file. For example, Notepad, TextEdit, vi, and emacs can create valid robots.txt files. Don't use a word processor; word processors often save files in a proprietary format and can add unexpected characters, such as curly quotes, which can cause problems for crawlers. Make sure to save the file with UTF-8 encoding if prompted during the save file dialog.

Format and location rules:

How to write robots.txt rules

Rules are instructions for crawlers about which parts of your site they can crawl. Follow these guidelines when adding rules to your robots.txt file:

Google's crawlers support the following rules in robots.txt files:

Example 1: Block only Googlebot

User-agent: Googlebot
Disallow: /

Example 2: Block Googlebot and Adsbot

User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /

Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly)

User-agent: *
Disallow: /

All rules, except sitemap, support the * wildcard for a path prefix, suffix, or entire string.

Lines that don't match any of these rules are ignored.

Read our page aboutGoogle's interpretation of the robots.txt specification for the complete description of each rule.

Upload the robots.txt file

Once you saved your robots.txt file to your computer, you're ready to make it available to search engine crawlers. There's no one tool that can help you with this, because how you upload the robots.txt file to your site depends on your site and server architecture. Get in touch with your hosting company or search the documentation of your hosting company; for example, search for "upload files infomaniak".

After you upload the robots.txt file, test whether it's publicly accessible and if Google can parse it.

Test robots.txt markup

To test whether your newly uploaded robots.txt file is publicly accessible, open aprivate browsing window (or equivalent) in your browser and navigate to the location of the robots.txt file. For example, https://example.com/robots.txt. If you see the contents of your robots.txt file, you're ready to test the markup.

Google offers two options for fixing issues with robots.txt markup:

  1. The robots.txt report in Search Console. You can only use this report for robots.txt files that are already accessible on your site.
  2. If you're a developer, check out and buildGoogle's open source robots.txt library, which is also used in Google Search. You can use this tool to test robots.txt files locally on your computer.

Submit robots.txt file to Google

Once you uploaded and tested your robots.txt file, Google's crawlers will automatically find and start using your robots.txt file. You don't have to do anything. If you updated your robots.txt file and you need to refresh Google's cached copy as soon as possible, learnhow to submit an updated robots.txt file.

Useful robots.txt rules

Here are some common useful robots.txt rules:

Useful rules
Disallow crawling of the entire site Keep in mind that in some situations URLs from the site may still be indexed, even if they haven't been crawled. User-agent: * Disallow: /
Disallow crawling of a directory and its contents Append a forward slash to the directory name to disallow crawling of a whole directory. User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/
Allow access to a single crawler Only googlebot-news may crawl the whole site. User-agent: Googlebot-news Allow: / User-agent: * Disallow: /
Allow access to all but a single crawler Unnecessarybot may not crawl the site, all other bots may. User-agent: Unnecessarybot Disallow: / User-agent: * Allow: /
Disallow crawling of a single web page For example, disallow the useless_file.html page located athttps://example.com/useless\_file.html, andother_useless_file.html in the junk directory. User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html
Disallow crawling of the whole site except a subdirectory Crawlers may only access the public subdirectory. User-agent: * Disallow: / Allow: /public/
Block a specific image from Google Images For example, disallow the dogs.jpg image. User-agent: Googlebot-Image Disallow: /images/dogs.jpg
Block all images on your site from Google Images Google can't index images and videos without crawling them. User-agent: Googlebot-Image Disallow: /
Disallow crawling of files of a specific file type For example, disallow for crawling all .gif files. User-agent: Googlebot Disallow: /*.gif$
Disallow crawling of an entire site, but allow Mediapartners-Google This implementation hides your pages from search results, but theMediapartners-Google web crawler can still analyze them to decide what ads to show visitors on your site. User-agent: * Disallow: / User-agent: Mediapartners-Google Allow: /
Use the * and $ wildcards to match URLs that end with a specific string For example, disallow all .xls files. User-agent: Googlebot Disallow: /*.xls$