How to write a Robots.txt file (2023)

Learning how to write a robots.txt file for your website, is a part of just basic SEO hygiene that is important to attend to in order to ensure that content that you wish to be seen and indexed by search engines actually is.

This is why I included checking over the file in my technical SEO checklist.

In this tutorial, I am going to show you how to write a robot.txt file.

Also, I included a section going over what a robot.txt file is and why you should have one, you are welcome to skip over it if you just need the instructions.

Step 1: Open a text file editor and name it the file robots.txt

First, you want to open the text file editor, it can be anyone, if you are using a Mac, a good one is “text editor”. Once you have your file, name it with the extension; robots.txt

Step 2: In the text file include a User-agent, Disallow rule, and Sitemap reference

Once your file is created and named you now need to include a few directives and rules. First, you need a user-agent. The user-agent defines the specific web crawler that you wish to direct or allow to visit your website.

What is User-agent

A “star” or * next to the user-agent indicates that all search crawlers are allowed to visit the website.

Here’s what it may look like:

User-agent: *

But if you want to specify a specific search crawler, then you would add another user-agent line and include the code of the crawler you wish to add.

Here are some common crawler codes:

Google – Googlebot

Bing – Bingbot

Yahoo! – Slurp

DuckDuckGo – DuckDuckBot

To include a specific crawler in the file, it may look like this:

User-agent: Bingbot

Generally speaking, specific crawlers are only specified if the webmaster wishes to limit access to a website for a particular crawler.

Next, you want to include an allow and disallow rule.

What is Allow and Disallow Rule

The Disallow Rule tells crawlers which web folder/subdirectories they should not visit or is “blocked from”. It can also be used to specify which crawlers should not visit the website at all, as I mentioned earlier.

You may also see an Allow Rule; this is the opposite of a disallow rule, as it tells the search crawlers what subdirectory they can visit that is inside of another subdirectory.

Essentially with the allow rule, a webmaster can allow crawlers to visit a specific secondary directory, even though the top-level directory may be blocked by the disallow rule.

To recall a screenshot:

As you can see, I specified in the Disallow: rule that I want to block crawlers from visiting the /author/ top-level directory on my website, but with the Allow: rule I specified that crawlers are able to visit the /archives/ second level directory, that is inside the /author/ directory.

It is not uncommon to see a robots.txt file that contains a disallow rule, but not an allow rule. The reason is, generally speaking, the allow rule is implied.

What I mean by that is, if a Disallow: rule is specified, without an Allow: rule, it is essentially being communicated to the web crawler that the whole directory is blocked from crawling.

Unless the webmaster wants to specify a specific directory in the blocked directory to be crawled, then the rule is often left out.

As you can see in the screenshot below, I included only a disallow rule, and I specified that I want the /author/ directory blocked from crawling.

Additionally, with the rules, you are able to specify multiple directories that you want to block or allow to be crawled. For instance, If I want to disallow multiple folders, I would simply add another disallow rule:

Disallow: /admin/

Disallow: /about/

As you can see in the code snippet example, I wanted to also block my /admin/ and /about/ directories. Again the same thing can be done for the allow rule, if I wanted to allow a lower-level directory to be crawled in each of these directories, my file would look like this:

Allow: /admin/plugins/

Allow: /about/bio-pages/

Disallow: /admin/

Disallow: /about/

One last thing that I want to point out, it is very important to include the “double slashes” or // between the directory names that you are blocking or allowing to be crawled.

As shown below:

/directory-name/

The double slashes make it clear to crawlers that a “folder” or directory is being specified in the rule. Essentially, without the double slashes, the rule would be invalid.

Lastly, you need to include a sitemap reference:

What is a sitemap reference

The Sitemap Reference tells crawlers the URL, or page path, to get to the sitemap.

As you can see in the screenshot below, to include the sitemap reference you can simply type the URL of the sitemap:

For my website Bounce Rank, the line would look like this:

Sitemap: https://bouncerank.com/sitemap_index.xml

And that is the last component you need to include in the file.

Step 3: Save the file in the root directory of your website

The last step is to save the file to your website’s root directory, you can do this via cpanel if you have a bootstrapped website, but if you are using Wix, Webflow, Squarespace, or Shopify there will be a dashboard for you to add the file requirements. For WordPress, you will need Yoast SEO to help you do it.

Last words about Robots.txt File

Despite the intricacies of the Robots.txt file, it is not necessarily required to be included on a website. However, it is generally best practice to include one.

The reason is, that the file gives webmasters the opportunity to tell search engines what folders they should crawl. By being able to control what is indexed and crawled, search crawlers can better prioritize indexing content that matters to users, and avoid wasting crawl resources on content that is not relevant.

Raj Clark

Raj Clark is a 9 year SEO professional & career mentor. He is also the author of the books ABC's of SEO: Search Engine Optimization 101 and The Technical SEO Handbook He has worked with a wide range of clients in many industries including B2B, SaaS, Fintech, Home improvement, Medical, and E-Commerce. He started the company, Bounce Rank, as a way to help business owners grow their website traffic and to help people who want to get a job in the SEO career field.