Learning how to write a robots.txt file for your website, is a part of just basic SEO hygiene that is important to attend to in order to ensure that content that you wish to be seen and indexed by search engines actually is.
This is why I included checking over the file in my technical SEO checklist.
In this tutorial, I am going to show you how to write a robot.txt file. Below I created a video showing how to do it for WordPress using Yoast SEO. The way I show it is applicable to any website. You can watch the video below, or you can use the instructions I put.
Also, I included a section going over what a robot.txt file is and why you should have one, you are welcome to skip over it if you just need the instructions.
What is a robots.txt file?
Pioneered by ALIWEB creator, Martijn Koster in 1994, the Robots Exclusion Protocol, now simply known as the Robots.txt file, is a file generally stored on a website at the URL slug: /robots.txt
It was invented as a way to tell search engine web crawlers which directories or web folders on the website the crawler can access. The file can also be used to specify which folders the crawlers are “blocked” from or should not/cannot access on a website.
In a nutshell, the robots.txt file contains “directives” “rules” and “references” – which can be thought of as a request to search crawlers regarding the directories you wish to be visited or ignored.
I must emphasize, that it is only a request, and there is no guarantee the search crawler will “obey” it.
Why should you have a robots.txt file?
You should have a robots.txt file because it helps search engines see what page paths they are able to access. This can come in handy if you are doing a website migration, and you don’t want search engines visiting your new website before they are supposed to.
How to write a robots.txt file
Okay now that you have a good idea of what a robots.txt file is, let us explore how to create one for your website
Step 1: Open a text file editor and name it the file robots.txt
First, you want to open the text file editor, it can be anyone, if you are using a Mac, a good one is “text editor”. Once you have your file, name it with the extension; robots.txt
Step 2: In the text file include a User-agent, Disallow rule, and Sitemap reference
Once your file is created and named you now need to include a few directives and rules. First, you need a user-agent. The user-agent defines the specific web crawler that you wish to direct or allow to visit your website.
What is User-agent
A “star” or * next to the user-agent indicates that all search crawlers are allowed to visit the website.
Here’s what it may look like:
But if you want to specify a specific search crawler, then you would add another user-agent line and include the code of the crawler you wish to add.
Here are some common crawler codes:
Google – Googlebot
Bing – Bingbot
Yahoo! – Slurp
DuckDuckGo – DuckDuckBot
To include a specific crawler in the file, it may look like this:
Generally speaking, specific crawlers are only specified if the webmaster wishes to limit access to a website for a particular crawler.
Next, you want to include an allow and disallow rule.
What is Allow and Disallow Rule
The Disallow Rule tells crawlers which web folder/subdirectories they should not visit or is “blocked from”. It can also be used to specify which crawlers should not visit the website at all, as I mentioned earlier.
You may also see an Allow Rule; this is the opposite of a disallow rule, as it tells the search crawlers what subdirectory they can visit that is inside of another subdirectory.
Essentially with the allow rule, a webmaster can allow crawlers to visit a specific secondary directory, even though the top-level directory may be blocked by the disallow rule.
To recall a screenshot:
As you can see, I specified in the Disallow: rule that I want to block crawlers from visiting the /author/ top-level directory on my website, but with the Allow: rule I specified that crawlers are able to visit the /archives/ second level directory, that is inside the /author/ directory.
It is not uncommon to see a robots.txt file that contains a disallow rule, but not an allow rule. The reason is, generally speaking, the allow rule is implied.
What I mean by that is, if a Disallow: rule is specified, without an Allow: rule, it is essentially being communicated to the web crawler that the whole directory is blocked from crawling.
Unless the webmaster wants to specify a specific directory in the blocked directory to be crawled, then the rule is often left out.
As you can see in the screenshot below, I included only a disallow rule, and I specified that I want the /author/ directory blocked from crawling.
Additionally, with the rules, you are able to specify multiple directories that you want to block or allow to be crawled. For instance, If I want to disallow multiple folders, I would simply add another disallow rule:
As you can see in the code snippet example, I wanted to also block my /admin/ and /about/ directories. Again the same thing can be done for the allow rule, if I wanted to allow a lower-level directory to be crawled in each of these directories, my file would look like this:
One last thing that I want to point out, it is very important to include the “double slashes” or // between the directory names that you are blocking or allowing to be crawled.
As shown below:
The double slashes make it clear to crawlers that a “folder” or directory is being specified in the rule. Essentially, without the double slashes, the rule would be invalid.
Lastly, you need to include a sitemap reference:
What is a sitemap reference
The Sitemap Reference tells crawlers the URL, or page path, to get to the sitemap.
As you can see in the screenshot below, to include the sitemap reference you can simply type the URL of the sitemap:
For my website Bounce Rank, the line would look like this:
And that is the last component you need to include in the file.
Step 3: Save the file in the root directory of your website
The last step is to save the file to your website’s root directory, you can do this via cpanel if you have a bootstrapped website, but if you are using Wix, Webflow, Squarespace, or Shopify there will be a dashboard for you to add the file requirements. For WordPress, you will need Yoast SEO to help you do it.
Last words about Robots.txt File
Despite the intricacies of the Robots.txt file, it is not necessarily required to be included on a website. However, it is generally best practice to include one.
The reason is, that the file gives webmasters the opportunity to tell search engines what folders they should crawl. By being able to control what is indexed and crawled, search crawlers can better prioritize indexing content that matters to users, and avoid wasting crawl resources on content that is not relevant.