Site icon Botify

Google’s Updates To Robots.txt: What SEOs Need To Know

Google’s Updates to Robots.txt: What SEOs Need to Know

4th July 2019Botify NewsBotify News

If you’ve been following Google’s recent updates, you’ll likely already be aware that they’ve made a few robots.txt-related announcements. There are a few different components to these updates, so we wanted to break down what they are, why they matter, and how they affect you.

Definitions cheat sheet

You may find it helpful to familiarize yourself with these definitions before diving in!

Term Definition
Robots Exclusion Protocol (REP) Created by Martijn Koster in 1994 to tell crawlers which parts of a website should and should not be accessed.
Internet Standard Defines protocols and procedures for the internet.
Internet Engineering Task Force (IETF) An open, international community of individuals dedicated to the smooth operation of the internet. They produce technical documents that describe Internet Standards.
Open Source Not proprietary; code that is freely available and can be redistributed or modified.
Request for Comments (RFC) Documents authored by engineers and computer scientists that describe methods and concepts, often for the purpose of being adopted by the IETF as an Internet Standard.

Google wants to make REP an official internet standard

On July 1, 2019, Google announced that they had worked together “with the original author of the protocol, webmasters, and other search engines” to document how the REP should be used on the modern web so they could submit it to the IETF and get it approved as an official Internet Standard.

The draft they created doesn’t change the original REP rules, but drawing from 20 years of real-world experience with robots.txt, they did outline specific scenarios and made it applicable for the modern web.

Why is this significant? A few reasons:

This news doesn’t change anything about how robots.txt files should be formatted, but rather gives clearer direction.

View the IETF spec here.

Google makes its robots.txt parser open source

On the same day as the REP news, Google announced that it’s robots.txt parser is now open source. They explained that, while attempting to make REP an internet standard was an important step, it also meant extra work for developers who parse robots.txt files. In response, Google open sourced the library that they use to parse robots.txt files.

Why is this significant? A few reasons:

Want the open source robots.txt parser? Find it on GitHub!

Google ditches unsupported robots.txt rules

The very next day, July 2, Google released more information on robots.txt. This time the update focused on unsupported rules. They said that open-sourcing their parser library allowed them to take a closer look at how robots.txt rules were being used, specifically focusing on usages that weren’t supported by the internet draft. Those included:

An example of a robots.txt file with a noindex rule.

They found that, when rules like noindex were used in robots.txt files, they contradicted other on-site rules “in all but 0.001% of all robots.txt files on the internet.” These types of conflicting signals can affect a website’s performance in search results in ways webmasters never intended.

So since unsupported robots.txt rules often contradict other rules, and in preparation for future open source releases, Google is retiring all code that handles unsupported and unpublished rules on September 1, 2019.

Why is this significant? A few reasons:

If you’re using this unsupported solution, we recommend monitoring activity on your robots.txt noindexed pages in September. For example, if you used robots.txt to noindex /forum*, you can use Botify to monitor page activity specifically in that segment (active pages being those that have generated at least one organic visit within the last 30 days).

If you’d like to learn more about how Botify can help you monitor your site after changes like these happen (and they happen often!), book a demo with us. We’d love to show you around!

Blog comments powered by Disqus.

Exit mobile version