Welcome to the second article in our series related to Crawl Budget and SEO. We recently explained Crawl Budget in Part 1 after Google’s acknowledgment of something many SEOs have long known: that Crawl Budget is real and SEOs can *and should optimize it.*
In Part 2 of the series, we’re discussing Crawl Ratio – an important SEO KPI pioneered by Botify.
Crawl Ratio is the percentage of unique URLs in your website structure that have been crawled by a search engine robot. SEOs are often most concerned about Googlebot, so for the purposes of the article, we’ll primarily focus on Google or Googlebot.
“But wait,” you might ask, “Doesn’t Google just crawl entire websites by default?” Based on our review of log files for SEO of thousands of websites, we find it’s uncommon for Google to crawl entire websites within a month’s time (30 days).
If Google isn’t crawling all of your content, then you may be missing opportunities to get indexed and drive traffic. No crawl – no index – no rank – no traffic.
So how can you determine Crawl Ratio for your website? You need to have two data sets at minimum: a crawl of your website structure, joined with Googlebot requests for your pages from your server logs. Much has been written about Log File Analysis for SEO, but for those who aren’t familiar with viewing log files, here is an image of part of a Googlebot request for a URL from a log file.
It’s not enough to simply divide the number of pages in your website by the number of pages crawled by Google. That would lead to a deceptive metric because search engines are not limited to crawling just what’s in your site structure. They may crawl URLs that are:
To get a definitive metric, you actually need to compare whether each page in your structure has been crawled by Google.
Below are real examples of Crawl Ratios from several different types of websites in different industries.
Small publisher of about 10.7k URLs
Large site of at least 10 million urls (10m crawled)
Large e-commerce site with 10.1 million urls (10m crawled)
Understanding your sitewide Crawl Ratio is a great starting point, but Crawl Ratio can vary significantly in different parts of your website. Segmenting your URLs can make it much easier to understand where you have suboptimal crawl. Using the same examples as above, we can see major variation in Crawl Ratio between page segments. (All charts taken from the Botify report: Search Engines tab > Google > Conversion).
Large E-Commerce Site
There are many reasons Google might not crawl your entire site, some of which include:
Below is an example of how some key metrics might relate to Crawl Ratio for a site with an overall high ratio. You can see the crawled URLs tend to be shallower, faster, and have content that is more unique than those not crawled. Most importantly, crawled pages have nearly all the traffic.
Knowing whether Google is completely crawling your website is just a starting point. You still need to know about the rest of the SEO funnel: are crawled pages being indexed, are they ranking, and are they getting visits?
It’s difficult to know exactly which pages are indexed unless they are getting organic visits, which is why we’ve created the active pages ratio, a subject we’ll revisit in a future article in our ongoing Crawl Budget series, along with topics such as:
Please leave your questions or feedback about crawl ratio for SEO in the comments below, and stay tuned for our next post on metrics relevant to Crawl Budget!
Did you miss our first post? Check it out here: Google Confirms SEOs Should Control Their Crawl Budget