Welcome to the second article in our series related to Crawl Budget and SEO. We recently explained Crawl Budget in Part 1 after Google’s acknowledgment of something many SEOs have long known: that Crawl Budget is real and SEOs can *and should optimize it.*
In Part 2 of the series, we're discussing Crawl Ratio - an important SEO KPI pioneered by Botify.
What Is Crawl Ratio For SEO?
Crawl Ratio is the percentage of unique URLs in your website structure that have been crawled by a search engine robot. SEOs are often most concerned about Googlebot, so for the purposes of the article we’ll primarily focus on Google or Googlebot.
“But wait,” you might ask, “Doesn't Google just crawl entire websites by default?” Based on our review of log files for SEO of thousands of websites, we find it's uncommon for Google to crawl entire websites within a month’s time (30 days).
Why It Matters If Google Isn’t Crawling Your Whole Website
If Google isn't crawling all of your content, then you may be missing opportunities to get indexed and drive traffic. No crawl - no index - no rank - no traffic.
So how can you determine Crawl Ratio for your website? You need to have two data sets at minimum: a crawl of your website structure, joined with Googlebot requests for your pages from your server logs. Much has been written about Log File Analysis for SEO, but for those who aren't familiar with viewing log files, here is an image of part of a Googlebot request for a URL from a log file.
It’s not enough to simply divide the number of pages in your website by the number of pages crawled by Google. That would lead to a deceptive metric because search engines are not limited to crawling just what's in your site structure. They may crawl URLs that are:
- In your XML sitemaps
- Orphan URLs that used to be linked in the site
- Alternate URLs such as AMP HTML pages or a mobile subdomain
- Static resources like .pdf files
- or even invalid URLs linked from other websites or caused by errors in your own site
To get a definitive metric, you actually need to compare whether each page in your structure has been crawled by Google.
Below are real examples of Crawl Ratios from several different types of websites in different industries.
Small publisher of about 10.7k URLs
Large site of at least 10 million urls (10m crawled)
Large e-commerce site with 10.1 million urls (10m crawled)
Crawl Ratio Varies By Page Type
Understanding your sitewide Crawl Ratio is a great starting point, but Crawl Ratio can vary significantly in different parts of your website. Segmenting your URLs can make it much easier to understand where you have suboptimal crawl. Using the same examples as above, we can see major variation in Crawl Ratio between page segments. (All charts taken from the Botify report: Search Engines tab > Google > Conversion).
Large E-Commerce Site
Why Wouldn’t Google Crawl My Entire Website?
There are many reasons Google might not crawl your entire site, some of which include:
- You’ve purposely limited accessibility via your robots.txt file
- You have URLs for which all links to them include a rel=nofollow, or all links appear on pages with meta robots nofollow
- You forgot to link to it or link too little to indicate importance (see step 2 here)
- Google deems the pages low value or low quality - fortunately Google provides explicit examples of what is meant by low value: a. Faceted navigation and session identifiers b. On-site duplicate content c. Soft error pages d. Hacked pages e. Infinite spaces and proxies f. Low quality and spam content
- Your crawl rate is limited because page load is too slow, your servers are giving 5xx errors (unavailable), or you set it to be slow in Google Search Console
- Google attempts to keep the more popular pages from going stale, so prioritizes them over those with less demand
Below is an example of how some key metrics might relate to Crawl Ratio for a site with an overall high ratio. You can see the crawled URLs tend to be shallower, faster, and have content that is more unique than those not crawled. Most importantly, crawled pages have nearly all the traffic.
Conclusion And Next Steps
Knowing whether Google is completely crawling your website is just a starting point. You still need to know about the rest of the SEO funnel: are crawled pages being indexed, are they ranking, and are they getting visits?
It’s difficult to know exactly which pages are indexed unless they are getting organic visits, which is why we’ve created the active pages ratio, a subject we’ll revisit in a future article in our ongoing Crawl Budget series, along with topics such as:
- Improving Crawl Budget by reducing access to low-value URLs
- Impact of site migration on Crawl Budget
- How to identify the relationship between URL popularity and crawl demand
- How to identify whether performance is inhibiting your Crawl Ratio
Please leave your questions or feedback about crawl ratio for SEO in the comments below, and stay tuned for our next post on metrics relevant to Crawl Budget!
Did you miss our first post? Check it out here: Google Confirms SEOs Should Control Their Crawl Budget