If a page is published on the web and Googlebot does not crawl it, will it get indexed and rank?
No. Every SEO knows this. “Crawling is the entry point for sites into Google’s search results,” Gary Illyes reaffirmed recently on the Google Webmaster Central Blog.
Much more powerful and useful in Gary’s blog post was the confirmation of what many SEOs have long experienced – despite appearances, Google has finite resources, so there is a crawl budget for Googlebot per website and it can be optimized to improve indexing and grow organic traffic.
In light of the importance of this public confirmation, consider this the introduction to a series that will:
Let’s start by visualizing some crawl budget concepts, including crawl ratio, crawl depth, time to crawl completeness, and crawl frequency.
Fact: Google doesn’t crawl 100% of your website. While this was recently confirmed in relation to Crawl Budget, even in 2009 Google acknowledged it could only find a percentage of the content online and that crawling should be optimized.
Remember the basics: there is an SEO funnel with Google’s crawl as a critical step on the path to organic visits.
In order to identify Crawl Ratio, you need to be able to crawl the website and join that data with server log files specifically processed and analyzed for an SEO point of view. Doing so allows you to know for a fact the percentage of a website’s pages that are being crawled by Google
Crawl Ratio will vary by website, based on the many factors listed in the Google blog post. Below are a few examples from different types of websites.
The left column shows the percent of URLs in the site structure (found by crawling links in the website) that were also crawled by Googlebot in the preceding month. The right column shows the proportion of the URLs in the site structure that actually had organic visits from Google in that same period.
The range of crawl ratios represented here, from 31% up to 73%, underscores that Googlebot isn’t crawling entire websites in any given period and that even when crawled, not all the pages are driving traffic. This can be optimized!
SEOs know it’s important to have a well-organized site in which the most important content is easily accessible from the homepage and other important entry points. That means having URLs the shortest distance possible from the homepage via internal links.
Crawl Ratio varies significantly by the level of depth. In this chart, it’s clear that of the URLs crawled based on the site’s internal links, Google crawled diminishing percentages the deeper those URLs were in the site.
If Google isn’t crawling your entire website, then the next questions are:
The dotted line in this chart shows how many potentially indexable (Compliant) URLs are found by crawling links in the site structure. The green bar sums the number of those Compliant URLs Google crawled each day while the yellow bar shows how much of the crawl went to Non-Compliant (or non-indexable) URLs linked in the site structure.
Compliance, or indexability, is just one view into what Googlebot is crawling. Another way to view crawl completeness is by pagetype, which will be the subject of a future article in this series.
According to Gary’s blog posts, two important factors in Google crawl budget are popularity and the prevention of staleness in the index. So one indicator of how important Google thinks it is to keep your content fresh in its index is the frequency with which it crawls those URLs.
This can be measured in terms of the number of days with at least one crawl by Googlebot in a month: Crawl Frequency. In the chart below we can see, for example, that 9% of the website was crawled at least once on 24 out of 30 days in the month (80% or more days). At the other end of the spectrum, nearly 48% of the site was crawled six days or fewer in the month (less than 20% of the days in the month).
A takeaway from this is that Google thinks a significant portion of the site isn’t important enough to call crawl frequently. It’s likely the least frequently crawled group represents long tail keywords or infrequent visits (a topic that will be explored in depth in a future article).
At Botify we’re excited to see Google acknowledge that every website has a limited Crawl Budget and there are ways to optimize it. But Crawl Budget optimization is only a part of SEO. SEOs should still be focused on building great websites that deliver a great user experience for searchers.
But SEO continues to increase in technical complexity, so efficient use of Crawl Budget is going to increase in importance for getting the level of indexing you need to drive more and better rankings and traffic. In order to do it right, SEOs need a web crawler and log file analyzer – for SEO. The distinction of those tools being for SEO vs. for scraping or for other server analysis is that there are specific metrics to be calculated that are relevant for only SEO.
Future installments in this blog series will address the following subjects and how they relate to Crawl Budget:
How does Crawl Budget factor into your SEO strategy for 2017? What would you like to see illustrated in our next posts about Crawl Budget?
Read Part 2 here: What is Crawl Ratio, and why does it matter?