Caret leftBack to Blog

Google Crawl Budget Optimization

27th July 2017Laura ScottLaura Scott

On average, only 40% of a site's pages are crawled regularly by Google.

SEOs know that optimizing crawl budget for large and complex websites is critical for organic performance. But how can you find your own crawl budget? And once you understand your site’s crawl budget, how can you impact it?

If you’re following our crawl budget series you've learned about several factors describing and affecting Google's crawl including crawl ratio, crawl errors, and crawl frequency.

In this post we explore how to define, measure and optimize for Google crawl budget and share insights from our recent on-demand webinar, Optimize your Google Crawl Budget.

Define Crawl Budget

Google defines crawl budget as the number of URLs Google can and wants to crawl.

Crawl budget is not a new concept and it is not something that will disappear tomorrow. In 2009, a post from Google explained that the search engine has a finite number of resources and that SEOs should strategically improve crawling and indexation of their sites to make the best use of those resources.

The management of crawling and indexation will only continue to grow in importance as we create higher volumes of content. Search engines will need to continue to prioritize how they spend resources to understand and organize content across the web.

Measure Crawl Budget

Your server logs will help you find your site’s crawl budget. Server logs contain a wealth of data about how bots and visitors are experiencing your site. That data is dense but an enterprise-level platform can organize it in a way that helps you understand the full picture of how your site is crawled. From your log files you can see the number of URLs that Google is crawling on your site each month. This is your Google crawl budget.

Combine your log files with a full site crawl to understand how your crawl budget is being spent. Segment that data by pagetype to show which sections of your site are being crawled by search engines and with what frequency. Consider, how are the most important sections of your site being crawled?
Segmentation of server logs

Identify your Opportunity

Take your crawl budget analysis a step further to understand the opportunity on your site.

Identify crawl ratio, or the percentage of unique URLs in your website structure that have been crawled by Google, to begin to understand your crawl budget opportunity.

Crawl ratio varies dramatically by site. Across industries, for unoptimized sites, an average of only 40% of strategic URLs are crawled by Google each month. That’s 60% of pages on a site that aren’t being regularly crawled and potentially aren’t indexed or being served to searchers. This offers a strong business case for measuring and optimizing your crawl budget. 40% of pages on a site are regularly crawled by Google

Optimize Crawl Budget

Once you’ve understood how your site is being crawled by search engines, you can optimize that crawl.

Begin by focusing on the three crawl factors identified in the Webmaster Central Blog:

  • Crawl health
  • Popularity
  • Staleness

    Crawl Health

    Crawl health, or the responsiveness of a site, will impact the number of URLs Google can and wants to crawl. Page speed and crawl errors both indicate the health of your site's crawl.

Look at load times across your site and review speed by segment to understand where you have the biggest opportunities to optimize performance. Once you’ve detected slow loading pages, the Botify platform helps to identify causes of slow loading pages and identify the highest opportunity areas for improving load times.

Also review crawl and server errors across your site. The Botify platform offers several charts to help webmasters evaluate HTTP status code distribution. View HTTP status codes by day across your site. Segment those responses by pagetype to understand the distribution of crawl errors on the site and identify key areas of the site to improve.

Optimizing for the way search engines are crawling will increase crawl budget. Crawl health and crawl errors

Popularity

URLs that are more popular on the internet tend to be crawled more often by Google. Use site structure to send signals about the popularity of your strategic pages.

Depth indicates the number of clicks needed to reach a page using the shortest path from the start page. The deeper you go into a site, the more clicks a user or search engine crawler have to make to discover that page, the less likely this content will be crawled and indexed.

We see compelling evidence of the importance of depth within the Botify platform. Combining server logs and crawl data, we see the number of URLs being crawled by Google starts to drop significantly starting approximately three levels deep.

Popularity and Pagerank Dilution - Depth

Another signal of popularity on your site is internal linking. If a page is linked to several times, it implies that page is popular. Botify offers several charts that offer insight about internal linking.

In the example below, we compare the average number of links to the pages Google isn’t crawling (5.9 links) to the average number of links to pages that Google is crawling (42.3). There is a dramatic difference between the number of internal links to pages that are being crawled and those that are not.

Popularity - internal linking

Use indicators of popularity like depth and internal linking to signal popularity of your strategic pages and maximize the way crawl budget is spent on your site.

Staleness

Another factor affecting crawl is the staleness of a page. Google wants to prevent a page from becoming stale in its index.

Botify provides advanced reports for staleness detection within Botify Log Analyzer. Use staleness detection to set a benchmark for a normal crawl of the pages on your site. Trying to understand how often search engines crawl your website and how often they need to crawl it helps you answer some of the fundamental questions of how to optimize your crawl budget.

Use the Botify platform to get a much longer view of how frequently URLs are being crawled and which categories are being crawled more often. This will help you make informed decisions about your crawl budget optimization strategy.

Staleness detection is a newer feature to Botify. It will give you insight about how to have a more controlled approach to long term SEO. Speak to your customer success manager for a tutorial on this feature. Staleness detection - Google crawl budget

Start Optimizing Your Google Crawl Budget

The web is going to continue growing exponentially in the coming years. Optimizing crawl budget is an important, ongoing project that SEOs need to prioritize to maximize their presence in search.

To get an in-depth look at these optimization strategies and learn more about Google crawl budget, view our webinar on demand.