Caret leftBack to Blog

Botify sheds light on Crawl Errors in Google Search Console

20th April 2017JeffJeff

As part of our ongoing series in response to Google’s detailed explanation about Crawl Budget, we’re dedicating a few articles to showing how to complement Google Search Console crawl reports with Botify insights.

In this article we’ll look at adding context to crawl errors and prioritizing what to fix.

How To Contextualize and Prioritize Crawl Errors

Crawl Errors GSC

In an ideal world, you wouldn’t have crawl errors because you have great QA, IT, or dev ops and engineering teams and no one ever links to your site incorrectly. Plus, you use Botify Analytics to crawl your site and find errors before Google even has a chance to report them in Search Console. GSC crawl error ba http status codes

Botify Analytics HTTP Codes insights

And then there’s reality. Web technology is ever changing and people can make mistakes. Below is a screenshot of the Crawl Errors overview in Google Search Console account. gsc crawl errors desktop not found

Google Search Console Crawl Errors Report

It’s great that you can see the errors and that Google has prioritized the top 1,000. But, what do you do next? If there aren’t many errors, you can start going through and addressing each URL case by case. But if there are a lot of errors and different types of errors, you may want more context in order to prioritize your efforts. That’s where Botify can help.

Get Perspective On Crawl Errors

In order to put the crawl errors in perspective, you may want to be able to answer the following questions.

  • How much of my total crawl do these errors represent? In other words, how big is this problem?
  • Is there a pattern to these errors? Are they in a particular segment of my site or all over?
  • How can I get the URLs beyond the top 1,000?
  • Did any of these URLs with errors ever drive traffic?

Determine The Scope Of The Problem

Getting a sense of proportion requires the ability to see the share of total crawl volume that is being taken by errors. This is most easily done simply by visiting the HTTP Codes section of Botify Log Analyzer where the amount of Crawl Budget going to errors is summarized in a data point and visualized with a pie chart (the charts have been filtered to show the difference in crawl errors by Google user agent, search vs. smartphone). GSC Crawl Errors HTTP codes

The chart of URLs crawled by Google by day by HTTP Code will give you a quick sense of whether there was a point in time where the errors began or increased in scale, or whether this is a steady ongoing issue. The chart below is filtered to show just the bad status codes. gsc crawl errors bla status code by day chart

As we saw above, about 20% of the Crawl Budget for this site is being used up by pages that do not give a 200 or 304 status code. This warrants further investigation.

Find Crawl Errors By Site Segment

Using the chart of URLs Crawled by Google by Day by Segment and filtering to just bad status codes, it becomes clear for this website that nearly all of the errors are happening on one particular pagetype. gsc crawl errors bla bad status code by pagetype

A table filtered by user agent, status code family, and site segment can show the absolute numbers for the period to make it easier for you to decide which site segment and error type to prioritize. Below is a view of the table filtered to focus on the classic Googlebot search user agent, pages that had status codes in the 500 family of server errors. GSC crawl errors bla status code by segment by bot table

Beyond the Top 1,000 Pages With Errors

Use the URL Explorer in Botify Log Analyzer to get the full list of URLs with Crawl Errors you want to prioritize and then export the list to CSV to share with the team that will fix the problem. No need to limit to the top 1,000 pages when we can see there were more than 200,000 just in the past 30 days. gsc crawl errors bla url explorer filter and result count

Botify Log Analyzer URL explorer with filters

Some Pages With Errors Are More Important Than Others

Some of the pages with Crawl Errors may have, at some point, driven traffic. Use visit data in the URL Explorer to further qualify your list. You could decide to redirect the URL to a current related page or even revive the content if it had value.

In the image below we simply added to the filter a line that there had to be at least 1 Google organic visit in the history of the log data available. We can see this narrowed the focus to a small subset of the URLs and now we can easily export the list. gsc crawl errors bla url explorer 404s filtered by visits

Once you’ve addressed the errors you can try the Google Search Console API to mark the errors as fixed so they drop out of your Google Search Console reporting.

SEO Log Analysis sheds light on Crawl Errors

Using your server log files is a valuable way to get perspective on and prioritize solutions for Crawl Errors found in Google Search Console. The two tools complement each other to help you maintain a higher quality website, improve your SEO, and better use your Crawl Budget for strategic content.

In the next installment of the Crawl Budget series, we will look at how to use server log data along with your Botify crawl to investigate XML sitemap-related indexing issues.

What has been your experience managing Crawl Errors? Share with us in the comments below or leave suggestions for future posts!

Category:Crawl Budget