As part of our ongoing series in response to Google’s detailed explanation about Crawl Budget, we’re dedicating a few articles to showing how to complement Google Search Console crawl reports with Botify insights.
In this article we’ll look at adding context to crawl errors and prioritizing what to fix.
In an ideal world, you wouldn’t have crawl errors because you have great QA, IT, or dev ops and engineering teams and no one ever links to your site incorrectly. Plus, you use Botify Analytics to crawl your site and find errors before Google even has a chance to report them in Search Console.
And then there’s reality. Web technology is ever changing and people can make mistakes. Below is a screenshot of the Crawl Errors overview in Google Search Console account.
It’s great that you can see the errors and that Google has prioritized the top 1,000. But, what do you do next? If there aren’t many errors, you can start going through and addressing each URL case by case. But if there are a lot of errors and different types of errors, you may want more context in order to prioritize your efforts. That’s where Botify can help.
In order to put the crawl errors in perspective, you may want to be able to answer the following questions.
Getting a sense of proportion requires the ability to see the share of total crawl volume that is being taken by errors. This is most easily done simply by visiting the HTTP Codes section of Botify Log Analyzer where the amount of Crawl Budget going to errors is summarized in a data point and visualized with a pie chart (the charts have been filtered to show the difference in crawl errors by Google user agent, search vs. smartphone).
The chart of URLs crawled by Google by day by HTTP Code will give you a quick sense of whether there was a point in time where the errors began or increased in scale, or whether this is a steady ongoing issue. The chart below is filtered to show just the bad status codes.
As we saw above, about 20% of the Crawl Budget for this site is being used up by pages that do not give a 200 or 304 status code. This warrants further investigation.
Using the chart of URLs Crawled by Google by Day by Segment and filtering to just bad status codes, it becomes clear for this website that nearly all of the errors are happening on one particular pagetype.
A table filtered by user agent, status code family, and site segment can show the absolute numbers for the period to make it easier for you to decide which site segment and error type to prioritize. Below is a view of the table filtered to focus on the classic Googlebot search user agent, pages that had status codes in the 500 family of server errors.
Use the URL Explorer in Botify Log Analyzer to get the full list of URLs with Crawl Errors you want to prioritize and then export the list to CSV to share with the team that will fix the problem. No need to limit to the top 1,000 pages when we can see there were more than 200,000 just in the past 30 days.
Some of the pages with Crawl Errors may have, at some point, driven traffic. Use visit data in the URL Explorer to further qualify your list. You could decide to redirect the URL to a current related page or even revive the content if it had value.
In the image below we simply added to the filter a line that there had to be at least 1 Google organic visit in the history of the log data available. We can see this narrowed the focus to a small subset of the URLs and now we can easily export the list.
Once you’ve addressed the errors you can try the Google Search Console API to mark the errors as fixed so they drop out of your Google Search Console reporting.
Using your server log files is a valuable way to get perspective on and prioritize solutions for Crawl Errors found in Google Search Console. The two tools complement each other to help you maintain a higher quality website, improve your SEO, and better use your Crawl Budget for strategic content.
In the next installment of the Crawl Budget series, we will look at how to use server log data along with your Botify crawl to investigate XML sitemap-related indexing issues.
What has been your experience managing Crawl Errors? Share with us in the comments below or leave suggestions for future posts!