“Problem Solved” Indicator: HTTP 404 Becomes HTTP 200

Posted on

“Problem Solved” Indicator: HTTP 404 Becomes HTTP 200

21st October 2014AnnabelleAnnabelle

By mistake or simply because of scheduled maintenance, some pages become unavailable. Does everything go back to normal? Does any unresolved issue remain? Is there any lasting negative impact on Google’s crawl? Botify Log Analyzer answers these questions with its “Recovered Pages” indicator.

A “recovered page” is a page which, when crawled by Google:

  • Used to return an HTTP error (404, 503‚Ķ) or a redirection (that is to say, any HTTP status code other than 200 – OK, or HTTP 304 – Not Modified),
  • Started returning an HTTP 200 or 304 during the displayed period (the last 30 days by default, or any custom period)
  • Still returned an HTTP 200 or 304 when last crawled during the displayed period.
    Simply put, it is a URL which returned an HTTP status code that was not OK, and is now OK.

This indicator complements the “Lost Pages” indicator – which indicates that a page’s HTTP status code went the other way, from OK to not OK. But while lost pages can be normal and reflect expired content on your website, or the removal of unwanted pages, recovered pages are never normal: the very existence of a recovery means there was a problem in the first place.

When “Recovered Pages” Mean “Problem Solved”

When there were unexpected Lost Pages, the Recovered Pages indicator is great to monitor are how well and how fast things are going back to normal.

The graph below shows lost pages on a website with editorial content. Articles are in pink.

Lost articles and other lost pages were recovered in the next few days, as recovered URLs show:

Notice that the recovery is spread over a few days, as Google does not necessarily recrawl all URLs the very next day.

But sometimes, recovered pages are not so obvious or simple to understand – and may still point at key issues.

Is There an Ongoing Problem?

On this TV replay website, there is a relatively steady number of recovered pages, and they are almost all composed of video pages:

The upper right counter indicates that there is a high number of lost pages, which is normal, as videos are available in replay for a limited time only. The high number of HTTP 4XX and redirections corresponds to these expired videos (those lost over the displayed period, plus other pages lost earlier, that are still crawled).

If we click on the videos category in the table below the graph to look at this category only, we’ll see that the total number URLs returning server errors (circled in red) is close to the number of recovered URLs:

A click on these server error status code shows that they are spread over time, just like recovered URLs. We can confirm that they are the cause of recovered URLs by exporting URLs returning server errors and recovered URLs (from the Export tab), and intersecting the two lists.

In this other example, which is an e-commerce website, recovered pages mainly include product pages (in light green), as well as some navigation pages, in particular search pages (in grey):

If we click on the products category in the table below the graph, we can see that server errors only account for a little less that half of recovered pages.

Possible causes for recovered products need to be investigated further – look at what’s happening with redirections.

Let’s set this aside and look at recovered search pages. Server errors can’t explain everything either, and there aren’t virtually any 404 errors.

The only explanation that remains are temporary redirections (HTTP 302): the search pages are probably redirected when they don’t have results, and start responding “OK” (HTTP 200) when they are populated again. This is not good for SEO: pages which do not respond HTTP 200 at all times cannot perform well as organic search landing pages. This means that in our example, a significant portion of search pages are counter-productive for SEO. There should be business rules to define which search pages are good candidates for organic traffic – based not ony on search queries, but also on the number of search results and how stable it is over time.

Blog comments powered by Disqus.

 

 

Related posts

Get more articles like this in your inbox monthly!