A website is rarely completely static. New pages appear, old pages disappear. A couple of weeks ago, we talked about checking how well Google crawls brand new content.
Let’s now see how we can keep an eye on URLs which drop off Google’s radar – when they don’t return content any more, but return an error or a redirection. In some case, it’s the expected behavior. In other cases, it’s not. For instance, in an editorial website, pages are not supposed to vanish; older articles are expected to remain available. On a classifieds website, on the other hand, ads expire. An e-commerce website also includes content that may expire, although, usually at a slower pace than on a classifieds website.
In Botify Log Analyzer, pages which disappear are monitored through the “Lost pages” indicator. This indicator is found in the Crawl Distribution section of the log analyzer, in the Data Overview tab.
A “Lost page” is a page which, when crawled by Google:
As a result, the page was “lost” during the displayed period.
Here is an example with a website specialized in housing classifieds, where most lost pages correspond to expired ads which now return an 410 HTTP status code (Gone):
The distribution of lost pages provides valuable information:
On this other classifieds website for instance, there is a low, regular amount of lost pages in the classifieds category (expired ads details pages in light green), combined with a surge of lost pages, which include classifieds as well as dealer (“Pro”) pages, towards the end of the period. In this example, lost URLs are now redirected.
From the data table below the graph, we can zoom in on a page category and see details within that category (click on the category name), or click on “View URLs samples” to go to the Export tab and get a CSV file.
In this case, this is an update of the dealer section, which impact both dealer navigation pages and classifieds pages.
Here are the details for the classified category:
And here are the details for dealer pages:
Looking at the number of lost pages in light of the number of new pages crawled (“New Unique URLs crawled”), or pages which are crawled for the first time ever, is interesting to get a grasp of content rotation – typically for products or ads. In the case of normal content rotation, the number of lost pages will be in the same ballpark as the number of new crawled pages.
This comparison needs to be applied to a long period (30 to 60 days), especially when dealing with a large amount of crawled pages: Google needs time to crawl the content and it doesn’t make sense to analyze content rotation on a partial view.
The lost pages indicator is also very convenient to monitor pages removal. Let’s say for instance that you just removed some useless pages that were generated by mistake. Google doesn’t know these pages are gone until the search engine crawls them again and gets the appropriate HTTP status code (HTTP 404 – Not Found or HTTP 410 – Gone). The fact that URLs appear as Lost Pages in Botify Log Analyzer indicates that Google got the information.
This is more precise than simply looking at HTTP status codes: If we just look at all 404 crawled by Google, we may also see pages that have been returning HTTP 404 for some time, that Google still keeps checking. Lost pages tell us precisely which were lost during the time frame we are looking at.
Do you find this useful? Do you see other interesting usage scenario? Let us know!