Could you be unknowingly cutting loose valuable pages? To put it another way, do you have orphan pages with existing or potential organic traffic? The answer is probably yes. It is for most websites. Re-attaching some of these pages to your website structure would allow to tap into their full potential.
Orphan pages are pages explored by Google that users can’t find while navigating on your website: they are not linked anywhere on your website. As a result, the Botify crawler doesn’t find them either.
Orphan pages have weakened traffic potential, but that’s not all. The other problem, actually even more frequent, is enormous amounts of crawl waste from Google. We recently talked about pages in your site that Google doesn’t know exist – because the search engine can’t, or won’t explore them (the red part on the left, in the graph below). In the vast majority of cases, there are also orphan pages (the grey part on the right).
In the following example, more than 70% of the pages explored by Google on the website are orphan pages:
There are two kinds of orphan pages: the expected, inevitable, normal orphan pages resulting from known causes; and the unexpected.
So the first thing to do when we see a high volume of orphan pages is to check what they look like and if they are expected or not.
Expected reasons for orphan pages:
Frequent causes of orphan pages that shouldn’t exist but are crawled by Google:
Yes, you read correctly! There can be both expected and unexpected orphan pages generated by expired content. The difference is in the HTTP status code. Both were linked on the website when Google crawled the pages, and they were not linked any more when the Botify crawler explored the website. But once the content expired, the normal orphan page says it’s gone (it returns HTTP 404 or 410), while the abnormal one still exist (it returns HTTP 200). The difference will show in the logs analyzer. In the first case, the number of HTTP 404 will grow steadily and the number of HTTP 200 will be relatively stable, while the number of HTTP 200 will keep growing over time with abnormal orphan pages.
So, what next? How do we know what we’re looking at?
The logs analyzer helps identify orphan pages. It also allows tells us if some would be worth reintegrating into the web structure, with information about visits generated by orphan pages.
Let’s take go back to our example. There are approximately 800K orphan pages crawled by Google, way more than the 300K pages explored on the website. The Log Analyzer’s crawl report shows page distribution by type of page, for each.
The distribution by type of page is very different from what Google finds in the website structure.
A quick look at the log analyzer’s daily history graphs tells us that the green pages that represent 61% of orphan pages in the graph above are redirected:
This graphs shows Google’s daily crawl volume on this particular category of pages, by status code. The pages almost always return an HTTP 301 status code (permanent redirection), shown in orange.
The report also tells us which types of orphan pages are active (an active page is a pages which generated at least one visits over the analyzed 30-day period), and how this compares to active pages in the website structure:
And most of all, the report indicates how this translates into organic visits. On this website, 5% of organic visits are generated by orphan pages.
In this example, the type of page that generates 79% of organic traffic on the website (in the structure), also generates 7% of traffic on orphan pages. And the two categories of pages that generate most traffic on orphan pages are actually broad buckets for “other” types of pages, which were not categorized more precisely, as they are very few on the website (the graphs above combine all values below 1%, but the report can show finer details).
The full list of categorized orphan pages, along with their number of organic visits and number of crawls from Google, is provided with the report. This will allow to investigate these orphan pages, and decide how to treat them.
And if you happen to find a surprisingly large amount of orphan pages with organic visits, you can bet these are mostly Google Adwords visits that were not properly identified – missing Adwords identifier parameter in the URL, for instance.