Caret leftBack to Blog

Top 3 Causes of Empty Pages (and Why It Matters for Your SEO)

25th September 2014AnnabelleAnnabelle

Some empty pages may be lurking in your website, unnoticed, making robots' job harder. Being empty, they don't generate any organic visits. The main issue here is crawl waste: when Googlebot is exploring these pages, it's not crawling your actual content. That's why we should track these pages and prevent them from being crawled.

Beware of systematic, crawlable empty pages

A few empty pages don't seem like a red alert. But some empty pages are generated systematically by content pages, and then, the problem reaches very serious proportions.

This is potentially critical, for instance, for websites with large amounts of daily new content such as forums. Google could be spending a significant amount of its daily crawl budget on new empty pages and failing to discover some of the new content (while finding that half of the new pages are not interesting, which is not an incentive to come and get more as fast as possible). Perhaps URL patterns are very clear and can give Google a hint, but perhaps it's not that clear, or Google may not go out of his way trying to figure it out. We shouldn't take that chance.

The bulk of empty pages are usually created by user action links, when managed through their own URL in an <a href> tag (as opposed to Javascript). These account for 2 of the top 3 causes listed below.

Top causes of empty pages

1) Links meant for registered users
Typically, any link that allows a user to act on a particular content and requires him to be logged-in, for instance:

  • Write a review about this product
  • Reply to this post / comment
  • Report abuse for this post /comment
  • Manage this ad (classifieds websites)

Resulting empty pages:
Login pages - as a robot is not logged in - with their distinct URL for each action and each content (product / post / article / ad).

2) Links to forms related to a page
In these cases as well, there is a URL per page, with content ID parameter

  • Contact us about this,
  • Email this (with email form hosted on the website)

Resulting empty pages:
Forms with their distinct URL, for each action and content

3) Interstitial pages for meta-redirects
Links do not go directly to the page with the actual content, but to an intermediary (interstitial page) which returns an HTTP 200 (OK) status code. The redirect is placed in a meta refresh tag in the intertitial page's code, which requests the page with the content after a predefined delay (or immediately). But a crawler will consider the interstitial page that returns HTTP 200 (OK).

Resulting empty pages:
Pages saying "You are being redirected to…."

The related question is, do we want the target of the meta-redirect to be crawled? If so, we should have a HTTP 301 redirect, or, even better, no redirect at all. If not, then the interstitial should not be crawlable either.

What can we do?
In all three cases, the answer is to code the links using Javascript instead of using <a href> tags and disallow the URL patterns in the robots.txt file.
If you are building new pages and don't need Google to "forget" empty pages that were already crawled, then the disallow is not necessary. You can prevent this from happening using Javascript. Simply make sure the URL does not appear - in full or even the the relative URL - in the Javascript call, Google would find it tempting to try it out. You might say, "but Google can now read some Javascript". Yes, but not all, nor will it systematically try. What we want to avoid here is to see all these empty pages filling the search engine's crawl queue.

Check if your website creates crawlable empty pages

In your Botify Analytics report

The first step is to check pages with only one incoming link. It will be the case for the type of empty page we are looking for.

Go to the Inlinks section of the Botify Analytics report. Scroll down to the URLs zone and click on "URLs linked only once":

And then on "Explore all URLs" to enter the URL Explorer, which will show the following settings:

Note: the filter is set on "unique" incoming links, to also get pages which have several incoming links from the same page.
As pages with a single incoming link may also include actual SEO potential landing pages that don't receive enough links, we need additional information to identify empty pages. Some pages with only one incoming link may also be redirected.

We are going to select URLs returning http 200 (OK) only, and display more fields to help figure out which may be empty pages:

  • Number of pages with the same Title
  • Number of pages with the same H1
  • Page code size
  • Page that has the one link to the page we are looking at

In most cases title and/or H1 will be identical on all empty pages with the same cause. Page size can also give a hint, if there are a number of pages with a similar size (the size alone is not enough, there may be a heavy template).

Here are corresponding filters and displayed fields settings (remove unwanted settings by clicking on the cross; make your selections from the drop-down lists; for displayed fields, start typing a part of the field name in the "fields to display" area to narrow down the selection):

In this example (ordered by highest number of URLs per H1 - click twice on the H1 column header to sort), there is a meta-redirect on several thousand pages:

You can easily see more information about an empty page, or the page it is linked from, simply by clicking on the URL. To go directly to the page on your website, click on the blue arrow on the right of the URL.

Let's say there are a number of pages with the same H1, and you would like to be able to see only one example URL for each different H1. Add "First duplicate H1 found" to the filter rules and click on "Apply".

Out of those, we can also find out which may be already identified as useless for SEO and as a result have a noindex meta tag - but still create Google crawl waste because the search engine has to request the page to find out it should not be indexed.

Add a "has robots anchors as 'noindex' " filter set to "true" and click on "Apply".

See what happens when navigating the website through robot's eyes

As causes are very specific, we can also approach the problem the other way around, and check what a robot gets while navigating the website. If we find empty pages, we can then search for URLs with the same pattern in Botify Analytics' URL Explorer.

Go to your website after disabling all Javascript, cookies and meta-redirects in your Web browser using a developer add-on (for instance, this Firefox extension). Click on all user action links you can find to see if you get a new page (new URL).

This will also allow to find empty pages that are disallowed to robots (disallow directive in the website's robots.txt file or nofollow in link to the page). This is not the worse case scenario, but it's not ideal either. Although Google won't waste any crawl on these pages, they still cause significant link juice waste, if links to empty pages are coded as with <a href> tags: these links are assigned a portion of the pagerank of the page they are on, but don't transmit it to any page (it falls into a PR "black hole").

We won't be able to query these pages in a standard Botify Analytics report, as the Botify robot the same rules as Googlebot. But we will still get infomation about links to disallowed / nofollow pages in the "Outinks" section of the report (from the perspective of the crawlable page they are linked from).

And if you wanted to know more about empty pages that are currently disallowed in the robots.txt file, you can still do another Botify Analytics crawl using the Virtual Robots.txt functionality: paste your robots.txt file's content and remove the line that disallows these pages before starting the crawl.

What's your experience? Do you see some other causes of empty pages spoiling your SEO? We’d love to hear about it, do not hesitate to drop a comment on this post!