Check if you fell into some pagination traps! Here are some common mistakes. Go down the list and check “Yes” or “No”. What’s your score?
This is a coding mistake. Let’s say the first page of the list is www.mywebsite.com/list.html. It has a duplicate, www.mywebsite.com/list.html?p=1, that every paginated page (www.mywebsite.com/list.html?p=2 etc.) links to, instead of linking to www.mywebsite.com/list.html.
*How to check: *
By looking at links on a paginated page. If the duplicate is on one page, it will be on all.
This is a server response problem : some paginated pages respond with an empty list and a HTTP 200 code (OK). For instance, a list has 3 pages, but www.mywebsite.com/list.html?p=4 responds with an empty template, instead of returning HTTP 404 (Not found).
Normally, this URL is not linked on the website (or if it is, it is an additional problem!). But this can still create a problem, as soon as a list becomes smaller than it was before: the last paginated pages that existed earlier were crawled by Google, and the search engine returns to check on these pages even if they are not linked any more. As a result, Google crawls pages with no content and associates negative quality signals to the website.
*How to check: *
To check the server response, change the pagination number to a page that doesn’t exist, in an existing paginated URL, and verify the HTTP status code returned by the server. If it returns HTTP 200, we need to check how much of a problem this behavior creates (if empty pages are crawled), or if it’s just waiting to happen. In some websites (with editorial content for instance), lists usually never get smaller, they just get longer as new articles are published. But in others (such as e-commerce websites, in particular marketplaces), they often do.
If empty pagination is crawled, the pages will appear as orphans in the Botify Log Analyzer report. Filter to see pagination only, and go to the “overview” page.
In the example below, there is a large number of orphan pages:
You can confirm that there is a problem if you know that your pagination length fluctuates, but, when looking at the log analyzer’s monitoring interface, you don’t see any crawl on pagination returning HTTP 404 (not found).
In the example below, we zoomed to see Google’s crawl on pagination only, and all pages return HTTP 200.
If some pagination pages were removed (returning HTTP 404), they would also be counted as “lost URLs” (see counter in the upper right corner in the example below): lost URLs are URLs returning a non-200 HTTP status code, while they were returning HTTP 200 earlier.
Paginated pages includea rel=canonical tag to the first page of the list. Such a tag would indicate that the page content is similar on all paginated pages and on the first page, and that the first page is the main version. That’s not the case. Worse, we would basically be telling Google that it should not bother looking at the paginated pages’ content, while it links to some key pages that may not be linked anywhere else.
This mistake is usually an attempt at persuading Google that the first page is the best candidate for search results. But there is no need: normally, internal linking naturally conveys that message aready.
How to check:
By looking at paginated page’s code. If it’s on one page, it will be on all.
This mistakes results from overlooking how navigation functionality meant for users impacts the crawlable website structure.
Alternate versions of each list are available through their own URL and can be crawled by robots: sorting options, more or less items per page, with/without image or other display options. These create pseudo-duplicates.
This issue actually goes beyond pagination, as the first page of the list will also be a duplicate from an SEO perspective: it will target the same traffic as the page with the defaut sorting / display options.
However, we are compelled to mention this here, as pagination makes the problem reach enormous proportions.
How to check:
By looking at paginated page’s code. It will include links to alternate versions with URLs that include display / sorting parameters (refreshing the current page using AJAX instead of going to a different URL solves the problem).
This mistake usually results from a human initiative: considering that items listed first will receive more link juice than items listed last, some may try to introduce some randomness in the order in which items are listed, in an attempt to distribute “link juice” more evenly. However, this won’t do any good. A changing structure will negatively impact search engine’s crawl efficiency and confuse the view they have of the website. As a general rule, search engine robots need a stable navigation structure and a permanent path to content.
How to check:
Setup your web browser to user a Google user-agent (you can use for instance this Firefox extension) and reload a page several times to see if items are always listed in the same order.
For more on pagination: