Some badly implemented canonical tags can cancel out the effect of appropriate ones.
First of all, why canonical tags? Once duplicate content has been identified, it’s not always easy to remove it, or to prevent it from being crawled by search engines. Then, we implement canonical html tags that indicate, on each duplicate page, which URL corresponds to their content’s primary version. A – hopefully temporary – solution, that easily solves half the problem (the remaining half, wasting Google’s crawl on the duplicates, remains).
However, all canonical tags need to be implemented the right way: if some are placed between pages with completely different content, Google can choose to ignore them entirely. Not only those, but also all other canonical tags on the website, considering that the rules behind the canonical implementation are not valid.
How well are your canonical tags implemented?
Let’s see. Go to the canonical tab of the Botify Analytics report. We are going to check pages with a canonical tag to another page (which is the tag’s purpose), those in yellow. You can of course do the same to export pages without any canonical tag or a tag to themselves.
Either click on the yellow area of the chart above, or on the “URLs with canonical to other URL” block in the “Metrics” section:
You will get a sample list of pages, with their URL and the URL their canonical tag points to. Click on “Explore all URLs” on the upper right of the list, to explore and export the full list in the URL Explorer.
We now have the statistics and the full data to make site-specific verifications.
What else? Here are a couple of ideas: look for canonicals on pagination, and check if metadata and canonical tags are sending consistent messages.
Any canonical tags in pagination?
Let’s check for canonical tags in pagination, that point to the first page of the list. That’s one of the top pagination mistakes.
When you entered the URL Explorer, the following filter were selected, for pages with canonical tags to another page:
Add a filter on the URL to select pagination only, with the regular expression (regex) that corresponds to paginated pages on your website:
Click on Apply. In our example, we get 203 results. A quick look at the results table indicates that pagination has canonical tags to the first page of a list (same URL without page number information).
Do canonical tags and content tags send consistent messages?
Let’s check how pages with duplicate metadata (which most likely include fully duplicated content) are managed through canonical tags. To do that, we’ll look at pages which title, H1, and meta-description are all found on other pages.
Go to the URL Explorer, clear all filters and select the following (all three are found in the dropdown list’s “Metadata” section):
- Number of pages with same H1 > 0
- Number of pages with same description > 0
- Number of duplicate title > 0
Select the following fields to display (click in the fields area to open the drop-down list and select them one by one; “URL” is the first listed; start typing “H1” / “title” / “description” or “number of” in the fields zone to narrow the list down):
- Number of pages with the same H1
- Number of pages with the same description
- Number of duplicate title
Click on “Apply”. Let’s look at the result for our example:
That’s really significant, considering that the website has less than 5,000 pages.
It would be interesting to find out how many of these pages are identified as the primary version of duplicate content, through the use of canonical tags: we are going to select pages which have other pages pointing to them through their canonical tag (incoming canonical tags, from the page’s point of view).
Add the following filter:
- Number of incoming canonical > 0
And add the following field to display:
- Number of incoming canonical (start typing “canonical” in the fields zone)
Click on “Apply”. In our example, 244 pages are designated as the primary version of duplicates, out of the 1366 with all content tags duplicated.
Note: in the table, the number of duplicate title / description / H1 include the current page. For instance, the 14 title and description duplicates for the first URL are the 13 incoming canonicals and the URL itself.
The rest of pages with all content meta tags duplicated includes pages with a canonical tag to one of the primary versions we indentified. But it could also include pages with no canonical tag or a tag to themselves.
To find out, update the filters:
- Change the “number of incoming canonical tags” filter to = 0 (to exclude the primary versions)
- Add “Canonical to” does not exist
Click on Apply. In our example, there are no pages with no canonical tag.
Lastly, is there any page, not a primary version, with a tag pointing to itself? Let’s see:
- Change the “Canonical to” filter to “exists”
- Add “Canonical is the same URL” is “true”.
Now we must verify if these pages are duplicates that have been overlooked when implementing the canonical tags. If they are not, their title, meta-description and title tags clearly need to be more precise!