Maintaining Quality On Your AMP HTML Pages

min read

December 13, 2016

It's a mobile world and you're staying on the leading edge: Your mobile page speed is fast and you've started publishing AMP HTML versions of your pages. Google's mobile-first index? You're ready.

What is AMP?

If you've tried to read an article on your smartphone over the past few years you've probably been frustrated by slow loading times and ads that get in the way of the content you're trying to read. The Accelerated Mobile Pages Project is an open-source initiative championed by Google to deliver a lightning-fast mobile content experience.Google cares a lot about the mobile experience because the majority of their users are on mobile devices (Search Engine Land, 2015). Perhaps more importantly, they've identified key behavioral stats like this one: 53% of mobile users abandon sites that take 3 seconds or more to load (DoubleClick study, 2016). That directly leads to lower engagement and monetization for publishers.We've been seeing incredible crawl trends that reflect Google's interest. Take for example this graph showing rapid growth and volume of Google Smartphone crawl of AMP URLs soon after a site launched them.Whether it's because Google has been championing AMP or because it's preparing for the mobile-first index, there's no doubt Google is giving a lot of attention to AMP URLs.That means it's important to get your configuration right. Validating your AMP HTML code using tools provided by Google and the AMP Project isn't enough. You can't just set it and forget it.Here is a list of questions to answer to make sure everything is going as expected with your AMP URLs.

AMP HTML Discovery

Let's start with discovery. The two primary methods to enable discovery of AMP pages are:

Include a tag in the of your non-AMP web pages
Include AMP URL entries in your XML sitemaps

How can you be sure the pages that should have AMP URLs actually do?If you're not publishing AMP HTML URLs for every page on your website, odds are you have some logic to determine which pages get AMP versions (articles on a news website, for example). In an ideal world you'd have a method for keeping track of which URLs should have an AMP version (maybe a table or flag in a database).Identify your AMP URLsThe point here is to be able to generate a list of URLs that you can crawl to identify whether the amphtml link is present in the head, as in this example: . If you have a method to define which URLs should have it, then you can simply crawl that list.If not, then you will need to crawl your website with a tool that can capture the amphtml link from the head of your pages.Once you've crawled your list or your site and identified which pages have the AMP link and which do not, you need to determine whether the results are as expected. If not, then you will need to determine why AMP links are missing. There could be many reasons your AMP links aren't showing when you expect them to and those reasons are likely to be particular to each website. We recommend revisiting your logic for when AMP URLs should published and made discoverable and reviewing the implementation with your engineers to find the gap.

Are your AMP pages canonicalized properly?

The second important step for AMP HTML discovery is ensuring that the AMP URLs have a rel canonical link in the to their base URL. The exception here is if you're only publishing AMP pages, in which case they should have a canonical link that references themselves.Crawl, Capture and CheckThe process here is much the same as the first crawl, but with a small difference. Here is our three-step process:

Crawl the AMP URLs
Capture the contents of the rel canonical link
Check to be sure the AMP URLs are canonical to the correct URL

You can either use the list of AMP URLs discovered from the first crawl or use another source, such as your log files or web analytics (filtering for URLs that contain a pattern such as /amp/ or /amp.html or amp.domain.com).In general, you want the canonical link on the AMP URL to:

Point to a URL that gives a 200 HTTP status code
Be indexable (doesn't have meta noindex, isn't disallowed by robots.txt)
Be canonical to itself

This way all the signals a search engine might use for indexing and ranking can be aggregated correctly. Search engines haven't yet said non-canonical URLs shouldn't link to AMP URLs. But AMP URLs should have a canonical link and it would make the most sense for that link to point to an indexable URL.Use a spreadsheet to check canonicalizationFor this step, we used a process similar to checking whether your canonical tags contradict your internal linking.

Create a spreadsheet with a tab for "canonical" URLs (from your initial crawl)
The "canonical" tab should contain your base URL in column A, the URL it is canonical to in column B and its AMP URL in column C
Paste the results from your AMP URL crawl in a second tab and call it "amp". This tab could just have two columns, one for the AMP URLs and a second for the AMP URL's canonical link values.
In the "canonical" tab label column D something like "AMP Canonical To"
In column D use a VLOOKUP formula to bring in the AMP URL's canonical value from the "amp" tab. The formula should look something like this:
In the "canonical" tab label column E something like "Canonicals Same?". This is where you'll check to ensure the canonical values of the base URL and the AMP URLs match.
On column E use an IF formula to compare the values of column B and column D. Ideally the result will be TRUE, meaning the canonical URLs match. Here is an example of an IF formula:
The headings of your complete "canonical" tab could look something like this:
Once your formulas are calculated, filter on column E in the "canonical" tab to view only the results that aren't TRUE, meaning the canonicals don't match. This will give you a great list of examples with which to debug with your QA or engineering team.

Are my AMP pages delivering the correct HTTP status code?

We expect all AMP URLs to deliver a 200 status code. Websites change, of course, so that may not always be the case. The process above will help you identify cases where the status code changed unintentionally.If you have eliminated your base URLs, then we'd expect the AMP URLs to give the same response code as their base. By eliminating base URLs, we mean:

Setting URLs to deliver a 404 or 410 HTTP status code
Or through migration using a 301 HTTP status code

Most methods of crawling URLs should be delivering the HTTP status code, so you should have this information as a result of your earlier crawls meant to QA the AMP HTML discovery.

Advanced Scenarios

While it's great that you've published your AMP URLs and verified they are discoverable, there is more to do. Are they being crawled? Are they getting traffic?Please share your comments below about your experience with publishing AMP HTML pages. What troubles have you encountered and how did you resolve them? How do you monitor their performance? What use cases did we miss here?

Want to learn more? Connect with our team for a Botify demo!

Get in touch