In SEO, a website migration is all about making the change as smooth as possible for search engine spiders.
Some migration mistakes everybody knows about. But they still happen from time to time: who hasn't heard a story with mishaps disallowing or allowing access to search engines robots? Second most frequent, are problems related to redirections. Then there are other caveats, such as preserving sitelinks in Google results. Worth a check, right?
1) Robots Access: Failing to Prevent Robots from Exploring the New Website While Under Development
Minimum required: a robots.txt file with Disallow: / (all) for all user agents.
Good to have: require authentication (HTTP basic authentication) and return HTTP 401 (Unauthorized) if authentication fails; return HTTP 403 (Forbidden) to all user-agents except a white-listed, secret user-agent for internal use.
If we don't prevent search engines robots from crawling, the version of the website under development can be indexed - with development URLs - , and potentially generate duplicates of the current website. You'll know if it already happened by typing a site:[domain] command in Google, with your pre-production domain and see if there are results. It's not too late to act (disallow in robots.txt, at least).
To crawl your pre-production website with Botify Analytics, you can use a Virtual Robots.txt option to instruct the Botify robot to ignore the robots.txt found online and follow specific rules instead, along with the Customized User-Agent option to instruct the crawler to use your white-listed user-agent. And soon, you will also be able to add authentication credentials in the crawler's HTTP header.
2) Robots Access: Failing to Allow Access When the New Website Goes Online
Of course, the new website won't get indexed (oops!).
Silly as it may sound, everybody is so busy at launch that it happens. You wouldn't be the first to forget to update your robots.txt file when the new website goes online.
3) Robots Access: Getting rid of Existing Disallow Rules Too Quickly
In addition to inserting new disallow rules to match new URL formats, we need to check which existing rules are still needed and should be removed. If we fail to remove some old rules, they may match some new pages and prevent search engines from crawling them. On the other hand, if we remove existing rules prematurely, while Google knows many such URLs exist but did not crawl them as they were disallowed, the search engine will most likely come running to check these old URLs (instead of spending time and energy on the new website).
To check the planned robots.txt file on the new website before it goes live, you can crawl the pre-production version with Botify Analytics with the planned robots.txt content as Virtual Robots.txt.
4) Mobile website: Failing to Redirect All User-Agents Adequately
In the case of a migration of a mobile subdomain or folder (to another separate mobile website, or to the main website using responsive design or dynamic serving), verify that all user-agents are redirected correctly. If there are separate desktop and mobile websites: no redirection for generic bots such as Googlebot, mobile bots redirected to the mobile version (pâge-to page redirections! which means, only if the mobile page exists); when migrating from separate websites to responsive design or dynamic serving: all user-agents to the single website.
5) Redirects: Failing to Take Existing Redirects Into Account
We know we should do as many page-to-page redirects as possible during a migration. Which is and should be the priority. But when focusing on redirects from the old (current) site to the new one, we tend to overlook the importance of existing redirects. These redirects from previous migrations or updates catch older URLs linked on external websites or bookmarked by users (also important for SEO for usage statistics).
- If older redirects are simply removed, these external links and bookmarks will lead to 404s (or worse, HTTP 500 server errors depending on how these old pages are handled).
- If these old redirects are kept as they are, and new redirects added without any consolidation with these existing redirects, and this creates redirect chains:
Old redirect: older-page-A→ current-page-B
New redirect: current-page-B → new-page-C
While we could plan to have A → C
This means that redirects must be reassessed as a whole and consolidated: this way, we will eliminate redirect chains and generate only one-hop redirects, even from older external links. Much leaner for Google crawl.
The perfect data set to test consolidated redirections is the list of pages crawled by Google, ideally over a few months period, plus the top pages that generate direct visits.
6) Redirects: Failing to Take Canonical Tags Into Account
Hopefully, any duplicate content on the old website that was managed through canonical tags was removed from the new website, and this content is now present through unique pages. In most cases, rewrite rules for page-to page redirects from old pages to new pages only take into account the main (canonical) version of each old content page.
To reap all SEO benefits, duplicates that had canonical tags pointing to their main version should also be redirected to the new page. This will help Google better understand the mapping between the old site and the new site - and in cases where canonical tags were not well implemented and possibly, as a result, ignored by Google, well, we make sure that the non-canonical page Google considered as the main page is redirected as it should. For example: if most of the internal linking goes to a non-canonical page - non canonical according to HTML tags - , Google will still consider this one as the main page.
Botify Analytics allows to identify all pages with canonical tags and the canonical page they point to.
7) Failing to Preserve Sitelinks in Google Results
Pages that appear as sitelinks in Google results below the website's home page deserve special attention during a migration.
Or, their compact version:
This results from Google's decision, according to data analysis from the website and users' visits patterns. Wouldn't it be a shame to reverse to plain results, when Google had decided to show sitelinks? This means that the new site has to continue meeting the criteria that made Google select these pages as sitelinks.
To maximize the chance of keeping your sitelinks:
- Make sure to implement page-to-page redirects for pages that appeared as sitelinks
- Link the new pages on the home page
- Encourage users to click on these links (highly visible, user-friendly navigation)
A few weeks delay for sitelinks to reappear in Google search results can be considered as normal.
8) Failing to Remove Old Sitemaps
In the case of a migration to a new domain, old sitemaps are sometimes forgotten, and remain active. They keep pointing Google to URLs with the old format. Don't forget to remove them!
9) Neglecting Page Performance Impact
If the migration involves technology migrations, page performance (download time) should be measured and compared to previous performance, as pages response times impact SEO In particular for large websites with a large proportion of long tail traffic: performance will have a direct impact on organic traffic, as traffic volume is directly related to crawl volume for long tail audience.
10) Failing to Prepare Monitoring Tools Ahead of the Migration
If the migration involves a domain or subdomain change we need to create a new website in Google Webmaster Tools. Same goes in the case of an HTTP to HTTPS migration, as GWT considers http://www.mywebsite.com and https://www.mywebsite.com as different websites. It's also best to inform Google that the domain has changed if that is the case, with GWT's "move site" tool.
If you are using Botify Log Analyzer, then the URL categorization needs to be updated prior to the migration, to be able to monitor Google's crawl, HTTP status codes, new pages discovered by Google, lost pages, recovered pages, active pages and organic visits.
We need to monitor both pages from the old website, and pages from the new website, while making the distinction (which GWT doesn't allow, by the way).
Two different approaches are possible for URL categorization tree structure (as you can get all indicators for each node or leaf node):
The short term approach: you want to monitor the migration over the next few weeks, and get a global view on metrics from the old site vs metrics from the new site:
The longer term approach: you want to monitor the migration of old + new website as a whole, and still be able to drill down and get metrics for one or the other (old vs new), per page type:
More generally, managing a website migration from an SEO perspective implies:
- Anticipation: prepare for the migration very early on
- Planning: inventory all pages that need to be redirected
- Testing: try out redirections on real data
- Monitoring: check the migration's impact
We covered these main website migration SEO steps.