The Botify crawler’s primary objective is to analyze your website as search engines see it – or as they would, if they could devote unlimited time to your site’s exploration.
It’s also a powerful ally in a number of other situations, as soon as you would like some ‚Äòreal world’ testing: automated redirects, special treatment according to user’s characteristics, or, simply, a site under construction. You can now customize the Botify crawler’s user-agent. It will introduce itself to the web server wearing the hat you have chosen for him, instead of that of the Botify robot. This allows any testing scenario.
A custom user-agent will allow to:
1) Crawl the mobile version of a website which redirects users based on their user-agent
2) Analyze a website in a pre-production environment, before it goes live
3) Be treated as Google, when Googlebot receives a special treatment
4) Transmit parameters to conduct specific tests: performance testing, user-language testing, or any other test
If your website redirects to its mobile version based on user-agent information, you can crawl your site using a mobile user’s user-agent (that of an iPhone for instance) to check that redirects are triggered as planned. You will also be able to check the proportion of page-to-page redirects versus bulk redirects – the former are for mobile page to perform well in search engines; the latter should be avoided: this will show in the Botify report, which provides the number of incoming redirects per crawled URL.
You can also crawl a second time using a Googlebot Mobile user-agent, to check that the mobile bot is redirected the same way and hence has the same vision of things as mobile internet users. This is a requirement for mobile pages to rank in search engines.
Being able to analyze a version of your website in a pre-production environment is of great value, not only for search engine optimization, but also for change management:
When Google’s bots are not treated the same way as other user-agents, we may want to crawl ‘as Googlebot’ to get a result that is in line with what the search engine sees.
This might be the case for different reasons, without implying the site is using cloaking (which would mean that search engines are shown a different content than that shown to users, which, depending on the nature of content differences, might be considered as deceptive and might be sanctioned by Google if considered abusive).
For instance, performance has been optimized by eliminating tasks that are not applicable to search engines (such as creating a user session). In this case, we’ll want performance analysis results to match Google’s actual experience of the website.
By using a user-agent build from Googlebot’s user agent, and adding a character chain that is specific to the website of the project. As a result:
Avoid ‚Äòpolluting’ log files with fake data!
The second point is key: without this additional element, log files analysis could be skewed, as some Googlebot crawls could be taken into account, while they weren’t actually from Google.
That’s not all. People who manage and analyze log files need to know there are lines with a ‚ÄòGoogle-like’ user-agent that need to be removed before performing any analysis. That’s precisely why Botify needs to validate any custom user-agent that includes one of the top search engines’ bots names (Googlebot, Bingbot, Yandex, Baidu, Yahoo’s Slurp).
Using a custom user-agent, any test is possible. You can add to the user-agent any element that can be detected by the web server and trigger a special treatment. Parameters could apply to technical or functional elements, such as:
We’re talking about manipulating user-agents, which are sorts of business cards on the Internet. But a crawler’s behavior and speed have nothing to do with what can be expected from an Internet user. That’s why Internet politeness rules suggest to include a link in the user-agent, so that the owner or manager of a website can contact without delay someone who has control over the crawler. We strongly advise to follow this politeness rule with custom user-agents.
As for crawl speed, the Botify crawler does everything in its power to avoid straining the website it is crawling: it adjusts its crawl rate not only according to configured speed, but also according to the website’s response delay, which can indicate strain.
B&W illustrations : Simple Icons from The Noun Project