Yes, of course. My name is Simon Doll√©, I’m a Research Engineer at Botify. I’m the lead dev for the Suggested Patterns functionality. I work on another project as well, which is top secret ( 🙂 ) and should yield results around the end of the second quarter of 2014.
As you know, Botify is a powerful SEO analytics application that performs automated audits of websites. The application’s goal is to quickly identify all SEO optimizations that could result in increased traffic and revenue for your website. These automated analyses are based on your website’s crawl data: during our crawl, we collect a huge amount of information, such as depth, number of pages, title tags, performance, etc.
This information represents a very large amount of data. One needs to be able to analyze this data and interpret results to adequately prioritize SEO optimizations.
This is precisely what we had in mind when we developed the Suggested Patterns. The idea is to quickly pinpoint patterns, or types of pages, that cause a given problem.
In the website analysis report, we suggest one or several URL patterns that you should focus on correcting first, for each indicator: these are Suggested Patterns.
Of course. For instance, Botify will allow you to discover that your website includes a high proportion of redirected pages.
What you will want to know, is where these pages are linked from to try to lower the number of redirections. The Suggested Patterns section for 3XX sources found in Botify’s report will allow you to display our suggestions and identify groups of redirected pages. In the following example, the Topic pages represent 86% of all redirected pages.
You will be able to use Suggested Patterns to identify which pages you should focus on when working on optimizing each indicator found in the Botify report (http status codes, duplicated or empty tags, slow pages, deep pages, etc…).
The easiest way to answer this question is through an example. In the following example, 46% of pages have a duplicate Title tag.
If you go to the Suggested Patterns section, you will find the following table:
The first column describes the pattern using URL properties (protocol, domain name, etc.).
The ‚Äò*’ are wildcards that can be replaced by other elements.
Here, we are showing the pattern for ‘/forum/divers/’ pages.
The second column indicates the number of pages with that pattern on the website. In this example, the ‘/forum/divers/’pattern represents 525 pages.
The third and fourth columns indicate that 498 pages, or 94,9% of pages with this pattern, have a duplicate title tag.
The fifth column is extremely interesting, as it allows to prioritize SEO actions. It shows in which proportion the indicator’s value is represented by the Suggested Pattern. In this example, you know that once you have updated pages in the ‘/forum/divers/’ pattern, you have solved a little over 39% of the duplicate title problem!
Suggested Patterns should save considerable time by allowing you to focus on areas that really need optimizing. That’s why patterns are sorted on that last column.
We look for groups of words that are often found together. We use terms from the URL, but also title texts or even H1 tags. We achieve this using data mining techniques to explore millions of words combinations and identify relevant ones. We then select relevant groups for each indicator that is shown in the the Botify report. We use fast algorithms with low memory consumption which allow us to compute Suggested Patterns for 500 000 pages in less than 30 minutes on a regular desktop computer.
Absolutely not, this suggested pattern discovery functionality is only just starting. For now, Botify only takes into account a limited number of parameters. As a result, it is entirely possible that a real problem is not yet diagnosed through suggested patterns. Imagine a physician who is faced with a very rare disease. He is aware that there is a problem, but may not be able to identify the disease. In this case, an expert’s point of view is needed. That is also why the URL Explorer* will be extremely helpful, as it allows to perform advanced searches in your meta data.
Our goal is to be able to diagnose in an automated manner any problem affecting your website. To achieve this goal, we will use more and more parameters in our algorithm. For instance, we are not yet using URL query string parameter values, and we would like to integrate those into our algorithm in the future. We are also going to make sure we are able to discover smaller and smaller groups of pages, to be able to pinpoint problems even if they are affecting a limited number of pages.
(*) upcoming post on the URL Explorer