Why do we want to draw your attention to this? Because getting priorities right is key to any type of optimization, all the more so for SEO. And if search engines don’t know a significant part of your website, isn’t that the first SEO problem we should solve?
Stating the obvious here, to point out that we shouldn’t take for granted that Google crawls everything that exists on the Internet. I’ve heard puzzled clients say: “Google is so powerful, has such virtually unlimited resources, that shouldn’t be an issue?”
Truth is, however powerful the search engine is, it still has to manage priorities. It may focus on new content discovery, vs. refreshing existing content. It will also typically explore highly popular pages hundreds or thousands times a day, while completely ignoring other pages, doing so knowingly or simply because it never came across a link to the page.
So where do your pages stand? Do they exist in Google’s world view? This type of graph can be a real eye-opener:
This graph confronts two views:
This example shows a very typical situation – although the size of each disk and the overlapping surface may vary.
The bottom line is that, in the vast majority of cases, Google has a very skewed view of your website:
Orphan pages, crawled by Google but not found on your website, may result from different things :
Let’s focus on the first problem – making sure Google explores your website as thoroughly as possible.
Google will use a number of signals to decide to crawl pages, and how often to crawl them. Among the top signals, besides website popularity and authority of course, are user’s visits and behavior, as well as content quality. But these rich signals are only available for a comparatively small number of pages on the internet. What about the rest? For those, Google only has Pagerank to fall back on.
Which is why the Botify report includes an Internal Pagerank indicator: it allows to see how the website’s pagerank flows in the website structure, how it is distributed among pages. Hopefully, it primarily goes to important pages, and accurately reflects the pages’ importance.
See below, an example of the percentage of pages crawled by Google on a website, shown by Internal Pagerank:
Now, the Internal Pagerank is not something you can directly tweak. It depends on the internal linking of the website (which is what you can adjust), and is heavily related to page depth. Most of a website’s “link juice” is at the top, and the deeper you go in the website, the less there is.
Page depth is measured as follows: The home page is at depth 0, pages linked on the home page are at depth 1, and so on. When there are several paths to reach a page, its depth is the number of clicks of the shortest path.
Let’s look at examples of the impact of page depth on Google’s crawl rate. The graphs below show, in blue, pages explored by Google, and in red, those that weren’t.
It’s extremely rare to find close to 100% of pages explored after a certain depth, usually 3 (or perhaps 4 for high volumes, and that of course also depends on the website popularity), and the proportion of pages crawled by search engines steadily decreases with depth.
Rule of thumb: try to have most of the website’s volume no deeper than 5.
And of course, check that your key content (products for an e-commerce website, articles for editorial content, etc.) is not too deep. Read about most common causes of deep pages and solutions.
So… once you are aware of the global situation, what can you do?
First, look at the same information (overall crawl rate, crawl rate by depth, crawl rate by Internal Pagerank), template by template. Botify allows to analyze a website by template, by defining Segments, prior to the analysis in the project settings, based on URL patterns.
This view by template will allow you to define priorities, and see which internal linking optimizations can be done: For instance, for product pages, you can add product-to-product links (horizontal navigation, with user justifications such as “Similar products”, “Accessories for this product”, etc.).
The graphs below show the distribution by segment, for all pages crawled by Google (on the left), and for all active pages (those that generated visits fro Google results, on the right):
This graph shows Google’s crawl ratio, by segment:
You can also get more details for a given segment, and Google’s crawl ratio and active pages ratio by depth for that segment, using a report filter:
This other graph shows, for each segment, how often Google crawls pages (among all those found by the Botify crawler on the website). The percentage indicates the number of days with Google crawls, over a 30-day period: for instance, > = 80% means at least 24 days over the 30-day period considered for the analysis. This is a great indicator of the interest Google is showing in of your website’s templates:
And for an overview of SEO efficiency for each of your templates, check out the graph below. Each horizontal bar represents a template on your website, the size of the bar on the left shows the number of distinct pages crawled by Google over 30 days, and the size of the bar on the right indicates the number of organic visits from Google over the same period.