Log File Analysis

Holy Crap, Google Is Blind!

Holy crap, Google is blind!

12th December 2012AnnabelleAnnabelle

Hey there Botify community!

Today’s post is a little provocative: is Google blind?

Let us first reflect on our findings from the first post: we saw that the Botify crawler allowed us to understand how the structure of a site was built. The distribution of pages by category and by depth were the following:

seo structure by depth and dimension botify

This first bit of information allows us to understand how the pages were shared but could not deduce the number of pages actually crawled by Google.

We had a feeling that Google certainly does not crawl all pages and we questioned the efficacy of the Top Products pages. 

We analysed server logs recovered from the website studied. The objective was to determine the passage of the Google bot and to compare them to pages found by the Botify crawler in the structure of the site. (30 days of logs has been used).

Only 38% of pages present in the structure are crawled by Google in a 30 day time frame.

The illustration below represents the number of pages in the structure that are crawled by Google. Google had crawled only 38% of the pages in the structure over 30 days. 62% of the pages are therefore unknown by Google!

crawled pages in the structure by botify

crawled pages in the structure by botify

These histograms also allow us to show that the rate of crawling rapidly decreases in terms of the depth of pages.

We had seen in the previous post that more than 70% of the Top Products pages were at depth level 7. Mechanically, Google was not able to crawl more than 20%! Consequences are probably bad in terms of SEO. 

So let’s take a look at the SEO activity of the site. We consider that a page is active in SEO when at least one visit has been made ¬†from Google search results.

100% of SEO traffic is done on only 19% of the sites pages!

The following histograms represent the number of active pages by level of depth:

active pages in the structure by botify

active pages in the structure by botify

These graphics show us that the rate of active pages decreases with the level of the page depth. The pages are more often active since they are positioned at the top of the website.

The pages with a depth of 7, where the Top Products are principally positioned, do not surpass 10% of active pages. ¬†Yet we are speaking about a category that generates the most revenue put of other marketing tools… Here there is probably an enormous lever available for the augmentation of traffic and SEO revenue.

To finish up, many conclusions are apparent to us:
– Google ignores entire sides of a website and Botify reveals the exact zones that are ignored,
– SEO traffic of a website is only made of parts of the page inventory,
– It is useless to try to optimise pages that are not crawled by Google as they will not be indexed or active,
– A huge growth driver of traffic and SEO turnover rests on how best to present useful content to Google.

We will found out in the posts to come crawling and page activity can be improved and how Botify can help with this.

Thank you in advance for all your comments!

Jun 6, 2017 - 3 mins

Forrester Report: Botify The Only Enterprise Platform With Core Capability Of Log File Analysis (Google Crawl Budget)

Log File Analysis
Sep 9, 2014 - 2 mins

Fine Tune Your Graphs In Botify Log Analyzer Reports

Log File Analysis
May 5, 2020 - 6 mins

Orphan Pages & SEO: What Are Orphan Pages & How Do I Find Them?

Log File Analysis