Crawl & Render Budget Log File Analysis

How Incomplete Is Google’s View Of Your Website?

How Incomplete is Google’s View of Your Website?

16th September 2014AnnabelleAnnabelle

“We never suspected Google crawled such a small portion of our website!” That’s the most frequent reaction, when our customers see their first Botify Log Analyzer report. And that’s precisely why crawl rate is one of the key indicators you will be willing to work with.

There is a wide variety of SEO tools on the market, and a wide variety of indicators. All are interesting from one of the many SEO perspectives: ranking, visibility, traffic, robots’ crawl, external links, internal links, website structure, technical performance, duplicate content, semantics…

With so many indicators, many influencing others, it’s easy to loose sight of the big picture. What will have the biggest impact on traffic? Where should we start?

Locating lost opportunities

Let’s look at the sequence of events that leads to an organic visit from Google:

1) Google explores the page
2) Google indexes the page
3) The page appears in search results
4) The user clicks on the result

Each step involves a different type of optimization:

Opportunity lost at step 1: the page is not crawled by Google
It happens either because Google can’t find the page (technically impossible to access? too deep?). Or because the search engine won’t explore it, anticipating that it won’t be interesting because the url looks like many others it has already crawled and found devoid of interest.

Opportunity lost at step 2: the page is not indexed
It clearly indicates that Google did not like what it saw. The reason could be that the page has virtually no or very little content, or that its content is like that of many other pages (duplicates or near duplicates).

Opportunity lost at step 3: the page does not appear in search results (or too far)
This can be related to low content quality, low website popularity / trust / authority (in general or compared to the competition) and insufficient Internet users’ usage statistics. Or, the website would actually stand a chance against competitors, but the page is optimized for the wrong keywords – not those that Internet users search.

Opportunity lost at step 4: the user does not click on the link in Google’s result page
Clicks can be encouraged through rich snippets (with picture, video thumbnail, sample prices, breadcrumbs…). That implies making metadata available to search engines in the page code so that they can use more attractive display formats in search result pages.

All levels of optimization are necessary to achieve a website’s full traffic potential. But if most opportunities are lost at the first step, working on the last steps first will only yield limited results, until the first step’s issues are resolved.

Unleash your website’s potential

Information about Google’s crawl, with page-level details, can only be found by analyzing web server log files. That’s the unique value proposition of log analysis tools. However, log analysis alone provides a one-sided view: that of Google.

Website crawlers, on the other hand, draw a full picture of the website ‚Äì their crawlable part ‚Äì, the real picture being usually quite different from what we expect. But these tools have no clue about Google’s crawl activity.

As it integrates both a log analyzer and a website crawler, Botify Log Analyzer provides an accurate vision of reality: it compares what’s in the website to what Google actually sees.

Here are some of the questions Botify Log Analyzer answers, with specific indicators:

** What part of your website does Google see, and what part doesn’t exist through its eyes?**

Indicator: crawl rate.
The perimeter is set by all pages found by the Botify crawler, starting at the home page and exploring systematically all links found. The crawl rate indicates what portion of these pages are explored by Google over a 4-week period.

Can search engines robots easily reach your content?

Botify Log Analyzer page depth distribution

Indicator: depth distribution.
It indicates how may clicks are necessary to reach the page, starting at the home page.

What portion of your website generates organic visits?

Indicator: active pages rate.
It shows the portion of pages found by the Botify crawler that generate at least one organic visit, over the analysis’ 4-week period (as visits information is also extracted from server logs).

How fast does Google discover new pages? (and… does it?)

Botify Log Analyzer new crawl

Indicator: daily new crawl.
It shows crawled pages that Google explored that day for the first time (ever). This is extremely valuable for websites with a fast publication rate, or after adding a new section.

Site-wide indicators provide a idea of where we stand. But that’s just a start. We also need sharp diagnosis abilities and actionable data to identify high-ROI SEO projects – quick wins and longer term projects alike. That means we need to be able to zoom in on different website areas. That’s where URL categorization comes in.

With URL categorization, each indicator becomes razor-sharp

Any indicator that covers the home page as well as the rest of the website can’t be representative of other pages. Other pages shouldn’t be considered as a bulk either: on an e-commerce website for instance, product details pages can’t be placed in the same bucket as product section pages: there are more product pages than section pages; they are deeper, by nature; internal linking patterns are also very different. As a result, their indicators (and our expectations) will differ. We should also separate the first page of a list from pagination, duplicates from valid pages, etc..

Here are a couple of examples from Botify Log Analyzer:

The graph below shows page volume distribution by type of page as well as crawl rate for each type of page: each bar represents the number of pages found in a specific area of the website, in green for pages which were crawled by Google, in red for pages which were not.

Botify Log Analyzer crawl rate by type of page

Another example with page depth distribution with page categories details (volumes on the left graph, the same information as percentages per depth on the right):

Botify Log Analyzer depth by type of page

This segmentation is achieved through URL categorization: URL pattern rules indicate which category a page belongs to – they can even take the form of a tree structure, if sub-categories are necessary.

This means you will not only get answers to the questions above (and more) for the whole website, but also for each page category that you chose to define. You can get the level of detail you need to make decisions. You’re all set to build the perfect SEO roadmap.

This is an illustrated version of the article we published at Brighton SEO, in the conference’s print magazine. Take a look at our presentation’s slides. Let us know what you think!

Dec 12, 2017 - 3 mins

Google Confirms: SEOs Should Use Log Files To Prepare For Mobile-first Index

Crawl & Render Budget Log File Analysis
Jan 1, 2017 - 4 mins

Google Confirms SEOs Should Control Their Crawl Budget

Crawl & Render Budget Log File Analysis
Sep 9, 2014 - 4 mins

Is Google Keeping Pace With Your Website, Or Lagging Behind?

Crawl & Render Budget Log File Analysis