Crawl & Render Budget Log File Analysis

Is Google Keeping Pace With Your Website, Or Lagging Behind?

Is Google Keeping Pace with Your Website, or Lagging Behind?

30th September 2014AnnabelleAnnabelle

How well does Google keep up with your new content? If, like many websites, yours regularly publishes new pages – weekly, daily, constantly… -, you need to know. Are all new pages explored by Google immediately? Do new pages generate organic visits right away? Or do older pages keep generating the bulk of the traffic?

Figuring that out will help prioritize and target your SEO optimizations. That’s why Botify Log Analyzer includes the following indicators:

  • Daily new pages crawled (“new unique URLs crawled”): number of pages which were crawled that day, for the first time ever, since log analysis was activated.
  • Daily new crawl volume: crawl on pages which were never crawled before. This is the total number of crawls on pages which were never crawled before that day. This indicator is interesting if much higher than the number of new pages crawled, to get a sense of the average number of crawls per page.
  • Daily new¬†active pages: number of distinct pages which generated their first organic visit that day
  • Daily new visits (short for “visits on new active pages”): number of organic visits generated by pages which generated their first organic visit that day.

Google’s crawl on new pages

New pages resulting from the website’s normal “life”

New content of an existing type is regularly published.
For example:
Editorial websites: new articles are created on a daily basis. They are placed in sections that also already exist, for the most part.
E-commerce websites: new products are regularly added (periodical product catalog updates).
Classifieds: new ads are constantly added by users.

New pages resulting from one-time changes

A new section or new type of page is added to the website, or a type of page which was disallowed to robots is now allowed.

New, unwanted or unexpected pages

Two sorts of unexpected new pages can appear in the new crawl:

  • URLs that we already know about or could anticipate (non-rewritten URLs for instance): we know these URLs exist and could be crawled, or they were crawled in the past and we want to verify that that won’t happen again. These were categorized and tagged as “warning”, so that they appear in Botify Log Analyzer’s “Alerting crawl” graph.
  • URLs which are not categorized (as a result, they are not flagged as “warning” either), and appear in the “new crawl” graphs under “Other”.

The example below illustrates all three cases in an e-commerce website: normal periodical changes (product catalog updates), a one-time change, and unwanted pages.

First, let’s take a look at all URLs crawled every day:

Botify Log Analyzer, distinct URLs crawled

Mainly products, and products duplicates, to a lesser extent.

Now, let’s see new URLs crawled:
It’s interesting to compare the global typology of all crawled URLs with the typology of new crawled URLs. The former indicates what the website looks like from Google’s perspective, the latter how the search engines sees the website evolve.

Botify Log Analyzer, new URLs crawled

The vast majority of new URLs crawled are product duplicates.

If we take a closer look at our example:
New URLs crawled in yellow and bright green over the first days are new navigation pages.
URLs in darker green are duplicates of product pages. They are flagged as “warning” and will appear in the “Alerting Crawl” graph. The counter above the graph shows that there are 129.4K warning URLs crawled – these are most of new URLs crawled.
New product pages appear in pink. So there are very, very few, actual new products, compared to all the product duplicates Google keeps finding.

Let’s zoom in on product pages, (select a page type at the top of the page, instead of showing “all website”):

New product pages are regularly published and crawled, they just become insignificant among large amount of new duplicates.

Considering that, on this website, product pages represent 74% of active pages (see below) and 56% of organic visits, dealing with product duplicates is an absolute necessity and a top priority.

New active pages

All active pages

As expected, products duplicates don’t generate any organic visits.

New active pages

Page type distribution (distribution by page category or tag) can differ significantly between all active pages and new active pages, depending on the type of content, and the portion of organic traffic which is expected to be generated by fresh content.

In our example, content freshness is not key to generate organic traffic. New active pages represent 22% of all active pages, but visits on new active pages represent less than 5% of organic visits.

Here, list pages are new pages for the most part (as seen in the new crawl graph), so it is not surprising to see they generate new active pages. It would be interesting to see how active they remain over time.

Also, while Google seems to easily distinguish between product and product duplicates (the latter don’t generate organic visits), the search engine does not seem to be able to distinguish as easily between lists and duplicate lists: duplicates list are almost as active as “real” list pages.

If pages are mainly active when they are new, it usually means that they are active because they are new:

  • Google detected that the query they respond to deserves content freshness (news-oriented, for instance)
  • The pages benefit from a short-lived freshness boost: Google just discovered these pages, doesn’t really know what to think of them and decided to try them out to see if users like them. This often happens when there is a large volume of pages.

In this other example (a classifieds website), the distribution of new active pages is closer to the overall distribution of all active pages:

Differences still deserve to be examined.

Did you make any observation regarding your fresh content vs older content? Don’t hesitate, let us know ! Leave a comment!

Oct 10, 2020 - 5 mins

7 Actionable Ways to Gain Crawl Budget

Crawl & Render Budget Log File Analysis
May 5, 2024 - 5 mins

Driving More Visibility: Technical SEO Essentials for Mid-Size E-commerce Brands

Crawl & Render Budget Log File Analysis
May 5, 2024 - 10 mins

The Future of SEO Is Here: AI Overview, GPT-4o, Plus GPTBot Data Now in Botify

Crawl & Render Budget Log File Analysis