Crawl & Render Budget Log File Analysis

How To Detect New URLs Crawled By Google

How to Detect New URLs Crawled by Google

23rd January 2013AnnabelleAnnabelle

Hello Botify community,

All SEO experts will confirm to you that an important part of their daily activity is to verify that any changes to the website will not be able to affect their SEO optimisation strategy.

It’s not always easy to avoid SEO regressions, between changes carried out to the site and different changes made by Google.

Today’s post is therefore an occasion to show you one of the functionalities for log analysis proposed by Botify, which allows you to control new pages crawled by Google each day, and often, to assure you that new crawled pages are not completely useless for your SEO strategy.

Adding content to your site is, in principle, a good thing for your SEO… In principle!

We consider that a page is new since it has never previously been crawled by Google (=never previously been identified in logs).

New pages crawled by the search engine are often good news because this signifies that there is more content on the site. The site will also benefit from the positive effects of being updated. In order to be sure that the impact is truly effective, it is necessary to ensure that these pages are of real interest for Internet users and that they really will be used if Google goes ahead and presents them in the page of results.

Google webmaster tools already give you information on your crawl or your indexing. This information is most of the times incomplete, because you don’t have the possibility to modify the period of analysis, nor the possibility to dimension URLs in order to identify in detail the evolutions of your crawled pages. You can hardly draw conclusions on the positive or negative effects of an increase or decrease in the number of indexed pages.

Boify detects new URLs that have been crawled by Google each day

The data presented today is taken from another online store that the website studied in previous weeks (more than 100, 000 pages present in its structure).

One of the first views proposed in Botify allows for verification each morning that the volume of new pages crawled by Google is coherent in relation to the number of pages published each day on the site.

This information is summed up in the following graph:

In this example, between the 1st and 19th of December, the number of new URLs crawled seems completely coherent with the pages published each day on the site (several hundred). On the other hand an abnormal phenomenon appears on the 20th September. The site suddenly has 18,000 new pages discovered by the Google bot in a single day and 23,000 supplementary new pages the day after. In total, 41,000 new pages crawled in only two days? This represents nearly 50% of the volume of pages in the structure! Why is there suddenly such a change?

The answer is very clear: the volume of new pages has not been modified during these days (no new categories were put online, no migration etc…) and it therefore very much seems that Google was in the process of discovering new pages that it shouldn’t be exploring. As a result, it is necessary to quickly identify these pages and to understand why they are there in order to avoid any consequences that could be problematic for SEO (Google downgrades, lower audience, increase of useless crawls etc….)

In two clicks, Botify identifies new crawled URLs

The following graph gives you the same information as the previous graph, but also gives detail  of the dimensions of the new pages crawled.

We find that the crawled pages belong almost exclusively to the pages that have a search filter.

The sudden appearance of these pages is due to the starting of production, that generates new parameters in the URLs and which therefore produces many URLs that are useless and hazardous for the efficacy of the Google crawl.

The following chart compares the distribution of the crawl (pages with/ without potential) between the week where it took place and the previous week.

Thanks to Botify, the SEO manager was able to bounce back quickly and immediately correct the problem. There were therefore no consequences on the traffic.

Without correcting this problem, the crawl of the site would have been completely disorganised, and with some medium term consequences:
– a long time spent by Google on pages without potential
– a probable indexing of some pages with low Page Rank and Trust Rank
– a ranking by Google that will be less effective and less used
– furthermore, a lowering of relevance and usage criteria calculated by Google for all of the site and therefore a lower ranking.

To conclude, Botify allows you to ensure each morning that Google does not suddenly crawl the pages that you were actually wanting to hide. You will gain a lot of time and discover problem areas in just a few clicks (correction of bugs or underlining new strategies) that will be really effective for your SEO audience.

Stop searching and spend more time on your strategy and optimisations!

Ideas? Comments? Thanks in advance!

Nov 11, 2014 - 6 mins

Crawl Speed: How Many Pages/Second? 7 Points To Take Into Account

Crawl & Render Budget Log File Analysis
Jun 6, 2019 - 3 mins

How SEOs Can Get Access To Their Server Log Files

Crawl & Render Budget Log File Analysis
Oct 10, 2020 - 5 mins

7 Actionable Ways to Gain Crawl Budget

Crawl & Render Budget Log File Analysis