Hello there Botify community!
On the menu today: the mystery of Ghost pages!
At Botify, ‚ÄòGhost’ pages are pages which are crawled by Google but which are not, however, linked to the structure of the site.
On the programme therefore: what exactly are these pages and how can Botify allow you to identify them? How can Google still crawl them and what is their SEO performance?
Don’t hesitate to leave us any comments and enjoy reading!
Botify can identify Ghost pages in just two clicks!
To recall from previous posts, we use two sources of information to feed Botify:
– the first is from our crawler¬†which roams the client’s site in order to make an inventory of pages present in the structure,
– the second is from analysis of server logs which allows us to know the exact passage of Google and the pages that are generating SEO visits.
As always on this blog, we will take an example from a real case.
The site we are looking at today is a seller site. Our Botify crawler has discovered nearly 130,000 pages in its structure, whilst Google had discovered 677,000 pages in the 30 days prior to this!
Big surprise… we found out that Google supposedly knew about more pages than there were actually present in the structure.
The Botify application represents the difference between pages in the structure and the pages crawled by Google in the following way:
The¬†pages found by Botify are at 129,694, those found by Google at 677,813, but the intersection of the two only represents 90,615 pages. ¬†
We therefore have :
– A 69% crawling rate of pages within the structure (that is 90,615 pages in the logs and the structure/ 129,694 total pages in the structure),
– an impressive number of pages (587,198 being 677,813 – 90,615) not connected to the structure, but discovered by Google nonetheless)
These pages unlinked from the structure are those pages that we call “Ghost” pages.
This poses several questions:
– are Ghost pages as effective in SEO as the pages attached to the structure?
– why does Google crawl these Ghost pages?
– and, of course, what should a SEO manager do with these Ghost pages?
In a word, is it serious doctor?
Ghost pages are less effective in SEO than the pages which are present in the structure.
This table, after using Botify, compares the SEO efficacy of pages present in the structure and unlinked pages.
Ghost pages are proportionally 11 times less active than pages in the structure (4% of active pages versus 45%)!!
This shows the importance of ¬†the structural context of a page in its positioning by Google (less of Page Rank contribution and less of semantics impact).
The real question is understanding why Google crawls pages that are not connected to the structure?
The principal explanations are the following (each one merits its own entire post on this blog):
– sending a sitemap containing pages which are not linked from the site,
– pages that have already been crawled in the past by Google that always respond in code 200 (and therefore Goolge regularly goes back to look for them),
– pages linked from another website (e.g. a link in an article towards a Ghost page),
– the influence of Adense and the Mediapartners robot (which we will discuss soon),
Ghost pages can come back to life, it’s possible!
In general, the best thing to do with Ghosts is either to raise them from the dead, or get rid of them…
It’s nearly the same thing in SEO with pages that are not linked to the structure.¬†
With the help of Botify you will be able to list “Ghost” pages by dimensions, in order to sort those pages that have potential from those which do not. For those which do have potential, you can reinsert them back in to the structure so that they can once again benefit from Page Rank and semantics.
As you have guessed, you will be able to base your linking strategy on the linking information provided by Botify. You will therefore be able to set the best internal linking strategy to position your pages with top, middle or long tail objectives.
For all Ghost pages which do not have SEO potential (duplicates, empty pages, error pages etc…), better to finally put them to rest by attributing them with the good old 410 code!
RIP Ghost pages!
Any questions? Comments? Don’t hesitate to ask! We will endeavour to respond.
P.S: here is a little “Hall of fame” of comparisons between pages in the structure and pages recognised by Google. The situtations are sometimes very different and this is regardless of the size of the site. Enjoy!