As Mike King wrote in his tour de force article about the technical SEO renaissance, “it has always been a crapshoot as to whether that content actually gets crawled and, more importantly, indexed.” You need to be measuring and monitoring this content because search engines are using it for crawling, indexing, and ranking. It affects your bottom line.
Being able to crawl a website and join that data with other highly relevant information like server log files and web analytics is a core competency. With this breakthrough, you can get an accurate view of your site structure and how search engines are crawling it.
In a normal crawl of the page, we found 141 outgoing links, none of them to other products.
We can grab one of the comments and search to see whether it’s indexed‚Ä¶ and it is:
We created a custom HTML extract to capture the number of comments (using
) on the Last.fm page as well as the amount of content overall. In our normal crawl, we found no comments.
Success – 11 comments found! Now we can evaluate on-page content in a way that is much more similar to how search engines do than we could before.