Caret leftBack to Blog

Crawling JavaScript For Technical SEO Audits

24th January 2017JeffJeff

Being the SEO manager for a website that uses JavaScript to render important content is especially challenging.

Your website has user-generated reviews, product recommendations, or even core site navigation that is critical to the user experience, but it's rendered with homegrown or third-party JavaScript. You want to be certain whether search engines are using this content for indexing and ranking, but you haven’t had a method to get data about this content and analyze it at scale.

JavaScript Crawl SEO Audit

As Mike King wrote in his tour de force article about the technical SEO renaissance, "it has always been a crapshoot as to whether that content actually gets crawled and, more importantly, indexed." You need to be measuring and monitoring this content because search engines are using it for crawling, indexing, and ranking. It affects your bottom line.

Now, all the power that comes from insights Botify provides for technical SEO analysis are available to websites that depend on JavaScript to render links and content. Know for certain - and at scale - which and how much of your JavaScript-rendered content is available to be indexed.

With the ability to execute JavaScript in the crawling process, you can now:

  • Know for certain your site structure and internal link distribution with JavaScript-rendered href links included
  • Know how much text content is actually on the page, how unique it is and how that content is changing over time with the JavaScript-rendered content included
  • Pair this information with crawl metrics from Log Files and Traffic metrics from web analytics

Search Engines & JavaScript

Search engines long ago realized that important and valuable content wasn’t being indexed because many sites were using JavaScript to render content (and links!). The downside for search engines was they weren’t able to promote the best results because they weren't executing JavaScript and therefore didn’t completely understand the content.

You know that Googlebot is executing JavaScript and rendering pages - and has been since 2014, at least. The proof is your indexed and ranking pages and the fetch and render tool in Google Search Console. But it’s been very difficult to replicate on your own. There has long been deep thought about how Google would crawl JavaScript sites and many people are running tests to see how Google is executing JavaScript.

Being able to crawl a website and join that data with other highly relevant information like server Log Files and Web Analytics is a core competency. With this breakthrough, you can get an accurate view of your site structure and how search engines are crawling it.

The Benefits of Crawling JavaScript

Enterprise websites depend on JavaScript for critical user features like product reviews, article recommendations, and other content and links. Let’s look at a couple examples to see what a difference being able to execute JavaScript can make for SEO auditing.

In his excellent article about auditing JavaScript for SEO, Justin Briggs used an example from Kipling USA, which has a personalized related products section with 20 links to other similar products. kipling usa related products 20161215

In a normal crawl of the page, we found 141 outgoing links, none of them to other products. kipling usa pre js crawl outlinks 20161215

After Botify’s new JavaScript crawl, however, we found the full contingent of outgoing links from this page: 165 including the 20 related products links (plus a few extra)! kipling js crawl 20161219

Executing JavaScript To Find Text

Product reviews and user-generated comments are standard types of content users expect on websites in many verticals, from e-commerce to travel and entertainment. Search engines understand the value of being able to index this content, even if it’s only accessible after a user interaction via JavaScript.

Last.fm acquires and publishes user comments in its “Shoutbox”. In the image below left is Last.fm’s Phish page with comments showing when JavaScript is enabled. Below right is the same page without comments because JavaScript is disabled. Last FM with JS enabled disabled

We can grab one of the comments and search to see whether it’s indexed… and it is:


We created a custom HTML extract to capture the number of comments (using <div class="shout-body">) on the Last.fm page as well as the amount of content overall. In our normal crawl, we found no comments. botify url explorer last fm phish pre js crawl 20161215

But in our JavaScript crawl of the same page we found: last fm js crawl content change 20161219

Success - 11 comments found! Now we can evaluate on-page content in a way that is much more similar to how search engines do than we could before.

Botify JavaScript Crawler

What Botify's JavaScript crawler can do:

  • Respect robots.txt rules
  • Capture and evaluate text content rendered with JavaScript
  • Capture and follow href links rendered with JavaScript
  • Be configured to not execute certain JS files, such as web analytics (to avoid inflating traffic metrics)
  • Caches resources to reduce load on the website

Coming soon, to get even more value from crawling JavaScript:

  • Capture and follow links created using onClick or other handlers
  • Render JS content that only loads in response to request from specific user agents

Executing JavaScript to render a page takes longer and puts more stress on your website because of the multiple resource requests per page. It is recommended to begin by focusing on subsets of your website that most depend on JavaScript, to see what SEO opportunities are lying in wait!

What will you find when you crawl JavaScript-rendered content and links? Get in touch to learn more about crawling JavaScript with Botify.