JavaScript Rendering Q&A With Google’s Martin Splitt

Posted on

Search is complex. 

To get the organic traffic and conversions we want, we first need to make sure our pages are crawled, rendered, indexed, and ranking. 

This is what we call the SEO funnel or “full-funnel methodology.”  

SEO funnel

As a funnel, this gets narrower at every step. In other words, there’s no guarantee that all the pages on your site will be crawled, no guarantee all your crawled pages will be indexed, and so on. 

While that makes sense in theory, we wanted to understand it in practice. This involved a study of 413 million unique web pages and 6.2 billion Googlebot requests over a 30-day period. 

how does Google crawl the web study by Botify

We found that:

  • 51% of those pages weren’t crawled by Googlebot
  • 37% weren’t indexable
  • 77% weren’t getting any organic search traffic  

Why is that?

While there’s no simple answer, we knew that the growing size and complexity of the web had some role to play, so we reached out to Google’s Martin Splitt to shed some light on these issues. 

You can keep scrolling to read the full interview, watch the conversation on-demand, or use the index below to jump to a particular section:

What role does JavaScript play in the indexing process?

“JavaScript only really plays a role in two spots (of the search process): crawling and rendering. It plays the biggest role in the rendering stage, but it has some implications on crawling as well.”

JavaScript's role in search

To help us understand the evolution of the search process, Martin walked us through the history of the web and how it’s evolved in the past 20 years.

“Originally, the web was a document platform. So you would have a bunch of documents like your homepage, a services page, etc. All of these things are informational. It’s basically like a page out of a book — a static document. That’s why it’s called a web page. Since then, we’ve introduced more interactivity to the web. The old web still exists, and it’s still perfectly fine to build static websites.”

However, a lot of people want more. They want to have the opportunity to add comments, they want live chat, they might build entire applications. You could, for instance, build an application that allows you to manage all your household appliances or manage a shared shopping list. You can still put that application on the web but it’s more interactive. It’s not just informational. Someone doesn’t go there just to fetch information. Someone goes there to use the application.”

“The web is really transforming into an application platform rather than a purely document platform. That’s where things get challenging. The line between what is application and what is content blurs.”

In other words, HTML is for information (for people who want to know something) and JavaScript is for functionality (for people who want to do something). 

However, JavaScript has begun to blur the lines between information and functionality. Sometimes, it does contain information, and that’s what SEOs need to watch out for.

JavaScript vs. HTML

“If you’re creating an online photo album, for example, where you can upload photos and people can comment on it, that content is embedded within an interactive application. You might still use HTML to express the content, but that content might be hidden behind interactions, and that’s where JavaScript comes in. Maybe you’re only loading the comments and the descriptions for images, or maybe you’re only loading the images themselves by using JavaScript as the user scrolls on the page (infinite scroll), then JavaScript needs to be executed. If you’re just crawling, and taking the HTML that comes initially from the server, you’ll see a bunch of skeleton content but you wouldn’t see the actual images because these are only loaded by JavaScript.”

“Instead of building your own JavaScript interpreter, you can just use the browser for it, and that’s the rendering stage. So now we have this extra stage that search engines are building, some don’t, where we basically open the website in the browser and then execute the JavaScript, which usually then generates additional content, then we take that content into indexing. So now we have this additional step we didn’t need when the web was just a document platform. As the web keeps moving forward and the boundary between applications and documents are blurring further, rendering is a necessary step.”

Does JavaScript have negative consequences for SEO?

We gave Martin this example of a client we worked with who changed their website from HTML to a JavaScript framework. As you can see, they immediately experienced a drop in organic search traffic. When they rolled back to an HTML framework, their traffic returned. 

JavaScript SEO risks

“A lot of people are very worried about this happening, and this does happen, but then again, this also happens if you change your server-side configuration, if you accidentally add a new robots.txt and disallow everything, etc.”

In other words, JavaScript isn’t the only reason why something like this could happen. 

“JavaScript is a tool, and there are many tools in your toolbox. There are different web servers, different content management systems — sometimes switching from one CMS to another doesn’t make any difference. However, if you misconfigure it, it might break, JavaScript or not. JavaScript is just one more tool. If you’re using it right, you’ll be fine. If you’re using it wrong, things can go wrong.”

So we decided to ask everyone in the audience during this conversation what they thought. Is JavaScript impacting your website’s organic search performance?

The results were split pretty evenly across the board:

  • 33% of attendees said JavaScript is not negatively impacting their performance
  • 31% said it is negatively impacting their performance
  • 20% said they had no idea how it might be affecting their performance
  • 16% said they hadn’t seen any impact, but they also hadn’t looked into it 
does JavaScript impact SEO

If you’re not sure how JavaScript might be impacting your organic search performance, you can do things like run a crawl on the HTML only version of your site as well as your rendered HTML and compare the two (you can do this with a tool like SiteCrawler). You can also use a tool like LogAnalyzer to see how Googlebot is interacting with your JavaScript.  

How can we know if a page has been rendered? 

Since rendering is such an important step in the search process — it’s necessary in order for your JavaScript-loaded content to be indexed! — then what’s a good way to test if our page has been rendered?

Someone specifically asked Martin if viewing the cached version is a good way to know.

Google cache
An example of how you can see a cached version of your page in Google search results.

While Martin said there’s no dedicated tool that allows you to pop in a URL to see “has this been rendered?” and get back a Yes/No answer, we can safely assume that everything in the index has been rendered, and there are some other tools we can use, but the cache option isn’t one of them. 

“Using the cache option to figure out if the page has been rendered is a bad idea because the cache feature is really old and hasn’t been maintained in the last couple of years. There’s no one actively working on it. The cache extracts information at some point during the search process, I believe sometimes it extracts it before rendering and sometimes after. So, it’s not a debug tool. It’s just a convenience feature so that if your server goes down, we have a copy saved of that page. That’s not necessarily what’s in the index.”

“If you want to know if we’ve seen your content, then you can use Google Search Console’s URL Inspection Tool — just click on ‘view crawled page’ and look at the HTML that we have rendered. If you want to test how that would look like when we crawl, render, and index again, we can do a live test that does pretty much the same thing. There may be small differences because of the way we do caching, but for the most part it’s the most accurate depiction of what’s happening.”

URL inspection tool

“If you want to test something that you don’t have access to in terms of indexing, or it hasn’t been verified in GSC because it’s something you don’t want indexed like the development version of a property, then I highly recommend the Rich Results Test or the Mobile-Friendly Test. Both of these not only give you JavaScript error messages but also the rendered HTML. Again, the rendered HTML is what you want to look for to see if your JavaScript executes properly. If so, the rendered HTML should contain all the content you care about.”          

view rendered HTML

Martin also recommended this video, which outlines various methods for debugging JavaScript SEO issues — thanks, Martin!

Why was the structured data testing tool replaced?

Speaking of testing tools, someone in the audience was worried about the structured data testing tool being replaced.

💡 Structured data is code you can use to mark up your pages to help search engines better understand what it’s about. Some structured data even makes your page qualified to show up in special “rich” features in Google search results. Learn more about structured data here.

They were curious if this happened because Google’s recommendations around schema had changed — are SEOs supposed to only focus on schema that will work for getting rich snippets in Google?

structured data testing tool going away
Pop-up message in the structured data testing tool

“This exact question is the reason this happened. The structured data testing tool is a tool that isn’t Google specific, technically, because it uses a bunch of validators and rules that aren’t Google product specific. Basically, we were mixing things. The structured data testing tool showed things that wouldn’t necessarily make you eligible for rich results, but at the same time, also showed you validation rules that were not in schema.org but were specific to Google products.” 

“So if you want a product to show up in rich results, for instance, then I believe it has to have an image, but that’s not required by schema.org. So people would run their schema.org markup through the structured data testing tool and wonder ‘hey why is this a requirement? Schema doesn’t say it’s a requirement!’ but Google was showing it as required because it’s what they specifically require on top of schema.org standards to show it in rich results.”

rich results test
An example of test results in the Rich Results Test tool

“That’s why we decided it’s not good to mix testing structured data accuracy and testing if something qualifies for Google rich snippets. They’re not exactly the same. We decided to make something that was just specific to the Google side of things, which is the rich results test. The rich results test tells you how you are performing in terms of rich results eligibility.”

“When it comes to the structured data testing tool… we would have to make large sweeping modifications to it to untangle the parts that are Google specific from the parts that aren’t. The structured data testing tool is not going away for a while, and there are other tools out there that can help you. Who knows where this is going. Maybe we can eventually open source a version or something like that — there’s no announcements for that yet but we’ll see where things go.” 

Are indexing APIs the future of indexing?

Since Google’s inception, indexing the web has been accomplished through crawling. But recently, search engines like Bing have started to shift from this crawl-only approach to one that integrates an indexing API. 

💡 Indexing APIs allow webmasters to submit content directly to the search engine, rather than relying on the search engine to crawl and find the content.

Botify even has a partnership with Bing that allows us to render pages for our customers and push them directly into Bing’s index. This saves a lot of time for both bots and websites.

Since we knew that only job posts and livestream events are eligible for Google’s indexing API, we asked Martin for more information on how Google views indexing APIs and where they might be going in the future.

“We don’t have any plans to announce in this area, but yes our indexing API is allowing two content types: livestream events and job postings. These are both pretty real time-y. That’s allowed us to experiment with this new format.” 

“I’m looking forward to seeing the future of indexing APIs. I can see potential problems with it, and I’m pretty sure Bing has thought about those as well. For example, why not shove every URL that you have into these indexing APIs all the time? You basically go back to square one. No one can crawl and index and process content all the time for everything that is on the internet. The web is too large. If every website out there starts pushing every page every day into these indexing APIs created by whoever offers them, that’s going to be hard.”

Martin then drew a parallel between indexing APIs and sitemaps, which weren’t a thing when Google started crawling the web.

“In the beginning, all we did was find a URL somewhere, fetch that URL, get all the links on that page, and that’s how we discovered that your website had more pages, and we’d go from link to link, and we could understand the priority of your content roughly based on your structure.”

For a visual explanation of this concept, check out this throwback video of Matt Cutts explaining how Google crawls the web.

“So if it’s something that’s on the home page, it’s probably more important than something where you need to click on the home page, then a menu, then another link in that text, etc. then you probably don’t care as much about that content compared to something that’s linked on the home page. Now that’s not complete obviously, because if I have a bazillion products then they can’t all be on the home page, but they’re still important. Maybe the first thousand products are most important to me, but I don’t want to put a thousand products on the home page, so then the sitemap mechanism was invented, where you could tell us what you thought was most important on your site.” 

“But as it turned out, eventually, a lot of people were saying everything was important, which didn’t help. That’s why that signal [the priority field in XML sitemaps] deteriorated in usefulness. So basically, we may run into the same problem with the indexing API where we have to give you a quota, then people are like ‘the quota is too small for me’ and everyone starts saying that, but we’d have to draw a line somewhere. I’m not sure if it’s the silver bullet that everyone hopes for. I do think it’s an interesting concept, and as you say, we are trying it out for certain types of content, but we’ll see.”

What are the dangers of infinite scroll? 

Next, we asked Martin about the dangers of infinite scroll.

💡 Infinite scroll is a JavaScript function that automatically loads additional content for the visitor to read once that visitor nears the end of the web page.

At some point, we remembered hearing Google’s John Mueller saying something about Google rendering pages using a very tall viewport — something like 9,000 pixels — so we wanted to know if that was still the case today.

“Generally yes, it’s not limited to a certain amount of pixels. There are other heuristics that we use, but yeah generally we are using a viewport that allows us to make sure we see all your content. There are implementation-specific details that may change tomorrow, so I can’t give you a number of pixels, but I would just check with the testing tools if we can see your content.” 

What are the different methods of rendering content?

There are many different methods for rendering content. If you’re newer to JavaScript, you may have just started hearing terms like “client-side rendering” vs. “server-side rendering” as well as terms like dynamic rendering and pre-rendering.

💡 Need a rundown of JavaScript? We wrote this breakdown specifically for SEOs! Read JavaScript 101 for SEOs on the Botify Blog.

So we asked Martin to give us an overview. 

“If you use a JavaScript framework, the default is client-side rendering. This means you send the bare-bones HTML over and then a piece of JavaScript, and the JavaScript fetches and assembles the content in the browser. It’s the equivalent of sending an IKEA set to someone and they have to assemble it in their home. As with IKEA, sometimes you end up with something that doesn’t look like the images, sometimes you end up with extra screws, and you wonder if your product is still safe. It’s the same in the browser. Things can go wrong during the network transmission, things could go wrong in the browser, the device could be low on battery, and you generally just have less control over the experience.” 

client-side rendering

“So then you could do something else. Say you have a blog. It doesn’t have comments and only changes when you make edits to existing blog posts or publish a new blog post. So there are very controlled moments in time where your website content changes, yet I might want to use a JavaScript framework for developer convenience, but it doesn’t really add anything to the website. So you don’t need to run JavaScript every time someone lands on the page because that’s wasting energy. So what you could do is prerendering. When you have very controlled moments of change on your website, you can basically run the JavaScript once on your server or wherever when the content should change, and then you get the HTML that falls out of that, and you put only the HTML and CSS and maybe some additional decorative JavaScript and put that on your server, so that whenever my browser downloads it, they just get the complete HTML. They see the article immediately. Prerendering is the equivalent of shipping the entire piece of furniture over, already assembled.” 

prerendering

“But what do you do when you don’t know when the content changes? I can’t just run the JavaScript once a day because I wouldn’t get all the changes in. I have to run it more often. Server-side rendering happens dynamically whenever it needs to happen (there can be a cache there doesn’t need to be though).”

“In both those scenarios, users and bots are treated the same way — they both get prerendered content from the server. But when you have problems that only concern bots, you can do dynamic rendering. So when the request comes in, you determine if it’s a user or a bot. If it is, you send that request to a dynamic renderer server that renders the page then gives the static HTML back to the bot, whereas if the user makes the request, they just get the client-side rendered version which they render on their own device.” 

dynamic rendering

How fast do we need to deliver content to bots?

We then asked Martin about how fast we need to respond to bots. This was based on something we had heard him say previously about “answering the bots as fast as possible to avoid timeouts.”

How fast is fast enough? 

“As fast as you can. One of the things that happens when you’re crawling the web is you’re running into a tradeoff you have to make. On the one hand, you want to make as many HTTP requests as possible to get as much content back from a website as possible. If you are an e-commerce site with a million products, optimally as a crawler, I would make a million HTTP requests in one go, get all the product information back, and then I can update my index based on that, and tomorrow I’ll do the same thing. But at the same time, web servers vary in capability.”

“Maybe you’re an e-commerce provider and it’s Black Friday and everyone wants to buy things from your website at the same time, so your web server is already heavily loaded. Now let’s say Googlebot comes along and makes 10 million requests when normally only 100 customers are shopping on your website at a given time. So maybe your server crashes and serves error pages to Googlebot, or even worse, your visitors. That’s something that not only you don’t want, we at Google don’t want to overwhelm and crash your server either. So we’re in this tradeoff situation — we want to get all your content but we don’t want to crash your server.” 

how Google crawls the web

“So we look at things like whether your server responds with 5xx errors. When we see this, we know that maybe we need to make less HTTP requests so we’ll slow down a bit. We’ll eventually start trying to see if we can go back to making more requests, but we’ll be very careful. Another thing that happens right before you push a web server over the edge is it starts getting slower, that’s a good sign a web server is about to be pushed over the edge. We do less requests then too.”

Is dynamic rendering cloaking? 

During our audience Q&A, Martin got quite a few questions about whether dynamic rendering would be considered cloaking in a variety of scenarios. 

According to Martin, no, you don’t have to worry about cloaking.

is dynamic rendering cloaking

“Generally speaking, no. If all you do is dynamic rendering, which is serving a pre-rendered version of your page to bots, and a client-side rendered version to your users, and it’s the same or roughly the same content, you would not risk a cloaking penalty.” 

can I hide my ads from googlebot

Someone then asked a similar question, but in the context of showing or hiding ads when Googlebot requests their page. Again, Martin said this wouldn’t be considered cloaking.

“Generally, Google is relatively good at spotting ads and not requesting or rendering them, so that’s not a concern I would have. If you do, I wouldn’t worry about them. The risk of introducing additional complexity to make it easier for the crawler is higher than the potential benefit. If you’re not seeing problems with your crawl budget, nor with how your website is rendered, it should be fine. It’s also not considered cloaking.”

is dynamic rendering cloaking

There was another question about whether serving Googlebot a single consolidated page rather than multiple parameter pages would be considered cloaking. Again, Martin responded that it would not be. 

“Generally speaking, most of the cases where people ask if something is cloaking, it’s definitely not cloaking. Cloaking is when you misdirect the user. People are a lot more nervous about cloaking than I think is warranted. Cloaking is specifically about spammy techniques or misleading users. If your website is about cats but for Googlebot you say it’s about dogs, that’s cloaking. But if it’s a matter of showing 5 cats to the user and 3 or 10 cats to the bot, that’s not a problem from our perspective in terms of cloaking. If your parameters show different content and you choose not to show your parameters to Googlebot, it just means we’re not going to see that content.”

One last question related to cloaking was about whether it was OK to use dynamic rendering to remove some URLs from your page, provided that those URLs were already blocked by robots.txt.

“Yes you can do that, but I would advise against it because it feels like it’s adding complexity because you have a version of your website that’s different than what your users see, which means you have a harder time testing it or you might forget about testing it. That mechanism could go rogue and produce incorrect values or errors that you don’t see. It just seems like more complexity and risk than benefit.”

What are the pitfalls of JavaScript redirects?

Because rendering introduces an extra layer of complexity, and an extra step to the indexing process, someone was curious about the potential pitfalls of JavaScript redirects.

Martin’s reply?

“There are a bunch! There is a use case where JavaScript redirects make sense, which is if you have a client-side rendered, single-page application and you want to do 404 handling properly, you might have to redirect or at least noindex the page, but that’s a different topic. Generally the problem is that an HTTP redirect can be caught right in crawling. If Google crawls and they get a 3xx, we know very early on at the crawling stage that the page has moved somewhere else, which means we’re more quickly getting to the new URL, which is what you want. If the only moment that we can detect that there’s a redirect is in rendering, that means we have to crawl, we get the initial HTML, we then have to render, in rendering we then see the redirect, we then have to queue that URL for crawling — it’s a lot later in the process, so that’s not great.”

“And then it can get worse. What if the JavaScript is fetched from an additional file? That means we need to crawl the JavaScript file, go into rendering, then crawl the script to find the redirect — but what happens if that JavaScript file is blocked by robots.txt and we don’t get it and instead get an empty page so we’ll never see the redirect. What if that JavaScript file 404s? That’s also not great. We would cache the JavaScript file so we might have an old version of it, but in general, JavaScript redirects are a lot more fickle than using an HTTP redirect.”

Should you consider reducing Googlebot’s crawl rate?

We then had a question from someone whose e-commerce platform was encouraging them to reduce Googlebot’s crawl rate. That seemed like an extreme measure, so they were wondering if that was necessary.

should we reduce Googlebot's crawl rate

“I would argue that if this platform says it’s an enterprise platform, they should be able to deal with the load. But then again, if you have a really large site and it can’t be handled, that’s just an argument to consider a different platform. You can limit the rate. It is obviously introducing the risk that we are crawling fewer pages than we might want us to, but if you are below, say, 1 million pages, then you should definitely not worry about this unless you have lots of really frequently updating content.”

Is it OK to have a server dedicated to Googlebot only?

“Sure! We don’t care. However, it might add complexity. Because if that server misbehaves, but the server you’re using to browse the website is fine, you might be confused when Google tells you something is a 5xx error. It’s just complexity that you should be careful with.”

How should you handle rendering elements on the page that are personalized, such as recommended product carousels?

“As far as the searcher goes, if they come to your website the first time through search results, they probably won’t get personalized results so I wouldn’t worry too much about it. Just make sure you have a good default experience.”

Will the Googlebot crawl time tool be migrated to the new GSC?

There’s a tool from the old Google Search Console that hasn’t been migrated, which is the Googlebot crawl time tool that shows how quickly Googlebot can access your pages. We asked Martin if this will be updated and added to the new (current) version of Google Search Console, and if so, if it would include rendering time. 

“It’s in progress! But no, it won’t include rendering time. The idea behind Google Search Console is to give insights that are actionable. There’s nothing you can do about rendering. It takes however long it takes. You do get an idea of how websites perform in terms of the user — that’s what the Core Web Vitals is about — but don’t worry about us. Basically, you can pretend that it’s instant.”

💡 Core Web Vitals are new metrics that Google will soon begin to consider as part of their ranking signals. Read more about those metrics and the page experience update here!

Watch the full interview on-demand

If you’d like to watch the full interview with Martin Splitt, it’s available here!

Watch the webinar with Martin Splitt on-demand!

A huge thanks to Martin for taking the time to talk with us and help us learn more about how Google treats JavaScript! 

For more information, we recommend checking out the following resources:

 

 

Related posts

Get more articles like this in your inbox monthly!