On Monday, November 4th, I traveled to Google’s Webmaster Conference in Mountain View, California, to dive into the topic of search with other SEOs, webmasters, and product managers from Google.
Community is essential for tackling challenges in SEO, and, as you can imagine, having Google lead the conversation was pretty incredible. Product managers responsible for Googlebot, rendering, and more talked about specific challenges their teams are facing, solutions they’re working towards, and helpful technical SEO tips.
Similar to Botify’s methodology, Google prioritizes the end-to-end processes of crawling, rendering, indexing, and converting. At Botify, we also plug real rank into the mix (a step between indexation and converting), which essentially means getting your content ranking for real queries. My team is always working on anticipating what’s next for Google and other search engines in order to develop more solutions for our community. The Webmaster Conference gave us a peek not only into the future but also at findings on a granular level that can help websites perform better in search.
Crawl factors: insights about load times, unreachables, and canonicals
As an SEO, one of my biggest concerns is getting Google to crawl as many important pages on a website as possible (a.k.a maximizing crawl budget). When the product manager of Googlebot began presenting to us, I was pretty stoked.
Take a look at what he had to say:
.5 seconds is the average load time
So, you know how fast your site is, but do you know fast it is compared to the whole web? Googlebot’s download time continues to get faster and faster, reaching around .5 seconds on average. This means that .5 seconds is the average load time for the entire internet. So, if your load time is two seconds, you should definitely make some changes.
Just last week, Google added the beta version of the Speed Report to Search Console to indicate in real time how users are experiencing your website. I recommend taking a look to see how your site is doing, and then consider your load time in the context of the web. You can check load times directly in Botify or by using the new speed report.
200s and 404s are O.K. for robots.txt files. Unreachables are not.
Googlebot checks the robots.txt file every time it starts crawling your website. If you serve a 200 or 404 on robots.txt, everything is A-OK. Each of these options gives Google a set of rules or no rules, and Google will re-crawl. On the other hand, other server error codes can cause the crawl to slow down.
Alternatively, if your robots.txt is “unreachable” (as in, the server doesn’t even respond), this creates an issue. If Google can’t find anything, it might not crawl your website at all. Interestingly enough, 20% of the time that Google crawls a robots.txt, the server is unreachable. C’mon guys, we’re better than this!
Google might not respect canonicals (but you should still use them wisely)
Keep in mind, canonicals are not an authoritative signal. Therefore, Google might not respect canonical tags and index the non-canonical version. It actually only uses canonicals as a hint in order to get more verification about something it already knows.
“Deduplicating” content for the index is a big challenge
Next up, we heard from the indexation product manager. Did you know that indexing is the single most expensive component of search? Google has to pay to store the entire contents of the web! That means, Google has to make a lot of decisions about what gets indexed and what doesn’t. It does a pretty good job of evaluating what’s duplicate and what’s not, but it’s still a challenge (and a lot of work).
Duplicate content is a pain, not only for SEOs to manage but also for Google. Could you imagine how big the internet would be if duplicate content wasn’t handled properly? Google has an entire team of product managers working on “deduplication” for SERPs. Their mission is to define and identify duplicate content and pick canonical URLs (a.k.a. authoritative content pages) to consolidate (or cluster) content.
How does Google predict duplicate content?
Google looks at URL strings to predict duplicate content, which to me seems like a clear indicator that it’s looking at more than internal/external links or pass signals to determine what it’s going to crawl. Google also tries to remember things about URLs. For example, if you use a noindex tag while updating content and then remove the noindex tag, Google will remember that the link was “bad” and likely won’t index it right away after the tag was removed. So, use the noindex tag with caution.
Meanwhile, the easiest way for Google to identify duplicate content is through redirects (301s and 302s). Redirects provide a clear signal that, although the content may not be the same, two URLs can be clustered into one. This is especially important to consider if you’re redesigning a website where the URLs change. Make sure you use redirects, because not using them can have a big impact on whether or not your pages get indexed. Soft error pages get clustered as well.
Google has to fetch 50-60 resources while rendering the average page, and only 60-70% of those resources are retrieved at cached rate. Furthermore, Google will ignore caching rules and retrieve the newest version of a page if it detects new information, while minimizing the resources it’s fetching (despite robots.txt rules).
Crawl costs increase for Google at about 20x when it begins rendering. So, if a page maxes out or has a slow load time, Google will cut off rendering, interrupt scripts, and mark a page as unrenderable in order to preserve resources and CPU consumption.
Ranking: what does Google prioritize?
*Relevance is the Big Kahuna *
According to the product manager of ranking, relevance is one of the main signals Google measures when looking at rank performance. To better serve relevancy, Google promotes diversity in search results in hopes of better satisfying the searcher. In other words, it will surface results from different types of websites that may be interesting to its users. For example, a query for the movie The Joker can pull a result from a forum, a starred review, an article about the movie, and so on.
Google also uses dozens of on-page signals to determine what to include in snippets and sitelinks. It evaluates their relevance by analyzing the internal relationship of sitelinks. It also looks for parts of a page that seem important, like, for example, the number or menu of a restaurant, and serves those sitelinks to users so they can get to where they need to go faster.
Trustworthiness wins out
Another point that was particularly interesting was that Google doesn’t value the accuracy of a page that’s ranking as much as it values its trustworthiness. In fact, Google contracts 30K people every year to work on judging content for trustworthiness (as opposed to fact-checking).
How does language factor in?
Google is working on becoming better at understanding content in different languages. Unfortunately for the time being, pages that have content in less popular languages aren’t competing as well with pages with content in very common languages. When content is not in one of the main languages, it tends to perform worse because there’s less training data for the algorithm. This plays a role in the recent BERT update (which I’ll mention later).
The Future of Google Search
There are a lot of changes coming to Google’s SERPs, and I was excited to hear about them firsthand last week.
What does the future _look like_?
For one, Google is planning to further move away from the traditional list of ten blue links in organic search. Whether that means implementing a grid or some other framework, our SERPs may be getting a facelift.
Our friend BERT
You’ve probably heard the recent announcement about BERT, the recent algorithm update using AI to provide relevant results for queries around keywords in combination with their context. BERT only works in English searches at the moment as Google finds it harder to process other languages in the algorithm. However, they mentioned that their BERT-enabled technology will grow to encompass more language capabilities in the future.
And, finally, a few points about content
Google’s team covered tons of tips about content quality throughout the day, some of which I was already familiar with and others that were completely new.
Here are some of my favorites from the day:
Website security is more complex than you’d think
While 75% of the web is secure, there are still some sites using HTTP. However, even if you are on HTTPS, it’s still possible that Google could consider your page not-secure. That is, if you have any kind of not-secure content anywhere on your page, like an image file that’s HTTP, it can send a negative ranking signal to Google (despite your secure server). Essentially, Google wants to serve pages that are completely secure, not just that have HTTPS in the URL.
Making the most of image search
If you want an image to rank, you’ve also got to think about the content of the page in which the image exists. In addition to including a good file name and using alt text, you also need to include text next to the image to help Google (and searchers) understand its meaning. The quality of the image and structured data also play a role. Meanwhile, Google does use some machine learning to understand the context of the image, but this is not a strong ranking factor.
APIs for better indexing
At Botify’s recent Crawl2Convert in New York, Bing shared the news that they’ve launched an API to enable instantaneous indexation (and the API is integrated in the Botify platform!). Google stated at the Webmaster Conference that, while they’re not going to move as fast as Bing, there are plans to launch something to help large enterprise sites. Woohoo!
The Google Webmaster conference was a win all around. I was able to connect with SEOs and webmasters from across the globe, including some of our very own clients, and get deeper into the nitty-gritty of Google’s inner workings.
I’m looking forward to seeing how Google’s product managers innovate for the challenges they discussed and seeing some of their new ideas come to life. Hopefully you found my favorite insights from the day helpful for advancing your SEO knowledge and processes. There’s always more to learn, and I’m looking forward to many more discussions with the greater SEO community.
Psst! If you’re interested, Botify hosted a Twitter chat to discuss Google’s BERT update. If you’d like to answer a question or have other thoughts about BERT, here’s where you can do that. Pop in any time to help us pick apart this complex subject, and be on the lookout for future #BotifyChat discussions on Twitter.