As a technical SEO, there’s almost nothing more satisfying than digging into a complex indexing problem and finding a solution that helps search engines find your content — especially when your stakeholders are counting on that content to drive revenue.
In our most recent article on the Botify Blog, we covered the differences (and synergies) between the SEO funnel and the marketing funnel and the cascading effects technical SEO can have on revenue. When we stop to think about how these changes improve the bottom-line metrics our execs care about, namely SEO ROI, crawl budget has a huge impact.
If you need a refresher, crawl budget is the maximum number of pages a search engine will crawl on any given website. Because search engines don’t have unlimited time and resources to crawl all the content on the web all the time, they prioritize what pages they’ll look at based on how healthy and popular a site is.
This budget is why search engines miss more than 51% of an enterprise site’s content. The good news is — you can do something about it. (Psst! If you’re curious about how to calculate your site’s crawl budget, take a look at our recap of TechSEO Boost 2019, where G2’s Jori Ford breaks it all down!).
Whether you have an e-commerce site with a huge faceted navigation or a publishing site that’s constantly adding new content, there’s almost always room for large websites to improve their crawl budget.
Here’s what to consider.
By using your site’s robots.txt file, you can tell search engine bots what to crawl and what to ignore. If you’re unfamiliar, robots.txt files live at the root of websites and look like this:
Visit Google’s documentation for more information on creating robots.txt files.
So how do these files help preserve your crawl budget?
Let’s say, for example, you have a large e-commerce site with a faceted navigation that lets you sort the content without changing it (e.g. sorting by price, lowest to highest). You’d want to disallow search engines from crawling those sort pages because they’re duplicates of the original page. You don’t want search engines wasting time on them since you don’t want them in the index anyway.
This reminds us of a story Technical SEO Manager at REI Ryan Ricketts shared at our Crawl2Convert conference. His team cut their website down from 34 million URLs to 300,000 and saw drastic crawl budget improvements. Or when Hubspot’s Aja Frost worked to cut down thin pages to increase traffic.
Your robots.txt file can be an important step to take in directing search engines away from your unimportant content and towards your critical content. If you’re a Botify customer, know that our crawler will follow the rules defined for Google in your website’s robots.txt file. However, you can also set up a virtual robots.txt file to override those rules.
It’s important to note that disallowing search engines from certain sections or pages on your site does not guarantee that search engines won’t index those pages. If there are links to those pages elsewhere, such as in your content or sitemap, search engines may still find and index them.
Which brings us to our second point.
To avoid wasting your crawl budget, make sure you’re linking to the live, preferred version of your URLs throughout your content. As a general rule, you should avoid linking to URLs if they’re not the final destination for your content.
For example, you should avoid linking to:
Don’t waste your crawl budget by sending search engine bots through multiple middlemen (a.k.a. chains and loops) to find your content. Instead, link to the ultimate destination. You can learn more about finding and fixing 301 redirect errors in our recent blog post, which can be a big step toward improving your crawl budget.
Avoiding common XML sitemap mistakes is a great step to take if you want to improve your crawl budget. After all, it’s the map that search engines use to find your so-called treasure.
What mistakes are those?
Including only live, preferred URLs and making sure you’re not leaving out key pages that you want search engines to crawl and index is critical. Have old product pages? Make sure to expire them and remove them from your sitemap.
Botify can help you audit your sitemap for errors to reduce your crawl waste.
Consider this example.
In other words, switching to server-side rendering can free up search engine bots to spend more time on your important pages.
Applying these optimizations on a site with millions of pages can open up a wealth of opportunity — not only for your crawl budget, but your site’s traffic and revenue, too!
When you cut out the excess (a.k.a. crawl waste), you’re not only opening the door for search engines to find your most critical content. You’re increasing the likelihood that more searchers will discover (and convert!) on that content.
Crawl budget isn’t just a technical thing. It’s a revenue thing. So bring the bots – and visitors – only to the good stuff!