A sitemap is a file which provides information about the URLs and other content that a website domain contains (in other words, providing a “map” of the site). It will typically include the most important or valuable URLs for the website.
There are different formats a sitemap can use; XML and HTML are the most common types you’ll see in an SEO context. As a general rule of thumb, the XML sitemap is primarily intended to help search engine crawlers, while an HTML sitemap is more of a tool for website users to navigate and understand the structure of the website.
An XML sitemap uses the XML (Extensible Markup Language) format to describe a list of URLs that exist on a given website. The simplest form of an XML sitemap lists the page locations (i.e., the URLs on the website). There are other types of information that can be included for these URLs, such as when the page was last modified; or, for sites with multiple country and language versions, you can include the international variants of that page (these are known as “hreflang” annotations).
The XML sitemap provides a guide to search engine crawlers for easy discovery of the URLs you want crawled and indexed. When you submit an XML sitemap to Google via Google Search Console, you’re basically telling the crawler that these are the pages you consider to be valuable for searchers, all in one convenient location. This means the crawler doesn’t have to rely on a crawl of the entire site to find these URLs.
It’s important to note that, while the XML sitemap allows us to tell crawlers which pages we want them to index, there is no guarantee that they will index every page in a sitemap. The sitemap makes it easier for the crawler to find the pages; but the search engines will still analyze each page and determine relevancy and quality.
The XML sitemap is your chance to tell the crawlers about the pages they should pay attention to. Make sure you include all your search-friendly and valuable pages.
There is a limit of 50,000 URLs per sitemap. If you have a large site, and need to include more pages than the limit, you can use multiple sitemaps and a “sitemap index” file to help crawlers find all the sitemaps and see all of your URLs.
Keeping your sitemaps up to date ensures that new pages get seen by crawlers and that old / outdated content isn’t being revisited unnecessarily. Ideally, this process should be automated – and it doesn’t have to be hard! For sites using a CMS like WordPress, there are plugins that you can use to generate your XML sitemaps and keep them updated.
Most websites have their XML sitemap in the root directory of the domain, e.g. www.domain.com/sitemap.xml. While this does make it findable, it’s important to also submit the sitemap via platforms like Google Search Console and Bing Webmaster Tools. This ensures that the crawlers access it, and allows you to see metrics like how many of your submitted URLs actually got indexed.
Crawlers don’t like to find a bunch of broken or redirecting URLs in your sitemap. The sitemap should only include live URLs with a 200 status code. Otherwise, if there are a high percentage of broken / redirected URLs in the sitemap, crawlers may view the sitemap as being poorly maintained, and begin to disregard it as a reliable source of information about what pages are important.
Unless images are a major element of your business (for instance, a photography showcase website or stock image platform), you don’t need a separate “image” sitemap. Images that are simply a background element of your overall page content will be crawled when those pages are crawled, and they typically don’t offer much standalone value or even relevance.
For blogs and even some ecommerce sites, auto-generated tag / category pages can multiply quickly with low value, thin content. While including tag or category pages might make sense if those pages are good quality and you want them to rank for specific keywords, as a rule of thumb these types of pages don’t need to be included in your sitemap.
There are some more advanced ways to use XML sitemaps, particularly if you have a very large site, a particular use case (image galleries, news publisher), or a lot of international versions of your content (hreflang). These include:
If you have a large site (or even if you simply want more granularity in understanding how crawlers are treating different page types), you may segment your sitemap into different files for different types of page. For instance, you could put your category pages into one sitemap, and your product pages into another; or you could have a sitemap for your men’s clothes products and another for your women’s clothes products. You can do this manually or use a tool like Botify to generate these segmented sitemaps and analyze them.
If you have multiple country / language versions of your pages, you can implement hreflang tags (which tell crawlers which version is intended for which country or language use case) in your XML sitemap.
If your pages are frequently updated (say, with new product information), you may find dynamic sitemaps a more effective way to keep your sitemaps fresh and up to date.
Depending on the type of website you have, you may want to use some of the special types of XML sitemaps for specific content types.
You can have a sitemap for:
Botify makes it easy to generate XML Sitemaps, even with hreflang!
Be sure to check out our full writeup Introducing Botify XML Sitemap Generator: Create Your Sitemap With Botify In One Click to see how we simplify this process for up to 25 millions URLs.