An XML sitemap is a structured file that lists the important URLs on your website so search engines can discover and crawl them efficiently. It is the equivalent of handing Google a table of contents for your site. Sitemaps speed up indexing for new pages, surface deep content that internal links miss, and let you signal which URLs are canonical, when they were updated, and how they relate to alternate language versions. Every site that wants reliable Google coverage should have one.
What an XML Sitemap Actually Is
An XML sitemap is a plain text file written in a standardized XML schema. It lives at a public URL such as https://example.com/sitemap.xml and contains a list of <url> entries, each with a <loc> tag pointing to a canonical URL. Optional tags describe when the page was last updated, how frequently it changes, and how important it is relative to other pages.
Google, Bing, and other search engines read the sitemap during their regular crawl cycles. New or recently changed URLs in the sitemap signal that the page is a discovery priority. A sitemap does not guarantee indexing — Google still decides what is worth keeping — but it dramatically improves discoverability, especially for large or deep sites.
The Basic Sitemap Structure
A minimal valid sitemap looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-05-15</lastmod>
</url>
<url>
<loc>https://example.com/blog/</loc>
<lastmod>2026-05-20</lastmod>
</url>
</urlset>
The <loc> tag holds the URL. The <lastmod> tag tells Google when the content last changed. Older optional tags like <changefreq> and <priority> exist but Google has explicitly said it ignores them.
Why XML Sitemaps Matter for SEO
Search engines find pages two ways: by following links and by reading sitemaps. Internal links work well for pages that are one or two clicks from the homepage, but they fail for deep content, newly published posts, or pages that have no incoming links yet. Sitemaps fill that gap.
- Faster discovery of new pages — a blog post added to your sitemap can be crawled within hours instead of days.
- Better coverage on large sites — sites with thousands of URLs get more pages indexed when a sitemap acts as the spine.
- Cleaner crawl budget — Google focuses on the URLs you signal as important rather than wasting crawls on filter pages or parameter combinations.
- Multilingual signals — hreflang annotations in the sitemap tell Google which language version to show to which audience.
- Last-mod hints — a fresh
<lastmod>value nudges Google to recrawl pages you have updated.
Types of Sitemaps
The standard URL sitemap is the most common, but there are several specialized formats for different content types.
URL Sitemap
The default. Lists pages, posts, category pages, and any other indexable HTML URL on the site. Most sites need only this format.
Image Sitemap
Lists image URLs along with metadata such as captions and titles. Useful for image-heavy sites like photographers, retailers, or stock libraries. Pair this with strong image optimization to maximize visibility in Google Images.
Video Sitemap
Includes thumbnail URLs, durations, descriptions, and player URLs for embedded video content. Sites with significant original video should publish one.
News Sitemap
Specifically for Google News publishers. Includes publication dates and categories. Required if you want eligibility for Google News surfaces.
Sitemap Index
A sitemap of sitemaps. When you exceed the 50,000 URL or 50 MB limit per file, you split into multiple sitemaps and reference them all from a sitemap index. Large sites with categorized content often use one sitemap per content type — posts, pages, products, categories — under a master index.
How to Create an XML Sitemap
You rarely have to write XML by hand. Most platforms generate sitemaps automatically.
WordPress
WordPress 5.5 and later includes a built-in sitemap at /wp-sitemap.xml. For more control, install a plugin like Rank Math, Yoast SEO, or All in One SEO. These plugins let you exclude specific content types, control which post statuses appear, and add multilingual support.
Framer
Framer generates and serves sitemap.xml automatically. Every published page and CMS entry is included. The sitemap updates the moment you publish, so there is no manual step.
Webflow, Squarespace, Wix, Shopify
All four hosted platforms auto-generate sitemaps. Find them at /sitemap.xml or under the SEO settings panel. Each has a checkbox or toggle to exclude individual pages from the sitemap if you do not want them indexed.
Custom Sites (Next.js, React, Astro, SvelteKit)
For custom-built sites on Next.js, React, or other frameworks, generate the sitemap at build time. Next.js supports app/sitemap.ts or sitemap.js that exports an array of URLs. Astro and SvelteKit have similar conventions. The build pipeline writes the file to /public/sitemap.xml and it is served as a static asset.
How to Submit Your Sitemap to Google Search Console
Once your sitemap is live at a public URL, you should submit it explicitly to Google.
- Open Google Search Console and select your property.
- Click Sitemaps in the left navigation.
- Enter your sitemap URL (typically
sitemap.xmlorsitemap_index.xml) and click Submit. - Wait 24 to 72 hours for Google to process. The status column will show Success with a count of discovered URLs.
Also reference your sitemap from robots.txt with a line like Sitemap: https://example.com/sitemap.xml. Other search engines like Bing and Yandex use this signal to find your sitemap without you submitting it manually.
What to Include — and Exclude
Your sitemap should list every URL you want indexed and nothing else. Inclusion in the sitemap is a signal that the URL is important and canonical.
Include
- Canonical versions of every important page.
- Blog posts, product pages, category pages, landing pages.
- Pages with at least some unique, valuable content.
- The current version, not historical or paginated archive pages unless they are uniquely valuable.
Exclude
- Non-canonical URLs — duplicate versions you have marked with a canonical meta tag.
- Pages blocked by
robots.txt— you cannot index them anyway. - Pages with
noindexdirectives. - Search results pages, faceted navigation, filter pages.
- Thin content, thank-you pages, account pages, internal tools.
- Redirected URLs — list only the destination.
Common XML Sitemap Mistakes
Sitemaps look simple but are easy to break. Most issues fall into the same handful of patterns.
Listing Non-Canonical URLs
If your sitemap lists https://example.com/page but the canonical tag on that page points to https://www.example.com/page, Google sees a conflicting signal. Make sure every sitemap URL matches its self-referencing canonical exactly.
Including Blocked or Noindex Pages
Pages blocked by robots.txt or marked noindex should not appear in the sitemap. This wastes Google’s time and creates errors in Search Console.
Stale Last-Mod Values
Either keep <lastmod> values accurate or omit them. Google has stated it ignores last-mod when sites lie about it — for example, setting every URL to today’s date. An accurate signal is helpful, a fake one is worthless.
Exceeding Size Limits
Each sitemap is limited to 50,000 URLs and 50 MB uncompressed. Sites that exceed these limits need a sitemap index splitting URLs into multiple files. Most plugins handle this automatically.
404s and Server Errors in the Sitemap
If sitemap URLs return 404 or 5xx codes, Google flags them. Run a periodic crawl of your sitemap to catch broken URLs.
Forgetting to Reference It from Robots.txt
Adding a Sitemap: line to robots.txt takes thirty seconds and helps every crawler find your sitemap without you submitting it to each one individually.
Sitemaps and Crawl Budget
For most sites, crawl budget is not a concern. Google can comfortably crawl sites with thousands of pages. But for very large sites — ecommerce stores with millions of SKUs, news sites with decades of archives — every URL Google wastes on a stale or low-value page is a URL it does not spend on something important.
A well-curated sitemap that lists only canonical, valuable URLs concentrates Google’s attention where it matters. Combine this with strong internal linking and your important pages get crawled often and indexed reliably.
Monitoring Your Sitemap in Search Console
Submit and forget is not the play. Check the Sitemaps report in Search Console monthly to catch issues early.
- Discovered URLs vs Indexed URLs — if Google discovers 1,000 URLs but indexes only 400, investigate the gap. Thin content, duplication, or canonical conflicts are usually the culprit.
- Errors — fix any flagged sitemap errors immediately. They block indexing.
- Recently Discovered — confirm that new pages appear in the sitemap quickly. If they do not, your sitemap generator is broken.
FAQ
Do I really need an XML sitemap if my site has good internal linking?
Yes. Internal linking handles most discovery for small sites, but a sitemap accelerates indexing for new pages, provides redundancy when links fail, and gives you a measurable feed in Search Console. The marginal cost of having one is zero on every modern platform, and the upside is real, so there is no reason to skip it.
How often should I update my sitemap?
Sitemaps should regenerate whenever content changes. Most platforms — WordPress, Framer, Webflow, Shopify — do this automatically on publish. If you maintain a custom sitemap, rebuild it daily at minimum, or whenever you publish, whichever is more frequent.
What is the difference between sitemap.xml and an HTML sitemap?
An XML sitemap is a machine-readable file for search engines. An HTML sitemap is a human-readable page that links to important sections of the site. They serve different audiences and you can have both, though XML is the one that matters for SEO.
Want a site that handles sitemaps, indexing, and SEO infrastructure correctly from day one? See our pricing or get in touch.
