fix xml sitemap errors
Moderate 21 min 2025-01-05

title:: How to Fix XML Sitemap Errors That Block Google From Finding Your Pages description:: XML sitemap errors prevent Google from discovering your pages. Fix broken sitemaps, invalid URLs, and submission errors with this step-by-step guide. focus_keyword:: fix XML sitemap errors category:: technical author:: Victor Valentine Romo date:: 2026.03.20

How to Fix XML Sitemap Errors That Block Google From Finding Your Pages

Quick Summary

  • What this covers: fix-xml-sitemap-errors
  • Who it's for: site owners and SEO practitioners
  • Key takeaway: Read the first section for the core framework, then use the specific tactics that match your situation.

An XML sitemap is a direct communication channel to Google — a file that says "here are my pages, please crawl them." When your sitemap contains errors, that channel breaks. Google either ignores the sitemap entirely or wastes crawl budget on URLs that return errors, redirect loops, or duplicate content.

SEMrush site audit data shows 68% of websites have sitemap errors. Most are trivially fixable once you know what to look for.

Why Your Sitemap Matters More Than You Think

Your sitemap isn't just a nice-to-have. For sites with more than a few hundred pages, it's the primary mechanism Googlebot uses to discover new and updated content. Without a functioning sitemap:

What Google Expects From Your Sitemap

Google's sitemap documentation specifies clear requirements:

Violate any of these and Google may partially or completely ignore your sitemap.

Step 1: Check Your Current Sitemap Status (3 Minutes)

In Google Search Console

Navigate to Indexing > Sitemaps. This page shows:

If your sitemap shows "Has errors" or "Couldn't fetch", you have an active problem that needs immediate attention.

Direct URL Check

Load your sitemap in a browser: https://yoursite.com/sitemap.xml

You should see properly formatted XML. If you see a 404 page, a blank page, or HTML instead of XML, your sitemap is broken at the most basic level.

Step 2: Identify Specific Errors (5 Minutes)

The Most Common Sitemap Error: No Sitemap At All

Before diagnosing complex errors, verify you have a sitemap in the first place. Many sites — especially custom-built ones or sites launched without SEO configuration — simply don't have one. If yoursite.com/sitemap.xml returns a 404, your first step is generating one, not debugging one. Your CMS or SEO plugin (like Yoast SEO or Rank Math for WordPress) should handle this automatically once activated.

Error: Sitemap Could Not Be Read

Cause: The sitemap URL returns a 404, 500, or is blocked by robots.txt.

Fix:

  1. Verify the sitemap exists at the URL you submitted
  2. Check your robots.txt — if it contains Disallow: /sitemap.xml, remove that line
  3. If the sitemap is generated by a plugin or CMS, regenerate it
  4. For WordPress sites using Yoast SEO, go to SEO > General > Features and toggle the XML sitemaps off then on

Error: URLs Not Accessible

Cause: Your sitemap includes URLs that return 404, 410, 301/302, or 5xx responses.

Fix:

# Check all sitemap URLs at once (Linux/Mac)
curl -s https://yoursite.com/sitemap.xml | grep -oP '<loc>\K[^<]+' | while read url; do
  status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
  if [ "$status" != "200" ]; then
    echo "$status $url"
  fi
done

Remove or fix every URL that doesn't return 200. Your sitemap should only contain live, indexable pages.

Error: Invalid XML Syntax

Cause: Malformed XML due to unescaped characters, missing closing tags, or encoding issues.

Common culprits:

Fix: Run your sitemap through an XML validator like xmlvalidation.com to pinpoint the exact line and character causing the error.

Error: Sitemap Contains Blocked URLs

Cause: Your sitemap includes URLs that are disallowed in robots.txt or have noindex meta tags.

Fix:

  1. Cross-reference sitemap URLs against your robots.txt rules
  2. Crawl your sitemap URLs with Screaming Frog — it flags URLs with noindex directives
  3. Remove any URL from the sitemap that you don't want Google to index

Rule: If a page has a noindex tag, it should NOT be in your sitemap. These are contradictory signals — your sitemap says "index this" while the page says "don't index this." Google finds this confusing, and confused Google is bad for your site.

Error: Non-Canonical URLs in Sitemap

Cause: Your sitemap includes URLs that aren't the canonical version (e.g., including both http and https versions, or parameter variations).

Fix: Every URL in your sitemap must match the canonical URL exactly. If a page's canonical tag points to https://yoursite.com/page, then https://yoursite.com/page is the only version that belongs in the sitemap.

Step 3: Fix Common Sitemap Structure Problems (15 Minutes)

Problem: Sitemap Too Large

If your sitemap exceeds 50,000 URLs or 50MB, split it into multiple sitemaps and create a sitemap index file:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://yoursite.com/sitemap-posts.xml</loc>
    <lastmod>2026-02-07</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yoursite.com/sitemap-pages.xml</loc>
    <lastmod>2026-02-07</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yoursite.com/sitemap-products.xml</loc>
    <lastmod>2026-02-07</lastmod>
  </sitemap>
</sitemapindex>

Submit the index file to Google Search Console instead of individual sitemaps.

Problem: Missing lastmod Dates

The <lastmod> tag tells Google when a page was last modified. Without it, Google has no signal for when to re-crawl.

<url>
  <loc>https://yoursite.com/page</loc>
  <lastmod>2026-02-07</lastmod>
</url>

Critical: Only update <lastmod> when the page content actually changes. Gaming this date with fake updates erodes Google's trust in your sitemap signals.

Problem: Including Non-Indexable URLs

Your sitemap should NOT contain:

Strip all of these. A lean sitemap with only valuable, indexable pages outperforms a bloated one.

Problem: Stale URLs After Redesign

After a site redesign or URL restructure, your sitemap may still contain old URLs that now redirect or 404. The sitemap generator (your CMS or plugin) typically pulls from the current database, but manually added URLs, cached sitemaps, or plugin-generated sitemaps from the old structure may persist.

Diagnosis: Crawl every URL in your sitemap with Screaming Frog in List Mode. Filter for non-200 responses.

Fix:

  1. Regenerate the sitemap from scratch (disable and re-enable in your SEO plugin)
  2. Clear any server-level or CDN caching of the old sitemap
  3. Verify the regenerated sitemap only contains live, current URLs
  4. Resubmit in Google Search Console

Problem: Wrong Protocol or Domain

Every URL in your sitemap must use your site's canonical protocol and domain format. If your site resolves at https://yoursite.com (no www), every sitemap URL must match. Mixing http://www.yoursite.com with https://yoursite.com creates duplicate signals that confuse Google.

Step 4: WordPress-Specific Sitemap Fixes (10 Minutes)

Yoast SEO Sitemap Issues

Yoast SEO generates sitemaps automatically at /sitemap_index.xml. Common issues:

Rank Math Sitemap Issues

Rank Math sitemaps live at /sitemap_index.xml. Configure included post types under Rank Math > Sitemap Settings. The same rules apply — exclude any post type or taxonomy that produces non-indexable pages.

Plugin Conflicts

Multiple SEO plugins generating competing sitemaps is a common WordPress problem. If you have both Yoast and Rank Math installed (don't do this), or a standalone sitemap plugin alongside an SEO suite, you'll have duplicate sitemaps confusing Google Search Console.

Rule: One sitemap generator per site. Deactivate all others.

Step 5: Submit and Validate (5 Minutes)

After fixing all errors:

  1. Open Google Search Console > Indexing > Sitemaps
  2. If your sitemap URL changed, remove the old submission and add the new one
  3. If the URL is the same, click into the submitted sitemap and look for the "Resubmit" option
  4. Wait for Google to reprocess — this typically happens within 24-48 hours

Robots.txt Reference

Add your sitemap URL to your robots.txt file:

Sitemap: https://yoursite.com/sitemap.xml

This ensures every crawler (not just Google) can discover your sitemap. For robots.txt best practices, see fixing robots.txt mistakes.

Step 6: Monitor Ongoing Health

Weekly Check

Glance at the Sitemaps report in Google Search Console. The "Last read" date should be recent (within the past week for active sites). If Google hasn't read your sitemap in weeks, something is blocking access.

After Content Changes

Whenever you publish, delete, or redirect a significant number of pages, verify your sitemap reflects the changes. Most CMS plugins update automatically, but verify.

After Site Changes

Plugin updates, server migrations, and CMS upgrades can break sitemap generation. After any infrastructure change, load your sitemap in a browser and verify it renders valid XML with correct URLs.

Advanced: Sitemap Strategies for Large Sites

Dynamic Sitemap Generation

For sites with thousands of pages, manually maintaining a sitemap is impractical. Dynamic sitemaps auto-generate from your CMS database, ensuring every new page is included immediately and deleted pages are removed automatically.

WordPress: Yoast SEO, Rank Math, and All in One SEO all generate dynamic sitemaps that update whenever content changes. No manual intervention required.

Custom sites: Use a server-side script that queries your database for published pages and outputs valid XML. Schedule regeneration via cron job to run hourly or after content changes.

Node.js example using sitemap library:

const { SitemapStream, streamToPromise } = require('sitemap');
const { createWriteStream } = require('fs');

const sitemap = new SitemapStream({ hostname: 'https://yoursite.com' });
const writeStream = createWriteStream('./public/sitemap.xml');
sitemap.pipe(writeStream);

// Add URLs from your database
pages.forEach(page => {
  sitemap.write({ url: page.slug, lastmod: page.updatedAt, priority: page.priority });
});

sitemap.end();

Priority and Changefreq Tags

The <priority> and <changefreq> tags in sitemaps are advisory. Google has stated it largely ignores these values — it determines crawl priority through its own signals. However, accurate <changefreq> values don't hurt and may be respected by other search engines like Bing and Yandex.

If you include them:

Image and Video Sitemaps

For content-heavy sites where image and video SEO drives significant traffic:

Image sitemap extension:

<url>
  <loc>https://yoursite.com/page</loc>
  <image:image>
    <image:loc>https://yoursite.com/images/photo.webp</image:loc>
    <image:caption>Descriptive caption for this image</image:caption>
  </image:image>
</url>

Video sitemap extension:

<url>
  <loc>https://yoursite.com/page</loc>
  <video:video>
    <video:thumbnail_loc>https://yoursite.com/thumbs/video.jpg</video:thumbnail_loc>
    <video:title>Video Title</video:title>
    <video:description>Video description here</video:description>
  </video:video>
</url>

These extensions help Google discover media content that might not be found through standard HTML crawling alone.

Sitemap for Multiple Languages (Hreflang in Sitemaps)

If your site serves content in multiple languages, you can include hreflang annotations directly in your sitemap:

<url>
  <loc>https://yoursite.com/page</loc>
  <xhtml:link rel="alternate" hreflang="en" href="https://yoursite.com/page"/>
  <xhtml:link rel="alternate" hreflang="es" href="https://yoursite.com/es/page"/>
  <xhtml:link rel="alternate" hreflang="fr" href="https://yoursite.com/fr/page"/>
</url>

This is often more reliable than implementing hreflang in HTML <head> tags because sitemaps are processed centrally rather than requiring Google to crawl every language variation individually.

Sitemap Error Quick Reference

Error Diagnosis Tool Fix Time
Can't fetch Browser + robots.txt Remove blocks, regenerate sitemap 2 min
Invalid XML XML validator Fix encoding, escape special characters 5 min
Non-200 URLs curl batch check or Screaming Frog Remove broken URLs 10 min
Noindex URLs included Screaming Frog crawl Remove noindexed URLs from sitemap 5 min
Non-canonical URLs Manual comparison Replace with canonical versions 10 min
Too large URL count check Split into sitemap index 15 min
Missing lastmod View source Add accurate modification dates 10 min

FAQ

How often should I update my XML sitemap?

Your sitemap should update automatically whenever content is published, updated, or removed. Most CMS plugins handle this. If you're managing sitemaps manually, update after every batch of content changes — at minimum weekly for active sites.

Can I have multiple sitemaps?

Yes. Use a sitemap index file that references individual sitemaps split by content type (posts, pages, products). Google supports up to 500 sitemaps per sitemap index, with 50,000 URLs each. That's 25 million URLs — more than enough for any site.

Does submitting a sitemap guarantee indexing?

No. A sitemap is a suggestion, not a directive. Google will crawl the URLs but decides independently whether to index each page based on quality, relevance, and canonical signals. Pages with thin content, noindex tags, or duplicate issues won't be indexed regardless of sitemap inclusion.

Should I include images in my sitemap?

Google previously supported <image:image> tags in sitemaps, and they can still help with image discovery. If your images are critical to your SEO strategy (product photos, infographics), including them adds a discovery signal. For most sites, standard image crawling through HTML is sufficient.

What's the difference between submitting a sitemap and the sitemap in robots.txt?

Submitting through Google Search Console only notifies Google. Adding the sitemap URL to robots.txt notifies every crawler that reads your robots.txt (Bing, Yandex, etc.). Do both for maximum coverage.

Sitemap Debugging Techniques

When Google Reads Your Sitemap But Doesn't Index Pages

If GSC shows your sitemap was successfully read but pages remain unindexed, the sitemap itself isn't the problem — Google is choosing not to index those pages for quality reasons. See why Google won't index your pages for the complete troubleshooting guide.

When Google Can't Read Your Sitemap At All

If the sitemap status shows "Couldn't fetch" repeatedly:

  1. Test the URL directly: Paste your sitemap URL in a browser. Does it load?
  2. Check server response: curl -sI https://yoursite.com/sitemap.xml — does it return 200?
  3. Check file permissions: Your web server needs read access to the sitemap file
  4. Check for .htaccess conflicts: Some security rules block access to XML files
  5. Check CDN caching: If your CDN caches a broken version, purge the cache and test again

Sitemap Index vs. Individual Sitemaps

Submit your sitemap index file to GSC, not individual sub-sitemaps. The index file acts as a master reference, and Google automatically discovers and processes all referenced sitemaps. If you submit individual sitemaps, you'll need to resubmit each time you add a new one.

Verify your sitemap index references all sub-sitemaps:

curl -s https://yoursite.com/sitemap_index.xml | grep "<loc>"

Each <loc> entry should point to a valid, accessible sub-sitemap. If any return 404 or errors, remove them from the index.

Your Direct Line to Google

Your XML sitemap is the single most direct way to tell Google which pages matter on your site. A clean, accurate sitemap accelerates discovery, prioritizes crawling, and ensures your new content reaches the index as fast as possible.

Fix the errors. Strip the non-indexable URLs. Submit the clean version. Then move on to the rest of your technical SEO cleanup.


When This Fix Isn't Your Priority

Skip this for now if:


Frequently Asked Questions

How long does this fix take to implement?

Most fixes in this article can be implemented in under an hour. Some require a staging environment for testing before deploying to production. The article flags which changes are safe to deploy immediately versus which need QA review first.

Will this fix work on WordPress, Shopify, and custom sites?

The underlying SEO principles are platform-agnostic. Implementation details differ — WordPress uses plugins and theme files, Shopify uses Liquid templates, custom sites use direct code changes. The article focuses on the what and why; platform-specific how-to links are provided where available.

How do I verify the fix actually worked?

Each fix includes a verification step. For most technical SEO changes: check Google Search Console coverage report 48-72 hours after deployment, validate with a live URL inspection, and monitor the affected pages in your crawl tool. Ranking impact typically surfaces within 1-4 weeks depending on crawl frequency.

This is one piece of the system.

Built by Victor Romo (@b2bvic) — I build AI memory systems for businesses.

← All Fixes