title:: How to Fix XML Sitemap Errors That Block Google From Finding Your Pages description:: XML sitemap errors prevent Google from discovering your pages. Fix broken sitemaps, invalid URLs, and submission errors with this step-by-step guide. focus_keyword:: fix XML sitemap errors category:: technical author:: Victor Valentine Romo date:: 2026.03.20
How to Fix XML Sitemap Errors That Block Google From Finding Your Pages
Quick Summary
- What this covers: fix-xml-sitemap-errors
- Who it's for: site owners and SEO practitioners
- Key takeaway: Read the first section for the core framework, then use the specific tactics that match your situation.
An XML sitemap is a direct communication channel to Google — a file that says "here are my pages, please crawl them." When your sitemap contains errors, that channel breaks. Google either ignores the sitemap entirely or wastes crawl budget on URLs that return errors, redirect loops, or duplicate content.
SEMrush site audit data shows 68% of websites have sitemap errors. Most are trivially fixable once you know what to look for.
Why Your Sitemap Matters More Than You Think
Your sitemap isn't just a nice-to-have. For sites with more than a few hundred pages, it's the primary mechanism Googlebot uses to discover new and updated content. Without a functioning sitemap:
- New pages may take weeks to be discovered through link crawling alone
- Updated content won't be re-crawled promptly because Google doesn't know it changed
- Deep pages (more than 3-4 clicks from the homepage) may never be found
- Google has no signal for which pages you consider most important
What Google Expects From Your Sitemap
Google's sitemap documentation specifies clear requirements:
- Must be valid XML (proper encoding, no syntax errors)
- Maximum 50,000 URLs per sitemap file
- Maximum 50MB uncompressed file size
- All URLs must return 200 status codes
- All URLs should be the canonical version
- Must be accessible to crawlers (not blocked by robots.txt)
- Should be referenced in robots.txt or submitted via Google Search Console
Violate any of these and Google may partially or completely ignore your sitemap.
Step 1: Check Your Current Sitemap Status (3 Minutes)
In Google Search Console
Navigate to Indexing > Sitemaps. This page shows:
- Every sitemap you've submitted
- The submission date and last read date
- Number of discovered URLs
- Processing status (Success, Has errors, Couldn't fetch)
If your sitemap shows "Has errors" or "Couldn't fetch", you have an active problem that needs immediate attention.
Direct URL Check
Load your sitemap in a browser: https://yoursite.com/sitemap.xml
You should see properly formatted XML. If you see a 404 page, a blank page, or HTML instead of XML, your sitemap is broken at the most basic level.
Step 2: Identify Specific Errors (5 Minutes)
The Most Common Sitemap Error: No Sitemap At All
Before diagnosing complex errors, verify you have a sitemap in the first place. Many sites — especially custom-built ones or sites launched without SEO configuration — simply don't have one. If yoursite.com/sitemap.xml returns a 404, your first step is generating one, not debugging one. Your CMS or SEO plugin (like Yoast SEO or Rank Math for WordPress) should handle this automatically once activated.
Error: Sitemap Could Not Be Read
Cause: The sitemap URL returns a 404, 500, or is blocked by robots.txt.
Fix:
- Verify the sitemap exists at the URL you submitted
- Check your robots.txt — if it contains
Disallow: /sitemap.xml, remove that line - If the sitemap is generated by a plugin or CMS, regenerate it
- For WordPress sites using Yoast SEO, go to SEO > General > Features and toggle the XML sitemaps off then on
Error: URLs Not Accessible
Cause: Your sitemap includes URLs that return 404, 410, 301/302, or 5xx responses.
Fix:
# Check all sitemap URLs at once (Linux/Mac)
curl -s https://yoursite.com/sitemap.xml | grep -oP '<loc>\K[^<]+' | while read url; do
status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$status" != "200" ]; then
echo "$status $url"
fi
done
Remove or fix every URL that doesn't return 200. Your sitemap should only contain live, indexable pages.
Error: Invalid XML Syntax
Cause: Malformed XML due to unescaped characters, missing closing tags, or encoding issues.
Common culprits:
- Ampersands (
&) must be&in XML - URLs with query parameters:
?id=1&sort=ascmust become?id=1&sort=asc - Non-UTF-8 characters in URLs
- Missing XML declaration or namespace
Fix: Run your sitemap through an XML validator like xmlvalidation.com to pinpoint the exact line and character causing the error.
Error: Sitemap Contains Blocked URLs
Cause: Your sitemap includes URLs that are disallowed in robots.txt or have noindex meta tags.
Fix:
- Cross-reference sitemap URLs against your robots.txt rules
- Crawl your sitemap URLs with Screaming Frog — it flags URLs with noindex directives
- Remove any URL from the sitemap that you don't want Google to index
Rule: If a page has a noindex tag, it should NOT be in your sitemap. These are contradictory signals — your sitemap says "index this" while the page says "don't index this." Google finds this confusing, and confused Google is bad for your site.
Error: Non-Canonical URLs in Sitemap
Cause: Your sitemap includes URLs that aren't the canonical version (e.g., including both http and https versions, or parameter variations).
Fix: Every URL in your sitemap must match the canonical URL exactly. If a page's canonical tag points to https://yoursite.com/page, then https://yoursite.com/page is the only version that belongs in the sitemap.
Step 3: Fix Common Sitemap Structure Problems (15 Minutes)
Problem: Sitemap Too Large
If your sitemap exceeds 50,000 URLs or 50MB, split it into multiple sitemaps and create a sitemap index file:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://yoursite.com/sitemap-posts.xml</loc>
<lastmod>2026-02-07</lastmod>
</sitemap>
<sitemap>
<loc>https://yoursite.com/sitemap-pages.xml</loc>
<lastmod>2026-02-07</lastmod>
</sitemap>
<sitemap>
<loc>https://yoursite.com/sitemap-products.xml</loc>
<lastmod>2026-02-07</lastmod>
</sitemap>
</sitemapindex>
Submit the index file to Google Search Console instead of individual sitemaps.
Problem: Missing lastmod Dates
The <lastmod> tag tells Google when a page was last modified. Without it, Google has no signal for when to re-crawl.
<url>
<loc>https://yoursite.com/page</loc>
<lastmod>2026-02-07</lastmod>
</url>
Critical: Only update <lastmod> when the page content actually changes. Gaming this date with fake updates erodes Google's trust in your sitemap signals.
Problem: Including Non-Indexable URLs
Your sitemap should NOT contain:
- Paginated pages (
/blog/page/2/,/blog/page/3/) - Search result pages (
/search?q=term) - Filter/sort URLs (
/category?sort=price) - Login or admin pages
- Thank-you or confirmation pages
- Pages returning anything other than 200
Strip all of these. A lean sitemap with only valuable, indexable pages outperforms a bloated one.
Problem: Stale URLs After Redesign
After a site redesign or URL restructure, your sitemap may still contain old URLs that now redirect or 404. The sitemap generator (your CMS or plugin) typically pulls from the current database, but manually added URLs, cached sitemaps, or plugin-generated sitemaps from the old structure may persist.
Diagnosis: Crawl every URL in your sitemap with Screaming Frog in List Mode. Filter for non-200 responses.
Fix:
- Regenerate the sitemap from scratch (disable and re-enable in your SEO plugin)
- Clear any server-level or CDN caching of the old sitemap
- Verify the regenerated sitemap only contains live, current URLs
- Resubmit in Google Search Console
Problem: Wrong Protocol or Domain
Every URL in your sitemap must use your site's canonical protocol and domain format. If your site resolves at https://yoursite.com (no www), every sitemap URL must match. Mixing http://www.yoursite.com with https://yoursite.com creates duplicate signals that confuse Google.
Step 4: WordPress-Specific Sitemap Fixes (10 Minutes)
Yoast SEO Sitemap Issues
Yoast SEO generates sitemaps automatically at /sitemap_index.xml. Common issues:
- Post types included that shouldn't be — Go to SEO > Search Appearance and set non-indexable post types to "No"
- Taxonomy sitemaps with thin content — Disable tag sitemap if your tag pages are thin. Navigate to SEO > Search Appearance > Taxonomies
- Sitemap not updating — Clear the site cache and flush Yoast's sitemap cache by disabling and re-enabling the feature
Rank Math Sitemap Issues
Rank Math sitemaps live at /sitemap_index.xml. Configure included post types under Rank Math > Sitemap Settings. The same rules apply — exclude any post type or taxonomy that produces non-indexable pages.
Plugin Conflicts
Multiple SEO plugins generating competing sitemaps is a common WordPress problem. If you have both Yoast and Rank Math installed (don't do this), or a standalone sitemap plugin alongside an SEO suite, you'll have duplicate sitemaps confusing Google Search Console.
Rule: One sitemap generator per site. Deactivate all others.
Step 5: Submit and Validate (5 Minutes)
After fixing all errors:
- Open Google Search Console > Indexing > Sitemaps
- If your sitemap URL changed, remove the old submission and add the new one
- If the URL is the same, click into the submitted sitemap and look for the "Resubmit" option
- Wait for Google to reprocess — this typically happens within 24-48 hours
Robots.txt Reference
Add your sitemap URL to your robots.txt file:
Sitemap: https://yoursite.com/sitemap.xml
This ensures every crawler (not just Google) can discover your sitemap. For robots.txt best practices, see fixing robots.txt mistakes.
Step 6: Monitor Ongoing Health
Weekly Check
Glance at the Sitemaps report in Google Search Console. The "Last read" date should be recent (within the past week for active sites). If Google hasn't read your sitemap in weeks, something is blocking access.
After Content Changes
Whenever you publish, delete, or redirect a significant number of pages, verify your sitemap reflects the changes. Most CMS plugins update automatically, but verify.
After Site Changes
Plugin updates, server migrations, and CMS upgrades can break sitemap generation. After any infrastructure change, load your sitemap in a browser and verify it renders valid XML with correct URLs.
Advanced: Sitemap Strategies for Large Sites
Dynamic Sitemap Generation
For sites with thousands of pages, manually maintaining a sitemap is impractical. Dynamic sitemaps auto-generate from your CMS database, ensuring every new page is included immediately and deleted pages are removed automatically.
WordPress: Yoast SEO, Rank Math, and All in One SEO all generate dynamic sitemaps that update whenever content changes. No manual intervention required.
Custom sites: Use a server-side script that queries your database for published pages and outputs valid XML. Schedule regeneration via cron job to run hourly or after content changes.
Node.js example using sitemap library:
const { SitemapStream, streamToPromise } = require('sitemap');
const { createWriteStream } = require('fs');
const sitemap = new SitemapStream({ hostname: 'https://yoursite.com' });
const writeStream = createWriteStream('./public/sitemap.xml');
sitemap.pipe(writeStream);
// Add URLs from your database
pages.forEach(page => {
sitemap.write({ url: page.slug, lastmod: page.updatedAt, priority: page.priority });
});
sitemap.end();
Priority and Changefreq Tags
The <priority> and <changefreq> tags in sitemaps are advisory. Google has stated it largely ignores these values — it determines crawl priority through its own signals. However, accurate <changefreq> values don't hurt and may be respected by other search engines like Bing and Yandex.
If you include them:
- Homepage: priority 1.0, changefreq daily
- Category pages: priority 0.8, changefreq weekly
- Blog posts: priority 0.6, changefreq monthly
- Static pages (about, contact): priority 0.3, changefreq yearly
Image and Video Sitemaps
For content-heavy sites where image and video SEO drives significant traffic:
Image sitemap extension:
<url>
<loc>https://yoursite.com/page</loc>
<image:image>
<image:loc>https://yoursite.com/images/photo.webp</image:loc>
<image:caption>Descriptive caption for this image</image:caption>
</image:image>
</url>
Video sitemap extension:
<url>
<loc>https://yoursite.com/page</loc>
<video:video>
<video:thumbnail_loc>https://yoursite.com/thumbs/video.jpg</video:thumbnail_loc>
<video:title>Video Title</video:title>
<video:description>Video description here</video:description>
</video:video>
</url>
These extensions help Google discover media content that might not be found through standard HTML crawling alone.
Sitemap for Multiple Languages (Hreflang in Sitemaps)
If your site serves content in multiple languages, you can include hreflang annotations directly in your sitemap:
<url>
<loc>https://yoursite.com/page</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://yoursite.com/page"/>
<xhtml:link rel="alternate" hreflang="es" href="https://yoursite.com/es/page"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://yoursite.com/fr/page"/>
</url>
This is often more reliable than implementing hreflang in HTML <head> tags because sitemaps are processed centrally rather than requiring Google to crawl every language variation individually.
Sitemap Error Quick Reference
| Error | Diagnosis Tool | Fix | Time |
|---|---|---|---|
| Can't fetch | Browser + robots.txt | Remove blocks, regenerate sitemap | 2 min |
| Invalid XML | XML validator | Fix encoding, escape special characters | 5 min |
| Non-200 URLs | curl batch check or Screaming Frog | Remove broken URLs | 10 min |
| Noindex URLs included | Screaming Frog crawl | Remove noindexed URLs from sitemap | 5 min |
| Non-canonical URLs | Manual comparison | Replace with canonical versions | 10 min |
| Too large | URL count check | Split into sitemap index | 15 min |
| Missing lastmod | View source | Add accurate modification dates | 10 min |
FAQ
How often should I update my XML sitemap?
Your sitemap should update automatically whenever content is published, updated, or removed. Most CMS plugins handle this. If you're managing sitemaps manually, update after every batch of content changes — at minimum weekly for active sites.
Can I have multiple sitemaps?
Yes. Use a sitemap index file that references individual sitemaps split by content type (posts, pages, products). Google supports up to 500 sitemaps per sitemap index, with 50,000 URLs each. That's 25 million URLs — more than enough for any site.
Does submitting a sitemap guarantee indexing?
No. A sitemap is a suggestion, not a directive. Google will crawl the URLs but decides independently whether to index each page based on quality, relevance, and canonical signals. Pages with thin content, noindex tags, or duplicate issues won't be indexed regardless of sitemap inclusion.
Should I include images in my sitemap?
Google previously supported <image:image> tags in sitemaps, and they can still help with image discovery. If your images are critical to your SEO strategy (product photos, infographics), including them adds a discovery signal. For most sites, standard image crawling through HTML is sufficient.
What's the difference between submitting a sitemap and the sitemap in robots.txt?
Submitting through Google Search Console only notifies Google. Adding the sitemap URL to robots.txt notifies every crawler that reads your robots.txt (Bing, Yandex, etc.). Do both for maximum coverage.
Sitemap Debugging Techniques
When Google Reads Your Sitemap But Doesn't Index Pages
If GSC shows your sitemap was successfully read but pages remain unindexed, the sitemap itself isn't the problem — Google is choosing not to index those pages for quality reasons. See why Google won't index your pages for the complete troubleshooting guide.
When Google Can't Read Your Sitemap At All
If the sitemap status shows "Couldn't fetch" repeatedly:
- Test the URL directly: Paste your sitemap URL in a browser. Does it load?
- Check server response:
curl -sI https://yoursite.com/sitemap.xml— does it return 200? - Check file permissions: Your web server needs read access to the sitemap file
- Check for .htaccess conflicts: Some security rules block access to XML files
- Check CDN caching: If your CDN caches a broken version, purge the cache and test again
Sitemap Index vs. Individual Sitemaps
Submit your sitemap index file to GSC, not individual sub-sitemaps. The index file acts as a master reference, and Google automatically discovers and processes all referenced sitemaps. If you submit individual sitemaps, you'll need to resubmit each time you add a new one.
Verify your sitemap index references all sub-sitemaps:
curl -s https://yoursite.com/sitemap_index.xml | grep "<loc>"
Each <loc> entry should point to a valid, accessible sub-sitemap. If any return 404 or errors, remove them from the index.
Your Direct Line to Google
Your XML sitemap is the single most direct way to tell Google which pages matter on your site. A clean, accurate sitemap accelerates discovery, prioritizes crawling, and ensures your new content reaches the index as fast as possible.
Fix the errors. Strip the non-indexable URLs. Submit the clean version. Then move on to the rest of your technical SEO cleanup.
When This Fix Isn't Your Priority
Skip this for now if:
- Your site has fundamental crawling/indexing issues. Fixing a meta description is pointless if Google can't reach the page. Resolve access, robots.txt, and crawl errors before optimizing on-page elements.
- You're mid-migration. During platform or domain migrations, freeze non-critical changes. The migration itself introduces enough variables — layer optimizations after the new environment stabilizes.
- The page gets zero impressions in Search Console. If Google shows no data for the page, the issue is likely discoverability or indexation, not on-page optimization. Investigate why the page isn't indexed first.
Frequently Asked Questions
How long does this fix take to implement?
Most fixes in this article can be implemented in under an hour. Some require a staging environment for testing before deploying to production. The article flags which changes are safe to deploy immediately versus which need QA review first.
Will this fix work on WordPress, Shopify, and custom sites?
The underlying SEO principles are platform-agnostic. Implementation details differ — WordPress uses plugins and theme files, Shopify uses Liquid templates, custom sites use direct code changes. The article focuses on the what and why; platform-specific how-to links are provided where available.
How do I verify the fix actually worked?
Each fix includes a verification step. For most technical SEO changes: check Google Search Console coverage report 48-72 hours after deployment, validate with a live URL inspection, and monitor the affected pages in your crawl tool. Ranking impact typically surfaces within 1-4 weeks depending on crawl frequency.