How to Fix Sitemap Errors in Google Search Console
Quick Summary
- What this covers: Diagnose and resolve sitemap errors in Google Search Console including parse errors, URL not found issues, and indexing blockers that prevent pages from ranking.
- Who it's for: site owners and SEO practitioners
- Key takeaway: Read the first section for the core framework, then use the specific tactics that match your situation.
Google Search Console reports sitemap errors when your XML file contains invalid URLs, malformed markup, or pages blocked by robots directives. These errors prevent Googlebot from discovering new content and updating the index. A broken sitemap doesn't directly trigger penalties, but it cripples crawl efficiency—pages that should rank in days take weeks to appear.
Sitemap errors fall into three categories: structural problems (XML syntax, encoding), URL issues (redirects, 404s, canonicals), and robots blocks (noindex tags, disallow rules). This guide walks through diagnosis using Search Console, systematic fixes, and validation workflows that prevent recurrence.
Why Sitemaps Matter for Crawl Efficiency
Googlebot discovers URLs through three mechanisms: internal links, external backlinks, and sitemaps. A comprehensive sitemap ensures pages without strong internal linking (category archives, paginated series, dynamically generated product pages) reach the index. Without a clean sitemap:
- New blog posts take 7-10 days to rank instead of 24-48 hours
- Orphaned pages (no incoming links) never appear in results
- Large sites exhaust crawl budget on error pages instead of fresh content
Search Console's Sitemaps report shows submission status, discovered URLs, and error counts. The "Couldn't fetch" status means Googlebot can't download the file. "Couldn't read" indicates parsing failures. "URL errors" flag individual entries that violate indexing rules.
A healthy sitemap has:
- 100% URLs successfully crawled
- Zero "Submitted URL not found (404)" errors
- Zero "Submitted URL marked 'noindex'"
- Zero "Submitted URL blocked by robots.txt"
Fixing errors restores crawl flow, allowing Google to allocate resources to indexing real content instead of chasing broken references.
Diagnosing Sitemap Errors in Search Console
Navigate to Search Console > Sitemaps. The overview panel lists submitted sitemaps with status indicators:
| Status | Meaning | Action |
|---|---|---|
| Success | Parsed without errors | Monitor for URL-level issues |
| Couldn't fetch | Server timeout or 404 | Verify sitemap URL is accessible |
| Couldn't read | XML syntax error | Validate with XML linter |
| URL errors | Individual URLs rejected | Drill into error details |
Click a sitemap to view the detailed report. The top section shows:
- Discovered URLs: Total URLs found in the file
- Last read: When Googlebot last parsed the sitemap
- Status: Error categories and counts
Common error types:
Submitted URL not found (404): The sitemap lists a URL that returns a 404. Happens when pages are deleted without updating the sitemap.
Submitted URL marked 'noindex': The URL has a <meta name="robots" content="noindex"> tag or X-Robots-Tag: noindex header. Contradicts sitemap inclusion—if you don't want it indexed, remove it from the sitemap.
Submitted URL blocked by robots.txt: The URL matches a Disallow rule in robots.txt. Googlebot can't crawl it, so it shouldn't be in the sitemap.
Submitted URL redirects: The URL returns a 301/302. Sitemaps should list final destination URLs, not redirects.
Sitemap contains invalid URL: Malformed URLs (missing protocol, invalid characters, relative paths). XML requires full absolute URLs with proper encoding.
Unsupported file format: Non-XML file uploaded or incorrect Content-Type header. Sitemaps must be application/xml or text/xml.
Fixing "Couldn't Fetch" Errors
Googlebot returns a fetch error when it can't download the sitemap file. Test accessibility first:
curl -I https://example.com/sitemap.xml
Look for:
- HTTP 200: File serves correctly
- HTTP 404: File missing or wrong path
- HTTP 500: Server error (PHP crash, timeout)
- HTTP 403: Permission denied
Verify Sitemap Location
WordPress sites typically generate sitemaps at /sitemap_index.xml (Yoast, RankMath) or /wp-sitemap.xml (core feature since 5.5). Check your plugin's settings to confirm the path.
Static sites place sitemaps at /sitemap.xml. If you moved your sitemap, remove the old URL from Search Console and submit the new one.
Check robots.txt Accessibility
Googlebot reads robots.txt before fetching sitemaps. If robots.txt is blocked, Google can't find the sitemap reference. Test:
curl https://example.com/robots.txt
Ensure the file includes:
User-agent: *
Sitemap: https://example.com/sitemap.xml
Absolute URLs only—relative paths break parsing.
Fix Server Timeouts
Large sitemaps (50k+ URLs) may time out. Break into sitemap index files:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-02-08</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-02-08</lastmod>
</sitemap>
</sitemapindex>
Each child sitemap should contain <50k URLs and compress below 50MB. Submit the index file to Search Console, not individual fragments.
Fixing "Couldn't Read" Parsing Errors
Parsing failures stem from invalid XML syntax. Download your sitemap and validate locally:
xmllint --noout sitemap.xml
xmllint reports line numbers and error descriptions. Common issues:
Invalid Characters
URLs must escape special characters:
| Character | Escaped Form |
|---|---|
| & | & |
| < | < |
| > | > |
| " | " |
| ' | ' |
Example of incorrect URL:
<loc>https://example.com/tags?tag=SEO&category=blog</loc>
Corrected:
<loc>https://example.com/tags?tag=SEO&category=blog</loc>
Missing XML Namespace
Every sitemap needs the namespace declaration:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
</url>
</urlset>
Omitting xmlns causes immediate parse failure.
Mismatched Tags
Every <url> needs a closing </url>. Editors with syntax highlighting (VSCode, Sublime) catch these instantly. Automated sitemap generators rarely produce mismatched tags, but manual edits introduce errors.
Incorrect Date Formats
The <lastmod> tag requires W3C Datetime format (ISO 8601):
<lastmod>2026-02-08T14:30:00+00:00</lastmod>
WordPress plugins sometimes output shortened dates (2026-02-08) which still validate. Avoid non-standard formats like 02/08/2026.
Resolving "Submitted URL Not Found (404)"
Dead URLs in sitemaps waste crawl budget. Googlebot visits each one, receives a 404, and marks it as an error. Run a sitemap audit:
wget -O- https://example.com/sitemap.xml | grep -oP '(?<=<loc>)[^<]+' | while read url; do
status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
if [ "$status" != "200" ]; then
echo "$url - $status"
fi
done
This extracts all URLs from the sitemap and checks HTTP status codes. Output shows non-200 responses.
Regenerate Dynamic Sitemaps
WordPress and JAMstack sites generate sitemaps automatically. If you deleted posts but the sitemap still lists them:
- WordPress: Clear cache (plugin + server), then visit
/sitemap.xmlto force regeneration - Next.js: Rebuild the site (
npm run build) to update static sitemaps - Shopify: Wait 24 hours for auto-regeneration or trigger manually via app settings
Update Static Sitemaps
Hand-maintained sitemaps require manual edits. Remove deleted URLs and add new pages:
nano /var/www/html/sitemap.xml
After edits, validate with xmllint before resubmitting to Search Console.
Handle Soft 404s
Soft 404s return HTTP 200 but display error content. Googlebot detects thin content patterns ("Page Not Found" text, empty templates) and flags them. Fix by:
- Serving proper 404 status codes:
header("HTTP/1.1 404 Not Found"); - Removing soft 404 URLs from the sitemap
- Adding redirects if replacement content exists
Fixing "Submitted URL Marked 'noindex'"
This error means a URL in your sitemap has a noindex directive. Google ignores contradictory signals—the noindex wins and the page never indexes.
Check for noindex tags:
curl -s https://example.com/page | grep -i noindex
Look in:
<meta name="robots" content="noindex">- HTTP headers:
X-Robots-Tag: noindex
Common Causes
Staging site leak: Developers forget to remove noindex when launching. Check wp-config.php (WordPress) or environment variables (Node.js apps).
SEO plugin misconfiguration: Yoast and RankMath have per-page noindex toggles. Bulk edits sometimes apply noindex to public content by accident.
Pagination settings: Some themes noindex paginated archives (/page/2/). If you want these indexed, remove the directive and add them to the sitemap.
Resolution Steps
- Remove the noindex tag from the page source or HTTP headers
- Request reindexing via URL Inspection tool in Search Console
- Verify removal after Googlebot recrawls (usually 3-7 days)
If you intentionally noindexed the page, remove it from the sitemap instead of waiting for Google to ignore it.
Fixing "Submitted URL Blocked by robots.txt"
robots.txt Disallow rules prevent crawling. Pages blocked by robots.txt shouldn't appear in sitemaps. Test a URL:
curl https://example.com/robots.txt | grep -A5 "Disallow"
Example problem:
User-agent: *
Disallow: /admin/
Disallow: /private/
If /admin/settings appears in your sitemap, Googlebot can't crawl it. Solutions:
Remove Blocked URLs from Sitemap
Edit the sitemap generator to exclude disallowed paths. In WordPress (Yoast):
- Go to SEO > General > Features
- Enable Advanced Settings
- Navigate to SEO > Tools > File Editor
- Edit sitemap settings to exclude
/admin/paths
For static sitemaps, filter URLs programmatically:
import xml.etree.ElementTree as ET
tree = ET.parse('sitemap.xml')
root = tree.getroot()
namespace = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'}
blocked_paths = ['/admin/', '/private/']
for url in root.findall('ns:url', namespace):
loc = url.find('ns:loc', namespace).text
if any(blocked in loc for blocked in blocked_paths):
root.remove(url)
tree.write('sitemap-clean.xml', encoding='UTF-8', xml_declaration=True)
Adjust robots.txt Rules
If the URL should be public, remove the Disallow rule. Be cautious—opening admin sections to crawlers risks exposing sensitive URLs in search results.
Fixing "Submitted URL Redirects"
Sitemaps should list canonical URLs, not redirects. When Googlebot encounters a 301, it follows to the destination but flags the original URL as an error. Audit redirects:
wget -O- https://example.com/sitemap.xml | grep -oP '(?<=<loc>)[^<]+' | while read url; do
final=$(curl -Ls -o /dev/null -w "%{url_effective}" "$url")
if [ "$url" != "$final" ]; then
echo "$url -> $final"
fi
done
This shows URLs that redirect. Update the sitemap to list final destinations:
Before:
<loc>https://example.com/old-page</loc>
After:
<loc>https://example.com/new-page</loc>
Fixing WWW vs. Non-WWW Mismatches
If your canonical domain is https://example.com but the sitemap lists https://www.example.com, every URL triggers a redirect. Standardize:
- Check Search Console > Settings > Property to confirm your canonical domain
- Update sitemap URLs to match
- Ensure robots.txt lists the correct sitemap URL
Fixing "Sitemap Contains Invalid URL"
Google rejects malformed URLs. Common mistakes:
Relative paths:
<loc>/blog/post-title</loc> ❌
<loc>https://example.com/blog/post-title</loc> ✅
Missing protocol:
<loc>example.com/page</loc> ❌
<loc>https://example.com/page</loc> ✅
Invalid characters:
<loc>https://example.com/blog/10 tips</loc> ❌ (space)
<loc>https://example.com/blog/10-tips</loc> ✅
Validate programmatically before submission:
from urllib.parse import urlparse
def is_valid_url(url):
parsed = urlparse(url)
return all([parsed.scheme in ['http', 'https'], parsed.netloc, parsed.path])
urls = ['https://example.com/page', '/relative-path', 'example.com']
for url in urls:
print(f"{url}: {'Valid' if is_valid_url(url) else 'Invalid'}")
Verifying Fixes in Search Console
After correcting errors:
- Resubmit the sitemap via Search Console > Sitemaps
- Click Request Indexing for critical URLs using URL Inspection
- Monitor the Coverage report for changes in indexed URL counts
Googlebot recrawls sitemaps every 24-48 hours for active sites. Large sites with infrequent updates may wait 5-7 days. Expedite by:
- Publishing new content (triggers a crawl)
- Submitting individual URLs via URL Inspection
- Increasing crawl rate in Settings > Crawl Rate (for verified high-traffic sites)
Check the Sitemaps report weekly. Error counts should drop to zero within 7 days of fixes. Persistent errors indicate ongoing issues (plugin bugs, incorrect robots.txt, dynamic pages returning 404s).
Automating Sitemap Monitoring
Set up alerts to catch errors before they impact rankings.
Screaming Frog Scheduled Crawls
Screaming Frog can crawl sitemaps on a schedule:
- Go to Configuration > Spider > Crawl
- Enable Crawl Sitemaps
- Set Mode to List and input sitemap URL
- Save crawl config
- Schedule via File > Schedule
Configure alerts for:
- 404 errors exceeding 5% of URLs
- Redirect chains longer than 2 hops
- URLs with noindex tags
Python Monitoring Script
Run daily via cron:
import requests
import xml.etree.ElementTree as ET
def validate_sitemap(url):
response = requests.get(url)
if response.status_code != 200:
return f"Sitemap fetch failed: {response.status_code}"
root = ET.fromstring(response.content)
namespace = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'}
urls = [elem.text for elem in root.findall('.//ns:loc', namespace)]
errors = []
for url in urls[:100]: # Check first 100 URLs
try:
r = requests.head(url, timeout=5)
if r.status_code != 200:
errors.append(f"{url} - {r.status_code}")
except requests.exceptions.RequestException as e:
errors.append(f"{url} - {str(e)}")
return errors if errors else "All URLs valid"
print(validate_sitemap('https://example.com/sitemap.xml'))
Email results or log to a monitoring dashboard.
WordPress-Specific Fixes
WordPress generates sitemaps via plugins or core functionality (5.5+). Common issues:
Yoast SEO Conflicts
If Yoast and WordPress core sitemaps both run, you get duplicate URLs. Disable one:
// Disable core sitemaps in functions.php
add_filter('wp_sitemaps_enabled', '__return_false');
Or disable Yoast sitemaps: SEO > General > Features → Toggle off XML Sitemaps.
Cache Plugin Interference
WP Rocket, W3 Total Cache, and LiteSpeed Cache sometimes serve stale sitemaps. After editing posts:
- Clear all caches (plugin + server)
- Manually visit
/sitemap.xmlto regenerate - Submit to Search Console
Custom Post Type Exclusion
Some themes noindex custom post types. Enable in SEO > Search Appearance > Content Types → Set to Yes for each post type.
Shopify Sitemap Issues
Shopify auto-generates sitemaps at /sitemap.xml. Common errors:
Hidden products in sitemap: Products with "Online Store" channel disabled still appear. Fix by unpublishing completely or archiving.
Collections with no products: Empty collections create thin-content pages. Remove from sitemap by:
- Going to Online Store > Preferences
- Editing
robots.txtto disallow empty collections - Manually excluding via Shopify Scripts (Shopify Plus only)
Variant URLs: Shopify sometimes lists variant URLs (/products/shirt?variant=12345) which redirect. The sitemap should only include parent product URLs. This is a platform bug—report to Shopify Support for escalation.
FAQ
Q: How often does Googlebot check sitemaps? Active sites see daily crawls. Low-traffic sites may wait 5-7 days. Publishing new content triggers immediate recrawls.
Q: Can I submit multiple sitemaps? Yes. Submit up to 500 sitemaps per property in Search Console. Useful for splitting by content type (posts, pages, products).
Q: Should I include images in sitemaps? Yes. Use image extensions to help Google Images discover photos. Each URL can list up to 1,000 images.
Q: What happens if I don't fix sitemap errors? Pages may still index via other discovery methods (internal links, backlinks), but crawl inefficiency delays indexing and wastes budget on error pages.
Q: Do sitemap errors cause ranking drops? Not directly. But delays in indexing fresh content and wasted crawl budget on errors indirectly harm visibility.
Q: Can I remove the sitemap entirely? Yes, but only if your site has strong internal linking. Large sites (1k+ pages) or sites with deep navigation hierarchies need sitemaps for efficient discovery.
Q: How do I handle paginated URLs in sitemaps?
Include page 1 only, or use rel="next"/rel="prev" tags on paginated pages. Google discovers subsequent pages via links.
Q: Should I include nofollow links in sitemaps? No. Sitemaps declare URLs you want indexed. Nofollow doesn't prevent indexing (it prevents link equity transfer), but including URLs you don't want indexed creates confusion.
Q: How long until errors clear after fixes? Most errors resolve within 7 days. Expedite by requesting reindexing via URL Inspection for critical pages.
Q: Can I automate sitemap submission to Search Console? Not directly via API, but you can use ping functionality by requesting:
https://www.google.com/ping?sitemap=https://example.com/sitemap.xml
Triggers immediate recrawl.
When This Fix Isn't Your Priority
Skip this for now if:
- Your site has fundamental crawling/indexing issues. Fixing a meta description is pointless if Google can't reach the page. Resolve access, robots.txt, and crawl errors before optimizing on-page elements.
- You're mid-migration. During platform or domain migrations, freeze non-critical changes. The migration itself introduces enough variables — layer optimizations after the new environment stabilizes.
- The page gets zero impressions in Search Console. If Google shows no data for the page, the issue is likely discoverability or indexation, not on-page optimization. Investigate why the page isn't indexed first.
Frequently Asked Questions
How long does this fix take to implement?
Most fixes in this article can be implemented in under an hour. Some require a staging environment for testing before deploying to production. The article flags which changes are safe to deploy immediately versus which need QA review first.
Will this fix work on WordPress, Shopify, and custom sites?
The underlying SEO principles are platform-agnostic. Implementation details differ — WordPress uses plugins and theme files, Shopify uses Liquid templates, custom sites use direct code changes. The article focuses on the what and why; platform-specific how-to links are provided where available.
How do I verify the fix actually worked?
Each fix includes a verification step. For most technical SEO changes: check Google Search Console coverage report 48-72 hours after deployment, validate with a live URL inspection, and monitor the affected pages in your crawl tool. Ranking impact typically surfaces within 1-4 weeks depending on crawl frequency.