XML Sitemap Extractor
Extract every URL from any sitemap.xml. Follows sitemap indexes recursively, dumps the full URL list with path-prefix stats.
About the tool
What is the SBMM XML Sitemap Extractor?
The SBMM XML Sitemap Extractor is a free online tool that pulls every URL from any public sitemap.xml file. Paste a sitemap URL and the extractor recursively follows nested sitemap index files (the master sitemap.xml that links to blog-sitemap.xml, product-sitemap.xml, news-sitemap.xml) to any depth, decodes XML entities correctly, deduplicates URLs across child sitemaps, and dumps the complete URL list with path-prefix statistics.
Pulling URLs from a sitemap is the fastest way to map any website you do not own. Competitive content research, internal-linking audits before a redesign, broken-link checks on a migrated site, content-gap analysis against your own URL set, and topical-cluster reconnaissance all start with the same step: get the full URL list. This extractor delivers it in one paste with no install, no sign-up, and no cap on the free tier for normal use.
The output is a flat list of every URL plus a path-prefix breakdown that shows how the site's information architecture is structured at a glance. You see immediately which subdirectories carry the most content, where the publication strategy concentrates, and which sections are growing or stagnating compared to the rest of the site.
Step by step
How to use this tool in 3 steps
-
Step 01
Paste any sitemap URL
Drop a sitemap URL into the form. It can be the master sitemap.xml at the root of a domain or a child sitemap (blog-sitemap.xml, product-sitemap.xml). The extractor figures out the type automatically.
-
Step 02
Recursive sitemap-index follow
If the URL is a sitemap index file, the extractor follows every child sitemap link to any depth, fetches each child in parallel, parses the XML, decodes entities, and deduplicates URLs that appear in more than one child.
-
Step 03
Read URLs + path-prefix breakdown
See the full URL list, a per-prefix count (how many URLs sit under /blog/, /products/, /docs/), and a copy-and-paste-ready plain text dump. Use the breakdown to spot content concentration, gaps, and growth areas in seconds.
Why this tool
Why use this tool
-
Recursive sitemap-index support
Follows nested sitemap index files (master plus child sitemaps for blog, news, product, video) to any depth. Most large sites publish split sitemaps for performance, and the extractor handles them in one click.
-
Decodes XML entities correctly
XML uses entity codes (&, ", ') for special characters. The extractor decodes these correctly before showing URLs so the output is ready to paste into a crawler, a CSV, or a content-gap report without manual cleanup.
-
Cross-child deduplication
URLs that appear in more than one child sitemap (a blog post referenced by both blog-sitemap.xml and tag-sitemap.xml) are deduplicated automatically so you do not see the same URL listed twice in the report.
-
Handles gzipped sitemaps
Sitemaps served with the .xml.gz extension are decompressed automatically before parsing. No extra step, no plugin, no command-line gunzip; the extractor handles the format detection internally.
-
Path-prefix statistics
Counts how many URLs live under each path prefix (/blog/, /products/, /docs/, /case-studies/) so the site's information architecture surfaces as a clean breakdown instead of a flat list of thousands of URLs.
-
Free, five runs a day
Five full extracts per day on the free tier cover normal competitive-research and audit workflows. SBMM Pro lifts the cap, adds CSV export, and ships URL diff comparison between two sitemaps for tracking content publication over time.
FAQ
Frequently asked questions
What is a sitemap.xml file?
A sitemap.xml file is a machine-readable list of URLs on a website, published in the sitemaps.org 0.9 XML format. Search engines fetch it to discover URLs they may not have found by following links. Every well-maintained site publishes one at the root of its domain so crawlers can index efficiently.
What is the difference between a sitemap and a sitemap index?
A sitemap is a single XML file listing up to 50,000 URLs. A sitemap index is a parent file that lists multiple child sitemaps (sitemap.xml pointing to blog-sitemap.xml, product-sitemap.xml, news-sitemap.xml). Large sites use indexes because the per-file URL cap forces them to split the content across many sitemap files.
Why extract URLs from a sitemap?
It is the fastest way to get a complete URL list from any site you do not own: for competitive content research, content-gap analysis, internal-linking audits before a redesign, broken-link checks on a freshly relaunched site, or pulling the URL set for a topical-cluster reconnaissance report.
Can I extract URLs from a site that does not publish a sitemap?
Not with this tool. If the target site has no sitemap at the conventional locations (/sitemap.xml, /sitemap_index.xml, or declared in robots.txt), you need a crawler instead. Use our Site Audit Pro or any general web crawler to discover URLs by following internal links. To build a missing sitemap, run our Sitemap Generator on the same domain.
How do I find a site's sitemap URL?
Try /sitemap.xml or /sitemap_index.xml at the root of the domain first. If neither resolves, fetch the robots.txt file (/robots.txt) and look for a Sitemap directive line that points to the actual sitemap path. Most CMS platforms publish at the conventional locations by default.
What does the path-prefix breakdown show?
It counts URLs by their path prefix (/blog/, /products/, /docs/) so you can see at a glance how the site distributes content across sections. A site with 80 percent of its URLs under /blog/ is a content-led business. A site with 80 percent under /products/ is an e-commerce site. Useful for sizing a competitor in seconds.
Does it follow sitemap nesting more than one level deep?
Yes, to any practical depth. The extractor recursively follows every sitemap index link until it reaches leaf sitemaps containing actual URL entries. Most real sites only nest one level, but the tool handles deeper nesting transparently.
Can I export the URL list?
The free tier shows the URL list in the browser, ready to copy. SBMM Pro adds direct CSV download, JSON export, and the ability to diff two sitemap extracts to see which URLs were added or removed between two crawls.