Robots.txt Analyzer

Free robots.txt analyzer. 22+ best-practice checks: sitemap declared, AI crawlers (GPTBot, ClaudeBot), blocked admin paths, WooCommerce rules.

About the tool

What is the SBMM Robots.txt Analyzer?

The SBMM Robots.txt Analyzer is a free online tool that fetches any domain's robots.txt file and grades it against 22 plus production-grade best-practice checks. It is the fastest way to confirm Googlebot can actually reach your important URLs, that admin endpoints are properly blocked, and that the AI crawler policy you intended is the one Google and OpenAI actually see.

Your robots.txt file is the first thing every search engine and AI crawler reads before touching your site. A single misconfigured Disallow rule can wipe out organic traffic overnight by blocking Googlebot from your money pages. A missing rule can leak admin paths to scrapers or quietly opt your content into training datasets you wanted to keep out.

The analyzer runs the same checks an experienced SEO would run by hand, including sitemap declaration, admin path coverage, AI crawler permissions (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Applebot-Extended, Meta-ExternalAgent and more), SEO research bot policy, crawl-delay sanity, WooCommerce cart-and-checkout exclusion, and WordPress wp-includes blocking. Every finding links back to the rule being tested.

Step by step

How to use this tool in 3 steps

  1. Step 01

    Enter the domain you want to audit

    Drop any website URL into the form. You can pass a full URL or just the bare domain; the analyzer fetches the live robots.txt from the root regardless.

  2. Step 02

    22+ best-practice checks run live

    The analyzer pulls the file fresh (no cached data), parses every User-agent group, then runs sitemap, admin path, AI crawler, SEO bot, WooCommerce, WordPress, and crawl-delay checks in one pass.

  3. Step 03

    Read the findings and ship fixes

    See passes, warnings, and errors grouped by severity. Each finding includes the matched rule, why it matters, and how to fix it. Re-run after changes to confirm the rule now passes.

Why this tool

Why use this tool

  • 22+ production-grade checks

    Sitemap declared, blocked admin paths, AI crawler permissions, SEO research bots, WooCommerce cart and checkout, WordPress wp-includes and xmlrpc, crawl-delay, and legacy bad-bot coverage all surface in one report.

  • AI crawler policy verifier

    Tests your robots.txt against the full set of 22 known AI crawlers including GPTBot, ClaudeBot, PerplexityBot, Google-Extended for Gemini training, Bytespider, Applebot-Extended, Meta-ExternalAgent, and CCBot.

  • WooCommerce and WordPress modes

    Toggle on the WooCommerce check to verify cart, checkout, and my-account paths are excluded. WordPress mode adds checks for wp-includes, xmlrpc.php, and the readme.html file that leaks the install version.

  • Per-rule fix guidance

    Every failed check returns the exact rule pattern to add, a copy-paste-ready snippet, and a link to the Google or vendor documentation page that backs the recommendation.

  • Live fetch, no cached data

    The analyzer pulls the live robots.txt at audit time, not from a stale crawl index, so the result reflects exactly what Googlebot saw on its last crawl.

  • Free, 10 runs a day

    Run it on your own site, a client site, or a competitor. Ten free runs per day cover any normal SEO workflow. SBMM Pro lifts the cap and adds bulk multi-domain audits.

FAQ

Frequently asked questions

What is a robots.txt file?

Robots.txt is a plain-text file at the root of every website (your-site.com/robots.txt) that tells search-engine and AI crawlers which paths they can fetch and which they cannot. It is the first thing every well-behaved bot reads before crawling, and the rules in it directly control which pages can show up in Google or get cited by ChatGPT.

Why audit a robots.txt file?

A misconfigured Disallow rule can wipe organic traffic overnight by blocking Googlebot from your money pages. A missing rule can leak admin endpoints to scrapers or accidentally allow AI crawlers you wanted to block from training on your content. An audit catches these issues before they cost you rankings or content rights.

Does this tool edit my robots.txt file?

No. It only fetches and reads the file. Nothing is uploaded, nothing is changed on your server. The recommendations tell you what to change, but you make the edits in your own CMS or by uploading a new file to your root.

Which AI crawlers does the audit test for?

It covers 22 plus bots: GPTBot, OAI-SearchBot, and ChatGPT-User from OpenAI; ClaudeBot, Claude-Web, anthropic-ai, and Claude-SearchBot from Anthropic; Google-Extended for Gemini training; PerplexityBot for live AI search; Applebot-Extended; Meta-ExternalAgent; Bytespider; CCBot; cohere-ai; and several more.

Should I block all AI crawlers?

It depends on your content rights strategy. Blocking GPTBot, ClaudeBot, and Google-Extended prevents your content being used to train future model versions but does not affect live AI search citations. Most publishers block training crawlers but leave live search crawlers like PerplexityBot allowed so they keep getting cited in answers.

Can I test a staging site?

Only if the staging site is publicly reachable. If it is behind HTTP auth or an IP allowlist, the analyzer cannot fetch its robots.txt. For private staging, copy the file contents into a local syntax checker before deploying it to production.

What is the difference vs the Robots.txt Generator tool?

The Analyzer grades an existing robots.txt. The Robots.txt Generator builds one from scratch using a wizard (Generic, WordPress, WooCommerce, Shopify templates plus AI crawler blocks and custom rules). Use the Generator to build, the Analyzer to verify what is live. For a focused AI-bot-only audit, run the AI Crawler Access Checker.

How often should I run this audit?

Run it after every robots.txt change, every CMS migration, every major site relaunch, and at least once a quarter on production as a regression check. A two-minute audit is a cheap way to catch the kind of mistake that costs a year of rankings. Pair it with a sitewide site audit on the same domain so crawler-policy issues and indexability findings line up.