AI Crawler Access Check
Free tool to test which AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) your robots.txt allows. 22+ bots covered.
About the tool
What is the SBMM AI Crawler Access Checker?
The SBMM AI Crawler Access Checker is a free online tool that tests any URL against the 22 plus known AI crawlers and returns a clear allowed-or-blocked verdict per bot. It tells you exactly which AI training datasets are ingesting your content, which AI search engines can cite your pages, and which crawlers your robots.txt is silently letting through when you thought it was blocking them.
AI crawlers are the bots ChatGPT, Claude, Gemini, Perplexity, and other generative engines send to your website. They read your robots.txt to decide what they can fetch for training, for live search, and for on-demand answers to user queries. Most site owners do not realise the same vendor often runs multiple bots (one for training, one for live search) that need separate access rules to handle correctly.
This checker covers the full current set of AI crawlers across every vendor, classifies each bot by purpose (training versus search versus on-demand), reads your robots.txt, and tells you for each bot whether it is allowed or blocked along with the exact rule that produced the verdict. You finish the audit knowing exactly what your AI access policy actually is, not what you assumed it was.
Step by step
How to use this tool in 3 steps
-
Step 01
Enter the URL you want to check
Drop any public URL into the form. The checker fetches the robots.txt for that domain and parses every User-agent group to see how each known AI crawler is treated.
-
Step 02
Tested against 22+ AI crawlers
Each AI bot in the matrix is matched against the robots.txt rules in priority order (most specific User-agent first, then wildcard). The checker records the matched rule and the resulting allow / block verdict.
-
Step 03
Read the per-bot verdict matrix
See a clear table: bot name, vendor, purpose (training / search / on-demand), allowed or blocked verdict, the exact robots.txt rule that decided it, and a link to the vendor documentation page so you can verify the bot identity.
Why this tool
Why use this tool
-
22+ AI crawlers covered
GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-Web, anthropic-ai, Claude-SearchBot, Google-Extended, PerplexityBot, Applebot-Extended, Meta-ExternalAgent, Bytespider, CCBot, cohere-ai, Omgilibot, FacebookBot, and more.
-
Training vs search bot classification
Each bot is tagged by purpose. Training bots ingest content for future model training. Search bots fetch live for AI Overviews. On-demand bots fetch per-user-query. The right policy depends on which jobs you want to allow.
-
Per-rule diagnostic
Every verdict shows the exact robots.txt rule that produced it (User-agent group plus matched Disallow / Allow line) so you can fix the rule, not guess. No more "I thought I blocked GPTBot" surprises.
-
Vendor doc links per bot
Every bot in the matrix links back to the vendor documentation page that defines its user agent string, purpose, and intended behaviour, so you can verify the bot identity is current with the vendor's latest disclosure.
-
Live robots.txt fetch
The checker fetches the live robots.txt at audit time, not from a stale cache, so the result reflects exactly what the AI crawler sees on its current visit to your domain.
-
Free, 10 runs a day
Run it on your own site, a client, or a competitor. Ten free audits per day cover any normal SEO or content-rights workflow. SBMM Pro lifts the cap and adds bulk multi-URL crawler audits.
FAQ
Frequently asked questions
What is an AI crawler?
An AI crawler is an automated bot that visits your website on behalf of an AI vendor (OpenAI, Anthropic, Google, Perplexity, Meta, and others) to ingest your content for model training, live AI search, or on-demand answers to user queries. Each vendor runs one or more named crawlers with distinct user agent strings and purposes.
How are AI crawlers different from search engine crawlers?
Search engine crawlers like Googlebot exist to build a search index. AI crawlers exist to ingest content for training or for live model retrieval. They use different user agent strings, follow different politeness rules, and respond to separate robots.txt directives, so a site owner needs explicit rules for each.
Should I block AI crawlers from my site?
It depends on your content rights strategy. Blocking GPTBot, ClaudeBot, and Google-Extended prevents your content being used to train future model versions but does not affect live AI search citations from PerplexityBot or OAI-SearchBot. Most publishers block training crawlers but allow live search crawlers to keep getting cited.
What is the difference between GPTBot and OAI-SearchBot?
Both are OpenAI bots. GPTBot crawls your site to gather training data for future ChatGPT versions. OAI-SearchBot crawls for live AI search results inside ChatGPT. Blocking GPTBot keeps your content out of training; blocking OAI-SearchBot means ChatGPT cannot cite you when users ask questions in your niche.
What is Google-Extended?
Google-Extended is the separate user agent Google uses to fetch content specifically for Gemini training and Bard / Search Generative Experience answers. Blocking Google-Extended in robots.txt opts your content out of Gemini training and AI Overviews without affecting normal Googlebot crawling for regular search ranking.
Do AI crawlers respect robots.txt?
The major vendors (OpenAI, Anthropic, Google, Perplexity, Apple, Meta) publicly commit to respecting robots.txt. A small set of lower-quality scrapers ignore it. If you need stronger enforcement than robots.txt, add server-side IP blocks or user-agent firewall rules on top.
Why does the checker need to read my robots.txt?
Robots.txt is the file every AI crawler reads to decide what it can fetch. The verdict for each bot depends on the matched rule in that file. Without reading the live robots.txt, the checker cannot tell you which bots can actually see your site.
How often should I run this audit?
Run it after any robots.txt change, after a major site migration, and at least once a quarter as a maintenance pass. New AI crawlers launch frequently, and existing rules sometimes break in unexpected ways after a CMS upgrade or theme switch, so quarterly is a sane cadence. After verifying access, publish an llms.txt manifest so the bots you allow get a curated reading list.