LLMs.txt Generator
Generate the LLMs.txt manifest (llmstxt.org spec). Tell ChatGPT, Claude, Gemini and Perplexity which pages to index. Free, unlimited use.
About the tool
What is the SBMM LLMs.txt Generator?
The SBMM LLMs.txt Generator is a free online tool that produces a spec-compliant llms.txt manifest for your website. LLMs.txt is the September 2024 standard published at llmstxt.org that tells large language model crawlers (ChatGPT, Claude, Gemini, Perplexity) which pages on your site are worth ingesting and citing. Drop in a domain, get back a clean manifest you can upload to the root of your site.
LLMs.txt does for AI search what XML sitemaps do for traditional search. Instead of dumping every URL on your site at the crawler, you hand it a curated list of high-value pages grouped by section, each with a short description that helps the model understand what the page is about before ingesting it. The result is more accurate citations, fewer hallucinated quotes, and a measurable lift in AI-search visibility.
The generator works in two modes. Crawl mode discovers your pages automatically by following internal links from the homepage. Sitemap mode reads your existing sitemap.xml and skips the crawl. Either way the output is a single llms.txt file, grouped by section, ready to upload to your-site.com/llms.txt. The free tier is unlimited.
Step by step
How to use this tool in 3 steps
-
Step 01
Enter your domain
Drop your homepage URL into the form. Pick crawl mode (we discover pages by following internal links) or sitemap mode (we read your existing sitemap.xml) depending on which signal is cleaner for your site.
-
Step 02
We group your URLs by section
The generator pulls the page list, groups URLs by section using your URL structure and on-page H1 signals, drafts a one-line description per page, and assembles a clean spec-compliant llms.txt manifest.
-
Step 03
Download and upload to your root
Download the generated llms.txt file. Upload it to the root of your domain so it lives at your-site.com/llms.txt, exactly like robots.txt. AI crawlers pick it up on their next fetch with no further work needed from you.
Why this tool
Why use this tool
-
llmstxt.org spec compliant
Follows the September 2024 llms.txt specification exactly: H1 site name, blockquote tagline, section H2s, per-URL markdown list items, optional descriptions. Validates against the reference parser.
-
Crawl or sitemap source
Use crawl mode to discover pages from your homepage, or sitemap mode to read pages directly from your existing sitemap.xml. Sitemap mode is faster and respects whatever URL-set logic you already maintain.
-
Auto-grouped by section
URLs are bucketed into logical groups (Blog, Docs, Products, Pricing, About) using URL path structure plus H1 detection so the manifest reads like a curated table of contents, not a flat URL dump.
-
Per-URL descriptions
Each listed page gets a short description pulled from the page meta description or H1 so the AI crawler has context before it decides whether to fetch the full page.
-
AI search citation lift
Pages listed in your llms.txt get a measurable citation lift in ChatGPT, Claude, Gemini, and Perplexity answers, especially for niche or technical queries where AI search struggles to find authoritative sources.
-
Free, unlimited runs
No daily cap, no email gate, no Pro upsell. Run as often as you publish new content. SBMM Pro adds multi-site management and automated re-generation on a schedule.
FAQ
Frequently asked questions
What is LLMs.txt?
LLMs.txt is a 2024 web standard published at llmstxt.org that lets you publish a manifest of high-value pages on your site for large language model crawlers (ChatGPT, Claude, Gemini, Perplexity) to ingest. It is the AI search equivalent of an XML sitemap, optimised for retrieval rather than discovery.
Do I need both robots.txt and llms.txt?
Yes, they serve different jobs. Robots.txt is the access control layer that tells crawlers what they can and cannot fetch. LLMs.txt is the curation layer that tells crawlers what is worth fetching first. Both files coexist at the root of your site and are read by every well-behaved AI crawler.
Will ChatGPT and Perplexity actually read my llms.txt?
Both already check for llms.txt during their live web fetches when answering user queries. Adoption among other vendors (Anthropic, Google, Mistral) is accelerating as the spec matures. Publishing one today positions your site for the next year of AI-search ranking gains as more models adopt the standard.
Where do I upload the file?
Upload it to the root of your domain so it lives at your-site.com/llms.txt. The path is fixed by the spec, same as robots.txt and sitemap.xml. AI crawlers fetch it from that exact location with no extra discovery hint required.
Should I list every page on my site?
No. The point of llms.txt is curation. List the pages that genuinely deserve a citation: high-quality articles, original research, foundational guides, product documentation. Skip thin pages, paginated archives, tag pages, and low-value content the crawler does not need.
How often should I re-generate it?
Re-generate after any major content addition (new pillar article, new product launch, fresh research piece) and at least once a quarter as a maintenance pass. AI crawlers cache the file for a few days, so the propagation lag is short.
What if I block AI crawlers in robots.txt?
If your robots.txt blocks GPTBot, ClaudeBot, or other AI crawlers from fetching your site, they will not be able to fetch your llms.txt either. The two signals must align. Use our AI Crawler Access Checker to verify which bots are currently allowed, then either allow AI access to your high-value pages and publish llms.txt to curate that access, or block AI access entirely (via our Robots.txt Generator) and skip llms.txt.
Does the generator follow my sitemap nesting?
Yes. Sitemap mode follows nested sitemap index files (the master sitemap.xml that links to sub-sitemaps like blog-sitemap.xml, product-sitemap.xml) to any depth so it picks up your full URL set even on large sites.