What Is robots.txt?
A robots.txt file is a plain text file at the root of your website that tells web crawlers which URLs they can and cannot access. It follows the Robots Exclusion Protocol (REP), a de facto standard created in 1994 that virtually all legitimate bots respect — from search engine crawlers like Googlebot and Bingbot to AI training crawlers like GPTBot and ClaudeBot.
When a crawler arrives at your site, the first file it requests is /robots.txt. Based on the directives it finds, the crawler decides which paths to visit and which to skip. This gives site owners granular control over crawl behavior, crawl budget allocation, and — increasingly — whether AI companies can use their content for model training.
How robots.txt Works
The file consists of one or more rule groups, each targeting a specific user-agent (crawler). Each group contains Allow and Disallow directives that specify URL paths.
User-agent: *
Disallow: /admin/
Disallow: /api/
Allow: /api/docs/
User-agent: GPTBot
Disallow: /
Sitemap: https://example.com/sitemap.xml
The * wildcard matches all crawlers. Specific user-agent rules override the wildcard for that bot — so in the example above, GPTBot is blocked from everything while other crawlers can access most of the site.
How to Use This Generator
- Select a user-agent from the dropdown — includes Googlebot, Bingbot, GPTBot, ClaudeBot, and other common bots — or type a custom one.
- Add paths you want to allow or disallow for that user-agent.
- Add more user-agent groups if you need different rules for different crawlers.
- Enter your sitemap URL (optional but recommended for SEO).
- Click Generate (
Ctrl+Enter) to build your robots.txt. - Copy the result and upload it to your website’s root directory.
robots.txt Directives Reference
| Directive | Purpose | Example |
|---|---|---|
User-agent | Which crawler the following rules apply to | User-agent: Googlebot |
Disallow | Block access to a path | Disallow: /admin/ |
Allow | Permit access to a path inside a broader Disallow | Allow: /admin/public/ |
Sitemap | Point crawlers to your XML sitemap | Sitemap: https://example.com/sitemap.xml |
Crawl-delay | Seconds between requests (Bing, Yandex — not Google) | Crawl-delay: 10 |
When Allow and Disallow rules conflict for the same path, most crawlers (including Google) apply the most specific rule — the one with the longest matching path prefix wins.
Blocking AI Crawlers in 2025–2026
Managing AI crawler access has become one of the primary reasons site owners update their robots.txt. Major AI companies have published official user-agent strings:
- GPTBot — OpenAI (ChatGPT, GPT model training)
- ClaudeBot — Anthropic (Claude model training)
- Google-Extended — Google (Gemini AI training, separate from Googlebot search indexing)
- CCBot — Common Crawl (dataset used by many AI labs)
- Bytespider — ByteDance (TikTok, AI models)
- Applebot-Extended — Apple (Apple Intelligence training)
Blocking these crawlers does not affect your search engine rankings — GPTBot and Google-Extended are distinct from Googlebot. You can block AI training while keeping full search visibility:
User-agent: Googlebot
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
This generator includes all major AI crawler user-agents in its dropdown menu for quick selection.
Common robots.txt Templates
Allow everything (default):
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Block a specific directory:
User-agent: *
Disallow: /admin/
Disallow: /staging/
Sitemap: https://example.com/sitemap.xml
Allow search engines, block AI crawlers:
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
Sitemap: https://example.com/sitemap.xml
Block everything (development/staging sites):
User-agent: *
Disallow: /
Common Mistakes
- Blocking CSS and JS files. Search engines need access to render your pages properly. Blocking
/assets/or/static/can hurt your search rankings because Googlebot cannot evaluate your page layout. - Using robots.txt for security. The file is publicly readable and purely advisory. Never rely on it to protect sensitive data — use authentication, access controls, or server-side IP restrictions instead.
- Forgetting trailing slashes.
Disallow: /adminmatches any URL starting with/admin, including/administrationand/admin-tools. UseDisallow: /admin/to target only the/admin/directory. - Missing the wildcard user-agent. If you only write rules for specific bots (like Googlebot), all other crawlers see no restrictions and crawl everything. Always include a
User-agent: *block as a baseline. - Conflicting rule order. While rule order within a group does not matter for Google (it uses specificity), some older crawlers process rules top-to-bottom. Put more specific
Allowrules before broaderDisallowrules for maximum compatibility.
robots.txt vs Meta Robots Tag vs HTTP Headers
| Method | Controls | Scope | Enforcement |
|---|---|---|---|
robots.txt | Crawling (whether a bot visits a URL) | Entire directories or paths | Advisory — bots choose to comply |
<meta name="robots"> | Indexing (noindex), following links (nofollow) | Individual pages | Respected by major search engines |
X-Robots-Tag HTTP header | Same as meta robots | Individual URLs or file types (PDFs, images) | Respected by major search engines |
For complete control, use all three: robots.txt to manage crawl budget and block unwanted bots, meta tags or HTTP headers to control which pages appear in search results. A page blocked by robots.txt might still appear in search results (with no snippet) if other pages link to it — only noindex prevents that.
Other Ways to Generate robots.txt
- This tool — zero-install, visual editor in your browser with AI crawler presets. Best for quick generation without switching contexts.
- Google Search Console — has a built-in robots.txt Tester for validating your file against Googlebot’s rules. Does not generate files, only validates.
- Yoast SEO (WordPress) — generates a basic robots.txt through the WordPress dashboard. Limited customization compared to manual editing.
- Manual editing — create a plain text file named
robots.txtin any text editor. Full flexibility, but easy to introduce syntax errors.
Use this generator when you want a correct, ready-to-deploy file with proper syntax — especially when you need rules for multiple user-agents including AI crawlers.