robots.txt Generator

Build a robots.txt file with a visual editor — search engines and AI crawlers

What Is robots.txt?

A robots.txt file is a plain text file at the root of your website that tells web crawlers which URLs they can and cannot access. It follows the Robots Exclusion Protocol (REP), a de facto standard created in 1994 that virtually all legitimate bots respect — from search engine crawlers like Googlebot and Bingbot to AI training crawlers like GPTBot and ClaudeBot.

When a crawler arrives at your site, the first file it requests is /robots.txt. Based on the directives it finds, the crawler decides which paths to visit and which to skip. This gives site owners granular control over crawl behavior, crawl budget allocation, and — increasingly — whether AI companies can use their content for model training.

How robots.txt Works

The file consists of one or more rule groups, each targeting a specific user-agent (crawler). Each group contains Allow and Disallow directives that specify URL paths.

User-agent: *
Disallow: /admin/
Disallow: /api/
Allow: /api/docs/

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

The * wildcard matches all crawlers. Specific user-agent rules override the wildcard for that bot — so in the example above, GPTBot is blocked from everything while other crawlers can access most of the site.

How to Use This Generator

  1. Select a user-agent from the dropdown — includes Googlebot, Bingbot, GPTBot, ClaudeBot, and other common bots — or type a custom one.
  2. Add paths you want to allow or disallow for that user-agent.
  3. Add more user-agent groups if you need different rules for different crawlers.
  4. Enter your sitemap URL (optional but recommended for SEO).
  5. Click Generate (Ctrl+Enter) to build your robots.txt.
  6. Copy the result and upload it to your website’s root directory.

robots.txt Directives Reference

DirectivePurposeExample
User-agentWhich crawler the following rules apply toUser-agent: Googlebot
DisallowBlock access to a pathDisallow: /admin/
AllowPermit access to a path inside a broader DisallowAllow: /admin/public/
SitemapPoint crawlers to your XML sitemapSitemap: https://example.com/sitemap.xml
Crawl-delaySeconds between requests (Bing, Yandex — not Google)Crawl-delay: 10

When Allow and Disallow rules conflict for the same path, most crawlers (including Google) apply the most specific rule — the one with the longest matching path prefix wins.

Blocking AI Crawlers in 2025–2026

Managing AI crawler access has become one of the primary reasons site owners update their robots.txt. Major AI companies have published official user-agent strings:

  • GPTBot — OpenAI (ChatGPT, GPT model training)
  • ClaudeBot — Anthropic (Claude model training)
  • Google-Extended — Google (Gemini AI training, separate from Googlebot search indexing)
  • CCBot — Common Crawl (dataset used by many AI labs)
  • Bytespider — ByteDance (TikTok, AI models)
  • Applebot-Extended — Apple (Apple Intelligence training)

Blocking these crawlers does not affect your search engine rankings — GPTBot and Google-Extended are distinct from Googlebot. You can block AI training while keeping full search visibility:

User-agent: Googlebot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

This generator includes all major AI crawler user-agents in its dropdown menu for quick selection.

Common robots.txt Templates

Allow everything (default):

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

Block a specific directory:

User-agent: *
Disallow: /admin/
Disallow: /staging/
Sitemap: https://example.com/sitemap.xml

Allow search engines, block AI crawlers:

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Sitemap: https://example.com/sitemap.xml

Block everything (development/staging sites):

User-agent: *
Disallow: /

Common Mistakes

  • Blocking CSS and JS files. Search engines need access to render your pages properly. Blocking /assets/ or /static/ can hurt your search rankings because Googlebot cannot evaluate your page layout.
  • Using robots.txt for security. The file is publicly readable and purely advisory. Never rely on it to protect sensitive data — use authentication, access controls, or server-side IP restrictions instead.
  • Forgetting trailing slashes. Disallow: /admin matches any URL starting with /admin, including /administration and /admin-tools. Use Disallow: /admin/ to target only the /admin/ directory.
  • Missing the wildcard user-agent. If you only write rules for specific bots (like Googlebot), all other crawlers see no restrictions and crawl everything. Always include a User-agent: * block as a baseline.
  • Conflicting rule order. While rule order within a group does not matter for Google (it uses specificity), some older crawlers process rules top-to-bottom. Put more specific Allow rules before broader Disallow rules for maximum compatibility.

robots.txt vs Meta Robots Tag vs HTTP Headers

MethodControlsScopeEnforcement
robots.txtCrawling (whether a bot visits a URL)Entire directories or pathsAdvisory — bots choose to comply
<meta name="robots">Indexing (noindex), following links (nofollow)Individual pagesRespected by major search engines
X-Robots-Tag HTTP headerSame as meta robotsIndividual URLs or file types (PDFs, images)Respected by major search engines

For complete control, use all three: robots.txt to manage crawl budget and block unwanted bots, meta tags or HTTP headers to control which pages appear in search results. A page blocked by robots.txt might still appear in search results (with no snippet) if other pages link to it — only noindex prevents that.

Other Ways to Generate robots.txt

  • This tool — zero-install, visual editor in your browser with AI crawler presets. Best for quick generation without switching contexts.
  • Google Search Console — has a built-in robots.txt Tester for validating your file against Googlebot’s rules. Does not generate files, only validates.
  • Yoast SEO (WordPress) — generates a basic robots.txt through the WordPress dashboard. Limited customization compared to manual editing.
  • Manual editing — create a plain text file named robots.txt in any text editor. Full flexibility, but easy to introduce syntax errors.

Use this generator when you want a correct, ready-to-deploy file with proper syntax — especially when you need rules for multiple user-agents including AI crawlers.

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain text file at the root of a website that tells search engine and AI crawlers which URLs they can and cannot access. It follows the Robots Exclusion Protocol (REP), a standard created in 1994. Every major crawler — Googlebot, Bingbot, GPTBot, ClaudeBot — checks robots.txt before crawling a site. The file is advisory: well-behaved bots respect it, but malicious scrapers may ignore it.

Where should I place my robots.txt file?

The robots.txt file must be at the exact root of your domain, accessible at https://yourdomain.com/robots.txt. It must be at the top-level directory — placing it in a subdirectory like /blog/robots.txt has no effect. For subdomains, each one needs its own robots.txt at its respective root (e.g., https://api.yourdomain.com/robots.txt).

Can robots.txt block all crawlers?

Yes. Setting User-agent: * with Disallow: / instructs all compliant crawlers to avoid your entire site. However, robots.txt is advisory, not enforceable — malicious bots and scrapers will ignore it. For sensitive content, use server-side authentication, IP blocking, or firewall rules instead of relying on robots.txt alone.

Should I include a sitemap in robots.txt?

Yes, including a Sitemap directive is a best practice. It helps search engines discover and index your pages faster, especially on large sites where internal linking alone may not reach every URL. The directive goes at the bottom of the file: Sitemap: https://yourdomain.com/sitemap.xml. You can list multiple sitemaps.

How do I block AI crawlers like GPTBot and ClaudeBot?

Add separate User-agent blocks for each AI crawler you want to block. For example: User-agent: GPTBot, Disallow: / blocks OpenAI's crawler. User-agent: ClaudeBot, Disallow: / blocks Anthropic's crawler. Other AI bot user-agents include Google-Extended (Gemini training), CCBot (Common Crawl), and Bytespider (ByteDance). This generator includes these user-agents in its dropdown for easy selection.

What is the difference between Crawl-delay and rate limiting?

Crawl-delay is a robots.txt directive that asks bots to wait a specified number of seconds between requests. Bing and Yandex respect it, but Google ignores it — Google's crawl rate is managed through Google Search Console instead. For true rate limiting, use server-side controls like nginx rate limiting or a CDN's bot management features. Crawl-delay is a polite request, not an enforcement mechanism.

Is my data safe when using this tool?

Yes. The robots.txt file is generated entirely in your browser using JavaScript. Nothing is sent to any server — you can verify this by opening your browser's Network tab during generation. Your rules, paths, and sitemap URLs never leave your machine.