Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain text file at the root of a website that tells search engine and AI crawlers which URLs they can and cannot access. It follows the Robots Exclusion Protocol (REP), a standard created in 1994. Every major crawler — Googlebot, Bingbot, GPTBot, ClaudeBot — checks robots.txt before crawling a site. The file is advisory: well-behaved bots respect it, but malicious scrapers may ignore it.

Question 2

Where should I place my robots.txt file?

Accepted Answer

The robots.txt file must be at the exact root of your domain, accessible at https://yourdomain.com/robots.txt. It must be at the top-level directory — placing it in a subdirectory like /blog/robots.txt has no effect. For subdomains, each one needs its own robots.txt at its respective root (e.g., https://api.yourdomain.com/robots.txt).

Question 3

Can robots.txt block all crawlers?

Accepted Answer

Yes. Setting User-agent: * with Disallow: / instructs all compliant crawlers to avoid your entire site. However, robots.txt is advisory, not enforceable — malicious bots and scrapers will ignore it. For sensitive content, use server-side authentication, IP blocking, or firewall rules instead of relying on robots.txt alone.

Question 4

Should I include a sitemap in robots.txt?

Accepted Answer

Yes, including a Sitemap directive is a best practice. It helps search engines discover and index your pages faster, especially on large sites where internal linking alone may not reach every URL. The directive goes at the bottom of the file: Sitemap: https://yourdomain.com/sitemap.xml. You can list multiple sitemaps.

Question 5

How do I block AI crawlers like GPTBot and ClaudeBot?

Accepted Answer

Add separate User-agent blocks for each AI crawler you want to block. For example: User-agent: GPTBot, Disallow: / blocks OpenAI's crawler. User-agent: ClaudeBot, Disallow: / blocks Anthropic's crawler. Other AI bot user-agents include Google-Extended (Gemini training), CCBot (Common Crawl), and Bytespider (ByteDance). This generator includes these user-agents in its dropdown for easy selection.

Question 6

What is the difference between Crawl-delay and rate limiting?

Accepted Answer

Crawl-delay is a robots.txt directive that asks bots to wait a specified number of seconds between requests. Bing and Yandex respect it, but Google ignores it — Google's crawl rate is managed through Google Search Console instead. For true rate limiting, use server-side controls like nginx rate limiting or a CDN's bot management features. Crawl-delay is a polite request, not an enforcement mechanism.

Question 7

Is my data safe when using this tool?

Accepted Answer

Yes. The robots.txt file is generated entirely in your browser using JavaScript. Nothing is sent to any server — you can verify this by opening your browser's Network tab during generation. Your rules, paths, and sitemap URLs never leave your machine.

Directive	Purpose	Example
`User-agent`	Which crawler the following rules apply to	`User-agent: Googlebot`
`Disallow`	Block access to a path	`Disallow: /admin/`
`Allow`	Permit access to a path inside a broader Disallow	`Allow: /admin/public/`
`Sitemap`	Point crawlers to your XML sitemap	`Sitemap: https://example.com/sitemap.xml`
`Crawl-delay`	Seconds between requests (Bing, Yandex — not Google)	`Crawl-delay: 10`

Method	Controls	Scope	Enforcement
`robots.txt`	Crawling (whether a bot visits a URL)	Entire directories or paths	Advisory — bots choose to comply
`<meta name="robots">`	Indexing (`noindex`), following links (`nofollow`)	Individual pages	Respected by major search engines
`X-Robots-Tag` HTTP header	Same as meta robots	Individual URLs or file types (PDFs, images)	Respected by major search engines

robots.txt Generator

What Is robots.txt?

How robots.txt Works

How to Use This Generator

robots.txt Directives Reference

Blocking AI Crawlers in 2025–2026

Common robots.txt Templates

Common Mistakes

robots.txt vs Meta Robots Tag vs HTTP Headers

Other Ways to Generate robots.txt

Frequently Asked Questions

robots.txt Generator

You might also need

.gitignore Generator

Meta Tags Generator

Favicon Generator

HTML Beautifier / Minifier