How it works
- The tester picks the most specific User-Agent block that matches your input (e.g.
Googlebotmatches more specifically than*) - Within that block, both Allow and Disallow rules are checked — the longest matching path wins (per RFC 9309)
- If no User-Agent block matches, the URL is treated as allowed by default
- Wildcards:
*matches any sequence of characters,$matches end of URL
Example robots.txt rules
User-agent: * Disallow: /admin/ # block /admin/ and everything under it Allow: /admin/help # but allow /admin/help Disallow: *.pdf$ # block any URL ending in .pdf Sitemap: https://example.com/sitemap.xml
Common bots reference
Googlebot— Google web crawler. Also:Googlebot-Image,Googlebot-NewsBingbot— Microsoft Bing crawlerBaiduspider— Baidu searchGPTBot— OpenAI crawler (block to opt out of GPT training)AhrefsBot/SemrushBot— SEO tools (often blocked to save bandwidth)