Frequently Asked Questions
What is robots.txt?
robots.txt is a file that tells search engine crawlers which pages they can or cannot access on your site. It's placed in your website's root directory (example.com/robots.txt) and helps manage crawler traffic.
Does robots.txt block pages from search results?
No, robots.txt only prevents crawling, not indexing. Pages can still appear in search results if other sites link to them. To truly block pages from search results, use the noindex meta tag or X-Robots-Tag header.
What is the wildcard user-agent (*)?
The wildcard (*) user-agent applies to all crawlers that don't have specific rules. It's recommended to always include a * section as a fallback for unknown or new crawlers.
How do Allow and Disallow interact?
When both Allow and Disallow rules match a URL, the more specific (longer) rule takes precedence. If they're equal length, Allow typically wins. This varies slightly between search engines.
Should I block AI crawlers like GPTBot?
It depends on your preference. If you don't want your content used to train AI models, you can block bots like GPTBot (OpenAI), CCBot (Common Crawl), and others. However, this won't affect AI assistants that have already been trained on your content.