robots.txt: What to Put in It (and When You Don't Need One)
Quick Answer
robots.txt is a plain text file at the root of your website that tells search engine crawlers which paths they can access. The standard minimum content is three lines: User-agent: * , Allow: / , and Sitemap: https://yoursite.com/sitemap.xml . Most small static websites don't strictly need a custom robots.txt, but pointing at your sitemap is useful. Never use robots.txt to hide sensitive content; bots can ignore it.
A robots.txt file tells search engines which parts of your website they can and can't crawl. It lives at the root of your website, at yoursite.com/robots.txt .
Most static websites don't need a custom one. If you don't upload a robots.txt , search engines will crawl everything, which is usually what you want for a small website. So before you create one, ask whether you actually need it.
You need a robots.txt if:
- You have pages you don't want in search results (an admin area, a thank-you page after a form submission, draft pages)
- You want to point search engines directly at your sitemap
- A search engine bot is hammering your website and you want to slow it down
You don't need one if:
- You want everything indexed (just don't upload the file)
- You think it's "good practice." It isn't, by itself. An empty
robots.txtadds no value.
The standard robots.txt for a small static website
If you want to add one anyway (to point at your sitemap, which is genuinely useful), here's the minimum:
User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xml
User-agent: * means "all bots." Allow: / means "you can crawl everything." Sitemap: tells bots where to find your sitemap.
Blocking specific pages
If you have a thank-you page you don't want indexed:
User-agent: * Allow: / Disallow: /thank-you.html Sitemap: https://yoursite.com/sitemap.xml
Disallow: tells bots not to crawl that path. You can list multiple Disallow: lines for multiple pages or folders.
robots.txt is a polite request, not a security mechanism.
Well-behaved bots (Google, Bing) follow it. Malicious bots ignore it entirely. Don't use it to hide sensitive content. For that, you need actual authentication.
Disallow doesn't always remove pages from search
A Disallow: rule tells Google not to crawl a page. It doesn't tell Google to remove it from search results. If other websites link to the disallowed page, Google may still index the URL with a generic snippet that says "no information available."
To keep a page out of search results entirely, use a <meta name="robots" content="noindex"> tag on the page itself, and do not block the page in robots.txt. Google needs to crawl the page to see the noindex tag. If it can't crawl it, it can't honor the noindex.
The mistake people make most often
In older tutorials you'll see advice to block CSS and JavaScript folders. Don't. Google needs to load your CSS and JS to render the page properly. Blocking them makes Google think your website is broken on mobile, which hurts your ranking.
Bad (do not do this):
User-agent: * Disallow: /css/ Disallow: /js/
The catastrophic mistake
The other one to avoid:
User-agent: * Disallow: /
This means "no bot can crawl anything." If you see this on your live website and didn't mean it, that's why nothing is getting indexed. This usually happens when someone copies a robots.txt from a development website without changing it.
Uploading it
You can either upload an existing robots.txt through the Files section, or create one directly in the platform: choose Create a File, name it robots , pick .txt from the extension dropdown, and paste your rules in the built-in code editor. Either way, the file lives at the root of your project. Verify in Search Console by also visiting yoursite.com/robots.txt in your browser. You should see plain text.
Don't reach for robots.txt to hide pages
Static.app has a better mechanism for genuinely private pages: Page Settings lets you mark any page as Private (only you can see it when logged in) or Password Protected. Both are stronger than a robots.txt Disallow, which is just a polite request bots are free to ignore.
Testing it
In Google Search Console, the URL inspection tool will tell you whether a specific page is blocked by your robots.txt . Paste any URL from your website, and the tool reports its crawl status. Use it whenever you change the file.
What about AI crawlers?
AI companies use bots with names like GPTBot , ClaudeBot , PerplexityBot , and Google-Extended . If you want to control whether they can crawl your content (for training or for live search results), you can add specific rules. For example, to block GPTBot:
User-agent: GPTBot Disallow: /
Most websites that want search traffic should let these bots in, because AI assistants are increasingly how people discover websites. But if you have a reason to opt out, this is where you do it.
Frequently Asked Questions
What happens if I don't have a robots.txt file?
Search engines crawl everything they can find. For most small static websites, that's exactly what you want. The absence of a robots.txt is not a problem; it's the default behavior.
Can robots.txt hurt my SEO?
Yes, if you misconfigure it. The most damaging mistake is Disallow: / , which blocks all crawlers from your entire website. Another common one is blocking your CSS or JS folders, which prevents Google from rendering your pages correctly. Always test changes in Search Console.
How do I tell Google not to index a page?
Use a <meta name="robots" content="noindex"> tag in the page's <head> , not robots.txt. Disallow in robots.txt prevents crawling but doesn't reliably remove pages from search results. For genuinely private pages, use static.app's Page Settings → Private or Password Protection.
Should I block AI crawlers like GPTBot?
Most static websites that want search traffic should let them in. AI assistants are increasingly how people discover websites, and being citable in ChatGPT or Perplexity is valuable. Block specific bots only if you have a clear reason (e.g., you don't want your content used for AI training).
Where do I check if Google can access my robots.txt?
In Google Search Console, use the URL inspection tool. Paste any URL from your website and the tool reports whether it's blocked. You can also visit yoursite.com/robots.txt directly in a browser to see what's served.