What is a robots.txt file and why do I need one?

A robots.txt file tells search engine crawlers what to crawl by specifying allowed paths, what to avoid through disallowed directories/files, how to crawl using crawl-delay directives, and where to find sitemaps. It's the first file crawlers look for when visiting your site.

Can robots.txt prevent pages from being indexed?

No! Robots.txt only controls crawling, not indexing. Pages blocked via robots.txt may still appear in search results if other sites link to them, they're included in XML sitemaps, or have canonical tags. To prevent indexing, use meta robots noindex tags, password protection, or X-Robots-Tag HTTP headers.

What are the main directives in robots.txt files?

Main directives include User-agent (specifies which crawler), Disallow (blocks crawling of specific paths), Allow (overrides Disallow for specific paths), Crawl-delay (sets delay between requests), Sitemap (indicates XML sitemap location), and Clean-param (ignores URL parameters).

How does robots.txt affect SEO?

While not a direct ranking factor, robots.txt impacts SEO positively by preserving crawl budget for important pages, preventing duplicate content issues, and reducing server load. Risks include accidentally blocking important pages, over-restrictive crawl delays, and blocking CSS/JS needed for rendering.

Where should I place my robots.txt file?

The robots.txt file must be located at your site's root (https://example.com/robots.txt). Upload to root directory, ensure UTF-8 encoding, verify HTTP 200 status code, keep file size under 500KB, and use plain text format.

What crawl-delay should I use for different hosting types?

Recommended crawl delays: Shared hosting - 10 seconds, VPS - 5 seconds, Dedicated server - 1-3 seconds. Note that Google ignores crawl-delay and uses dynamic crawling based on server response times. This directive primarily affects Bing and Yandex.

Free Robots.txt Generator - Create SEO-Optimized Robots Files

Sitemap URL (optional)

Allow all crawlers by default

Crawler-Specific Directives

Select Crawler

Your Robots.txt File

User-agent: *
Disallow:

Implementation Instructions:

Copy the generated code
Upload to your website's root directory (e.g., https://example.com/robots.txt)
Verify in Google Search Console

Robots.txt FAQ

Everything you need to know about controlling search engine crawlers

A robots.txt file tells search engine crawlers:

What to crawl: By specifying allowed paths
What to avoid: Through disallowed directories/files
How to crawl: Using crawl-delay directives
Where to find sitemaps: Via sitemap declarations

It's the first file crawlers look for when visiting your site (HTTP status code 200 should return when accessing yourdomain.com/robots.txt).

No! Robots.txt only controls crawling, not indexing. Pages blocked via robots.txt may still appear in search results with a "No information is available for this page" snippet if:

Other sites link to them (Google may infer their existence)
They're included in XML sitemaps
They have canonical tags pointing to them

To prevent indexing, use:

<meta name="robots" content="noindex"> tags
Password protection
X-Robots-Tag HTTP headers

Rule	Example	Purpose
`User-agent:`	`User-agent: Googlebot`	Specifies which crawler the rules apply to
`Disallow:`	`Disallow: /private/`	Blocks crawling of specific paths
`Allow:`	`Allow: /public/`	Overrides Disallow for specific paths
`Crawl-delay:`	`Crawl-delay: 10`	Sets delay between crawl requests (seconds)
`Sitemap:`	`Sitemap: https://example.com/sitemap.xml`	Indicates location of XML sitemap
`Clean-param:`	`Clean-param: ref /products/`	Ignores specified URL parameters

Note: Each directive must be on its own line, and paths are case-sensitive.

Target specialized crawlers with their specific user-agent names:

User-agent: Googlebot-Image
Disallow: /
            
User-agent: Googlebot-News
Disallow: /draft-articles/
            
User-agent: Bingbot
Disallow: /temp/

Major crawlers and their user-agents:

Google (desktop): Googlebot
Google (mobile): Googlebot-Mobile
Google Images: Googlebot-Image
Bing: Bingbot
Yandex: YandexBot
Baidu: Baiduspider
Facebook: facebookexternalhit

While not a direct ranking factor, robots.txt impacts SEO by:

Positive Effects

Preserves crawl budget for important pages
Prevents duplicate content issues
Reduces server load during crawls

Potential Risks

Accidental blocking of important pages
Over-restrictive crawl delays
Blocking CSS/JS needed for rendering

Pro Tip: Always test changes in Google Search Console's robots.txt tester before deployment.

The robots.txt file must be located at your site's root:

Correct: https://example.com/robots.txt
Incorrect: https://example.com/subfolder/robots.txt

Implementation checklist:

Upload to root directory (typically public_html or www)
Ensure UTF-8 encoding
Verify HTTP 200 status code
Keep file size under 500KB
Use plain text format (.txt extension)

Disallow is the primary directive, while Allow creates exceptions:

Scenario: Block /private/ except one subfolder

User-agent: *
Disallow: /private/
Allow: /private/public-files/

Key rules:

More specific paths take precedence
Order of directives matters (first match wins)
Google supports unlimited Allow directives, while other crawlers may not

Avoid: Disallow: (empty value) - this actually allows crawling!

Support varies by search engine:

Pattern	Google	Bing	Yandex	Example
`*` (wildcard)	✅ Yes	❌ No	✅ Yes	`Disallow: /*.jpg$`
`$` (end anchor)	✅ Yes	❌ No	✅ Yes	`Disallow: /print$`
Full regex	❌ No	❌ No	✅ Limited	`Disallow: /user/*/profile`

Best practice: Stick to basic patterns for maximum compatibility.

Crawl delay recommendations based on server capacity:

Shared Hosting

Crawl-delay: 10

(1 request every 10 seconds)

VPS

Crawl-delay: 5

(1 request every 5 seconds)

Dedicated Server

Crawl-delay: 1-3

(1-3 requests per second)

Note: Google ignores crawl-delay and instead uses dynamic crawling based on server response times. This directive primarily affects Bing/Yandex.

Yes, you can include multiple sitemap declarations:

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/news-sitemap.xml
Sitemap: https://example.com/image-sitemap.xml

Best practices:

Include all relevant sitemaps (pages, images, videos, news)
Use absolute URLs
Place sitemap declarations at the top of the file
Keep sitemaps updated (crawlers check them frequently)

Pro Tip: Also submit sitemaps directly to Google Search Console for faster discovery.

Sensitive Information

Avoid exposing private paths like /admin/, /wp-login/ - this actually reveals their location to malicious bots.

CSS/JS Files

Never block .css or .js - Google needs these to properly render and index pages.

Important Content

Don't accidentally block pages you want indexed - double-check with URL Inspection tool.

Use these free testing tools:

Verify syntax is error-free
Check if important pages are blocked
Test with different user-agents
Confirm sitemap is accessible

Free Robots.txt Generator

Crawler-Specific Directives

Your Robots.txt File

Implementation Instructions:

Why Robots.txt Matters for SEO

Crawl Control

Server Efficiency

Security