DelphiFAQ Home Search:
General :: Web publishing
Information about web publishing, how to maintain, optimize and promote a web site.

Articles:

This list is sorted by recent document popularity (not total page views).
New documents will first appear at the bottom.
Recommended links on this topic:
Featured Article

Using robots.txt to block spiders crawling your web site

'Robots.txt' is a plain text file that through its name has special meaning to most decent robots on the web. By defining a few rules in this text file instruct robots to not crawl and index certain files or directories within your site.

If you do not want Google to crawl your site's /pictures folder, you can protect this folder from Google's crawler.

The following gives a few examples how to write a robots.txt file. It has to be placed in the www root directory of your server. On Linux boxes, this is typically /var/www/html.

The following example shows several versions of robots.txt files, separated by a line.


; block Google's image crawler completely User-agent: Googlebot-Image Disallow: /
; block all spiders and bots from those 2 directories User-agent: * Disallow: /cgi-bin/ Disallow: /pictures/
; allow Googlebot to access everything except /cgi-bin ; and all other bots can access nothing ; finally allow ia_archive (alexa.com) to access everything! User-agent: * Disallow: / User-agent: Googlebot Disallow: /cgi-bin/ User-agent: ia_archiver Allow: /

Generated 16:02:19 on Oct 19, 2017