By using this site you agree to the use of cookies by Brugbart and our partners.

Learn more

Block or Allow robots using the robots.txt file

How to Block or Allow search engines and other robots, including a list of robot names.

Edited: 2011-07-24 18:49

To only Block or allow access for certain crawlers or robots, you would need to know how the robot identifies it self. The best way to figure that out, would be to either look at their websites, or search for this information on Google.

This article shows how to exclude a few different well-known search engines. Note. It might be a better option to totally exclude all other search engines, then the ones that you specifically allow.

Google

Below would prevent Google from indexing images placed in a directory called images, as well as any other files located in the images Directory

 User-agent: Google

Bing

Microsofts Bing identifies it self as msnbot, and can be blocked with robots.txt using the below code.

 User-agent: msnbot

Yahoo

Yahoo uses the name "Slurp" for their search engine, perhaps reflecting the sound you make when you eat? Anyway, the way to block Yahoo is shown below:

User-agent: Slurp

Alexa

Alexa and the Internet Archive identifies it self as ia_archiver. If you don't want past versions of your website to be archived, then you might want to consider blocking the Internet Archive.

User-agent: ia_archiver