Making robots.txt File

The robots.txt file is a text file that has specific instructions for search engine robots about specific content that they are not allowed to index. These instructions tells search engine about which pages of a website should be indexed. The address of the robots.txt file is: www.yoursitename.com/robots.txt .

By Default, every robots at first searches for robots.txt file. It then follows the file for indexing the site content.
Any robots.txt file must contain two fields User-agent and Disallow.

User-agent: *
Disallow:

The User-agent field is for specifying robot name for which the access policy follows in the Disallow field. Disallow field specifies URLs which the specified robots have no access to.

User-agent: *
Disallow: /

Here “*” means all robots and “/” means all URLs. This is read as, “No access for any search engine to any URL”. Since all URLs are preceded by “/ ” so it bans access to all URLs when nothing follows after “/”. If partial access has to be given, only the banned URL is specified in the Disallow field.

Lets consider this example:
# access for MSNbot.
User-agent: MSNbot
Disallow:
User-agent: *
Disallow: /category/php/

Here we see that both the fields have been repeated. Multiple commands can be given for different user agents in different lines. The above commands mean that all user agents are banned access to /category/php/ except MSNbot which has full access. Characters following # are ignored up to the line termination as they are considered to be comments.

Make sure that the robots.txt file is always named in all lowercase (e.g. Robots.txt or robots.Txt is incorrect)


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.