Managing Robots.txt for Wordpress

What is robots.txt for?

Robots.txt is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned. 

 


Understanding and Editing Robots.txt configurations

1. Before adding or editing anything in your robots.txt file, it is important to note that you should not edit anything that was included in the stock robots.txt file. The best way to prevent this is to add a commented line with your rules under it:

# sitename robots.txt rules
List out your new rules here, separate from the stock file under your comment.

 

2. The syntax of a robots.txt file is as follows:

User-agent: [the name of the robot the following rule applies to]

Disallow: [the URL path you want to block]

Allow: [the URL path in of a subdirectory, within a blocked parent directory, that you want to unblock]

 

2. After adding your new syntax, ES recommends running your robots.txt file through this robots.txt test to ensure your robots.txt file will work properly. 

 


How is robots.txt implemented on a Drupal site?

For stand alone WordPress sites: the robots.txt file is a standalone file (managed through code) in the docroot of the webserver.

For multisite instances with universal robots.txt configuration:  if all the WordPress sites running off the multisite environment share one robots.txt configuration, then use the robots.txt standalone file (managed through code) in the docroot of the webserver.

For multisite instances with a need for individualized robots.txt configurations: To have independent robots.txt configurations for each site, the multisite environment will need custom server-side apache rules/directives to accommodate various robots.txt files.  Consult your server administrator.​

Recently Updated in Additional Site Management Features