By using this site you agree to the use of cookies by Brugbart and our partners.

Learn more

How to Block or 404 out your Index

This article explains how you block your index.html or index.php with robots text, to avoid Duplicate Content. It also shows how to throw out a 404, which is likely the best option.

Edited: 2013-03-21 06:53

Avoiding Duplicate Content is important, because you can't control which of the pages that will show up in the search results on search engines. For instance, you could own a website on the following domain http://brugbart.com/, the Directory Index is usually named index.html by default. Most people know this, and often they will type in index.html in their browsers address bar, for whatever reason.

The threat is if someone starts to link to the index.html, or if search engines somehow get to know about the URL. There is a number of ways to prevent this, the first I'm going to mention uses the robots.txt, to disallow access to the page.

Using Robots.txt

Blocking search engines from indexing the index.html, from within the robots.txt, can be done fairly easy. Most major search engines recognise the robots.txt, and respect the rules you set inside of it. Example below:

User-agent: *
 Disallow: /index.html

Of cause you can also disallow access to specific search engines.

User-agent: *
 User-agent: Google
 User-agent: Yahoo
 Disallow: /

Using PHP

If you are using 6 on your site, then i do think that this is one of the best methods. We simply check to see if the requested path equals /index.php, this can be done with a simply PHP if statement. The $_SERVER['REQUEST_URI'] variable contains the requested path, as a root-relative text string, we can easily check that this doesn't equal /index.php, so simply include something like the below, somewhere in the top of your source.

if ($_SERVER['REQUEST_URI'] == '/index.php') {
  header('HTTP/1.1 404 Not Found');
  include_once '404.php';
  /* mysql_close($Connection); // include if you use MySQL */
  exit();
}

The 404 error page is optional, but i do suggest that you make one. You can also read the article titled Creating a Custom 404 Error Page, there is a sample file included that you can use.