Control how search engines access and index your site

by Sam Glover on January 29, 2007

The Official Google Blog says a bit about robots.txt, the file on your server that tells search engines how to access your site, what to index, and much more. From the OGB:

robots.txt

However, you may have a few pages on your site you don’t want in Google’s index. For example, you might have a directory that contains internal logs, or you may have news articles that require payment to access. You can exclude pages from Google’s crawler by creating a text file called robots.txt and placing it in the root directory. The robots.txt file contains a list of the pages that search engines shouldn’t access. Creating a robots.txt is straightforward and it allows you a sophisticated level of control over how search engines can access your web site.

The OGB article starts on a detailed instructional guide to robots.txt, with more to come. [via Lifehacker]

Leave a Comment

When you post a comment on this blog, you grant us the right to modify or delete your comment, but we have no duty to do so.

Previous post: Jello.Dashboard: productivity tool for Outlook

Next post: Linux law office?