I noticed a useful session at the upcoming Search Engine Strategies conference in San Jose. In exactly a month there will be a Bot Obedience class. People sometimes ask me about how to “sculpt” where Googlebot visits, and my only other post about this was pretty technical, so I’ll take a stab at a shorter, clearer post.
At a site or directory level, I recommend an .htaccess file to add password protection to part of a domain. I wrote a quick example of setting up an .htaccess file about this time last year. I’m not aware of any bot (including Googlebot) that guesses passwords, so this is quite effective at keeping content out of search engines.
At a site or directory level, I also recommend a robots.txt file. Google provides a simple robots.txt checking tool to test out files before putting them live.
At a page level, use meta tags at the top of your html page. The noindex meta tag will keep a page from showing up in Google’s index at all. This tag is great on any page that’s confidential. The nofollow meta tag will prevent Googlebot from following any outgoing links from a page. This page shows the proper syntax.
At a link level, you can add a nofollow tag on the granularity of individual links to prevent Googlebot from crawling individual links (you could also make the link redirect through a page that is forbidden by robots.txt). Bear in mind that if other pages link to a url, Googlebot may find the url through those other paths. If you can, I’d recommend using .htaccess or robots.txt (at a directory level) or meta tags (at a page level) to be safe. I’ve seen people try to sculpt Googlebot visits at the link level, and they always seem to forget and miss a few links.
If the content has already been crawled, you can use our url removal tool. This should be your last resort; it’s much easier to prevent us from crawling than to remove content afterwards (plus the content will be removed for six months). This help page discusses how to remove other types of content from Google.
Update: Vanessa Fox pointed out this Googlebot help page which covers a ton of other Googlebot questions.