Sometimes people think that the Google Toolbar led to Google indexing a page. Here’s a recent such story, for example, which speculates how urls with the substring “mms2legacy” got indexed. Here’s where I started to disagree:
The reason for this [supposedly unlisted urls getting crawled --Matt], explained Ken Simpson, CEO of anti-spam company MailChannels, is that one’s Google Toolbar may be configured to pass URLs that one visits to Google for indexing. “If you run Google Toolbar, it knows pages you visit,” he said.
Sorry, but if Ken Simpson is implying that the Google Toolbar led to these urls being crawled, then he’s mistaken. Let’s take the first result from the [inurl:mms2legacy] query given in the article. The first url in that result set that I saw was http://mediamessaging.o2.co.uk/mms2legacy/showMessage2.do?encMmsId=F1ABCF6D326A3F65 . Well, if you take the string F1ABCF6D326A3F65 from that url and search for that then you’ll find multiple references to that url. In the cases I looked into, we found these pages via someone publishing a link on http://my.opera.com or other places around the web. I can definitively say that all the urls I looked into were discovered via crawling regular old links.
Folks with great memories may remember that I’ve talked about this before. Back in 2006, both Philipp Lenssen and Google OS did controlled experiments by visiting unlinked deep pages with the toolbar, and both concluded that the toolbar did not lead to those urls being indexed.
It’s good to reiterate this every couple years though, especially as Google has gotten better at finding new pages as it crawls. We get questions like this often enough that we have an FAQ answer about it:
Why is Googlebot downloading information from our “secret” web server?
It’s almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your “secret” server to another web server, your “secret” URL may appear in the referrer tag and can be stored and published by the other web server in its referrer log. So, if there’s a link to your “secret” web server or page on the web anywhere, it’s likely that Googlebot and other web crawlers will find it.
Security through obscurity is not a great way to keep a url from being crawled. If you don’t want your content in Google’s web index then we provide a ton of advice on how to prevent that content from getting into Google.