Debunking: Toolbar doesn’t lead to page being indexed

Okay, looks like I’ve got one more debunking (and fun!) blog post in me this weekend.

So many people talk all the time about SEO. Is it better to use hyphens or underscores? Is it better to separate meta tags with commas or spaces? Is it worth doing the table trick? Can the Google Toolbar cause pages to be indexed? Many of these questions work out well if you just experiment with them. Here’s an example.

You sometimes hear people say “I installed the Google Toolbar, and a day later, Google crawled my secret/unlinked page. Clearly installing the Google Toolbar caused that!” Then you’ll often see me post and say “No, it didn’t.” You’ll often see me point to this page that discusses how a page that you think is secret and unlinked can be crawled (hint: our addurl form is one way, referrer leaks is another). Philipp Lenssen decided to try an experiment. He created an unlinked web page in August, then visited it with the Google toolbar to see if it would be crawled. Read his description of the experiment, then come on back.

When I heard about his experiment, I wrote to give him some advice on how to do the experiment well:

Just to be safe, you should make sure that the name isn’t guessable (e.g. use a different long random number for the path/filename). If it’s guessable, someone could submit the url to Google. I’d also keep an eye for any accesses to that page at all, b/c if someone finds it and surfs to a new site, it could leave a referrer in the server log of the dest site, when then might turn into a hyperlink that Google could crawl.

Philipp replied and offered a bet. Eventually we settled on the terms. If the hidden page showed up in our search results, I would autograph this card and send it back to him:

Matt Cutts trading card

(This card is part of a fun series that Philipp did in May 2006.)

If the hidden page (and he didn’t tell me where it was, so it could have been anywhere) never showed up in Google after a couple months, he would send a copy of his book, 55 Ways to Have Fun With Google, to anyone that I chose.

Google didn’t index the hidden page that Philipp visited with the toolbar, so I won the bet. πŸ™‚ Now the question is: who should get the free copy of Philipp’s book? I already bought myself a personal copy of Philipp’s book months ago. Should I donate this new copy to Google’s engineering library or send it to some SEO who needs to have more fun with Google? Let me know your thoughts. The main thing is that I’m glad an experiment by a smart third party supports what I’ve been saying for a while. πŸ™‚

More details for the terminally interested.
Q: What toolbar did Philipp use?
A: I didn’t know until he wrote it up. It turns out Philipp used the Firefox toolbar. However, in the comments on the experiment, Ionut reveals that unbeknownst to both of us, he ran a similar experiment starting in August with the IE toolbar, with the same results.

43 Responses to Debunking: Toolbar doesn’t lead to page being indexed (Leave a comment)

  1. What if both of you autographed the book and you put it up for auction (e.g. on eBay) with the proceeds going to charity?

  2. Ahhh…we all know what referrers are, but what do you mean by referrer leaks ?

  3. Well if you’re looking to unload it I’ll take it. πŸ™‚

    I’ve been doing a lot of research on how to best optimize a website and not sacrifice on design. So this book could be helpful…or as Mulit-Worded Adam said you could auction it and do the charity route…I know of a great charity…the Seth Aldridge Charity for Extended Learning, Future Growth and More Learning Foundation.

    It’s new…I just finished the business plan…in this post… πŸ™‚

  4. How’s about you send that book to me? It’ll help me out since I just launched a SEO site (seonewsblog.com) and I don’t think I’ve read that one yet.

    Thank You

  5. Hehe on the google FAQ page – want to contact them? Go here: http://books.google.com/support/bin/request.py?user_type=user&contact_type=suggest_new&submit=Continue

    Evidently some form is suppoused to go there πŸ™‚

  6. [cross-posting my Blogoscoped comment here]

    Nice experiment – always fun to debunk some of the tin-foil hat conspiracies with a simple test.

    Ditto Ahmedf’s suggestion that you should check your web server logs Philipp – while Google could index the page without looking at the contents (and correctly does IMHO) for blocked pages that are linked as Matt has pointed out in the past), I assume it would at least make an attempt to spider it if it got into the indexing process.

    I.e. a suggested update/clarification to your post would that no attempts to block Googlebot were employed (i.e. robots.txt, meta tag, etc.) and that the web server logs showed that nobody except you (and Matt) accessed that non-password protected page.

    BTW, I didn’t see any toolbar pagerank on the “hidden page” either – just “dotting another ‘i'” for ‘ya … πŸ˜‰

    P.S. Great idea on the charity idea – if I can toss mine in for consideration, how about the University of Maryland Center for Celiac Research – over $14,000 raised so far with offbeat awareness/fundraisers like my Controllable Christmas Lights for Celiac Disease. My kids have this, so it’s personal for me.

  7. send it to me for writing up all your videos :p

  8. I’d say: write out a small competition asking people to write a script you need and give it to the one who writes it fastest πŸ™‚ (and don’t forget to give me a head start because i suggested this πŸ˜‰ )

  9. Tom Churm, if you have referrers turned on and surf from page A to page B, the webserver for page B will see that referer. And sometimes the webserver for page B shows those referers as hyperlinks.

  10. That was comical – of course the toolbar helps getting pages crawled – but if you admit it, well, we all can imagine what would happen… All the frenzy that will follow will make your toolbar sending back wrong data to you, thus messing up whatever plans you might have to do with that data…

    Anyhow, I will blow th whistle – if you want to get you page(s) visited by GBot – all you have to do is:

    With the toolbar instaled, visit the page in question, and then on the toolbar click on “Cached Snapshot of Page”. This will send back request to G for the cache, and if not found – the bot will visit within 2-3 days.

    That’s a fact, and you can deny it all you want πŸ˜‰

  11. Mikkel deMib Svendsen

    Good post, Matt! There are so many myths in SEO that I think there should be plenty for at least a couple of seasons of “SEO Mythbusters” πŸ™‚

    I know, as you know, because I test a lot of it myself, and often what may originally have been seen as a reasonable theory turn out to be wrong. The problem, I think, is that in many forums where “creative” ideas are being discussed people don’t realize that 98% of those discussions are based on assumptions and limited testings – its very often not proven facts. That’s just the way SEO are – especially in the grey areas of SEO πŸ™‚ If you just know that such discussions can be both entertaining and educational.

    Having said that I think we are many that have seen β€œstrange” indexing where it was hard, or impossible, to find the source of the indexing. I am not saying that the answer to that is that the toolbar did it. I am just saying that apparently Google pick up URLs from many places – some of which I may not even know about. And actually, I think that’s fine. If a URL is public and not restricted by robots.txt I don’t mind it getting crawled. If it’s secret it should not be publicly accessible.

  12. please dear god debunk the fact that this new google logo on ads isnt going to stay!

  13. Matt. Earlier this year (or maybe late last year) Google’s regular spider started to take URLs from the AdSense spider’s fetches, to save having to fetch the pages again. Is it possible for a new unlinked-to page, with AdSense ads on it, to end up in the regular index that way? I.e. does the regular system learn of new URLs from the AdSense system?

  14. Saying “Google Funds Terrorists” is like blaming Crest toothpaste if you got cavities from drinking 18 cans of regular Coke a day.

    If you want to stop terrorism, simple – don’t drive a car that runs on gasoline.

  15. Wouldn’t this experiment have been a lot easier and less time consuming simply by sniffing outgoing traffic at the time of visiting the page?

  16. The charity idea is best I must say (drat, Adam already pulled the oft used “charity = instant worthy idea” card), but you might also consider giving it to winner of whoever comes up with an experiment that could disprove the “I added a Google sitemap and all my pages disappeared” theory.

    The title of that book should be changed to…

    “55 Ways to Have Fun with Google and 1 Whopping Good way to Promote your Book”

  17. Man are you UGLY! ;o)

  18. Riddle me this, SpamMan?

    We we’re diagnosing an issue with unexplained errors on our site a few months back subsequent to the creation of a new page (a POST). As I reviewed the logs, I found a hit from an IP that is Google’s and whose referrer was MediaBot … within 4 seconds of the page being created. Frankly, I was impressed. How could MediaBot, or any crawler have known, within a few seconds, that a new page was present? Yes, there was a link to the new page on the site, but 3 second turnaround?

    We discovered the cause of our error; the request was to the URL I had created in my test, but was a GET (not a POST), thus contained none of the required parameters. We handled this case in our code more gracefully, but we still get quite a lot of them. I would be easily prepared to admit that we had a bug in our code, or I hit enter twice or something, except for the IP and referrer in the logs.

    I have Firefox and no other search toolbars installed, and am pretty paranoid about spyware. The requests we get are certainly not limited to pages I enter or others in my company, so it seems likely that the source is widely distributed. By no means are all follow-up hits that quick and there is little regularity to them, but they do seem to happen surprisingly quickly, not infrequently within a matter of a few minutes of a new post.

    Several of us studied the logs pretty carefully and could only deduce that the Googlebar watched for new URLs. I have a Google account and things like personalized search, etc. The privacy statement I agreed to was certainly broad enough to include this kind of thing, and tools like Alexa and others do this. So, I figured, why not GoogleBar? (I was a little miffed that it did a GET when the request was supposed to be a POST).

    So is my conclusion incorrect?

    Tom

  19. Sorry, in my previous post I should have said User-Agent where I said referrer.

  20. Mikkel deMib Svendsen, anyone can run this experiment, which is nice. πŸ™‚

  21. Of course anyone can run it – but will they get a book from Phillip?

    I’ve always wanted to read this book. hint hint

  22. Hi Matt!

    I just came across with one of the power of Google that I didn’t want to extend that far…

    I searched for a name that I was mostly known at school and at the university. Yet I am appearing under that name in Google under an Internet Portal where I stopped using the former name at least 2 years before registering to the portal.

    I don’t like much Google showing me under my old name to a recent page.
    Is there a way to ask Google who on the earth had linked to this portal using my old name (since all people I knew through this portal only knew the new one) and how to ask Google to remove this link?

    I am not keen on all my old class mates (good or bad) reading my comments on this portal just by using the old name.

    Sorry it is out of topic but I didn’t know where to ask.

  23. Send it to Jerry Yang and David Filo. They need all the help they can get.

  24. I’d love to have a signed card to use as a book mark for this book

  25. Simple solution: send me the book πŸ™‚

  26. The charity idea is best I must say (drat, Adam already pulled the oft used β€œcharity = instant worthy idea” card)

    Sorry about that. I’m not one to pull it either, but anything else leads to a “two seven-year-olds, one ice cream cone” scenario and no matter how Matt chooses, he can’t win.

    This was the only out I could see.

  27. Matt, I think that the first 55 comments to this topic should receive a copy of Philipp’s book. Getting the card too would also be great πŸ™‚

  28. Matt,

    Thanks for debunking some myths. It is difficult to sort through pages and pages of webmaster posts to get a good sense for solid webmastering.

    I have a question about Google penalities that may need to be debunked.

    I previously mentioned that I had a problem with Google after inserting an .htaccess file. I got penalized for duplicate content and decided to fix the problem by switching out the platform on my site so that the software produced no duplicate URLs for my site. After about 6 months my site started receiving better placement in Google.

    That only lasted for about 4 months and then it seems I received another penalty (or sandbox) from Google that is again approaching the 6 month mark.

    I’m of the opinion that my original approach to solving the duplicate content problem was the correct one. Approximately 10 months after I switched out the platform from my site, the page count for my site using the site command, site:domain.com, showed that Google’s list of my pages decreased from 90,000 to the more or less correct 9,000 mark. The duplicate pages that now went to a custom 404 error page were all removed from the index,

    The problem is that it’s been three months since those changes to my site have appeared in Google, still, my site rankings are well below my five year historic Google average (I use the 5 year average as a comparative tool for defining what I believe to be a penalty caused reduction in Google rankings.)

    Also, I have individual meta titles and description tags for each page on my site, no affiliates, and it’s basically a content site filled with original articles and photography.

    After all the technical work I’ve invested in my site during the past year, seeing the positive results in how Google now lists my site and still seeing no substantive results, as far as a return of SEPRS go, I’m wondering about the duration of Google penalities.

    If sites are not under a “penalty” and their rankings do not return, is it better to just start over with a new domain. I’ve got ten years invested in my site and have run out of ideas of how to fix it with Google.

  29. After having looked at the card again, I don’t know which is more disturbing:

    1) The idea that maybe Philipp could draw some bad guys and spammers and other characters and create the big G equivalent of Magic: The Gathering.

    2) The number of us that would actually go out and buy the game (admit it, you would too!)

    3) My overwhelming urge to correct Philipp by pointing out that Matt should have an Inigo Montoya sword rather than an axe.

    4) I can clearly identify the font used on the card as Avant Garde.

    5) My compunction to blame Matt for indirectly turning us all into geeks when I already was one beforehand. Oh to hell with it, I’m blaming him. Matt, it’s all your fault. EVIL MATT! EVIL EVIL EVIL!

  30. Send it to Jerry Yang and David Filo. They need all the help they can get.

    I was thinking to send it to Bill Gates πŸ™‚

  31. Since my site is built with MS FrontPage you should give the book to me, it will help relieve the pain.

  32. Adam, I felt a little bit guilty about using the “card” comment, it really is a good idea. Sorry about that.

    The only problem with is the method, what if no one wants to pay for it. Nothing worse than a ebay charity auction that only raises four dollars and fifty eight cents.

    He he (I hope Matt doesn’t use his card on me now!)

  33. Hey matt, since you’re on debunking duty can you do this one for me?

    http://jeremy.zawodny.com/blog/archives/008122.html

    i smell a rat.

  34. My problem is I’ve setup a site using Google Apps, very nice product, easy to use etc. but it has a robots.txt file automatically applied by Google that stops all crawlers from visiting and indexing the site. And yes I’ve tried over-writing the file, no luck. Am I missing something or is not the point of a web page to be found and read. Maybe I should try some of that Goggle Toolbar magic and recite some SEO-vodoo incantations, looks like magic is all that I’ve left to get my site on Google.

  35. No harm, no foul, Pat…now with last name and 20% more cleaning power!

    I look at it like this: if someone would pay $16-$20 retail for a “regular” book, if it’s signed that’s generally worth quite a bit more. At the very least, start the bidding at $10 so at least you get half of face value and cover the auction off.

    I wouldn’t be surprised to see bidding go into 4 digits for something like that.

  36. This is great, but in the meantime, the folk at Webmasterworld are absolutely fuming. Some examples:

    “It’s been a while since the last time we’ve seen this much criticism over Google. And MC popping up here and saying nothing has happened sure doesn’t help.”

    “[Google:] Sorry, our index is currently experiencing technical difficulties… please allow 1 to 4 weeks for us to realize that there’s a problem and then an additional 1 to 6 weeks to fix it. Thank you for your endless patience. In the meantime, to relieve the stress of lost holiday profits, please a) spend more money on Adwords, and b) laugh at the funny fake collector card on Matt Cutt’s blog.”

  37. Good Job Matt your truly a MythBuster. or Seobuster…

    Its nice to know what true and whats a myth..
    With seo most info out thier is just that – a myth

    Matt Cutts the Seobuster……

  38. Many people in China are talking about this issue. Still believe links is a vital part in Google indexed.

  39. It’s a good thing to test it. It may seem somehow logical, but doesn’t work IRL πŸ˜‰

  40. Matt,
    I would love to have that book, but I’m sure it has been given away by now.
    Scott

  41. that would be great if l see the book, by the day, how can we know if toolbar let our page crowled or not. but l use google toolbar for only short searches and for page rank sometimes. thanks for the article.

  42. Earlier this week we put up a nearly-blank page on a small site, with no links in or out whatsoever, and a filename that probably wouldn’t be guessed. I immediately browsed to the page using IE that has the Google toolbar. I looked at the logs a couple hours later — and saw that Googlebot had made 4 hits to the page 20 minutes after it was created and first browsed-to.

    The information from Matt is now 5 months old. I’m wondering if things have changed. We can come up with no other explanation than the Toolbar transmitting data used by Googlebot.

  43. Evidence suggests that Google IS indexing the page we visit when we have Toolbar’s “web-history” and “pagerank” on.

    Why? Look at this for example:
    http://www.google.nl/search?q=Controleer+of+alle+woorden+correct+zijn+gespeld.+site:goudengids.nl&hl=nl&start=10&sa=N&filter=0

    This returns TONS of pages that says “check if the spelling is correct” (=no results of the query).

    ??? It seems that Google is indexing URL generated while we make search queries. Can you please explain?

css.php