I got up this morning and decided to use a bottle of vitamin C pills as a Magic 8-ball to tell me what to write about today. I asked “What should I talk about, magic vitamin bottle? Good SEO, Bad SEO, or Yahoo’s 19 billion webpages claim and things to know when comparing index sizes?” I gave the bottle three choices because there are three colors of vitamin C pills: purple, garish red, and even more garish red.
Each time I shook the bottle and dropped out a pill, it was purple. Yahoo. Purple. Yahoo. And the third time: purple. Yahoo. Friggin’ weird, man.
Well that’s just tough, vitamin bottle. I believe in free will, it’s Friday, and I don’t feel like discussing the finer points of index size assessment today. It’s not as if Y! is going to change their position on having 19B distinct webpages, so it’ll keep for a while. I want to talk about Bad SEO, sending unsolicited link exchange requests, and JavaScript redirects.
fun little aside within this debate:
yahoo is indexing close to HALF OF A BILLION pages from their own site. If G wants to “catch up” maybe point googlebot at yahoo.com for awhile 😉
When googlebot learns to index sites fully then I will believe your calls of bullshit about Yahoo’s size index.
3 of my sites have less than half the number of indexed pages (and even still many of those are the not indexed though known about variety) in G as in Y.
> and things to know when comparing index sizes?
I’ve allways found it rather silly when little boys compare sizes of … hmm, whatever “objects” they love. But, it takes more than one to turn it into a game and I don’t think anyone can claim Yahoo is the only one that have been playing this game of size.
All major search engines have a great deal of questionable content indexed: Own SERPS, dead pages, duplicate content and general crap. Who’s got the most is not really importan to me – or most other searchers. What is important to me is not if I get 1 million or 10 million results for my query but if I get just that ONE page I was hoping to find. Quality matters – size rarely does.
I wish that all the major engines would find a common way to meassure and compete for quality instead. Danny Sullivan have been preaching this for years but so far with very little response. It would make so much sense instead of this, sorry, but rather childish: Look, mine is bigger than yours, game 🙂
Being a (relatively) uninformed person who can only judge from my own search engine logs, the most obvious difference in completeness of indexing seems to be the query strings. I had an old Drupal site hosted with a lousy provider (whom I’ve since ditched in favor of the wonderful TextDrive people) that wouldn’t let me use mod_rewrite. So I had “ugly” urls. The MSNBot happily indexed the whole site instantly, ugly query strings and all. Yahoo indexed a large chunk of the site, sticking to the places where the query strings were only mildly ugly. Google indexed my front page. Maybe 6 months or so later, they indexed the rest of my site. Went from one page indexed by Google to about 1100. (Just checked, since the results never got cleared from Google’s index, even though the site doesn’t exist and hasn’t for months now. Wow, didn’t realize I had that many!)
But consider Amazon. For any given item on Amazon, there are about 1000 different urls that will take you to that same item. Affiliate links, lookups by ASIN, ISBN, combinations of all of the above, etc – they all effectively represent the same page as long as no redirects were involved, right? Amazon’s certainly not alone in this regard, there’s bazillions of other sites that do similar things. The search engine index could count each version as a seperate entry, or it can be sophisticated and count it as one page somehow, or it can ignore pages with query strings in them entirely and pretend that mod_rewrite doesn’t exist. The search engines all seem to do different things, so it’s really hard to judge what that index size count really even means. If it means anything at all.
P.S. Vitamin C as a magic 8-ball? Hot!
Haha fun story, I guess Yahoo is getting on your nerves 🙂