Archives for March 2006

April Fools Watch

Adotas mentions new AdSense unit sizes of 1×1 pixel and 2560×1920 pixels. What about people with bigger screens than that though?

How the heck did the Sydney Morning Herald find out about Google’s quantum-based spam filtering?

Philipp notices Google’s new room search prototype.

Ask introduces RhymeRank, which lets you find related queries that rhyme with your original query. It may only work in English, but that’s okay: it’s in gamma release.

Here’s my favorite though. MSN allows anyone to make up their own fake search results. I oblige. But the best part? I tried to customize the jokes in Firefox and got an error message: “xml could not be sent: Permission denied.” Meta-humor, or just incompatibility? Either way, it’s funny. Wait, I tried it in IE and it didn’t work either:

MSN April Fools

Oh well, it’s still funny. What’s the best April Fools Joke you’ve seen or done?

Update: Slashdot decided to go with a fresh new pink color scheme and ponies:

Slashdot: Pink with Ponies

Yahoo! bought all of Web 2.0! The whole shebang! I’ve done a panel at the Web 2.0 conference; does this mean that Yahoo! has also acquired . . . me?!

And Google rolls out a new product on the first day of Q2: Google Romance. I’m glad we’re applying the power of our search technology to finding that special someone. Looks like the upload has already crashed under the strain, but I was able to download the FAQ page. Here’s a couple questions:

10. What do you mean when you say Google Romance is a beta product?
What do you mean when you ask us what we mean when we say Google Romance is a beta product? It is what is it, okay? It’s new, it’s probably still buggy, which is to say that yes, by using this product now you conceivably are setting yourself up for a disastrous outcome – but on the other hand, you might also be on the verge of thrilling to an experience that will transform your very existence and only could have come about because you took this step, right here, right now. You’re online; take a chance. We may never pass this way again. Carpe diem. The world could, like, end tomorrow, you know? Gather ye rosebuds while —

11. Okay, okay, I’ll try it.
Great, babe, great.

It’s a shame I’m happily married; this looks like an interesting product. Go ahead and take the tour.

Update: Looks like there’s aliens in Google Earth and a dog on the Whitehouse lawn.

Via Philipp, there’s a ton of news. There’s the new Google Browser. Only 1.68M! Also, Google apparently acquired for “somewhere in the neighborhood of A Dozen Fat Sacks of Mad Cash.”. Nice. And finally, you can track zombies with Google Maps; read how on deathhacker.

If you enjoyed Guitar Hero, it looks like a sequel is coming out: Cowbell Hero! And ibiblio has proposed a <BLING> tag.

Looks like the Wikipedia entry for April Fools 2006 is doing a much better job of tracking breaking April 1st news; I’d just go there.

DOJ sent subpoenas to 34 companies

InformationWeek did a Freedom of Information Act (FOIA) request and discovered that the Department of Justice sent subpoenas to 34 different companies:

The full list of companies subpoenaed by the Department of Justice includes: 711Net (Mayberry USA), American Family Online, AOL, ATT, Authentium, Bell South, Cable Vision, Charter Communications, Comcast Cable Company, Computer Associates, ContentWatch, Cox Communications, EarthLink, Google, Internet4Families, LookSmart, McAfee, MSN, Qwest, RuleSpace, S4F (Advance Internet Management), SafeBrowse, SBC Communications, Secure Computing Corp., Security Software Systems, SoftForYou, Solid Oak Software, Surf Control, Symantec, Time Warner, Tucows (Mayberry USA), United Online, Verizon, and Yahoo.

I did a declaration in this case, so I won’t comment. Read the article for more details.

Dropping Valleywag

I’m dropping Valleywag from my daily RSS reading. It’s not you Nick, it’s me: I have too much snark in my life already. I think Russell Beattie nailed the trend a couple months ago. But keep fighting the gossipy fight, and maybe I’ll find my way back eventually.

SEO Advice: Use text

This tip is simple: don’t bury words in an image, especially if those words don’t appear in normal, index-able text.

Q & A thread: March 27, 2006

Okay, let’s try tackling a few questions from the Grab bag thread. Just a hint for next time: if your question takes three paragraphs to ask, your odds of getting an answer go down. 🙂

Q: “Is Bigdaddy fully deployed?”
A: Yes, I believe every data center now has the Bigdaddy upgrade in software infrastructure, as of this weekend.

Q: “What’s the story on the Mozilla Googlebot? Is that what Bigdaddy sends out?”
A: Yes, I believe so. You will probably see less crawling by the older Googlebot, which has a User-Agent of “Googlebot/2.1 (+”. I believe crawling from the Bigdaddy infrastructure has a new User-Agent, which is “Mozilla/5.0 (compatible; Googlebot/2.1; +”

Q: “Do you take Emmy with you to San Francisco?”
A: Nope, Emmy is a true indoors cat; she doesn’t like to travel.

Q: “Any new word on sites that were showing more supplemental results?”
A: An additional crawling change to show more sites from those sites was checked in late last week, but it may still take a little bit of time (another few days) for that to show up in the index. I’ll keep an eye on sites that people have given as examples to see how those sites are showing up.

Q: “Is the RK parameter turned off, or should we expect to see it again?”
A: I wouldn’t expect to see the RK parameter have a non-zero value again.

Q: “What’s an RK parameter?”
A: It’s a parameter that you could see in a Google toolbar query. Some people outside of Google had speculated that it was live PageRank, that PageRank differed between Bigdaddy and the older infrastructure, etc.

Q: “Now that Bigdaddy is out, will there be a new export of PageRank anytime soon?” and “Will the deployment of BigDaddy stabilise the rolling PR issues we are experiencing at present?”
A: I’ll ask around about that. If there aren’t any logistical obstacles, I’ll ask if we could make a new set of PageRanks visible within the next couple weeks. I’d expect that as Bigdaddy stabilizes everywhere, the variation in toolbar PR for individual urls is more like to settle down too.

Q: “This datacentre works differently to all of the others. Noticed just a few hours ago. . . . . Where does that DC fit into the scheme of things? Is it mainly made from newly spidered data?”
A: Sharp eyes, g1smd. That wouldn’t surprise me. As Bigdaddy cools down, that frees us up to do new/other things.

Q: “Not so much a question… GET A PSP!”
A: I got one today, TallTroll. I picked up Me and My Katamari (MAMK) and a PSP that turned out to have firmware v1.52 on it. So I could upgrade to 2.0, then downgrade to 1.5 so I could run homebrew programs. But I think MAMK requires firmware 2.5 or 2.6 to play, which means a one-way upgrade or maybe using RunUMD or a similar program. Suffice it to say I’m having fun just geeking around. 🙂

Q: “Can you give us a general way of getting a good idea in front of Google?”
A: If it’s bizdev, there’s a bizdev dept. at Google you could contact. If it’s not a business/patent/proprietary idea, I’d mention it here or blog about it somewhere. Writing a snail mail letter could work well too.

Q: “Did you check out the guys all painted in silver doing the robot on milk crates in San Fran?”
A: Nope, that’s down by Fisherman’s Wharf. We’re hanging near Union Square.

Q: “Why do you focus your attention so much on SEOs and not at webmasters who make actual quality websites?”
A: I think that’s an issue I have personally, because I spend so much of my time looking at spam. Lots of other people focus on helping general webmasters, like the Sitemaps team, for example. I have started to do “SEO Advice” posts instead of just “SEO Mistakes” posts, but you’re right: I personally could use a reminder to keep focusing on the sites that make quality content and how to pull those sites up, not just how to counter sites that cheat. Thanks for bringing that up.

Q: “My sitemap has about 1350 urls in it. . . . . its been around for 2+ years, but I cannot seem to get all the pages indexed. Am I missing something here?”
A: One of the classic crawling strategies that Google has used is the amount of PageRank on your pages. So just because your site has been around for a couple years (or that you submit a sitemap), that doesn’t mean that we’ll automatically crawl every page on your site. In general, getting good quality links would probably help us know to crawl your site more deeply. You might also want to look at the remaining unindexed urls; do they have a ton of parameters (we typically prefer urls with 1-2 parameters)? Is there a robots.txt? Is it possible to reach the unindexed urls easily by following static text links (no Flash, JavaScript, AJAX, cookies, frames, etc. in the way)? That’s what I would recommend looking at.

Q: “When I change a robots.txt to exclude more existing files from being crawled, how long does it take for them to be removed from the index? Perhaps the answer is a function of how often the site is crawled and it’s PR?”
A: It is a function of how often the site is crawled. I believe in the past that every several hundred page fetches or several days, the bot would re-check the robots.txt. Note that for supplemental results, you need recrawling to happen by the supplemental Googlebot in order for the robots.txt file to take affect on those pages. If you’re really sure you never want those pages to be seen, you can use our url removal tool to remove urls for six months at a time. But I’d be very careful with the url removal tool unless you’re an expert. If you make a mistake and (for example) remove your entire site, that’s your responsibility. Google can sometimes clear out self-removals, but we don’t guarantee it.

Q: “I would love to be able to search for html code and see how that ranks.”
A: I would like that too. Indexing non-visible things like punctuation, JavaScript, and HTML would be great, but it would also bulk up the size of the index. Any time you’re considering a new feature (e.g. our numrange search), you have to trade off how much the index would get bigger versus the utility of the feature. My guess is that we wouldn’t offer this any time soon.

Q: “Seriously, How do you plan on picking which of these questions to answer?”
A: I’m tackling the ones that looked interesting, short, and general enough that more than one person would be interested.

Q: “I am seeing a lot of sites with “%09″ (tab) and “%20″ (space) in front of the URL in Googles index.”
A: I’ll ask someone about that.

Q: (paraphrasing) The sitemaps validation fetch seems to happen with a User-Agent of “-“? My auto-reject rules reject that user agent.
A: I’ll ask someone about that. You could whitelist the IP range that Googlebot comes from in the mean time.

Q: “If one were to offer to sell space on their site (or consider purchasing it on another), would it be a good idea to offer to add a NOFOLLOW tag so to generate the traffic from the advertisement, but not have the appearence of artificial PR manipulation through purchasing of links?”
A: Yes, if you sell links, you should mark them with the nofollow tag. Not doing so can affect your reputation in Google.

Q: “On sites directed to international audiences with the same (high quality) content in several languages is it better to do several TLDs like,,, and so on or do subdomains like,, or something else like,,”
A: Good question. If you’ve only got a small number of pages, I might start out with subdomains, e.g. or Once you develop a substantial presence or number of pages in each language, that’s where it often makes sense to start developing separate domains.

Q: “Any results on why IDN Domains don’t show pagerank?”
A: I’ve seen a couple that do, but I’ll check into why most don’t. My guess is that there’s a normalization issue somewhere in the toolbar PageRank pathway.

Q: “Would it be possible to add a date range to queries? I might get 91,000,000 results, but the first 200 are 2-3 years old. I would like to limit results to items no more than 6-12 months old.”
A: Check out our advanced search page for this option. Tara Calashain also did some really interesting digging into this too, e.g. this info she uncovered. Google Hacks is a pretty solid book if you’d like to read more fun Google hacks.

Q: “What about the problem of directories and shopping comparison spam overriding real pages?”
A: Fair feedback. I heard that recently from a Googler, too. Sometimes we think of spam as strictly things like hidden text, cloaking, etc. But users think of spam as noise: things that they don’t want. If they’re trying to get information, fix a problem, read reviews, etc., then sites that like aren’t as helpful.

Q: “Are you planning to visit/speak in the UK at all in the near future?”
A: Sadly not. I’m hitting the Boston Pubcon and SES San Jose, but I can only do 4-5 conferences a year.

Q: “The one thing that seems to be getting to people generally, is what are the post Big Daddy intentions? Fixes, spam issues, regeneration of ‘pure’ indices, supp. issues, PR and BL update, etc.”
A: I can’t give a timeline (e.g. “scaling up communication in April, more work on canonicalization in May”) because priorities can change, esp. depending on machine issues, deployments of new binaries, webspam developments, etc. Short-term, I wouldn’t be surprised to see some refreshing in supplemental results relatively soon, and potentially different PageRanks visible in the next couple weeks.

Q: “Even Matt is afraid to use a redirect from to because Google might penalize his website and put it into supplemental hell.”
A: Heh. No, that’s not it. I’m deliberately leaving them separate as a test case to see how we do now and down the road.

Q: “Just like you told me a couple of months ago, the Supplemental Googlebot (SG) got around to my site and things got sorted out. Thanks. . . . . If you are in San Fran and want to check out the Monterey Aquarium, could you please write a short review? I’ve been thinking of visiting and wondering if it is worth the trip.”
A: I would definitely recommend the Monterey Bay Aquarium, especially if you can find a coupon or other good deal. I highly recommend the otters, the kelp forest, and the jellyfish area.