Yahoo! gives a nice weather report to announce an index update, and it seemed like a good time to give people an update on search quality/infrastructure on Google going into the fall. The last weather forecast I did was about a month ago, and it was on video. It’s still a good video to go watch as background. Just to be crystal clear, each of the following paragraphs is talking about a different piece of infrastructure.
Bigdaddy was a software upgrade to how we crawl and partially how we index the web. It was deployed and done pretty early in the year. It brought smarter Googlebot crawling, including tricks like full gzip support and a crawl caching proxy that means less bandwidth usage for site owners.
We used the summer to swap in a completely new architecture for Supplemental Results. The core of that infrastructure is complete and fully deployed, but I’m sure we’ll see additional smaller changes (mostly making sure that queries off the beaten path such as site: do what people expect).
I believe site: results estimates should be more accurate at any IP address you try now. In mid-summer (while I was on vacation, in fact), people noticed that sometimes site: results estimates were too high. One change went in during mid-summer to make general results estimates more accurate, especially for shorter queries, but the change didn’t really apply to site: results estimates.
Happily, there was another piece of infrastructure going out that improved general quality and also made site: results estimates more accurate. I think I mentioned in the video that those folks were shooting to be live everywhere by end-of-summer/end-of-quarter, but it was a hope, not a promise. I believe that infrastructure was turned on at all data centers by last Friday (Oct. 6, 2006), which is pretty close. Most of the other quality improvements due to this infrastructure will be pretty subtle/stable, but it’s nice that site: results estimates are more accurate now.
Let’s see, what else? We just did a PageRank export, so I wouldn’t expect to see another export until the new year. The infrastructure that serves up PageRank in the Google Toolbar, link: data, info: queries, and “Similar results” is also new (surprise! ). I believe that’s the only piece of infrastructure I’ve mentioned so far that isn’t deployed at every data center, and relative to the other things I’ve mentioned, that infrastructure is smaller. The new infrastructure is live at about 2/3rds of data centers, and I’d expect it to roll out to all data centers within a month or two (again that’s a hope, not a promise). In the mean time, you may see some differences in PageRanks in the Google Toolbar depending on which data center you happen to hit.
I know that webmasters are especially sensitive to quality/webspam/ranking changes in Q4 because of the holiday season. If we’ve got something that evaluates well and that we think will improve quality, we can’t just pause for 1/4th of the year, but if anything big launches I’ll try to be available to answer questions and help get a handle on any changes. (Right now I’m not expecting radical changes in webspam ranking, but I know better than to make a promise.) Of course we’ll also be around at webmaster conferences. Several Googlers (including me) will be at PubCon in Vegas in November to talk to webmasters. Several Googlers (including Adam Lasnik and Vanessa Fox, but probably not me) will also be at SES Chicago in December to get feedback and answer questions too.
Okay, that’s everything that I can think of.