Okay, it’s been a while since my last infrastructure status report, so I’ll briefly cover the things that I know are going on. The executive summary is that things are relatively quiet.
The quarterly-ish PageRank export is underway. As always, don’t expect traffic or rankings to dramatically change, because these PageRank values are already incorporated into our scoring. The same quarterly-ish data push that updates PageRank in the toolbar also updates the data for related:, link: and info: (remember that operator?). You can read more about PageRank from this previous post if you swing that way. Also remember that the link: operator only shows a subsample of the links to a page that we know of. I’ve mentioned before that some data centers (I believe 64.233.183.xx and 72.14.203.xx) continue to show PageRank values from a slightly older infrastructure. Not a big deal, but I wanted to mention it for the hard-core data center watchers so that they don’t get confused.
There were some situations where site: would show supplemental results ahead of regular results. I believe we’ve changed that so that regular results will usually show ahead of supplemental results for site: queries.
As a reminder, supplemental results aren’t something to be afraid of; I’ve got pages from my site in the supplemental results, for example. A complete software rewrite of the infrastructure for supplemental results launched in Summer o’ 2005, and the supplemental results continue to get fresher. Having urls in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past. The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit).
I think going forward, you’ll continue to see the supplemental results get even fresher, and website owners may see more traffic from their supplemental results pages. To check out the current freshness of the supplemental results, I grabbed 20 supplemental pages from my site and checked out their crawl date using the “cache:” command and looking in the cached page header. The oldest supplemental results page that I saw was from September 7th, 2006 (and I only saw 2-3 pages from September; most were from December or November). The most recent of the 20 pages was from January 7, 2007, which shows that supplemental results can be quite fresh at this point.
Let’s see, what else? I think we’re going to change the “filetype:” operator so that it doesn’t require an additional query word, so that you could do filetype:doc or filetype:pdf or whatever. That isn’t live at this point, but I believe it will be down the road.
I’ve mentioned this before, but one of our data pushes that used to happen every 3-4 weeks is now happening more like every 1-2 days. Regular searchers won’t really notice this, but if you see more variance in your rankings, I believe it’s probably due to that data push happening more frequently.
An SEO or two has been holding my feet to the fire about root pages of .com’s that are hosted outside the US. Barry has talked about the issue a little bit here. In some (pretty rare) circumstances, you’ll see the root page when you search site:domain.com on google.co.uk for the “search the web” option but you won’t see the root page when you switch to “search pages from the UK”. I thought we’d nailed this issue in December, but we found another way that this can happen. I believe a fix has been submitted and is percolating its way through the system. Of the ~7 examples that I know of, I believe all but one is working now (and the remaining site is doing a chain of like five 302 redirects to weird/long/deep urls). However, if you 1) have a .com that is hosted outside the US, 2) searching on (say) google.co.uk for [site:yourdomain.com] returns your root page and all your pages for “Search the web”, 3) if you switch to (say) “pages from the UK”, the root page does not appear but the rest of your pages do, then this paragraph applies to you. I’d wait 4-5 days to let this second change percolate completely into our index, and if you still see the behavior after 4-5 days, please leave a comment with the name of your site.
Right now I’m not expecting any major infrastructure-related upheavals to our rankings. Should that change in the future, I’ll be here to talk about it then.
Update: A few people were seeing PageRank 0 for their site. There was a small auxiliary push that needed to happen to complement the PageRank push, and that push happened a few hours ago (i.e. Jan 11, 2007). If you were getting stressed, you might want to re-check now. If you never even noticed, well, good for you.