(quick post. I’m doing the write fast thing to get me back in the blogging habit.)
I had some wonderful teachers in high school. One of them, my English teacher, used the acronym MEGO to stand for My Eyes Glaze Over. MEGO applies to something so technical that most people don’t care. But it’s my blog, so if you don’t want some MEGO, go elsewhere.
The hardest part of getting technical feedback from folks outside Google is deciding which stuff to dig into; there’s only 24 hours in a day. Plus you also don’t want to burn cred by bugging someone only to find out that it’s a non-issue. An SEO that I would trust (well, not trust exactly. listen to, maybe ) complained about something, so I took it to the crawl team, and they dug into it enough to produce the raw docs as they were fetched by Googlebot, and it looked like Google was doing the right thing. I mentioned that to the SEO, who dug into it more on their side. It turns out that the SEO was looking for Googlebot’s old user-agent string instead of the newer user-agent string, so it was on the SEO’s side. Not an issue on Google’s side, and the SEO knows who they are.
Another example that happened this week was when I read Graywolf’s post about Google showing content from domain A on domain B. Graywolf gave three examples of sites getting confounded, with screenshots (always a helpful idea), but every example was from the same IP address. Now if you’re an experienced search person, two sites on the same IP getting confounded makes you think of one explanation: the webhost configured virtual hosting wrong. Right? Can I get a “w00t” from the back there? Cool, thanks.
So I was all set to dismiss this. But then I noticed that the confounding happened over a long period of time (usually virtual hosting errors don’t last long, because people notice and complain to the webhost), and none of Yahoo!/MSN/Ask were showing confounded domains for the example Graywolf. That wasn’t good, so I reported it to the crawl/index team.
Have I mentioned how much I respect our crawl/index team? They do a lot of heavy lifting everyday and they do it so well that few people notice how much is getting done. So that team checked it out and they were able to reproduce a bug on the webserver using only telnet. That means it wasn’t Google’s fault, but last I saw in the discussion, people were talking about ways to make Googlebot even smarter to work around that bug when they can tell the webserver might be affected.
Ah, heavy MEGO. Why didn’t Y/M/A see confounded domains? Well, Googlebot is pretty smart. It can utilize something called persistent connections to a webserver via a Keep-Alive header. If mattcutts.com and shadyseo.com are both on the same IP address, Googlebot can open up a connection and request a page from mattcutts.com, then on the same connection ask for a page from shadyseo.com. That’s more polite on the webserver because you don’t have to break down and set up a whole new connection for every page. As the Apache docs mention, “These long-lived HTTP sessions allow multiple requests to be send over the same TCP connection, and in some cases have been shown to result in an almost 50% speedup in latency times … .” As far as I can tell, bots from other engines probably open and close a connection for every page; that’s why they didn’t see this particular behavior in the webserver.
Interestingly enough, this bug was first mentioned in 1997: ” When using keepalives and name-based virtual hosts, requests on a keptalive connection can get a response from the wrong virtual host.” I guess that there’s some pretty old webservers out there; like I said, the crawl team was last seen talking about workarounds for ancient servers that have this bug.
So that’s two examples in the last week where I asked someone to dig deeper and it was an issue outside of Google. That said, it’s essential to keep reading feedback from forums and the blogosphere. For example, we’ve been refreshing some of our supplemental results, and the feedback that we’ve gotten has helped us find a couple different ways that we could make the site: operator more accurate with the newer supplemental results. GoogleGuy put out a call for feedback about that particular issue on WebmasterWorld, and someone on my team has been reviewing the feedback to find any other issues to pass on.