Archive for May, 2006

Danny in town next week

Danny Sullivan is going to be out in California next week, so he’s agonna stop by the Googleplex and say hello. I believe we’re going to do a long edition of the Daily SearchCast. Although I think Detlev may do the official SearchCast that morning, so a better title might be “Matt and Danny sit around and talk search for 45 minutes to an hour.” :) Listen in, um, I think Tuesday, May 16th at 4 p.m.-ish Pacific time on webmasterradio.fm, or I’m sure it will be archived. Remind me to ask Danny to do longer SearchCasts (25 minutes is great, or else I run out of SearchCast during my commute). Plus I’d like more sighing. Hearing Danny sigh at the quotidian adrenaline of search always brightens my day. Maybe I’ll do a DaveN impression; who knows? :)

The nice thing about being in the blogosphere is that you can read things from other peoples’ eyes. I wasn’t able to go to the Chicago SES last year, so I was enjoying Vinny Lingham’s coverage of Danny’s keynote. I get to read what Danny said, and I get to read Vinny’s take on it too. :)

Update: Speaking of the SearchCast: Daron, where’d you put the 5/6, 5/7, 5/8, and 5/9 editions? I gotta drive into the city tonight, man. You’re not going to make me listen to the radio, are you? :)

Comments (13)

MEGO

(quick post. I’m doing the write fast thing to get me back in the blogging habit.)

I had some wonderful teachers in high school. One of them, my English teacher, used the acronym MEGO to stand for My Eyes Glaze Over. MEGO applies to something so technical that most people don’t care. But it’s my blog, so if you don’t want some MEGO, go elsewhere. :)

The hardest part of getting technical feedback from folks outside Google is deciding which stuff to dig into; there’s only 24 hours in a day. Plus you also don’t want to burn cred by bugging someone only to find out that it’s a non-issue. An SEO that I would trust (well, not trust exactly. :) listen to, maybe :) ) complained about something, so I took it to the crawl team, and they dug into it enough to produce the raw docs as they were fetched by Googlebot, and it looked like Google was doing the right thing. I mentioned that to the SEO, who dug into it more on their side. It turns out that the SEO was looking for Googlebot’s old user-agent string instead of the newer user-agent string, so it was on the SEO’s side. Not an issue on Google’s side, and the SEO knows who they are. ;)

Another example that happened this week was when I read Graywolf’s post about Google showing content from domain A on domain B. Graywolf gave three examples of sites getting confounded, with screenshots (always a helpful idea), but every example was from the same IP address. Now if you’re an experienced search person, two sites on the same IP getting confounded makes you think of one explanation: the webhost configured virtual hosting wrong. Right? Can I get a “w00t” from the back there? Cool, thanks.

So I was all set to dismiss this. But then I noticed that the confounding happened over a long period of time (usually virtual hosting errors don’t last long, because people notice and complain to the webhost), and none of Yahoo!/MSN/Ask were showing confounded domains for the example Graywolf. That wasn’t good, so I reported it to the crawl/index team.

Have I mentioned how much I respect our crawl/index team? They do a lot of heavy lifting everyday and they do it so well that few people notice how much is getting done. So that team checked it out and they were able to reproduce a bug on the webserver using only telnet. That means it wasn’t Google’s fault, but last I saw in the discussion, people were talking about ways to make Googlebot even smarter to work around that bug when they can tell the webserver might be affected.

Ah, heavy MEGO. Why didn’t Y/M/A see confounded domains? Well, Googlebot is pretty smart. It can utilize something called persistent connections to a webserver via a Keep-Alive header. If mattcutts.com and shadyseo.com are both on the same IP address, Googlebot can open up a connection and request a page from mattcutts.com, then on the same connection ask for a page from shadyseo.com. That’s more polite on the webserver because you don’t have to break down and set up a whole new connection for every page. As the Apache docs mention, “These long-lived HTTP sessions allow multiple requests to be send over the same TCP connection, and in some cases have been shown to result in an almost 50% speedup in latency times … .” As far as I can tell, bots from other engines probably open and close a connection for every page; that’s why they didn’t see this particular behavior in the webserver.

Interestingly enough, this bug was first mentioned in 1997: ” When using keepalives and name-based virtual hosts, requests on a keptalive connection can get a response from the wrong virtual host.” I guess that there’s some pretty old webservers out there; like I said, the crawl team was last seen talking about workarounds for ancient servers that have this bug.

So that’s two examples in the last week where I asked someone to dig deeper and it was an issue outside of Google. That said, it’s essential to keep reading feedback from forums and the blogosphere. For example, we’ve been refreshing some of our supplemental results, and the feedback that we’ve gotten has helped us find a couple different ways that we could make the site: operator more accurate with the newer supplemental results. GoogleGuy put out a call for feedback about that particular issue on WebmasterWorld, and someone on my team has been reviewing the feedback to find any other issues to pass on.

Comments (58)

snuffle snuffle

If you’re wondering why no posts for a few days, it’s because 1) I’ve been prepping for an internal meeting tomorrow, and 2) I caught a case of snarkitis this past week; everything I started to write came out slightly snarky. I composed 4-5 blog posts in my head, and actually wrote 2-3 of them. But after sleeping on it and re-reading them, I didn’t like how they came across, so I’m not posting them. I look forward to a full recovery from the snarkitis (and the meeting) in a day or two. :)

Comments (77)

Next entries »