Help Marcela

On a more serious note, if you’ve been an SEO for a while and have been to conferences or if you visited WMW in the last few years, you probably got to meet Marcela DeVivo (aka 2_Much). Marcela helped a lot of people and always held my feet to the fire about doing more in the Hispanic market.

Now Marcela could use some help of her own. Her unborn son, Nathan Dorje Andrew, was recently diagnosed with serious brain damage. This has got to be a nightmare for an expecting mother, and she asks for your prayers. If you want to help in other ways, you can also donate money (thanks for making that page, grnidone).

I’m wishing the best for you and Nathan, Marcela.

(Via Threadwatch and Oilman.)

Update: Jeremy is matching funds if you’re thinking about donating.

SEO Answers on Google Video

Okay, I took at stab at answering several questions on Google Video. Here’s what I did in chronological order:

As you can probably tell, I shot these videos at home this weekend. I wanted to make a video so that I could give a little more detail and nuance than you’d get from reading a sentence or two. My video camera has an internal hard drive so when you connect it to a computer with USB, it looks like a hard drive with mpeg2 files. All I did was upload the files directly to Google Video from a web browser. The videos are rough because I didn’t do any editing; I just shot single takes and copied the files up.

Let me know if these are helpful, and if they are, I’ll try to make more.

Debugging Dean

Battelle points to this post by Steven Johnson. Mr. Johnson just had a boy, named him Dean, and asked his friends to point to that post. Battelle asks because the post was in the top 10 for the search [dean] for a while, and then it wasn’t.

By the time I checked today, it was back in the top 10. Bear in mind that rankings do change all the time, especially for bloggy items. One common reason is when multiple people blog about something, it’s often on the front page of their blogs, and front pages typically have the highest PageRank. As stories scroll off the front page of peoples’ blogs, there’s often a dip in ranking because the links tend to be from deeper pages with lower PageRank.

There can be other reasons why a page temporarily drops, too. I saw at least one day when we weren’t able to include the page in the index (could have been that the server was unreachable for a short time). When a server is down for a short while, we normally recrawl the pages again the next time, and then the page often returns to roughly where it was before. Our webmaster console in Sitemaps is a great place to see errors like that.

On the other hand, is that really the post you’d like Dean-watchers to find? Can you imagine when Dean is 13, and mortified that one of the top results is baby pictures of him? “Dad, you might as well have posted picture of me in a bathtub! Why do you always have to embarrass me?” :)

Grabbag Friday

My wife left me–temporarily. She’s going to China for two weeks, along with my mother and my mother-in-law. It sounds like the beginning of a bad joke, doesn’t it? But it’s true. And my forlorn, lonely, bereftness-osity means: it’s time for webmaster questions again!

Same guidelines apply as last time:

Ask whatever you want. I’ll tackle a few of the questions that are general. Please make sure you read the most recent comment guidelines so you know to avoid “what’s up with my specific site?” or other questions that won’t apply to most people.

Comments that ignore the comment guidelines will be pruned. I’ll add a couple more requests. First, please don’t ask me about legal stuff (“Dammit Jim, I’m an engineer, not a lawyer!”). And once someone has asked, “Dude, what’s up with topic X?” please don’t repeat the question–once is enough. I’ll let questions stream in today and tackle some of them this weekend.

A word about metrics, part II

Okay, in a previous post I told a story about Google’s market share in early days, and mentioned that you have to think about the limitations of any measuring methodology. I briefly touched on sampling bias too. Let’s look consider sampling bias in a different arena: Alexa.

One possible source of skewing in Alexa data is a bias toward webmaster-y sites. Alexa shows how popular web sites are, so it’s natural that webmasters install the Alexa toolbar. Some do it just so that their normal day-to-day visits around the web (including their own site) are added to Alexa’s stats. The net effect is that webmaster-related sites are going to look more important to Alexa. Let’s take a look at a graph comparing mattcutts.com and ask.com:

Matt vs. Ask!

For now, let’s concentrate on the green ellipse. This is a graph of reach, which is defined as “out of one million internet users, how many of them went to mattcutts.com vs. Ask each day.” If you look at the green ellipse, it shows that I had a spike in May and Ask had a dip in June. I believe Alexa was reporting that for at least a good day for me and a bad day for Ask, I was reaching more internet users as a percentage than Ask. (Alexa folks, please correct me if I’m mis-speaking or drawing the wrong conclusion.) And I believe that I can safely say that’s not remotely close to true. I have nowhere near the reach that Ask has. :)

I’m clearly getting some boost from webmaster bias because so many SEOs read my blog. Am I getting a boost from anything else? Well, look at the purple ellipse in the graph above. I got a really huge spike in reach around April 20th. Why? It’s not like I said anything especially insightful that week. I think the answer is that I’m getting a bit of geek boost too.

Others have noticed this impressive jump in late April, and that some non-geek sites remained unaffected. What on earth could account for this huge (but welcome) spike in my reach graph?

Jason Striegel proposed a possible explanation: maybe Digg did it. He suggests that a Digg story about Digg overtaking Slashdot in traffic caused a bunch of Diggers to install the Alexa toolbar–enough to skew Alexa’s stats. Now the Digg story was popular about a month before the Alexa spike–maybe there’s a near-one-month wait on accepting data from new Alexa toolbar installs? It’s hard to say, but that late-April spike is definitely interesting. I haven’t seen too many other theories on that boost for geeky sites. Anyone got other ideas?

Just to be clear: Alexa is wonderful in many ways, and I love Alexa. They provide easy access to nice usage data. You just have to keep in mind possible limitations, e.g. skewing due to sampling bias. And to be fair, I grabbed this Alexa graph a couple weeks ago: I went back today and the two “Matt vs. Ask” spikes don’t cross now. Maybe Alexa did some renormalization. That does raise the issue that any metric is a bit of a black box: you need to know the raw data used compute a metric, and exactly how that metric is computed. If you don’t know that, then there are bounds to how confident you can be in a metric.

So how do you decide how much to trust a metric? One way is to find another similar metric and compare the two. For example, here’s a graph comparing reach for mattcutts.com to zawodny.com:

Matt vs. Jeremy

Ha ha! Looks like I’m trouncing him, eh? Time to do a little Google Dance? Not so fast. Let’s look at a completely different metric which should be comparable: Bloglines subscribers. My RSS feed lists 1,136 subscribers, while Jeremy lists 5,096 subscribers. So by that metric, Jeremy is destroying me. And I suspect that Bloglines subscriptions are more accurate in this case.

Now, are Bloglines subscriptions perfectly accurate? Of course not. People who talk a lot about RSS and APIs probably are more likely to have RSS subscribers, for example. Also, different feed readers will have different audiences and demographics. And I noticed that over my six-week vacation that my Bloglines subscribers numbers didn’t budge. It’s probably true that even when web surfers visit a site less often, RSS subscriber numbers would remain nearly constant, because it’s more trouble to unsubscribe in most feed readers. So drops in popularity are probably more visible from web surfers than from RSS subscribers.

What are the takeaway points so far? You should think about the limitations in any methodology: bear in mind that sampling bias can under (or over!) represent a site, for example. To be completely sure in a metric, you need to know the raw incoming data and how a metric operates on that data to produce a number. And if you want to be more confident, look for similar metrics that should roughly agree. If different metrics agree, that’s a good sign. If they disagree, you should probably be cautious.

css.php