Okay, in a previous post I told a story about Google’s market share in early days, and mentioned that you have to think about the limitations of any measuring methodology. I briefly touched on sampling bias too. Let’s look consider sampling bias in a different arena: Alexa.
One possible source of skewing in Alexa data is a bias toward webmaster-y sites. Alexa shows how popular web sites are, so it’s natural that webmasters install the Alexa toolbar. Some do it just so that their normal day-to-day visits around the web (including their own site) are added to Alexa’s stats. The net effect is that webmaster-related sites are going to look more important to Alexa. Let’s take a look at a graph comparing mattcutts.com and ask.com:

For now, let’s concentrate on the green ellipse. This is a graph of reach, which is defined as “out of one million internet users, how many of them went to mattcutts.com vs. Ask each day.” If you look at the green ellipse, it shows that I had a spike in May and Ask had a dip in June. I believe Alexa was reporting that for at least a good day for me and a bad day for Ask, I was reaching more internet users as a percentage than Ask. (Alexa folks, please correct me if I’m mis-speaking or drawing the wrong conclusion.) And I believe that I can safely say that’s not remotely close to true. I have nowhere near the reach that Ask has.
I’m clearly getting some boost from webmaster bias because so many SEOs read my blog. Am I getting a boost from anything else? Well, look at the purple ellipse in the graph above. I got a really huge spike in reach around April 20th. Why? It’s not like I said anything especially insightful that week. I think the answer is that I’m getting a bit of geek boost too.
Others have noticed this impressive jump in late April, and that some non-geek sites remained unaffected. What on earth could account for this huge (but welcome) spike in my reach graph?
Jason Striegel proposed a possible explanation: maybe Digg did it. He suggests that a Digg story about Digg overtaking Slashdot in traffic caused a bunch of Diggers to install the Alexa toolbar–enough to skew Alexa’s stats. Now the Digg story was popular about a month before the Alexa spike–maybe there’s a near-one-month wait on accepting data from new Alexa toolbar installs? It’s hard to say, but that late-April spike is definitely interesting. I haven’t seen too many other theories on that boost for geeky sites. Anyone got other ideas?
Just to be clear: Alexa is wonderful in many ways, and I love Alexa. They provide easy access to nice usage data. You just have to keep in mind possible limitations, e.g. skewing due to sampling bias. And to be fair, I grabbed this Alexa graph a couple weeks ago: I went back today and the two “Matt vs. Ask” spikes don’t cross now. Maybe Alexa did some renormalization. That does raise the issue that any metric is a bit of a black box: you need to know the raw data used compute a metric, and exactly how that metric is computed. If you don’t know that, then there are bounds to how confident you can be in a metric.
So how do you decide how much to trust a metric? One way is to find another similar metric and compare the two. For example, here’s a graph comparing reach for mattcutts.com to zawodny.com:

Ha ha! Looks like I’m trouncing him, eh? Time to do a little Google Dance? Not so fast. Let’s look at a completely different metric which should be comparable: Bloglines subscribers. My RSS feed lists 1,136 subscribers, while Jeremy lists 5,096 subscribers. So by that metric, Jeremy is destroying me. And I suspect that Bloglines subscriptions are more accurate in this case.
Now, are Bloglines subscriptions perfectly accurate? Of course not. People who talk a lot about RSS and APIs probably are more likely to have RSS subscribers, for example. Also, different feed readers will have different audiences and demographics. And I noticed that over my six-week vacation that my Bloglines subscribers numbers didn’t budge. It’s probably true that even when web surfers visit a site less often, RSS subscriber numbers would remain nearly constant, because it’s more trouble to unsubscribe in most feed readers. So drops in popularity are probably more visible from web surfers than from RSS subscribers.
What are the takeaway points so far? You should think about the limitations in any methodology: bear in mind that sampling bias can under (or over!) represent a site, for example. To be completely sure in a metric, you need to know the raw incoming data and how a metric operates on that data to produce a number. And if you want to be more confident, look for similar metrics that should roughly agree. If different metrics agree, that’s a good sign. If they disagree, you should probably be cautious.