A word about metrics, part II

Okay, in a previous post I told a story about Google’s market share in early days, and mentioned that you have to think about the limitations of any measuring methodology. I briefly touched on sampling bias too. Let’s look consider sampling bias in a different arena: Alexa.

One possible source of skewing in Alexa data is a bias toward webmaster-y sites. Alexa shows how popular web sites are, so it’s natural that webmasters install the Alexa toolbar. Some do it just so that their normal day-to-day visits around the web (including their own site) are added to Alexa’s stats. The net effect is that webmaster-related sites are going to look more important to Alexa. Let’s take a look at a graph comparing mattcutts.com and ask.com:

Matt vs. Ask!

For now, let’s concentrate on the green ellipse. This is a graph of reach, which is defined as “out of one million internet users, how many of them went to mattcutts.com vs. Ask each day.” If you look at the green ellipse, it shows that I had a spike in May and Ask had a dip in June. I believe Alexa was reporting that for at least a good day for me and a bad day for Ask, I was reaching more internet users as a percentage than Ask. (Alexa folks, please correct me if I’m mis-speaking or drawing the wrong conclusion.) And I believe that I can safely say that’s not remotely close to true. I have nowhere near the reach that Ask has. ๐Ÿ™‚

I’m clearly getting some boost from webmaster bias because so many SEOs read my blog. Am I getting a boost from anything else? Well, look at the purple ellipse in the graph above. I got a really huge spike in reach around April 20th. Why? It’s not like I said anything especially insightful that week. I think the answer is that I’m getting a bit of geek boost too.

Others have noticed this impressive jump in late April, and that some non-geek sites remained unaffected. What on earth could account for this huge (but welcome) spike in my reach graph?

Jason Striegel proposed a possible explanation: maybe Digg did it. He suggests that a Digg story about Digg overtaking Slashdot in traffic caused a bunch of Diggers to install the Alexa toolbar–enough to skew Alexa’s stats. Now the Digg story was popular about a month before the Alexa spike–maybe there’s a near-one-month wait on accepting data from new Alexa toolbar installs? It’s hard to say, but that late-April spike is definitely interesting. I haven’t seen too many other theories on that boost for geeky sites. Anyone got other ideas?

Just to be clear: Alexa is wonderful in many ways, and I love Alexa. They provide easy access to nice usage data. You just have to keep in mind possible limitations, e.g. skewing due to sampling bias. And to be fair, I grabbed this Alexa graph a couple weeks ago: I went back today and the two “Matt vs. Ask” spikes don’t cross now. Maybe Alexa did some renormalization. That does raise the issue that any metric is a bit of a black box: you need to know the raw data used compute a metric, and exactly how that metric is computed. If you don’t know that, then there are bounds to how confident you can be in a metric.

So how do you decide how much to trust a metric? One way is to find another similar metric and compare the two. For example, here’s a graph comparing reach for mattcutts.com to zawodny.com:

Matt vs. Jeremy

Ha ha! Looks like I’m trouncing him, eh? Time to do a little Google Dance? Not so fast. Let’s look at a completely different metric which should be comparable: Bloglines subscribers. My RSS feed lists 1,136 subscribers, while Jeremy lists 5,096 subscribers. So by that metric, Jeremy is destroying me. And I suspect that Bloglines subscriptions are more accurate in this case.

Now, are Bloglines subscriptions perfectly accurate? Of course not. People who talk a lot about RSS and APIs probably are more likely to have RSS subscribers, for example. Also, different feed readers will have different audiences and demographics. And I noticed that over my six-week vacation that my Bloglines subscribers numbers didn’t budge. It’s probably true that even when web surfers visit a site less often, RSS subscriber numbers would remain nearly constant, because it’s more trouble to unsubscribe in most feed readers. So drops in popularity are probably more visible from web surfers than from RSS subscribers.

What are the takeaway points so far? You should think about the limitations in any methodology: bear in mind that sampling bias can under (or over!) represent a site, for example. To be completely sure in a metric, you need to know the raw incoming data and how a metric operates on that data to produce a number. And if you want to be more confident, look for similar metrics that should roughly agree. If different metrics agree, that’s a good sign. If they disagree, you should probably be cautious.

74 Responses to A word about metrics, part II (Leave a comment)

  1. Matt your such a show off =P

    Cool post. See you in San Jose!

  2. Whoa! u r doing amazingly! alexa refuses to draw a graaph for my site. lol.

  3. how long until we see google traffic rank? Combine the pagerank, search clicks, alexa data, and optional: bloglines subscriptions, analytics data into a nice “robust and scaleable” 1-10 red bar for my toolbar?

  4. Hi, Matt!

    I had the identical spike you had, Google traffic peaked on May 17, had been building since May 15, then after May 17 it fell like a lead balloon.

    Today another spike seems to be occurring. Our sites are very different, but how Google views them is the same. I would like to suggest that the spikes are a result of something happening at Google rather than exposure of our sites ( my Google referrals went from 12% to 76% of my traffic, also with the Jagger2 update in late October ’05). Every time I see a spike, I think my online business will be saved from the Google nukes and will be fine again (dead since July 22, 2005 except for a few spikes).

    Any incite will be very welcome!
    Thanks again,
    Lu

    So, the question is, what did Google do in May and what did they do today or in the past few days?

  5. I believe this is extremely relevant when trying to balance paid and organic search.

    If you have the #1 spot organically & the top PPC position… How much of the free traffic is being diverted to the paid result? If most of the PPC traffic would have been free anyway, than how can you believe the ROI metrics from that PPC placement or evaluate that type of performance on the same level that you would for keywords that you do not rank well for?

  6. It would be nice to see a traffic graph from your web logs and compare them to Alexa. Granted, we don’t need to see the actual number. It would just be nice to see if the two graphs match in terms of changes in traffic. Does Alexa spike when your logs show consistent traffic? Maybe your logs spike and it doesn’t get picked up by Alexa?

  7. I agree with Ryan. Google could create data services with metrics far more accurate than Alexa.

    Just off the top of my head – potential sources for market share, traffic, clickstream aggregated numbers include:
    – Google.com
    – Google toolbar
    – AdSense
    – Google Analytics
    etc.

    That’d be a great 20% project

  8. Hey Matt–

    Interesting topic. I have been wondering about this sort of thing myself. I have had a large leap Alexa rankings since the beginning of the month. I believe my content is a little improved, but not overly. I’m posting less, not more. I’ve had a couple great things happen bbut not something that would justify a continually increasing Alexa rank, I don’t think. So what is it?

    Personally, I feel that the ranking increase is as a result of the Firefox Alexa toolbar that became quite the rage early in the month. Before, with only trhe IE toolbar reporting back metrics, Firefox users were completely disenfranchised. I think at this point, with FF users back in the loop, the Alexa ranking suddenly becomes relevant again.

  9. Matt, interesting analysis.. I would love to see a graph and more metrics between mattcutts.com, zawodny.com and scobleizer.

    Based on this, I think it would be prudent to see spike rates /visitage (is there such a word?) on the actual events happening in the technological areana.. not just blog content and what you blogged !!

    Anther face is that we must also use trends to see how each of you actually “trend out” with that componenet.. after all they must agree to some degree to alexa correct ??

  10. Hi Matt,

    I think Big Daddy and all of the subsequent problems may have something top do with the spikes.

    Itโ€™s nice to see that you guys have fixed one of the bugs that was introduced during the 27th June โ€œrefreshโ€. Many people are now reporting that their sites – that were suddenly banished to the bottom of the rankings on 27th June – have now recovered their former rankings equally suddenly, precisely one month later. So thatโ€™s one of the bugs fixed.

    But there is at least one more bug outstanding. This bug has exactly the same effect, but evidently an entirely different cause. Are you aware yet of the remaining bug or did this simply get lost in the furor of the other one? If you are aware of it, do we now have to wait another month for a fix? Or, if you are not aware of it, how can we bring this to your attention?

    Many Thanks.

  11. Excellent Post! I’ve just started blogging much more heavily, and watching my numbers via Google Analytics has been interesting. It’s hard to make too much sense out of the numbers, though, and your post helps explain why. I’ll keep posting, while I try to figure out how to best tell where and how I can have an impact. ๐Ÿ™‚

  12. Interesting… makes sense Matt. I’ve talked about this for a while actually, how Alexa’s data isn’t relevant!

    We’re all waiting for Google to implement a better system!

  13. Speaking of Google analytics… does anybody have an invite.. I’ve been waiting since day one to try it out.. (email me if you have one)

    Or a trusted tester invite, that’d be even better as I have tons of ideas for how to improve just about every google service there is (and a few new ones I’ve dreampt up)… but i’ll settle for trying out analytics.

  14. I would attribute the burst of web developer sites to the intoduction of SearchStatus: http://www.quirk.biz/searchstatus/

    Take a look at the Alexa traffic details for MattCutts and DigitalPoint.com:
    http://www.alexa.com/data/details/traffic_details?compare_sites=mattcutts.com&range=6m&size=medium&y=r&url=DigitalPoint.com

  15. I wouldn’t say the Alexa data isn’t relevant… it’s scewwed like Matt said and you can’t trust the numbers but the trends are worth watching and generally accurate. For example, a spike in traffic at alexa is usually accompanied by a spike in my site logs as well and usually within 5% accuracy as far as growth by percent.

    But the base numbers are complete crap.

    And I am talking about an SEO site where many visitors are likely to have alexa’s toolbar installed. But even on niche catagories that are non-techie I see the trends are pretty close… maybe within 10% accuracy (again the trend, not the base numbers).

    Anyway, good point Matt – statistics is the third lie after all.

  16. Some actual number comparisons would sure help here – I bet Jeremy would share his with you. I’ve noted major weirdness with Alexa’s blog reports vs regular site stats and wonder about how much of this is the webmaster bias you talkin’ bout here and how much other glaring Alexa deficiencies.
    Perhaps a neat Google project would be to examine traffic at several nodes and generalize about rankings from that information. Seems that would be far more accurate than either Hitwise’s or Alexa’s approach.

  17. I would like to see you do a series of Metrics articles, part III, part IV, part V, etc, as there’s plenty of stuff to look at.

    You want to see the raw data? the Raw Alexa data? that might be nice but it’s kinda hard to look at I suspect.

    A couple of comments here are right on the money – maybe with all the stuff Google does – it can also find the time to create a better set of metrics than what we get from Alexa and company.

  18. Hi Matt

    My guess in that the post in May about “indexing-timeline” caused a huge spike in May and happens to be PR7 page unlike most which are PR6. The small spike in the begining of Feb was the BMW fiasco

  19. Nice one

    But do you know why my home page has disappeared from SERPS but everything else is AOK?

  20. I saw that bump on Digg you mentioned and a similar bump for DigitalPoint Forums and was wondering what was up with that. Your explanation of Digg users running out to install Alexa makes a lot of sense. Thanks for posing a plausible explanation to that odd bump. ๐Ÿ™‚

  21. I wish I had a witty response…

  22. So by the same effect it could be said that webmasters prefer Yahoo and MSN.
    That’s a scary thought; six years ago webmasters preferred Google while the general population flocked to Yahoo. That soon changed. So if the same rule of history applies to this principle than the general population will quit using Google in a couple years.

  23. So by the same effect it could be said that webmasters prefer Yahoo and MSN.

    I’m not sure if anyone else is confused by this, but that’s a bit of a quantum leap from where I sit.

    There is one thing that somewhat confuses me in all of this, Matt: while the statistical measures posed by Alexa and Bloglines provide some nice general insights, how come you don’t seem to measure the overall success of your site by a combined metric based on the factors that specifically relate to your site? Regular posters, traffic logs, people who offer you free stuff for a #1 ranking, things like that. (Or do you and it’s not a formula you want to reveal to the general public?)

  24. Are there any examples of accurate metrics?

  25. Those green on black Texas-Instrument-number site counters that go up by one every time someone refreshes the page. ๐Ÿ™‚

  26. Ryan, you might try requesting a Google Analytics account. I was under the impression that the backlog was being worked off.

    Aaron Pratt, I think that there are more inaccurate metrics in our space than accurate. ๐Ÿ™‚ It’s just a consequence that everyone sees a different audience and a different chunk of the web.

    I do like that we provide Google Trends; I doubt that we’d provide Alexa-like data though.

    Eli, I missed your thinking on that one?

    Jeremy, I said that you were kicking my ass, blogwise. So I pretty much trashed myself for you. ๐Ÿ™‚

  27. I haev to agree with what Aaron said in his comments… “Weโ€™re all waiting for Google to implement a better system!”. ๐Ÿ™‚

    I use Google Analytics too, so Ryan, if you still want an invite, let me know. Although, like Matt said, it is quite readily available now. It didn’t take me long to get an invite from Google once it launched.

  28. The quality of the traffic is just as important as having MASS Traffic of Casual Users

    The Webmaster/SEO/Marketing users who visit certain TYPES of sites have a very valuable potential to advertisers who could push a certain type of product or service.

    Using AdSense/ AdWords on this blog and donating the money to a very reputable charity would have a least two benefits….

    —> The first one is so obvious ……….

    —> The second would allow personalized evaluations of click fraud and would make a great topic

    _________________________________________

    In reference to Digg – one EXCELLENT technique is to study the Alexa rankings for the non tech vrs tech Digg stories that reached the front page – and for the top users…

    Also there are problems with how ALEXA evaluates their Metrics

    According to this SearchEngines (WEB) is doing better that ANYONE

    http://www.alexa.com/data/details/traffic_details?&range=2y&size=large&compare_sites=www.mattcutts.com/blog&y=r&url=http%253A%252F%252Fdigg.com%252Fusers%252FSearchEngines%252Fhomepage

    Oh, Search Engines WEB! ๐Ÿ˜€

  29. I wasn’t really making a point, just a little unfoundable observation.
    Just to be stupid I’ll slightly elaborate.
    “Iโ€™m clearly getting some boost from webmaster bias because so many SEOs read my blog.” If Alexa has a bias torwards webmasters, and Google controls a larger percentage of users than Yahoo. Without going into too much nitpicking(such as yahoo has more entry pages) it be said that webmasters prefer Yahoo. My observation was simply that the history of Google has already proven that webmasters and techies eventually set the trends the rest of the masses follow in the search industry. Therefore the current Alexa data could be a bad sign for Google’s future.

    There is really only one hope for a decently solid metric. Large stat programs such as Awstats deciding to invade privacy and starts storing site traffic.

  30. There is really only one hope for a decently solid metric. Large stat programs such as Awstats deciding to invade privacy and starts storing site traffic.

    Now that MS bought DeepMetrix, that may happen sooner than you think.

  31. Dave (Original)

    ALEXA! Oh my goodness. Matt, why the slip into fiction land?

  32. http://www.kevinjb.com/2006/07/20/why-alexa-rankings-dont-matter/

    You totally stole this (much more comprehensive) post from my blog. I’m on to you…

  33. I have a little homily about metrics, too.

    Back in the old days, I was at a search engine, and we watched our Media Metrix numbers; the de facto measurement of who’s who, then. Companies lived and died by their ranking. We competed with Inktomi, the big gun of the day; both companies were the source of many of the search results you actually saw when you searched on popular search engines.

    In those days, we were trying to make it easy search engines to incorporate our search results in their websites. Sure, we had XML, which was pretty cool for then (1999), but XML was too hard to implement. So we made a little dynamically generated GIF that drew a clickable image of the words “See the 10 Most Popular Results for Widgets”. Since webmasters only had to add a single snippet of HTML that referenced his GIF on our site, numerous search engine first page results were showing our little image. And our search results were pretty good (better than Google’s for a while), so a lot of people were clicking the image where they viewed our results on our domain on a page we designed to look just like the search engine’s page.

    At one point, we were, according to MediaMetrix, in the top 3 websites. We were getting credit for clicks on our GIF for many of the most popular search engines of the day (MSN, Lycos, Hotbot, AV, and many others). Our marketing department was thrilled. Then, one of our bigger customers moved to use our technically superior XML interface; instead of us showing a co-branded results page on our domain, they rendered the same results, but on their domain. Next month’s MediaMetrix numbers completely tumbled! Marketing was distraught. My logs showed that people were seeing even more of our data every day, and we were getting paid for pages served so revenue was great. But our “numbers” were tanking.

    The numbers were bogus, much like everything else those days. Yet they were God’s own truth to most people buying and selling anything worth anything on the web. The world has gotten a little more sophisticated since then, but not really that much.

    Every metric we see is probably an accurate measure.

    The question is: of what?

    Tom

  34. I often find blogs with few backlinks done by extremely smart people who just do not care to know about marketing, links and metrics. These people if found and presented to others blow many of those who falsely inflate their value out of the water.

    The same goes for popularity contests that can be easily gamed (see Digg any day of the week), they just do not present the real picture. I would be happy if Google would nofollow that entire site or atleast decrease it’s value in their algorithm.

  35. There’s a simple way you could go from guessing to knowing (or at least knowing a bit more): was there any change in the percentage of Alexabar users in that time-frame? Come on Matt, 3rd party numbers based on unknown populations surely can’t be trusted, what else is new? Open your server log files, check your Google Analytics account: plot the percentage of Alexa users over time, plot the absolute unique traffic you had, compare those to the graphs from Alexa. Do you or do you not have a rise in the percentage of Alexa users?

  36. Well Matts Do u think these are the valid outputs, I don’t think so. Alexa results are its tool bar dependent and If I say that the usage of Alexa toolbar in asia specific is quite low and the there are millions of user from asia u knows better. U can find out this from your analytics.
    Alexa have good positioning in market and I got sick ๐Ÿ™ expalining the facts to my clients. They prefers Alexa a lot.

  37. Spikes in Alexa can be caused by newletters or similar sources. If one of your posts contained something of interest to many more than just the average geek then you can have a spike. If you use email marketing, your alexa data can spike too.

  38. Hi Matt,

    This skew is why I stopped trusting Alexa. As for the spike, could it co-incide with the Big Daddy update? When I was scratching around for answers I was in your site a lot. Presumably a lot of other SEO’s did the same.

  39. Great post Matt.

    One skew that you didn’t mention and is worth commenting on is that sites which have a disproportionatly high number of visitors using Macs are under represented in Alexa data, simply because there’s no Mac version of the Alex toolbar. Likewise, there’s no Firefox version, although there are at least options for Firefox users.

    Given that both the Mac OS and Firefox are growing in share, albeit very slowly, this skew is likely to become greater as time goes on.

  40. Dave (Original)

    I have managed to jump over 10,000 places simply by having my Mum, Dad as Sister install their silly toolbar and go to my site.

    Alexa data is purely fictitous and cannot be used for anything useful.

  41. I have two sites with very different visitors. One has a lot of housewives and regular people. The other has a lot of spamfighters and SEO folks. The one with SEO folks is killing the others in Alexa stats. But the one with housewives has double the bandwidth usage…

  42. Let’s put things in perspective. Your site attracts SEO people — those who artificially boost Alexa ranks. Your Alexa rank is 1340th whilst Netscraft shows 15,590th.

    But nonetheless: your site rocks!

  43. I am curious to see if this article (and your corresponding link to Alexa) bring another spike for you in about a month or so… Perhaps you can keep us updated.

  44. When I first learnt about Alexa, I used to use the site every day, however the results that it spits out are no good to me or the website owners that I work for.

    How can you make any use of something that is not equally used or that does not even hand out results that you can actually calculate proper results. If Alexa could come up with a new way of gaining the results without the tool bar, maybe it may gain some credibility back with in the web community.

  45. Um, I don’t think we need a part 3, 4 or 5. This is just basic statistics friends! Take a class at your local JC.

  46. Dave (Original)

    RE: “This is just basic statistics friends!”

    Not, it’s Alexas fictitious take on the Web. Nothing to do with “statistics” IMO ๐Ÿ™‚

  47. On the subject of Alexa Graphs, you might like to play with my Online Dynamic Alexa graph, it’s a little faster and easier than going to Alexa’s traffic details page:

    http://www.iconico.com/workshop/dynamicAlexa

    Nico Westerdale
    http://www.iconico.com

  48. I don’t know how long i need to study to have traffic like that ๐Ÿ™

  49. is it true that domain registration status affects seo? for example if i have a domain that will expire in 9 months compared to a domain that will expire in 4 years. Will the domain expiring in 4 years get better rankings?

  50. Yes Matt, congratulations! And the best thing is that you don’t need to use SEO, because your blog is really good for us!

  51. Well, it’s 2006, and the large site I work for lives and dies by its Nielsen NetRatings and Comscore numbers because that’s all the major ad agencies seem to care about…

    We use Omniture as our web analytics vendor, but the ad agencies aren’t interested in a unique visitor number that’s reported by an analytics service.

    How do you guys sell ads on your sites?

    Thanks.

  52. It’s an interesting reading. May be Alexa should apply some like they do in handicap horse races – so that the ratings are fair and even.

  53. Alexa does imo give you a very good rating as to what iot is webmasters are looking at. The general public on the other hand wont have ever heard of it much less use the tool bar. For this reason I personally find it a very good tool. If a webmaster is looking at your site there is a good reason for it.

  54. Hi Matt,

    I think Alexa is very much biased. I am saying so coz, Alexa results are based on people entering site through their toolbar. It cannot probably get even 90% correct.

  55. But Alexa is still useful to get a rough estimate of the site’s traffic comparing to other sites.

  56. I guess if you are popular, YOU ARE POPULAR!

  57. Alexa info is so skewed and it depens on how many people install the toolbar as well.

  58. I wish my website could have half of your traffic

  59. MAtt you are the best! Thank you for all your posts.

  60. Thanks for the lengthy and informative post. Waiting more!

  61. Give me a quarter of your traffic and I’d be happy. And that page rank… stellar.

    Goes to show only CONTENT can drive visitors. Not all that black hat shmackhat

  62. I’ve been using Google analytics. Is the Alexa ranking system worth learning?

  63. I think we’ve all learned by now that the various “ranking” and “reporting” systems are far from perfect. But it certainly doesn’t hurt to get an ego boost once in a while!

  64. ipod touch: No, many people don’t really care about Alexa rankings b/c it’s based on those that use it’s toolbar — I surely don’t.

  65. I agree with Mark, Alexa is pretty much useless.

  66. Alexa is dead. Google tells you all you need to know.

  67. Does anyone still use Alexa? Is it helpful? Does it offers anything that I can’t get through other ranking sites, just wondering for my SEO stuff on my sites.

  68. Alexa does imo give you a very good rating as to what iot is webmasters are looking at.

  69. Can someone give me a link to a page that explains how Alexa works?

  70. Alexa and Google pageranking both sucks. What matters is the # of backlinks you have to your website.

  71. Alexa definitely has some useful stuff, but both google and alexa have their strengths and weaknesses.

  72. Blog ranking sites are starting to abandon Alexa ranking and many webpages that I’ve seen give presentations at conferences always try to back up evidence of their popularity with more than just Alexa (often it’s google pagerank as well). I think soon Alexa might kick the bucket in terms of respect.

  73. It’s quite interesting that you get to catch the small examples like the small spikes that catches Alexa’s Metric system. Also, you got a copy of your graph lowered at another website.

    Among billions of sites, i mean how do you find such examples!!!

css.php