Estimating webmaster skew in Alexa metrics

You may have heard of Peter Norvig, who is the Director of Research at Google. He’s the guy that did the Gettysburg Address as a PowerPoint presentation. Or you might have used his artificial intelligence textbook in college.

Recently Peter used several folks’ logs (including mine) as a baseline to estimate the skew in Alexa due to the self-selection bias of webmasters or SEOs installing the Alexa toolbar. The results are really interesting:

The difference is quite profound. For example, I get about twice the pageviews of mattcutts.com, but his Alexa pageview ranking is about 25 times more than mine … What that means is that people with the Alexa toolbar installed are 25 times more likely to view a page on Matt’s site versus mine, but overall, all users view twice as many pages on my site. That’s a 50 to 1 difference introduced by the selection bias of Alexa.

Read the entire post for more details. Something to think about when you use any metrics that allow for self-selection.

54 Responses to Estimating webmaster skew in Alexa metrics (Leave a comment)

  1. Certainly there is a skew towards webmaster related sites with Alexa and tech related sites in general. Case in point – does anyone really believe that digg is the 80th most popular site in the world as Alexa claims? There are simply not enough Apple and Kevin Rose fans in the world to make that possible. 🙂

    Back to the original point though … Compete shows a slightly more balanced view of traffic between Peter’s site and your site Matt:
    http://snapshot.compete.com/mattcutts.com+norvig.com

    Perhaps that is slightly closer to the truth?

  2. Yeah, Alexa is amazingly off in a lot of cases – more than anyone would normally imagine, looking at all those exact lines updating every day… Although, I find Alexa’s data is useful in other ways, such as this mashup with del.icio.us: http://www.delexa.org/ – kind of like Wikipedia, it works best for simple questions (is this site HUGE or TINY) and as a launching point for deeper research.

  3. It would be interesting to compare the number of known users with the Alexa-bar (from the user-agents). Is the number of known Alexabar-users really proportional to the numbers in Alexa? Could you extrapolate a certain Alexa-ranking from a known number of Alexa-Users? Or does Alexa use other factors as well?

  4. Remnds me of an old series of Dr. Pepper commercials I really loved.

    “The Toolbar Wars are over.”

    You’ve won the war, Google. No need to keep fighting the battles.

  5. Michael, I don’t think Matt is fighting a toolbar battle here, he’s fighting the false widespread impression that Alexa stats are reliable.

  6. Alexia is heavily skewed towards tech sites due to the type of person that would actually install the toolbar. Most would agree that it’s nothing more then a gimmick and I’ve never seen any value in the use of the alexia data.

    The analysis Peter done is just concrete evidence to enforce what everyone already knows.

  7. This is confirm what many seo experts published before, but it is good to have this information from matt cutts and google.

  8. No surprise to me; I’ve long been saying variations of

    “Alexa scores are utterly dependent on visits by alexa toolbar users; very largely webmasters and geeks, with a scattering of well-meaning teeenagers.

    Alexa’s ability to tell you ANYTHING about YOUR, GENUINE visitors is precisely zero.”

  9. Besides Alexa, It would have been nice to get his accessments of: Compete and Quantcast in terms of accuracy.

    Here are more verifiable stats to compare all three against

    seoptimization.blog.com/1221628/

  10. I say that I have to agree that Alexa is probably wholly inaccurate due to the bias of people who install the toolbar.

    However the one good thing about having it installed is that its pretty easy to spot myself in the weblog files from the various IP addresses my ISP gives me every time I use the good old fashioned 56k dialup account.

  11. So is this suggesting that Google is going to start publically sharing their massive toolbar data set so the public can see?

  12. alexa is not that accurate I gather – but google analytics is not much better – it never matches with my raw log files, why is that?

  13. I agree with Brett, it would be very interesting if google did do this.

    Most spyware programs ding the alexa bar as spyware so I would venture to say most average users do not have the alexa toolbar even installed.

  14. I don’t see how norvig is doing 2.1m uniques. I have non-webmaster sites doing over 60k uniques a day that are ranked better. By all traffic standards, the site does closer to 25-35k real uniques a day.

    We have been watching Alexa and our sites rankings/traffic for years. We see a huge increase in firefox users with webmasters – but don’t see anywhere close to that match in Alexa rankings.

    I know its hard to get exact traffic estimates from Alexa, but I would bet Matt really does more “true unique” visitors in an “entire month” than the other listed sites do. Which is why Matt crushes most sites in the rankings.

  15. Philipp Lenssen, just to clarify, I do like Alexa’s data and I’m glad that they share it; you just have to remember that the data can be biased.

    Brett, that wouldn’t be for me to say. I’m sure that if people would lob other criticisms at anyone that provided data.

    TheDoc, I made my logs stats for 2006 public here:
    http://www.mattcutts.com/blog/my-search-stats-for-2006/

  16. Great work by you and Peter. I knew is was “not accurate”, but 50x off for two *technology* sites is a very ugly reflection on Alexa. Combining this with SEOMOZ’s excellent study of how bad the stats programs are right now I agree with Battelle that Google should consider providing a comparison service.

  17. I think alexa works as long as you are comparing in related fields… just like any other tool it has its place.

  18. Hey Matt.. Cool that you posted your stats for everyone..
    I find that my Analytics are 20-30% off the real number as as high as 50% off server logs. Still a great a service that I use though. You can move the counter from the bottom of your site to the top to increase hits, or make it load first.

    If webmaster traffic was the difference, affiliate programs and all webmaster resources would rank through the roof – which isn’t the case.

    Without going the everyones logs to be sure.. I would bet your site has more ‘real uniques’ – fresh new people finding your site than the others. You also have a 50% return rate.. Even my affiliate programs don’t have a 50% return rates.. Most sites are like 15-20%. I think you have the rankings because your site deserves it – not because of webmasters.

  19. Oh no, don’t link to it! You’re ruining it! Now Peter can chart the demise of the skew as all of the Cuttlettes become novig.com fans.

  20. I’ve known this for years. Why doesn’t anyone pay attention to me when I say stuff like this? Maybe I need to get a job at Google…

  21. The Alexa rating/traffic conversation – one that I always dread. Thank you to you and Peter Norvig for giving more legitimate support the “why Alexa data is not always reliable” argument.

  22. Matt, your characterization of Peter’s data as demonstrating a general skew in Alexa is data is guilty of the same kind of innumeracy that Peter is attempting to point out.

    Taking Peter’s analysis to say that Alexa’s data is *generally* inaccurate draws a huge inference from a rather small amount of data and does so using the exact selection bias Peter points out.

    It is no surprise that a tool know to skew towards webmasters shows a skew in favor of your site. This says nothing about what skew, if any, might be in the Alexa data for sites that are substantially less strongly correlated with their profession.

  23. I have even more striking statistics.

    One of my sites receives 100x more daily traffic than my webmaster site. My webmaster site has a higher Alexa rank (its close, but still higher).

    The site with 100x the traffic is completely non-techy in it’s audience.

    The skew can happen within subsets of webmasters too. Internet marketing sites rank higher than development sites because marketers are more likely to have it installed than developers.

  24. Actually I didn’t have the honor of reading his book, even though they might of used it at the CS dept at UMD.

    Mr Novig metions something bout selection bias. It will only calculate if the user has the alexa toolbar installed. But they mention this in their terms of service.

    This selection bias could also be compared to Google’s Pagerank calculation method. It tries to select those so called ‘editorial votes’ which it thinks should be selected and credits the Pagerank. The left are discarded ‘source google webmaster blog, post saying some pages may lose their ability to pass reputation’.

    Now bout his site not having a higher alexa ranking then this site, basically is saying that this site gets more visits from people which have the toolbar installed.

    And this site having a higher Pagerank then his says that this site is getting more editorial votes. I think mainly webmasters are the ones casting those votes, and its possible that this site has topics more interesting to webmasters.

    Even though I don’t use the alexa toolbar, cause there is none for firefox, but there is no perfect metric.

    All have their weakspots and their strong points.

  25. Matt,
    The Verio Dulles19 server has been down since Sunday morning, I believe it’s the Virginia Data Center Server Location, and as you can guess my main site is hosted on it. It seems that first the RAID died then there was a undisclosed but complete failure during a data restore. Argh! We have certainly worked hard to offer a quality site but this worries me for obvious reasons.

    So my question is about how Google will treat this matter? Are we going to be reduced to road kill?

    Unable to smile today,
    The rashly unpredictable Charles S.

  26. >This selection bias could also be compared to Google’s Pagerank calculation method.

    I don’t think anyone has ever said PageRank is a popularity metric that can branch across niches.

    You could have the largest and most popular knitting site on the Internet and have a PR of 4. You could also have the 172nd most popular funny videos site on the Internet and have a PageRank of 6.

    All its good for really then is comparisons within a niche. Or, to say it another way, it is a good thing to have more PR than all of your competitors, but you shouldn’t worry about how you stack up against sites not competing against you.

  27. Nick – I think the original Nick here, you gotta write a blog post with some numbers. 🙂

  28. > I do like Alexa’s data and I’m glad that they share it

    Two people lost on the ocean in a little boat:
    “I’m thirsty!”
    “We’ll wait for the rain.”
    “It hasn’t rained for days, I’ll drink from the ocean!”
    “That’s salt water, it will only make you more thirsty.”
    “It looks delicious, and surely some water is better than no water at all right?”

  29. Peter should just have read Aaron Wall’s essential reading, The SEO book, that explains,

    “Alexa is widely tooted as a must use tool by many marketing gurus. The problems with Alexa are:

    * Alexa does not get much direct traffic and has a limited reach with it’s toolbar
    * a small change in site visitors can represent a huge change in Alexa rating
    * Alexa is biased toward webmaster traffic
    * many times new webmasters are only tracking themselves visiting their own site

    Why do many marketing hucksters heavily promote Alexa? Usually one of the following reasons:

    * ignorance
    * if you install the Alexa toolbar and then watch your own Alexa rating quickly rise as you surf your own site it is easy for me to tell you that you are learning quickly and seeing great results, thus it is easy to sell my customers results as being some of the best on the market
    * if many people who visit my site about marketing install the Alexa toolbar then my Alexa rating would go exceptionally high
    * the marketers may associate their own rise in success with their increasing Alexa ranking although it happens to be more of a coincidence than a direct correlation”

    Funny how often I have to share that with people – Thanks Aaron

  30. Matt,

    I would not consider Alexa to be the end all, be all of online marketing, however, if analyzed as compared to rivals it can be useful.

  31. Guys,

    On another note, what interest does google have in the Alexa toolbar.

    This is my analysis.

    See, the people which sell advertising on their site use Alexa as a metric of measure. You go to Text Ads or whatever and they have all these sites and then they have the sites’ Alexa ranking to tell the potential buyer, that your ad is going to be placed on a site with xxxxx traffic ‘according to alexa’.

    The text ads company is allowed to do that. Alexa’s terms of service doesn’t prohibit it.

    Now imagine no Alexa. Then what does the advertising company go by? They can show their own stats, log files or whatever which is going to be cumbersome & even harder to get the potential buyer to believe them.

    Now I know google lauched its new checkout service to get a market share of established online processors like paypal, 2checkout etc, etc.

    But for now, google itself is a big advertising company. They sell those adwords ads on searches and websites. And that makes a sizeable portion of their revenue (i think).

    Without alexa, in the long run it would more benefit google. The Text ads sellers would be without a metric to go by. Potential text ads clients could then revert to buying the adwords ads where they pay per click.

    I don’t know what the text ads industry is like or what kind of future it holds. But I see google getting on its guard so I’m assuming that it forsees it as a future threat.

    I’m going to keep my big mouth shut now for a few months now before I start getting into trouble.

  32. So matt, are you saying the inaccuracy of Alexa is not worth sweating over?

    I notice that I’m getting some 30,000+ hits per day on one of my sites, but my alexa ranking is still 100,000+ which is yes, very inaccurate.

    Alexa rankings can easily be faked too…

  33. The metrics from alexa always seem to be very flaky. Having worked on numerous websites and analysed the alexa results with real factual traffic figures including that of competitors im always surprised in how in accurate or accurate they can be? It all depends on how you loo kat the mand in which sectoe. As pointed out most SEO and techie folks use the alexa toolbar, so missing out oan real junk of demogrpahic data.

  34. Dave (Original)

    Only one thing worse than consistently inaccurate stats and that’s inconsistent inaccurate stats.

    Nobody has a large enough amount of sample data to provide meaningful statistics on Web site comparisons.

  35. Alexa provides traffic estimate from visitors clicks, and that have no particular idea about unique visits.

  36. There are rumors out there, that Alexa will come out with a javascript, that publishers can add to their site. Otherwise use Quantcast, Spyfu, Compete.

  37. Charles re: Verio Server problems – thanks for the Verio update and I hope Matt answers your question. I was pulling out my hair assuming I’d screwed something up so I didn’t call them.

    You gotta come to Matt’s place to find out wazzup with the dang interwebs!

  38. Google Analytics is more accurate if everyone turned on the IP filter, but the problem is, Yahoo and Microsoft will never show up in the ranking.

  39. Quantcast really has something going and is taking hold with many sites. They offer a very easy way to tag your own site with a quantcast tagging allowing you to publish accurate clear unbiased metrics for your website.

    Many sites are starting to tag with quantcast – including powerhouses such as facebook.com.

    The only conern with Quantcast if I’ve heard the ‘estimated’ traffic (if you don’t tag your site) is underestimated. For my site, skireport.com, the number of daily visitors nearly doubled according to them and jumped from 35k to 15k in terms of most trafficked sites.

  40. My traffic rank has moved up 600k places in the last 2 months on Alexa, but my traffic has increased only slightly. I give NO reliability to Alexa numbers.

  41. > but google analytics is not much better – it never matches with my raw log files, why is that?

    This is because Google Analytics is JavaScript based, which means it can only count page views/visits by real web browsers that have JavaScript enabled (e.g. not crawlers, bots, site scrapers). Your server logs count all page views/visits. Since many automated processes spoof real browsers, and some users have JavaScript disabled there should always be a significant difference between GA and your server log reports. There is nothing that can be done about this except to realize that GA may be a more realistic view of your true traffic, particularly your traffic that is probably able to see your ads (e.g. AdSense).

  42. Hi Matt:

    http://www.sonicko.com/

    check this url ………. amazing fan of google : )

  43. I can’t wait for Google to buy Alexa, A9 and Amazon all together so that the quality of service gets perfected. Any chance of seeing this happening any time soon? A9 and Alexa may not carry value but Amazon is a great advertising machine and with the new Clickriver service they launched.

  44. I did use Alexa rank for recent Inferno blog hunt, Indian Business School Blog hunt. It was ok, the better rank certainly reflected better site. In long run, it gets stable.

  45. Would you risk your life on Alexa? No
    Is it one of the tools available? Yes

    While it is not accurate for a tech website, it is probably more accurate for a gardening site.

    After all it is just a sample. Samples are not accurate, but are better than nothing.

  46. I read before that Alexa Toolbar was used more predominately in ASIA & not so much in US or Europe.

    Im not sure the Rank in Alexa holds water beyond the Top 100 mark….

    After all Google isnt #1 there ( at the moment !: )

    Cheers!

  47. Alexa definitely gives different results and truly you can’t trust any online tool. Because, accuracy in results of web traffic can’t be measure through online tool available.

    But, atleast you can have the approx. results of your competitor that provide statistics on web site traffic, as well as lists of related links to the some extent.

    Archna Sajwan

    http://www.ecommind.com

  48. Hi,

    I think that Alexia is an excelent internet marketing tool, the problem is that some people use some tricks to increase their ranking; the consequence of that is that not too many people will trust Alexia in the future is this people continue using those tricks, there might be a way to stop that!

  49. Yes, I agree..

    But, according to the whispers and for the other alexa toolbar users, we use alexa toolbar..

    We think; alexa ranks are the part of google’s algorithmes..

  50. I also notice that sites such as digg and some I have never even heard of (france, north africa) have very high scores, so it makes me wonder how accurate the measure is. Like everything else, I suppose it is at least a vague rule of thumb when comparing relativity of sites.(should not be way, way off)

  51. I would like to point out that ANYTHING you do on the internet while you have alexa toolbox installed will be recorded and made available to archive.org. This includes your search queries and many of other things you type in (sometimes it even includes password tokens). Alexa is no good, it provides no meaningful information and worse, it takes your information and put it on the public domain.

  52. I would not consider Alexa to be the end all, be all of online marketing,yes being an online tool it gives results, it can be used for comparative analysis but with so many tools that move Aexa rankings, it is not a safe bet to rely completely on it.

  53. Hmmm is t alexa important to us????

  54. I am rather confused by estimates from all these services like
    Alexa,
    Quantcast etc
    Is there a valid methodology behind these estimates ? or are they just estimates ?
    Even more confused about Alexa being more popular in the Asia, than in the US, I would have thought the other way around.

css.php