Search Results for: seo analysis

Google Trends for Websites

If you’re a site owner, webmaster, SEO, or otherwise have an interest in website metrics, I think you’re going to like Google Trends for Websites. It’s almost as addictive to me as Google Maps is for, you know, normal people. 🙂 You’re probably familiar with regular Google Trends, which lets you see trends in how people search for difference phrases such as full moon or skiing vs. swimming. Here’s the graph of how often people search for “full moon,” for example:

Trends for full moon

Those spikes correspond nicely to when full moons actually happen. Now you can do similar fun things with sites. Here’s a simple example with my site, mattcutts.com:

Trends for Websites for mattcutts.com

As you can see, Trends will try to estimate the number of visitors to my site over time. (The product is free, but you have to sign in if you want to get estimated numbers — otherwise you see the graph but not the number estimates.) There’s other good info too, though. See the graph below, for example:

Trends for Websites for mattcutts.com

This is saying that people who visited mattcutts.com also visited searchmarketingexpo.com and sphinn.com. You can “surf” related sites just by clicking around. You can also see what else people searched for. And you can even enter in two sites, separated by commas, to compare the estimated number of visitors between the sites:

Trends for Websites for mattcutts.com

It’s a lot of fun, especially if website metrics is your cup of tea. You can read the official blog post or Barry Schwartz did a write-up as well. The comments are pointing out that some sites might not have much/any data. I think that that’s mainly because there’s a minimum threshold of traffic before Trends is willing to show statistics for a site — bear in mind that this is a launch on Google Labs. But you can still do some fun analysis with Trends for Websites, even though it’s in Google Labs. And it is free, so give it a try.

Bay Area Blawgers

Tonight I went to a meet-up of Bay Area Blawgers (a blawger is a law blogger). Why did I go to this, when I normally don’t do blogger meet-up kinda stuff and don’t know much about law? Well, the get together was just a little down the road at Santa Clara University. And the shindig was coordinated by Eric Goldman. I’ve mentioned before that I enjoy reading Eric’s blog for coverage of web legal issues.

I came in just before things started and happened to luck into sitting by several neat people. On my right was Mike Masnick of Techdirt fame. If you don’t browse Techdirt from time to time — dude, you need to read fewer SEO blogs and broaden your horizons. 🙂 Mike and the writers at Techdirt provide an independent take on news items. Mike’s got a long memory (like Danny Sullivan, but with general news), so he does a good job of putting news items into perspective. In my experience, Techdirt does a deeper level of analysis than most sites, so when Techdirt rakes Google over the coals for something, I tend to give that critique more weight.

To Mike’s right was Kurt Opsahl of the Electronic Frontier Foundation. My advance planning for the meet-up consisted of wearing my EFF T-shirt, so all that hard planning paid off. Kurt polled the group on interesting questions about the DMCA (“How many of you have gotten a DMCA takedown notice?”). Afterwards, he talked about the info on this page where you can register as an online service provider with the U.S. Copyright. It’s a one-page form and an $80 fee. We also talked briefly about Google’s decision to anonymize our logs data after 18-24 months. I still hope to circle back around to that topic at some point (I’m a fan of the decision).

On my left was Colin Samuels. Colin is the general counsel for Accela, which makes government software. Colin told a good story about how he learned the ropes of white-hat SEO and built his reputation up enough to be the #1 Colin Samuels in the world, handily beating a Colin Samuels who skis. 🙂

Other tidbits:
– I didn’t realize that Sun’s general counsel is a blogger.
– We discussed whether it was better for a law blogger to mention legal cases that could be negative for a firm (it definitely bolsters your credibility as a blogger). We also talked about the pros and cons of anonymous blogging, and a little bit about online bullying.
Chris Hoofnagle was there. I hadn’t seen Chris since the Computers, Freedom & Privacy Conference in Berkeley in 2004. Which reminds me: I want to hit some non-SEO conferences this year. Maybe Defcon or SIGGRAPH.
– One of the more entertaining people there, Kevin Underhill, runs a legal humor blog. That’s right, the law can be funny:

In a long-awaited and dramatic decision, the Supreme Court held today, unanimously, that in the context of the Guam Organic Act’s debt-limitation provision, 48 U.S.C. section 1423a, Guam’s debt limitation must be calculated according to the assessed valuation of property in Guam.

Like we didn’t all see that coming. In your face, Supreme Court of Guam!

I think a good time was had by all. Thanks for pulling so many blawgers together, Eric.

My search stats for 2006

[Note: I wrote this about five days ago, and I’m just now getting around to posting it.]

Okay, all the other search bloggers are sharing stats, so here goes. 🙂 All this comes courtesy of Google Analytics. If you want to sign up and analyze your website visitors with around zero work, I highly recommend it. And it’s free. 🙂

With a few days left in 2006, looks like about 1.7M visits and about 2.9M pageviews:

stats for 2006

That number of visits (~1.7M) helps explain why I try to avoid site-specific comments and try to stick to general topics. The spikes in my traffic graph were (I think) talking about international webspam and two or three posts that got dugg. The diggage happened during the week that my wife was out of town and I had a lot of free time to blog. I have a fair number of repeat visitors from all over the world. About 1/4th of people come directly to my site and about 1/3rd reach it through Google. Those digg spikes look impressive, but digg only accounted for about 2% of my traffic throughout the year.

Top Requested Pages:

Let’s see which posts were the most popular this year.

Top posts

Looking down the list, lots of people check my blog page or the root page of my domain. Over 18% of my traffic is from non-English browsers (!), so it’s not a surprise that international spam is a popular topic. If you want only Google-related posts and no cat posts, http://www.mattcutts.com/blog/type/googleseo is the best url to use. Looking at that url up above, it seems a fair amount of people use that ability to view only the Google/SEO category. For feedreaders, http://www.mattcutts.com/blog/type/googleseo/feed/ is the best url to get a Google/SEO-only RSS feed of my blog.

Top keywords:

The vast majority of people find me through some variation of my name. Other than that, the top phrases were
seo
bigdaddy
seo blog
google update
reinclusion request
noodp
google analytics
blog
google blog
proxy

I could probably do some more in-depth keyword analysis to determine what keywords to use to reach new webmasters/SEOs/site owners. I do have a long tail of referrals. My post about setting a default printer for Linux and Firefox, for example, got referrals such as
firefox default printer
default printer firefox
changing the default printer for firefox on linux
linux default printer
changing the default printer firefox
default printer linux
cups set default printer
linux set default printer
mozilla default printer
cups default printer
firefox “default printer”

That serves as a good reminder that people usually don’t type the same phrase to find information. If you’re an SEO or site owner, don’t just chase after a “trophy phrase”; think about the long tail of queries, too. You should think of the words that people will type and make sure you include the right ones in your article in a natural way. Including the right/relevant words on the page in the first place is something that a lot of people forget. Read my post about writing useful articles with good SEO practices if you want to hear more.

Traffic Sources:

I write a techie-heavy blog that talks about Google issues, so I wouldn’t expect to see much (any?) traffic from other search engines. But the sheer number of different sites that sent traffic is pretty wild. I got more referral traffic from Bloglines (22636 visits) than from Yahoo (17591), which makes sense given that a ton of people skip the site and read my full-text feed. But other sources were surprising drivers of traffic. The Search Engine Watch blog drove more traffic (15916 visits) than MSN (13554). Ask sent 820 visits, which was a tie with Steve Bryant over at eWeek’s Google Watch site. And that in turn was a little more than Oilman (789 visits) and a little less than cre8asiteforums.com at 1253 visits.

What this says to me is that there’s a lot of traffic beyond search engines, and I’m not just talking about social media optimization such as submitting stories to Slashdot/TailRank/Reddit/Digg/SearchMob. Just getting out there, talking on the web, and getting your name known in an industry can make a big difference.

The Future:

Over the last 18 months or so, being a webmaster myself and writing a blog has taught me a lot. I understand more of the issues that site owners run into, and I sympathize with the frustrations of running a site. I think that using AdWords would also be an eye-opening and useful experience. I’m torn though, because I only have a limited amount of time in my day. If anything, I need to be spending less time blogging and more time with my family. I’ve also avoided AdSense, other types of advertising, and even “subscribe to my feed, tag this post, digg this page, share me on facebook” stuff because I wanted my site to be purely informational. But the net effect is that the blog is pretty austere (spartan? plain? ugly?).

Vanessa Fox and Adam Lasnik have done an amazing job this year on both the Google webmaster blog and the Google Discussion Group for webmasters. This year I posted on a wide range of SEO topics, and advanced topics are more fun to talk about, but going forward I think it would be a good idea to cover more intro-level material.

I think within Google there’s solid awareness that blogging can be hugely helpful to discuss issues informally, answer questions, and dispel misconceptions. I’d like to encourage even more Googlers to blog. The issue is how to build trust that a Googler can talk about issues with finesse. I’m lucky because as an old-timer Googler, I made my public mistakes back when hardly anyone was watching (remind me to tell my Paul Boutin/Wired story sometime). I think we need more Googlers blogging and checking the blogosphere for mentions of their respective products. I’m not 100% sure how to get there, but I think it needs to happen.

I also wish I had more stats on my feeds. I should probably sign up for something like FeedBurner, but I hate the idea of losing control of my feed urls.

[Added before posting: I signed up for FeedBurner last night; we’ll see how it works. I’m already curious about one thing. I’m using the officially-recommended Feedburner Feed Replacement plug-in for WordPress, which is supposed to send all feed requests over to FeedBurner. It seems to work fine for my main RSS and Atom feeds, but I notice that category feeds like the http://www.mattcutts.com/blog/type/googleseo/feed/ url I mentioned above don’t seem to be getting redirected to FeedBurner. Anyone know how to fix this? Also, let me know if you see any other feed-related issues/bugs.]

Update: In the comments, Sergey S. Kostyliov asked for geo-location stats. Here ya go, Sergey. When I said 18% of my traffic was non-English, I believe that was referring to the browser language. Where people are coming from shows even more diversity:

Geolocations for 2006

To make the graph more readable, I left out the percentage labels for the Netherlands (2.24%) and Spain (1.91%). Sergey, the Russian Federation is listed at 9955 visits. With 1,695,129 total visits for the year, that’s about .6% from the Russian Federation. Also, I got exactly one visit from Samoa. They must have decided they didn’t like me, and decided not to come back. 🙂

Update 2: Doh! I completely forgot the videos I made in 2006! I got 189,923 views of my videos during 2006. Even my 3 second video of my cat Ozzie jumping that I uploaded as a test video got 2,106 views. 🙂

Also, I got some initial data from FeedBurner: 11,950 subscribers and 5,149 reach (that’s people who clicked or viewed the content of my feed). Bloglines’ web interface has pegged me at 1,216 subscribers but Feedburner claims that I have 3,409 Bloglines readers. I’m not sure how to reconcile that. I was getting discouraged that Bloglines kept saying 1,100 to 1,200 subscribers for most of 2006 in its web interface, and I really felt like I had more than that. Anyone have guesses about what’s causing the Bloglines/Feedburner disparity? Gary Price, want to ask around?

How Google handles hacked sites

If you’ve never read my blog before, welcome. I’m the head of the webspam team at Google. And I have a blog for days just like this.

Okay, first off you should go read this post. It’s entitled “Me Against Google” and the author is unhappy that talkorigins.org was nowhere to be found in Google for the last 5-6 days. After that post, go read this Slashdot post, entitled “Google De-indexes Talk.Origins, Won’t Say Why.” By the time you’re done, your pulse should be pounding. Hell, you should be angry. Damn that evil Google for not communicating with webmasters!! Or as Wesley put it in his blog:

You might think that a company that prides itself upon advanced textual analysis and automated decision-making algorithms might provide helpful warning messages to webmasters concerning problems found in their sites. You would be wrong.

Okay, ready for my side of the story? Here’s the timeline of how things happened:
– talkorigins.org was hacked on November 18th. I know this because Wesley says so in his blog post.
– By November 27th, Google had detected spammy links and text on talkorigins.org. In case you’re wondering, here’s what the cracker added:


<script>document.write(String.fromCharCode(60,100,105,118,32,115,116,121,108,101,61,39,100,
105,115,112,108,97,121,58,110,111,110,101,39,62))</script><br><a href="http://vvu.edu.gh/images/?i=animal-porn">animal porn</a>, <a href="http://vvu.edu.gh/images/?i=animal-sex">animal sex</a>, <a href="http://vvu.edu.gh/images/?i=beastiality">beastiality</a>, <a href="http://vvu.edu.gh/images/?i=rape-sex">rape sex</a>, <a href="http://vvu.edu.gh/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://deepx.com/images/?i=animal-porn">animal porn</a>, <a href="http://deepx.com/images/?i=beastiality">beastiality</a>, <a href="http://deepx.com/images/?i=dog-porn">dog porn</a>, <a href="http://deepx.com/images/?i=horse-porn">horse porn</a>, <a href="http://deepx.com/images/?i=rape-sex">rape sex</a>, <a href="http://deepx.com/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://theoi.com/image/?i=animal-porn">animal porn</a>, <a href="http://theoi.com/image/?i=animal-sex">animal sex</a>, <a href="http://theoi.com/image/?i=beastiality">beastiality</a>, <a href="http://ugobe.com/media/?i=dvd-covers">dvd covers</a>, <a href="http://ugobe.com/media/?i=dvd-ripper">dvd ripper</a>, <a href="http://ugobe.com/media/?i=psp-downloads">psp downloads</a>, <a href="http://ugobe.com/media/?i=psp-games">psp games</a>, <a href="http://ugobe.com/media/?i=psp-movies">psp movies</a>

Not pretty stuff–lots of text about rape and animal porn. In case you’re wondering, that JavaScript at the beginning produces the string “<div style=’display:none’>”, which makes the entire section of spammy junk hidden. So talkorigins.org has these porn words and spammy links, and it’s all hidden via sneaky JavaScript.

We have pretty good reason to believe that this site was hacked, but it’s still causing problems for regular users, so Google has to take action. Here’s what we do:
By November 27th, the site was classified as hacked and spammy. We stopped showing it for user queries.
By November 27th, we started flagging this site as penalized in Google’s webmaster console. I believe that Google is the only search engine that will confirm to webmasters that their site does have penalties. No, we don’t confirm penalties if we think it might clue in web spammers that they’ve been caught. But yes, we do try to confirm penalties if we think a site is legitimate or has been hacked. You can read more about how we confirm penalties in this previous post.

I hear a few people ask, “It’s nice that I can sign up for Google’s webmaster console and learn that Google penalized my site. But couldn’t Google have done more?” Well, it turns out that we did do more:
By November 28th, we emailed multiple addresses at talkorigins.org to let them know exactly what happened. According to the records I’m looking at, we tried to email contact at talkorigins.org, info at talkorigins.org, support at talkorigins.org, and webmaster at talkorigins.org with a timestamp of 2006-11-28 14:24:15. Here’s an excerpt from the email that we sent:

Dear site owner or webmaster of talkorigins.org,

While we were indexing your webpages, we detected that some of your
pages were using techniques that were outside our quality guidelines,
which can be found here: http://www.google.com/webmasters/guidelines.html
In order to preserve the quality of our search engine, we have
temporarily removed some webpages from our search results. Currently
pages from talkorigins.org are scheduled to be removed for at least 60 days.

Specifically, we detected the following practices on your webpages:

* The following hidden text on talkorigins.org:

e.g.
animal porn, animal sex, beastiality, rape sex, sleeping sex, animal porn, beastiality, dog porn, horse porn, rape sex, sleeping sex, animal porn, animal sex, beastiality, dvd covers, dvd ripper, psp downloads, psp games, psp movies

We would prefer to have your pages in Google’s index. If you wish to be
reincluded, please correct or remove all pages that are outside our
quality guidelines. When you are ready, please visit:

https://www.google.com/webmasters/sitemaps/reinclusion?hl=en

to learn more and request a reinclusion request.

You can read more about how we try to email webmasters about issues on their site in this previous post. According to his post, Wesley did a reinclusion request recently, and I’ve confirmed that the reinclusion request was approved, so I expect talkorigins.org to be back in Google within 24-48 hours.

But let’s take a step back. This site was hacked and stuffed with a bunch of hidden spammy porn words and links. Google detected the spam in less than 10 days; that’s faster than the site owner noticed it. We temporarily removed the site from our index so that users wouldn’t get the spammy porn back in response to queries. We made it possible for the webmaster to verify that their site was penalized. Then we emailed the site, with the exact page and the exact text that was causing problems. We provided a link to the correct place for the site owner to request reinclusion. We also made the penalty for a relatively short time (60 days), so that if the webmaster fixed the issue but didn’t contact Google, they would still be fine after a few weeks.

Ultimately, each site owner is responsible for making sure that their site isn’t spammy. If you pick a bad search engine optimizer (SEO) and they make a ton of spammy doorway pages on your domain, Google still needs to take action. Hacked sites are no different: lots of spammy/hacked sites will try to install malware on users’ computers. If your site is hacked and turns spammy, Google may need to remove your site, but we will also try to alert you via our webmaster console and even by emailing you to let you know what happened. To the best of my knowledge, no other search engine confirms any penalties to sites, nor do they email site owners.

Wesley and anyone else who works on talkorigins.org, I’m sorry that this was a stressful experience for you. Could Google do a better job? Absolutely, and we’ll keep working on it. For example, maybe we can show a more specific message for hacked sites in the webmaster console. Google could also try to identify better email addresses when writing to site owners. For example, for talkorigins.org, there are email addresses such as “archive@” and “submissions@” that we could have used instead that might have reached the right person. I’m open to other suggestions too. But please give Google a little bit of credit, because I do think we’re doing more to alert webmasters to issues than any other search engine.

Note to new readers of my blog: I pre-moderate my comments, and it’s after 2 a.m. and I’m going to bed now. If your comment doesn’t show up immediately, it’s waiting for me to approve it after I wake up. 😉

Review: Compete

Earlier this week, I sat down and wrote for about 40 minutes about hacked sites, then promptly lost that post forever because my webhost’s database machine was pokey right then. My fault for running Firefox 1.5 on my laptop instead of 2.0. WordPress 2.1 will also have autosave built in. My breath is definitely bated for the WordPress autosave feature. I tried not to let my inner snark out, but it’s been a couple days, so I figure I’ll let out just a little bit of grump.

Battelle mentioned a new metrics company / search engine called Compete. Yahoo! provides most of the backend power. What added value does Compete provide? Well, the search results have little icons beside them for things like spyware/phishing, coupon codes, and how popular the site is. How accurate is the spyware/phishing data? Well, for my site, I get a warning exclamation point and this message:

Use caution in providing any personal information or downloading software on mattcutts.com.

I’m not a spyware/phishing site, and my site has been around for over a year. So with a sample size of one, how accurate is that data? Not very.

Okay, how about offers or other deals? Compete says that they’re the “first service to identify available deal codes as you enter a retail website.” I can honestly say that of all the people I’ve talked to, I don’t remember anyone asking “Why don’t you tell me in the search results whether a site offers coupon codes?” I tried a search like [coupon codes] and none of the results triggered as offering promotion codes. So I did the search [fat wallet], and fatwallet.org isn’t listed as having any deals or coupon codes. That doesn’t seem right.

Let’s talk about Compete Picks. The idea behind the feature is for a search, Compete looks at the results and if there’s a lot of post-click page views or other activity on a site, then that site might be a better match. So for some queries, you’ll see “Compete Picks” where Compete thinks the results are especially helpful.

Except: go back to the search [coupon codes]. The Compete Picks are video game cheat codes, which are completely off-topic. And for [fat wallet], the three Compete Picks that I got for that query were “Bbw plumpers, Bbw mature thumb galleries,” “Fat Chicks in Party Hats,” and “Fat Face”:

Compete's Picks for fat wallet

That’s just obviously wrong.

Am I being too harsh on Compete? I’m guessing so–after all, I’m getting some snarkiness out. But on their launch announcement they claim to be the “first social search solution leveraging community click-stream information to enhance search results.” Really? Because Danny Sullivan wrote about DirectHit in 1998, and I distinctly remember when DirectHit leveraged community click-stream information for HotBot back in the day.

So what’s left? Metrics. That whooshing sound you hear is all the SEOs going to install the Compete Toolbar. Compete claims 2 million users help it compute metrics, and I’d be very surprised if all those participants were toolbar users. Yep, digging through the FAQ, they mention “ISP relationships,” which presumably means that they’re buying user data from ISPs. ISP relationships can be a huge source of metrics bias. For example, some ISPs partner with Yahoo, and users on those ISPs are probably more likely to visit Yahoo. Other ISPs partner with Google. And savvy users that use smaller providers such as Covad or Speakeasy are likely not counted at all.

Because you don’t know which ISPs are selling user data to companies such as Compete or Hitwise, you don’t know what biases are baked into those companies’ metrics–and the metrics companies won’t tell you. Maybe I’m cynical about metrics lately, but Rand Fishkin looked into this recently. He got data from 25 blogs (including mine) and then compared that data with a bunch of metrics services (including Compete). His conclusion? “Based on the evidence we’ve gathered here, it’s safe to say that no external metric, traffic prediction service or ranking system available on the web today provides any accuracy when compared with real numbers. … The sad conclusion is that right now, no publicly available competitive analysis tool we’re aware of provides solid value.” Go read his post–it’s a good one.

I don’t think Compete is even the first Bill Gross search company to look at using click data. This review of Snap.com quotes a previous version of Snap’s about page, and it sounds pretty familiar:

“Instead of just relying on computer algorithms to rank search results, Snap also uses click-stream information from a network of one million Internet users. By recording and processing which Web sites users spend time on, and which sites they quickly leave, Snap improves the likelihood that the search results you get will be the results you’re really looking for.”

You have to ask what Compete brings to the table if Snap.com already tried using clickstream data. Okay, I’ll stop. It’s cool that Compete is trying new ways to inform their users with different icons. I like that they link to their blog from their main page. And it’s nice that they offer free metrics for sites. I just wish I had more confidence in Compete’s metrics, and I wish I knew which ISPs they get data from. To the folks at Compete, I’ll try to be in a more positive mood the next time we meet.

css.php