Archive for May, 2006

Quick comment on nofollow

The rel=”nofollow” attribute is an easy way for a website to tell search engines that the website can’t or doesn’t want to vouch for a link. The best-known use for nofollow is blog comment spam, but the mechanism is completely general. Nofollow is recommended anywhere that links can’t be vouched for. If your logs analysis program shows referrers as hyperlinks, I’d recommend using nofollow on those links. If you have a wiki that anyone on the web can edit, I’d recommend nofollow on those links until you can find a way to trust those links. In general, if you have an application that allows others to add links, web spammers will eventually find your pages and start annoying you.

Let me give an example to illustrate. There’s a domain that runs an oompa loompa dating service. Oompa-Loompas are the small folks from the original Charlie and the Chocolate Factory. I think that the dating service is just a gag; it’s a fun way that people can play around and pretend to be oompa loompas. It used to have real people leaving messages for each other. But it also lets you add a link to a webpage, so this fun service has been inundated with people trying to get links. In the picture below, notice that every comment is pretty meaningless: “Good content and very informativity! Thanks!” and “Your website has been very helpfull to me!!”. And if you mouse over the little home page icon, you see why; I’ve highlighted one below:

Oompa Loompa Dating!

The fact that webspammers will find and attack a one-off application is very telling. It shows that if you run a site that lets anyone add a regular link, webspammers will eventually find your site and spam it as well.

I’d be the first to say that nofollow isn’t perfect. For example, plenty of people will set their bots loose, and those bots will spam for links without checking if a particular page has nofollow. But the people that write the bots also aren’t dumb. If it doesn’t add any benefit to spam a particular software package, a smart spammer will avoid wasting the time/effort on that software.

If you run a well-known website or software package, webspam is more of an issue for you. Someone recently pointed me to this wikipedia thread, where someone asked if Google was in favor of enabling nofollow on wikis, so I wanted to give a quick reply: I do think it’s a good idea. For example, I’ve talked to a couple SEOs recently who said that they have a full-time person on their staff dedicated to scamming links from Wikipedia and wikis.

In an ideal world, nofollow would only be for untrusted links. Let’s take the example of a forum that wants to avoid linking to spam, but the same advice applies to wikis or any other web software. If an off-domain link is made by an anonymous or unauthenticated user, I’d use nofollow on that link. Once a user has done a certain number of posts/edits, or has been around for long enough to build up trust, then those nofollows could be removed and the links could be trusted. Anytime you have a user that you’d trust, there’s no need to use nofollow links.

Comments (104)

Heads up: Taking a vacation soon

I’m glad to see that Jeremy is taking a couple weeks off because I’m about to take a vacation too. Starting on May 22nd, I’m taking off until June 30th.

Back in grad school, it was easy to decide on a whim: “Okay 10 of us are going to fly into Utah, then do a road trip down to L.A. for SIGGRAPH! Bryce Canyon, here we come!” And I love working at Google, but getting the right balance between work and life is tricky. Last year, I resolved to start working eight-hour days, and I’ve been really glad for the extra time with my wife. So this year, I made a couple resolutions: first, to make Friday evenings a “date night.” We don’t have to go out on a date, but we do put away the computers, and that’s been great. The other resolution was to take a long vacation. :)

When I get back, I’m going to take my Gmail inbox and start fresh by completely archiving it. So if you need to email something important, you’ll have to contact me again when I get back. I don’t know how much I’ll write on the blog, so if postings are scarce, you’ll know why. :)

Comments (47)

SiteAdvisor Study

Kevin Delaney over at the Wall Street Journal has a short article about a recent SiteAdvisor study of potentially malicious web sites. According to the article, SiteAdvisor claims that about 2% of regular web sites may expose surfers to “risks or nuisances,” while the number for search results is about 3% and the number for search engine ads is higher.

I could pick nits about this study (for example, SiteAdvisor’s definition of a bad site includes asking for email addresses), and it’s quite rare for me to hear complaints about badware on Google. But I do hate scuzzy behavior, especially in our search results.

In fact, several weeks before I found out about this study, we added a new provision to Google’s webmaster quality guidelines because of the WMF vulnerability:

Don’t create pages that install viruses, trojans, or other badware.

I’ve said that before, but it’s nice to make it official. Just as an aside, I’m surprised no one noticed this addition. I thought SEOs watched our quality guidelines with eagle eyes? Gary Price, I miss your uncanny ability to notice changes on a website. :)

Google’s statement to the WSJ made it clear that we don’t want junk in our ads, either:

Google Inc. said in a prepared statement that it prohibits ads that promote spyware, viruses and other malicious software and removes them when it becomes aware of them.

That’s a fine response. But given how much I hate web pages that install malicious software or abuse browser security holes, I’d like it if we did even more to protect our users.

Comments (36)

Better conversations

For months, a post by Tom Hespos and a related post has rolled around in my head. In my mind, having a someone doing engineering communication (reading blogs, participating in forums, answering questions at conferences or online) has helped Google to get good feedback, and hopefully I’ve been able to help people in return.

But I can’t always keep up with everything going on outside Google, especially when I’ve got a traditional set of duties in the quality group and webspam. In normal Google tradition, when you have too much work for one CPU to do, you shard it across multiple computers. So I asked my manager if I could shard myself, and my manager said yes. After that, I kept my eyes open for people that I’d seen around the web and respected. I wanted someone with knowledge of search and who was intelligent, nice, patient, and well-spoken. When I saw that Adam was available, I invited him down to lunch, and I liked him a lot.

So a couple of months ago, he went through the interview process and Adam joined Google. We look a little bit similar, but we have different backgrounds. I’m a computer science guy; he’s got an MBA and a law degree. I know a ton about search, but not as much about Adwords these days. Adam had more experience with AdWords when he joined Google, but in the last couple months he’s been ramping up fast on websearch.

My hope is that Adam will help Google listen more, and will also grow into answering webmaster questions. We sent him to WMW Boston to soak up what Pubcons are like, and he’ll be at SES London as well. Sure, it will take time for him to ramp up, but he’s already been helping me a lot. For example, he’s been going through the WMW Boston emails and replying to them. He’s been handling outside email to me that I wouldn’t get a chance to respond to. Over the next couple months, we’re also going to throw Adam into our intensive training to become a webspam warrior. Plus he’s got my RSS feed list, and we’ve been practicing the fine art of spotting reports of good bug reports, and deciding who at Google to harass about them.

Please welcome Adam, and be gentle. :) He’s ramping up quickly, and I’m excited that he’s at Google.

Comments (21)

Fun with Trends

Steve Rubel is having a blast, and Philipp shows some fun examples and his readers find more.

Personally, I enjoy doing things like [lake tahoe]:

Lake Tahoe

Sure, you see a spike in the winter-time (in purple), but check out the increase of searches in the summer (in green). I guess people want to go up and bike and relax during the summer too. And it’s fun to see the unexpected periodicity of some queries like [ncaa] (March Madness rules), [prom], [jewelry] (Christmas), or [burning man, sundance, sxsw].

What are your favorite graphs or insights so far?

Comments (68)

Next entries » · « Previous entries