Archives for January 2009

Four Things You Need To Know About Knol

Recently Google mentioned that 100,000 different articles have been posted to Google Knol. I’ve been meaning to talk about Google Knol for a while, because there’s a few things you need to know. It seemed especially relevant after I saw the Silicon Alley Insider article about Knol on Techmeme, so I figured that I would weigh in.

Google Knol does not receive any sort of boost or advantage in Google’s rankings. When Knol launched, some people asked questions about this. I dutifully trundled around the web and said that Knol would not receive any special benefits in our scoring/ranking for search. With the benefit of six months’ worth of hindsight, I hope everyone can agree that Knol doesn’t get some special boost or advantage in Google’s rankings.

In my opinion, Knol is doing just fine. It’s weird that in just a few months, the conventional wisdom can change from “Google will give Knol unfair boosts in ranking; it will dominate the space!” to “Oh, Knol gets so little traffic that it’s not a success.” The rapid change in perception gives me a little bit philosophical whiplash. 🙂 The fact is that neither of these perceptions is true. Mashable made a point that “it took Wikipedia almost two years to reach a similar number of pages.”

The Knol team is not standing still. Some of the ways I’ve learned to estimate whether a team will be successful is how high-impact their project is, but also 1) how quickly they can iterate and 2) how they react to feedback. I consider the Google Chrome team very successful, for example. They roll out a new version of Chrome about once a week, and I see them pay attention and prioritize based on feedback. In the same way, you probably haven’t noticed it, but the Knol team has been steadily delivering new releases over the last six months. Knol has more polish, more features, and the team has listened to the outside world when they plan what to work on next.

My personal conception of Knol is that when you want to write a quick article or put some information on the web, Knol is a great place to do it. If you already have a blog, you could always stuff the info on your blog. But a ton of people occasionally want to post some info but don’t have or want a blog. Imagine if you’ve searched the web for some piece of info and didn’t find exactly what you wanted (maybe there isn’t any good content about using red widget A with blue operating system B). By the time you’ve finished searching, you might be an expert about that micro-niche. That’s a perfect time to document what you’ve learned, and if you want an easy place to store that info, Knol can serve that need.

Detecting Googlebombs

I recently did a Googlebomb post over on the Google Public Policy Blog. I’ve talked about Googlebomb phenomenon before (also see more Googlebomb background here). Just as a reminder, a Googlebomb is a prank where a group of people on the web try to push someone else’s site to rank for a query that it didn’t intend to (and normally wouldn’t want to) rank for. Typically these queries tend to be unusual phrases such as “talentless hack” that don’t really have any existing strong results.

Danny Sullivan asked a good question in this most recent round of coverage about Googlebombs:

Obama no longer ranks for “failure” on Google. The White House hasn’t changed anything. The link data that Google has been using to rank the Bush page — data inherited by Obama’s page — hasn’t changed. So the Googlebomb fix for this that hasn’t worked since earlier this month just happens to kick in a few hours after I post this article? That’s going to kick off another round of questioning over how “automated” that fix really is…

I wanted to address that question. The short answer is that we do two different things — both of them algorithmic — to handle Googlebombs: detect Googlebombs and then mitigate their impact. The second algorithm (mitigating the impact of Googlebombs) is always running in our productionized systems. The first algorithm (detecting Googlebombs) has to process our entire web index, so in most typical cases we tend not to run that algorithm every single time we crawl new web data. I think that during 2008 we re-ran the Googlebomb detection algorithm 5-6 times, for example. You can think of it like this:

Googlebomb or linkbomb pipeline

The defusing algorithm is running all the time, but the algorithm to detect Googlebombs is only run occasionally. We re-ran our algorithm last week and it detected both the [failure] and the [cheerful achievement] Googlebombs, so our system now minimizes the impact of those Googlebombs. Instead of a whitehouse.gov url, you now see discussion and commentary about those queries.

css.php