Archives for March 2009

Why Google won’t remove that page you don’t like

(This is just a post from my personal perspective, but I hope it’s helpful.)

Every few weeks or so, someone contacts me and says “Hey Matt, there’s page out on the web about me that I really don’t like. Is there any way to remove it from Google’s index?” People don’t usually say it like that. More likely, they say “There’s this person making crazy claims about me on the web, and the stuff they say is just off-the-wall. Can Google remove this crazy person’s page?” Or “Everybody knows that this crazy person is posting lies and twisting people’s words. Is there anything you can do about it?”

I want to kill a webpage!

I’ve responded to this so many times that I thought I’d write up a complete response. Now when people ask me some form of this question, I can just point them to this blog post. So here’s the sort of reply that I would normally send back:

Unfortunately there’s not much I can do. The page you pointed out is not spam, and pretty much the only removals (at least in the U.S., which is what I know about) that we do for legal reasons are if a court orders us. We typically say that if person A doesn’t like a webpage B, only removing page B out of Google’s search results doesn’t do any good because webpage B is still there (e.g. it can be found by going to it directly or through other search engines). In that sense, the presence of that page in Google’s index is just reflecting the fact that the page exists on the wider web.

The best actions for you from our perspective can be one of a couple options. Either contact whoever put up webpage B and convince them to modify or to take the page down. Or if the page is doing something against the law, get a court to agree with you and force webpage B to be removed or changed. We really don’t want to be taking sides in a he-said/she-said dispute, so that’s why we typically say “Get the page fixed, changed, or removed on the web and then Google will update our index with those changes the next time that we crawl that page.” Our policies outside the U.S. might be different; I’m not as familiar with how legal stuff works outside the U.S.

There you have it. People usually aren’t happy to hear that reply, but I hope they can understand the reasoning behind it. If you were creating your own search engine, I also hope that you’d come to pretty much the same conclusion. The official documentation page on how to remove a page from Google’s search results says essentially the same thing, but I wanted to give a little more context.

Book Review: Anathem

I have a sneaking suspicion that hanging out on Twitter is causing my attention span to grow shorter and shorter and … wait, what was I talking about? Oh, short attention span, right.

So as penance for all that microblogging, I decided to set myself a thick book to read. I chose Neal Stephenson’s Anathem. I’ve read all of Stephenson’s early work, but the Baroque Cycle didn’t grab me. Then I saw that fellow Googler Riona MacNamara had downloaded it to her Kindle, so I decided to take a whack at it.

I ended up liking Anathem a lot, but it’s not for everyone. Here’s a simple test to help:

If you like to read: add 1 point.

If you like to read science-fiction or have read Einstein’s Dreams or you’ve heard of Penrose tiles: add 2 points.

If you like other Neal Stephenson books or Isaac Asimov’s “Foundation” series: add 3 points.

If you liked A Canticle for Leibowitz: add 4 points.

If you were a math, computer science, or philosophy major: add 5 points.

If you have ever considered becoming a monk: add 6 points

If you have read any Socratic dialogues or any Thucydides or you made it through Light: add 7 points.

Add up all the point values and if you tally over 10 points or so, you’d probably enjoy this book.

At 937 pages, Anathem is a hefty read. For the first six pages, I was kind of annoyed because Stephenson seemed to be making up new words like “Saunt” for “Saint” or “upsight” for “insight.” But after a few hundred pages you realize the reason for that and it’s a good one. Plus anyone that can slip words like “sere” and “tarn” into the story smoothly clearly knows what they’re doing with language.

Overall, I really enjoyed it. There were heart-pounding action scenes interspersed with some very approachable philosophical discussions, a sprinkling of actual physics, and some extrapolation of technology into the future. I also love that Stephenson has invented a whole world, even a whole cosmology. The scope of the book is pretty breathtaking, and Stephenson takes the hero of the story on a much bigger journey than you would expect.

I do hope Stephenson keeps building in this world. It will take you several days of serious reading, but assuming you meet the criteria above, I think you’ll enjoy the book. Especially if you were able to make it to the end of this review without checking back on Twitter or Facebook.

Paid posts should not affect search engines

Normally I wouldn’t weigh in on “sponsored conversations,” because I’ve talked about similar subjects before, but it’s worth reiterating Google’s position on paid posts that pass PageRank and why we feel that way. Here’s the short version as a comment that I left on Jeremiah Owyang’s blog:

Clear disclosure of sponsorship is critical, and that includes disclosure for search engines. If link in a paid post would affect search engines, that link should not pass PageRank (e.g. by using the nofollow attribute). Google — and other search engines — do take action which can include demoting sites that sell links that pass PageRank, for example.

My bottom-line recommendation is simple: paid posts should not pass PageRank. I’m not going to pay $750 to check whether the Forrester report mentions this important point. But I will mention something that the Forrester report probably missed, and I’ll do it for free. 🙂 The Forrester report discusses a recent “sponsored conversation” from Kmart, but I doubt whether mentions that even in that small test, Google found multiple bloggers that violated our quality guidelines and we took corresponding action. Those blogs are not trusted in Google’s algorithms any more.

We do take the subject of paid posts seriously and take action on them. In fact, we recently finished going through hundreds of “empty review” reports — thank you for that feedback! That means that now is a great time to send us reports of link buyers or sellers that violate our guidelines. We use that information to improve our algorithms, but we also look through that feedback manually to find and follow leads.

I wanted to talk for just a minute about *why* we dislike paid posts that pass PageRank. Let me go back to an example I’ve given before about how they can be bad. I believe these were paid posts:

Brain Cancer paid post

The paid post at the top happens to be about brain tumors, which is a really serious subject. If you are searching for information about brain cancer or radiosurgery, you probably don’t want a company buying links in an attempt to show up higher in search engines. Other paid posts might not be as starkly life-or-death, but they can still pollute the ecology of the web.

Marshall Kirkpatrick makes a similar point over at ReadWriteWeb. His argument is as simple as it is short: “Blogging is a beautiful thing. The prospect of this young media being overrun with “pay for play” pseudo-shilling is not an attractive one to us.” I really can’t think of a better way to say it, so I’ll stop there.

Top 5 signs you are anal-retentive

  1. You keep large redundant amounts of all your sundries such as laundry detergent so that you never risk running out.
  2. You don’t just sort the money in your wallet by $1, $5, $10, or $20, but also sort the bills by wear-and-tear so that you get rid of the bills in the worst shape first.
  3. You look up anal-retentive to see whether it needs a hyphen.
  4. You don’t just keep a grocery list, you micro-optimize order of the items on the grocery list so that you only make one pass through the grocery store.
  5. After a power outage or when Daylight Savings Time starts or ends, you feel the need to set all your clocks to the same minute and second.
  6. It really irritates you when someone says a list has 5 items and you count six.

How about you? What do you do that might be a tad anal-retentive?

css.php