Helping hacked sites

(I’m taking my wife somewhere really soon, so I’m just going to dash out a quick post.)

There was a Techmeme discussion this weekend about whether Microsoft should chase Google in search or find their own “Big Hairy Audacious Goal.” Into that discussion came a post by Ryan Stewart about being removed from Google’s index. It turns out that Ryan’s blog had been hacked, and Google does remove hacked sites from our index to protect our users. I left a comment at Ryan’s blog, but while I wait for it to be approved I thought that I’d post it here as well:

Hi Ryan, my name is Matt Cutts and I’m a software engineer at Google. Sorry to hear that your blog got hacked. I know that it’s disappointing if you don’t show up in Google, but there’s another way to look at it. It looks like your blog was hacked to show “buy pharmacy”-type links, but what if the hackers had hosted malware on your site? Then every user to your site might have gotten infected just by visiting your site. That danger to Google users is one of the reasons that we temporarily remove hacked sites from Google.

I’m glad that things look clean now and I’ve revoked the “hacked site” flag for your domain. I’d expect your domain to return to Google within 48 hours, if not sooner.

By the way, we did try to contact you. We sent an email to contact [at] digitalbackcountry.com, info [at] digitalbackcountry.com, support [at] digitalbackcountry.com, webmaster [at] digitalbackcountry.com, and a gmail.com address on May 19th at 21:25:23 with a subject line of “Removal from Google’s index.” I believe if you had logged into our webmaster console at google.com/webmasters and proved that you owned digitalbackcountry.com, we also would have left a message waiting for you there as well. That webmaster console is the primary way to request reconsideration in case your blog has been hacked.

We do try to communicate with hacked blogs where we can, and we also do blog posts to try to help prevent hacked sites and for site owners to recover from hacked sites. Some example posts that we’ve done in the past:

http://googlewebmastercentral.blogspot.com/2007/09/quick-security-checklist-for-webmasters.html
http://googlewebmastercentral.blogspot.com/2008/04/my-sites-been-hacked-now-what.html
http://www.mattcutts.com/blog/how-google-handles-malware-a-historical-overview/

The only last point I’d make is that users tell us loud and clear that they don’t want to be sent to hacked sites, because of the potential danger that they represent. Even though it’s stressful to be removed from Google, I hope you understand why Google might not want to send users to a hacked blog.

Again, thanks for cleaning up your site and you should return to Google’s index soon.

How Google should handle hacked sites is a tough question, but personally I think Google does a better job than other search engines of protecting our users and communicating with site owners about hacked sites. For example, here is an excerpt of the email that we sent to Ryan on May 19th:

Dear site owner or webmaster of blog.digitalbackcountry.com,

While we were indexing your webpages, we detected that some of your pages were using techniques that are outside our quality guidelines, which can be found here: http://www.google.com/webmasters/guidelines.html. This appears to be because your site has been modified by a third party. Typically, the offending party gains access to an insecure directory that has open permissions. Many times, they will upload files or modify existing ones, which then show up as spam in our index.

The following is some example hidden text we found at blog.digitalbackcountry.com:

Acyclovir Adderall Adipex Alprazolam Ambien Ativan Biaxin Bontril Bupropion Butalbital Carisoprodol Celexa Cheap Phentermine Cialis Online Cialis Cipro Clonazepam Codeine Darvocet Diazepam Didrex Diflucan Effexor Ephedrine Fioricet Flexeril Generic Viagra Glucophage Hydrocodone Online Hydrocodone Levitra Lexapro Line Xanax Lipitor Lorazepam Lortab Meridia Nexium Norco Viagra Tramadol Soma Phentermine Valium Norvasc Buy Acyclovir Buy Adderall Buy Adipex Buy Alprazolam Buy Ambien Buy Ativan Buy Biaxin Buy Bontril Buy Bupropion Buy Butalbital Buy Carisoprodol Buy Celexa Buy Cheap Phentermine Buy Cialis Online Buy Cialis Buy Cipro Buy Clonazepam Buy Codeine Buy Com Lvivhost Online Viagra Buy Darvocet Buy Diazepam Buy Didrex Buy Diflucan Buy Effexor Buy Ephedrine Buy Fioricet Buy Flexeril Buy Generic Viagra Buy Glucophage Buy Hydrocodone Online Buy Hydrocodone Buy Levitra Buy Lexapro Buy Line Xanax Buy Lipitor Buy Lorazepam Buy Lortab Buy Meridia Buy Nexium Buy Norco Buy Norvasc Buy Online Xanax Buy Oxycontin Buy Paxil Buy Percocet Buy Phentermine Online Buy Phentermine Buy Propecia Buy Provigil Buy Prozac Buy Renova Buy Seroquel Buy Soma Buy Tadalafil Buy Tamiflu

[...]

In order to preserve the quality of our search engine, we have temporarily removed some of your webpages from our search results.

(The rest of the email goes on describe how long the blog will be out of Google, and where to go in order to get back into Google’s index faster.)

Getting hacked is not fun. It’s just not. But I think Google does the right thing for our users by removing hacked sites from our index temporarily. I also think we do a pretty good job of trying to alert site owners that they’ve been hacked — more than any other search engine does. We alert many webmasters about hacked sites not only via email but also with our webmaster console.

Do I want more competition in search? Absolutely, because it keeps everyone on their toes and working hard for our users. But I think Ryan’s specific situation actually shows that Google is trying to do the right thing for site owners and users. Ryan, I hope there’s no hard feelings that your site was removed from our index after being hacked, and now that it’s clean you should be back soon.

Stupid Google Tricks: Get a calendar from the search box

I spend a lot of time in my browser. So much time, in fact, that I notice when I drop down to a command-line to type things. I wanted to look up a day later this year, so I typed “cal 2008″ into a Unix terminal window. I caught myself thinking, “Hey, why doesn’t Google add a onebox shortcut for searches like ‘cal’ or ‘cal 2008′?”

On one hand, I could bug someone at Google with my request. To be honest, not many people would benefit from a feature like this. Then I realized that I could still solve the issue for myself with Google Subscribed Links. It takes 2-3 minutes to define a shortcut that says “When the user types query X, show a link to page Y in the search results.”

So I ran the command “cal 2008″ and copy/pasted the output into a file on my domain. Then I made a simple subscribed link in 2-3 minutes. The interface looks like this:

Just a pointer to my calendar

If you’d like to add this subscribed link to Google too, you can subscribe to my calendar subscribed link with one click.

Anyone that is subscribed can search for [cal] or [calendar] or [cal 2008] and you’ll see a link like this:

Calendar link

And clicking it will take you to my calendar page.

You could have more fun with this, but I’ve already spent more time writing about it than the original hack took. Other thoughts:
- Google Subscribed Links can do more powerful things (e.g. use a feed file), but I didn’t need that power for this simple hack.
- I could have made a script to dynamically show the current year instead of 2008. But compared to the time to copy/paste a text file, I’d almost rather just change the text file once a year.
- If you wanted some practice with Google App Engine, an app to show a calendar for the current year would be a pretty good starter project.

xkcd @ Google!

[Adding an xkcd cartoon to my last post made me remember that I had this leftover post that I never published.]

I’m ruthless in pruning my work email down to the essentials. In particular, I auto-archive emails about different speakers at Google. So many neat/fun speakers are always visiting Google that if I started going to all those cool lectures, I’d never get my regular work done.

I’m at peace with that choice, but it does mean that sometimes I find out about awesome speakers at Google by reading about them on an outside blog.

I missed Randall Munroe, the guy that draws xkcd, which is a bummer. It’s one of my favorite net comics. Here’s my favorite xkcd:

Funny xkcd comic

My second-favorite is this map of the internet, because some real internet cartographers used the idea and made a real map of the internet with the same basic design.

If you like xkcd, Ellen has a great post about Randall Monroe’s visit to Google.

Something is wrong on the internet!

xkcd recently posted a webcomic that is quickly becoming a classic cartoon:

Comic: Something is wrong on the internet

That comic sums up the internet in one sentence: the scrum of jostling opinions on the web and the optimism that truth can still win out. I was reminded of that comic when someone asked me about a particular way that someone recently tried to get links. Jonathan Crossfield wrote up a good background summary of the situation.

Believe me, I have no particular desire or plans to charge out onto the internet looking for fake stories; Snopes and other people on the web do a fine job of that. But this was an interesting case, because the proof landed in everyone’s lap. Someone spoke up afterwards and essentially admitted “I made up a story and actively promoted it. The story is utterly fake. By the way, I think any tactic to get links is fair game. I only care about whether a tactic to get links works.” A little while later someone else asked me point-blank for my reaction. I pointed out that Google’s quality guidelines already cover deceptive or misleading ways of getting links. The first two sentences in our quality guidelines say

These quality guidelines cover the most common forms of deceptive or manipulative behavior, but Google may respond negatively to other misleading practices not listed here (e.g. tricking users by registering misspellings of well-known websites). It’s not safe to assume that just because a specific deceptive technique isn’t included on this page, Google approves of it.

Google tries to return the most relevant, useful results to our users and protect them from deceptive or misleading tactics. For example, when someone spams a blog or a guestbook with a fake comment, we try to prevent that fake link from carrying weight in Google. If a spammer blitzes dozens of websites with fake referrers, we try to ignore those fake links. If a website claims to have high-quality information and then deceives the user and serves up malware or off-topic porn, Google considers that spam and takes action on it. Likewise, if a site says that they completely made up a story to get links, Google doesn’t have to trust the links to that site as much.

I really don’t view Google’s role as judging the truthiness of the web. That is, after all, what Stephen Colbert is for. :) But if someone is sloppy enough to get caught (or to admit!) making up a fake story, I don’t think Google has to blindly trust those links, either.

My takeaway from this brouhaha: There are plenty of ways to market a site creatively without deceiving anyone. Don’t burn your credibility by using fake stories. It’s a short-term tactic and makes people trust you less in the future.

A peek behind the curtain at Google

Udi Manber talks about search at Google in a recent post on the Google blog. If you’re interested in search or search engine optimization (SEO), the post is definitely worth a read. Udi discusses items from big (Google revamped how it computes PageRank in January) to small (in Hebrew, an acronym like IBM would be written as IB”M).

But you know what my favorite tidbit is? Udi talks a little about how the Search Quality group is organized. He mentions topics such as core ranking, evaluation, and webspam. This post makes it crystal clear that I have a limited role in overall search quality at Google. I can’t help but laugh when someone refers to me as Google’s head of search quality, because that’s not remotely close to true. I’m the head of the webspam team, which is just one part of the search quality group. Here’s how to think of search quality at Google:

Web spam org chart

As this hand-done bit of an org chart shows, webspam is just one group under the overall umbrella of search quality. The webspam group gets a lot of attention from the SEO community, but there are so many other people and teams that tackle search quality at Google — everything from synonyms and snippets to personalization and international search quality. I’m grateful to work with talented colleagues directly in my team, but I also really appreciate the chance to work with great people in the search quality team as a whole.

Anyway, check out Udi’s post and you’ll probably learn a thing or two about how we think about search quality at Google.

css.php