“Fetch as Googlebot” tool helps to debug hacked sites

One of the most tenacious blackhat webspam techniques we continue to see is hacked sites. I wanted to remind site owners that our free “Fetch as Google” tool can be a really helpful way to see whether you’ve successfully cleaned up a hacked site.

For example, recently a well-known musician’s website was hacked. The management firm for the musician wrote in to say that the site was clean now. Here’s the reply I sent back:

Unfortunately when our engineers checked this morning, the site was still hacked. I know the page looks clean to you, but when we send Googlebot to fetch www.[domain].com this morning, we see

<title>Generic synthroid bad you :: Canadian Pharmacy</title>

on the page. What the hackers are doing is sneaky but unfortunately pretty common. When you surf directly to the website, you see normal content. But when a search engine (or a visitor from a search engine) visits the website, they see hacked drug-related content. The reason that the hackers do it this way is so that the hacked content is harder to find/remove and so that hacked content stays up longer.

The fix in this case is to go deeper to clean the hack out of your system. See http://support.google.com/webmasters/bin/answer.py?hl=en&answer=163634 for some tips on how to do this, but every website is different.

One important tool Google provides to help in assessing whether a site is cleaned up is our “Fetch as Googlebot” feature in our free webmaster console at http://google.com/webmasters/ . That tool lets you actually send Googlebot to your website and see exactly what we see when we fetch the page. That tool would have let you known that the website was still hacked.

I hope that helps give an idea of where to go next.

Something I love about “Fetch as Googlebot” is that it’s self-service–you don’t even need to talk to anyone at Google to diagnose whether your hacked site looks clean.

Example email to a hacked site

Beyond clear-cut blackhat webspam, the second-biggest category of spam that Google deals with is hacked sites. The most common reaction we hear from webmasters is “The problem is with the Google search. There is nothing wrong with our website.” That’s a real quote from an email one site owner recently sent us. Sadly, it turns out that the site is almost always really hacked.

The single best piece of advice I can give to prevent website hacking is “keep your web server software up-to-date and fully patched.” That prevention is much better than the hassle of cleaning up a hack. Here’s an example email I just sent to a site owner with the identifying details removed:

Hi xxxxxxx, I’m the head of Google’s webspam team. Unfortunately, example.com really has been hacked by people trying to sell pills. I’m attaching an image to show the page that we’re seeing.

We don’t have the resources to give full 1:1 help to every hacked website (thousands of websites get hacked every day–we’d spend all day trying to help websites clean up instead of doing our regular work), so you’ll have to consult with the tech person for your website. However, we do provide advice and resources to help clean up hacked websites, for example
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=163634
https://sites.google.com/site/webmasterhelpforum/en/faq-malware-and-hacked-sites
http://googlewebmastercentral.blogspot.com/2008/04/my-sites-been-hacked-now-what.html
http://googlewebmastercentral.blogspot.com/2007/09/quick-security-checklist-for-webmasters.html
http://googlewebmastercentral.blogspot.com/2009/02/best-practices-against-hacking.html

We also provide additional assistance for hacked sites in our webmaster support forum at https://groups.google.com/a/googleproductforums.com/forum/#!forum/webmasters . I hope that helps.

Regards,
Matt Cutts

P.S. If you visit a page like http://www.example.com/deep-url-path/ and don’t see the pill links, that means the hackers are being extra-sneaky and only showing the spammy pill links to Google. We provide a free tool for that situation as well. It’s called “Fetch as Googlebot” and it lets you send Google to your website and will show you exactly what we see. I would recommend this blog post http://googlewebmastercentral.blogspot.com/2009/11/generic-cialis-on-my-website-i-think-my.html describing how to use that tool, because your situation looks quite similar.

Anyway, just a reminder for site owners to keep their web server software up-to-date, because hacked sites are a real pain. Most Google searchers and even website owners don’t think about hacked sites much, but on our side have to spend a fair amount of effort writing classifiers to catch this illegal activity, helping the victims of hacked sites, adapting when the hackers change their techniques, etc.

Sharing a search story

I’ve been reading a lot of the coverage of the Search plus Your World launch and I wanted to share my story and then clarify something.

I love to stay up until early in the morning playing Werewolf. In early December I went to a journalism conference called “News Foo Camp” in Phoenix and played a lot of Werewolf. When I got back, for some reason I searched for [werewolf] — maybe I was thinking about making a custom deck of werewolf cards. Because I was dogfood-testing Search plus Your World, this is what I saw:

Search for werewolf

In the top row of pictures, you’ll see a bunch of people playing werewolf, including a picture of me as the werewolf in the top-left image. Doing a generic search like [werewolf] or [photos] and getting back a picture of you or your friends is a pure, magic moment.

Let me tell you how it happened. I have Brian “Fitz” Fitzpatrick in a circle on Google+, because he’s in charge of Google’s Data Liberation Front and he’s an all-round awesome guy to boot. Fitz published an album of 25 Werewolf photos shortly after the conference. Okay, but I’m only in one of the 25 pictures; how did Google return the picture of me first? It turns out that Brian had tagged me in that single photo.

Once you know the trick, it might not seem like magic anymore. In fact, this is the “things just work” experience that everyone in the tech industry strives for. But when I searched for [werewolf] and got back a recent picture of me playing werewolf, it did seem like magic right then. I suspect as more people take Search plus Your World out for a test drive, they’ll quickly experience similar magical “Aha!” moments like I did.

I was reading some of the comments on tech blogs, and I wanted to clarify something: Search plus Your World does surface public content from the open web, not just content from Google+. For example, look back up to the top-right image from my screenshot above. That’s actually a werewolf photo that Gina Trapani took and it’s hosted on Flickr, not Google.

Here’s another example. If you follow the excellent and erudite Jennifer 8 Lee and search for [general tso’s chicken], Google can surface this high-quality thread from Quora:

Quora page

By the way, that’s a fantastic thread for Google to highlight, since Lee literally wrote the book about General Tso’s Chicken. It’s exactly the sort of “just works” user experience you’d want.

It’s not hard to find content shared on other sites. For a search [grand unified theory of snack food], Paul Buchheit shared a link on FriendFeed, and Google can highlight that:

Shared on FriendFeed

Or if I search for [connectbot], here’s a link that Brad Fitzpatrick shared on Live Journal:

LiveJournal example

(Yes, we do have both a Brian Fitzpatrick and a Brad Fitzpatrick at Google. People sometimes mix them up, but they’re different.)

I hope that helps to make my point. Search plus Your World builds on the social search that we launched in 2009, and can surface public content from sites across from the web, such as Quora, FriendFeed, LiveJournal, Twitter, and WordPress.

The team should be finishing the rollout of Search plus Your World in the next day or so, and I hope you enjoy it. Remember, to see the new results, you’ll need to be signed in with a Google account and search on google.com. Give this new feature a whirl: once you see how much better personal search can be, I don’t think you’ll want to give it up.

Beware of fake Matts leaving comments

A lot of the time, I dispel misconceptions by leaving comments on blogs. That works great, except for the rare occasion when someone pretends to be me and leaves a rude, fake, or otherwise untrue blog comment. Over the previous decade, I’ve only seen 4-5 times where someone impersonated me. But in the last month, I’ve seen at least three nasty comments written by “fake Matt Cutts” impersonators.

The first fake-Matt comment I remember was over Marketing Pilgrim around November 14th, 2011. When Frank Reed checked out the fake comment, it came from 74.120.13.132, which is an exit router for Tor. That means someone went to some trouble to hide their tracks.

The second not-Matt comment was on November 18th, 2011. The impersonator wrote:

Normally we do not comment on ranking methods but I’ll explain a misconception: input from manual raters is used only in the rarest of cases when a non-brand cracks the top ten for high value money terms.

The tone (and content) of the comment was so far off that Matt McGee questioned whether it was really me, and I was quickly able to clarify that I never wrote that comment.

The third one I’ve seen was just a few days ago on Search Engine Journal, and included gems like

[Google is] very transparent. Some sites do not even have an address listed, yet we have everything, including the credit card numbers for adword advertisers. That is a strong signal for us to list them ahead in organic search as well.

The claim that “Google ranks AdWords advertisers higher in our search results” is fake and untrue; it was one of the first myths I debunked when I got online.

The web isn’t built to prevent impersonation. On many places around the web, anyone can leave a comment with someone else’s name. So if you see a comment that claims to be from me, but makes crazy claims (e.g. that we preference AdWords advertisers in our search results), let me know. I’m happy to verify whether I wrote a comment, e.g. with a tweet. Thanks.

What cool new websearch ideas should Google launch in 2012?

Even though this year is nowhere near finished, a lot of people at Google are already thinking about things to launch next year. So I wanted to put the question out: what cool things would you like to see Google launch in 2012?

For example, in 2011, we launched hundreds of search quality changes that might not be noticeable, along with a few high-impact changes. But we also added new ways to search, like the ability to search by image and search by voice. We’ve beefed up our social search, and continued to make search faster.

So take a minute to think about potential search features, products, or changes that we could launch next year. As a user (not as an SEO/webmaster/publisher), what cool piece of technology would you like to see Google launch in 2012?

css.php