How Google handles hacked sites
If you’ve never read my blog before, welcome. I’m the head of the webspam team at Google. And I have a blog for days just like this.
Okay, first off you should go read this post. It’s entitled “Me Against Google” and the author is unhappy that talkorigins.org was nowhere to be found in Google for the last 5-6 days. After that post, go read this Slashdot post, entitled “Google De-indexes Talk.Origins, Won’t Say Why.” By the time you’re done, your pulse should be pounding. Hell, you should be angry. Damn that evil Google for not communicating with webmasters!! Or as Wesley put it in his blog:
You might think that a company that prides itself upon advanced textual analysis and automated decision-making algorithms might provide helpful warning messages to webmasters concerning problems found in their sites. You would be wrong.
Okay, ready for my side of the story? Here’s the timeline of how things happened:
- talkorigins.org was hacked on November 18th. I know this because Wesley says so in his blog post.
- By November 27th, Google had detected spammy links and text on talkorigins.org. In case you’re wondering, here’s what the cracker added:
<script>document.write(String.fromCharCode(60,100,105,118,32,115,116,121,108,101,61,39,100,
105,115,112,108,97,121,58,110,111,110,101,39,62))</script><br><a href="http://vvu.edu.gh/images/?i=animal-porn">animal porn</a>, <a href="http://vvu.edu.gh/images/?i=animal-sex">animal sex</a>, <a href="http://vvu.edu.gh/images/?i=beastiality">beastiality</a>, <a href="http://vvu.edu.gh/images/?i=rape-sex">rape sex</a>, <a href="http://vvu.edu.gh/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://deepx.com/images/?i=animal-porn">animal porn</a>, <a href="http://deepx.com/images/?i=beastiality">beastiality</a>, <a href="http://deepx.com/images/?i=dog-porn">dog porn</a>, <a href="http://deepx.com/images/?i=horse-porn">horse porn</a>, <a href="http://deepx.com/images/?i=rape-sex">rape sex</a>, <a href="http://deepx.com/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://theoi.com/image/?i=animal-porn">animal porn</a>, <a href="http://theoi.com/image/?i=animal-sex">animal sex</a>, <a href="http://theoi.com/image/?i=beastiality">beastiality</a>, <a href="http://ugobe.com/media/?i=dvd-covers">dvd covers</a>, <a href="http://ugobe.com/media/?i=dvd-ripper">dvd ripper</a>, <a href="http://ugobe.com/media/?i=psp-downloads">psp downloads</a>, <a href="http://ugobe.com/media/?i=psp-games">psp games</a>, <a href="http://ugobe.com/media/?i=psp-movies">psp movies</a>
Not pretty stuff–lots of text about rape and animal porn. In case you’re wondering, that JavaScript at the beginning produces the string “<div style=’display:none’>”, which makes the entire section of spammy junk hidden. So talkorigins.org has these porn words and spammy links, and it’s all hidden via sneaky JavaScript.
We have pretty good reason to believe that this site was hacked, but it’s still causing problems for regular users, so Google has to take action. Here’s what we do:
- By November 27th, the site was classified as hacked and spammy. We stopped showing it for user queries.
- By November 27th, we started flagging this site as penalized in Google’s webmaster console. I believe that Google is the only search engine that will confirm to webmasters that their site does have penalties. No, we don’t confirm penalties if we think it might clue in web spammers that they’ve been caught. But yes, we do try to confirm penalties if we think a site is legitimate or has been hacked. You can read more about how we confirm penalties in this previous post.
I hear a few people ask, “It’s nice that I can sign up for Google’s webmaster console and learn that Google penalized my site. But couldn’t Google have done more?” Well, it turns out that we did do more:
- By November 28th, we emailed multiple addresses at talkorigins.org to let them know exactly what happened. According to the records I’m looking at, we tried to email contact at talkorigins.org, info at talkorigins.org, support at talkorigins.org, and webmaster at talkorigins.org with a timestamp of 2006-11-28 14:24:15. Here’s an excerpt from the email that we sent:
Dear site owner or webmaster of talkorigins.org,
While we were indexing your webpages, we detected that some of your
pages were using techniques that were outside our quality guidelines,
which can be found here: http://www.google.com/webmasters/guidelines.html
In order to preserve the quality of our search engine, we have
temporarily removed some webpages from our search results. Currently
pages from talkorigins.org are scheduled to be removed for at least 60 days.Specifically, we detected the following practices on your webpages:
* The following hidden text on talkorigins.org:
e.g.
animal porn, animal sex, beastiality, rape sex, sleeping sex, animal porn, beastiality, dog porn, horse porn, rape sex, sleeping sex, animal porn, animal sex, beastiality, dvd covers, dvd ripper, psp downloads, psp games, psp movies
…We would prefer to have your pages in Google’s index. If you wish to be
reincluded, please correct or remove all pages that are outside our
quality guidelines. When you are ready, please visit:https://www.google.com/webmasters/sitemaps/reinclusion?hl=en
to learn more and request a reinclusion request.
…
You can read more about how we try to email webmasters about issues on their site in this previous post. According to his post, Wesley did a reinclusion request recently, and I’ve confirmed that the reinclusion request was approved, so I expect talkorigins.org to be back in Google within 24-48 hours.
But let’s take a step back. This site was hacked and stuffed with a bunch of hidden spammy porn words and links. Google detected the spam in less than 10 days; that’s faster than the site owner noticed it. We temporarily removed the site from our index so that users wouldn’t get the spammy porn back in response to queries. We made it possible for the webmaster to verify that their site was penalized. Then we emailed the site, with the exact page and the exact text that was causing problems. We provided a link to the correct place for the site owner to request reinclusion. We also made the penalty for a relatively short time (60 days), so that if the webmaster fixed the issue but didn’t contact Google, they would still be fine after a few weeks.
Ultimately, each site owner is responsible for making sure that their site isn’t spammy. If you pick a bad search engine optimizer (SEO) and they make a ton of spammy doorway pages on your domain, Google still needs to take action. Hacked sites are no different: lots of spammy/hacked sites will try to install malware on users’ computers. If your site is hacked and turns spammy, Google may need to remove your site, but we will also try to alert you via our webmaster console and even by emailing you to let you know what happened. To the best of my knowledge, no other search engine confirms any penalties to sites, nor do they email site owners.
Wesley and anyone else who works on talkorigins.org, I’m sorry that this was a stressful experience for you. Could Google do a better job? Absolutely, and we’ll keep working on it. For example, maybe we can show a more specific message for hacked sites in the webmaster console. Google could also try to identify better email addresses when writing to site owners. For example, for talkorigins.org, there are email addresses such as “archive@” and “submissions@” that we could have used instead that might have reached the right person. I’m open to other suggestions too. But please give Google a little bit of credit, because I do think we’re doing more to alert webmasters to issues than any other search engine.
Note to new readers of my blog: I pre-moderate my comments, and it’s after 2 a.m. and I’m going to bed now. If your comment doesn’t show up immediately, it’s waiting for me to approve it after I wake up. ![]()
Pete Said,
December 4, 2006 @ 2:58 am
Matt,
great posts, thank you for this nice example. However, I would avoid emailing webmasters. I would show the message in the webmaster console instead.
In my opinion, most webmasters don’t read such generic accounts as webmaster@ or info@, just becuase there is too much spam coming to those addresses. Or they use a spam protection that your bot can’t answer, and your emails will not be received.
Just my opinion.
Matt Cutts Said,
December 4, 2006 @ 3:08 am
Pete, I agree. In the past, we’ve tried to find email addresses mentioned on the sites themselves, which can work a little better because they aren’t as generic. But it’s true that the webmaster console is a good place to communicate with site owners; it’s just a shame that not everybody knows about it yet.
Richie Hindle Said,
December 4, 2006 @ 3:14 am
Why not email the address(es) that have that site registered with Webmaster Tools? Even a basic “Dear Webmaster Tools account holder, please visit Webmaster Tools ASAP” would be worthwhile.
Matt Cutts Said,
December 4, 2006 @ 3:19 am
Richie Hindle, I agree that’s a good idea, except right now it doesn’t require an email address to verify a site in the webmaster tools. So we don’t have the contact info in order to be able to email site owners. That would be nice to offer though, I totally agree.
Ugh, it’s past Pi (3:14), so I’m going to bed now.
Richie Hindle Said,
December 4, 2006 @ 3:25 am
Matt: “..doesn’t require an email address…” - I thought Webmaster Tools required a Google Account, which in turn required an email address?
Article in 27 Blogs Said,
December 4, 2006 @ 3:34 am
Matt,
That’s very scary stuff. However, it’s great to see your team has fingers on the pulse.
Steve Said,
December 4, 2006 @ 3:36 am
I have to say that google’s response to the problem is well more than one could or should expect from a search engine. I’m surprised that the posting was slashdotted, given that the scenario presented is relatively common within the webmaster community. It’s great to see that google (Matt) is finding ways to improve the situation for webmasters, but on the same token everything that google did besides filtering out or penalizing the site is above and beyond what they need to do.
Everybody is responsible for their own quality control. If you can’t manage that on your own, hire somebody who can.
Frank Said,
December 4, 2006 @ 3:43 am
Matt, thanks for the details.
But: If they have their site registered on a webmaster console account why does google not contact the email from this account? Wouldn’t that be the most reliable way to contact a webmaster in such cases?
Apart from that a more detailed message in the webmaster console would be great of course.
Kenny Said,
December 4, 2006 @ 3:46 am
Matt,
I guess most site owners get mad at Google from time to time, for one reason or another, I know I have. However, from your explanation it strikes me that Google has gone way beyond its duty to the site owner and has done everything it could while meeting its obligation to Google visitors.
Great to know that while fighting the war against spam, Google is also making huge steps towards improving communication with site owners.
Dave Davis Said,
December 4, 2006 @ 3:48 am
Nice overview of the situation Matt. An email notification would be great or even something tied into Google Alerts.
In fairness, you guys did everything you realistically could and were pretty fair about it. I don’t see how he could get so upset about it.
Matthew Said,
December 4, 2006 @ 3:53 am
Matt, this is by far the most detailed respnose I have seen in regard to what an SE does when a site has been hacked. Thanks so much for you candor on the subject. I agree that the webmaster console is the preferred tool to communicate this info as well as an email to the acct on file for WT at Google. great post!
Matthew
Dave Sherratt Said,
December 4, 2006 @ 3:56 am
Love the post. Great incite into how google operates with spammed/hacked webpages..
I’ve got to say i feel alot happier at how you are approaching these situations and yes maybe you could be better.. but heh your a hell of a lot better than i had imagined.. good job all round!
bout de papier Said,
December 4, 2006 @ 4:06 am
The first website I have made have been black listed because I used bad practices on it.
After that I made corrections, cleaned the website and send a mail to Google. The web site reappears in Google some times after.
Now, after that, I think it’s normal and good that Google makes effort to block web site using bad techniques to let everyone quite equals chances!
The only thing I regret that is no one prevent me for my black list and my reintegration (that was one year ago).
Justin Mason Said,
December 4, 2006 @ 4:21 am
heh, good old String.fromCharCode() — that’s a 100% block pattern in email, it’s only used in hostile HTML
BillyS Said,
December 4, 2006 @ 4:30 am
That slashdot blurb contradicts itself…
Rather mysteriously, Google pulled the plug on its search engine
This was apparently triggered by a recent cracking of the site that added ‘hidden links to non-topical sites…
I think this is one of those slipperly slopes for Google. There is this “entitlement” attitidue among some webmasters - and Google does a lot to promote that attitude in the name of “do no evil.”
Maybe because we’re small we’re a bit more humble. We were hacked on November 3rd and again on November 11th - after two passes at security we’re hopefully good now. But on inspection, all three SE had “bad pages” and we caught each hack within minutes of it happening.
Maybe it’s me, but I thought I was ultimately responsible for what my site was feeding robots.
Mark Said,
December 4, 2006 @ 4:41 am
Matt, you mean that you didn’t contact the man, request his ftp details and fix the changes for him!
RedCardinal Said,
December 4, 2006 @ 4:43 am
Well done Google. This is very nice to see. The only shame is that Google couldn’t reply to the same queries made on the Google Webmaster Group, and it was left for you to respond here after the issue became more open?
Of course I’m still wearing my tin-hat after my post about Thinkhouse PR mysteriously disappeared completely from Google’s index after being crawled, indexed and ranked.
I hope that Adam might respond to my thread on the group when he returns from SES (as he was the on who put out the fire the last time a post about Thinkhouse PR mysteriously vanished from the index).
Not withstanding, it’s great to see you guys improving your communications. Hopefully you’ll figure out a better way to become proactive rather than having to put out the fires.
Well done Google
geniosity Said,
December 4, 2006 @ 4:53 am
“It’s past pi”… I LOVE IT!!! Gonna have to use that one, and then stand back and endure the abuse I’m sure to get.
varun Said,
December 4, 2006 @ 4:54 am
Hi Matt !
What if the site is hacked again ? What is the guarantee that the webmaster has fixed the site and prevented future hack attempts.
Jules Said,
December 4, 2006 @ 5:12 am
Matt,
Thanks for letting us know what you do in cases like this. I hope its a process I never need to go through. I think rather than work hard on finding better e-mail addresses, a more useful approach for improving this kind of communication would be to attempt to ensure the e-mail gets through spam filters. Looking at that text, there’s no way it would get through mine. It includes the following words which are all nearly-perfect spam indicators according to my bayesian filter:
webmaster webpages porn sex beastiality rape
The following words are additional strong spam indicators:
results dvd ripper psp downloads movies Google
OK, so there’s probably not a lot you can do about the last one (in there because of loads of SEO spam I keep getting), but the problem is largely the stuff you’re quoting from the spam links. If I were setting this up, I wouldn’t include that info, but put it behind a link for the recipient to click to find out what was wrong.
Dirson Said,
December 4, 2006 @ 5:37 am
Matt: Did you also ban ‘theoi.com’ due to this hacking issue? Theoi.com was linked from talkorigins.org during the deface, but the webmaster claims he’s nothing to do with this issue.
Oliver Yeates Said,
December 4, 2006 @ 5:38 am
Above and beyond! take no notice!
John Wilkins Said,
December 4, 2006 @ 5:55 am
As one of the Archive foundation members, I have amended my own blog entry to note this, and retract my own complaint of Google failing to contact us. Wesley was also travelling during the relevant period, and may have simply missed the message, although I think it is more likely it got spam trapped. In short, it may have been an unfortunate confluence of circumstances.
My apologies.
steve Said,
December 4, 2006 @ 6:19 am
Can you provide a more specific link to this “Google webmaster console” you describe? I visited http://www.google.com/webmasters/ and can see nothing that goes by that name. I tried the “site status wizard” but it doesn’t appear to give any infomration on whether or not a site is penalized. Otherwise, there are just some links to blogs, discussion groups, and a “tool” that lets you submit sitemaps. Thanks…
M. Just M. Said,
December 4, 2006 @ 6:31 am
Great work and recount of events.
Terrence Said,
December 4, 2006 @ 6:44 am
Matt, will your blog now be de-listed for impolite words like “animal porn?” Seriously, how does other web sites cite these examples without getting flagged? Also, does it mean Google indexes javascript injected text?
Thanks for the eye-opening post. Heading for the console now!
Tgr Said,
December 4, 2006 @ 6:52 am
“Then we emailed the site, with the exact page and the exact text that was causing problems.”
Did you? I can’t find the exact page in the excerpt you posted. Or is it ‘talkorigins.org’? If so, it could use a more careful wording, because right now it could be understood refer to the site, not the opening page, and “dear webmaster, there is this spammy text in one of your gazillion pages, but we won’t tell which, have fun hunting it down” is not a very friendly message.
Also, in his post, the webmaster says he checked Webmaster Tools, and didn’t find any reason for deindexing, so thats another point where you could improve communication.
Tgr Said,
December 4, 2006 @ 6:59 am
The best solution would of course be to automatically differentiate between spammers and honest webmasters. It can’t be that hard to identify at least some of the legitimate users: if a site has ran continously, with high PageRank and no bad behavior for years, then it’s probably not a spammer. Of course spammers sometimes buy domains of formerly legitimare sites, but that can’t happen in great numbers, and to find weaknesses in Google’s algorithms, one should have to do lots of experiments and receive details from Google a lot of times.
Francois Faubert Said,
December 4, 2006 @ 7:09 am
Man, you’re so politically correct it’s hypnotizing, even when someone questions the quality of the work you do (though indirectly through an attack on the webspam team).
You’re definitely the kind of guy it’s impossible to have a long term problem with. The kind I’d invite over for a beer to watch a hockey game of the Canadiens if I could
Mika Said,
December 4, 2006 @ 7:23 am
Awesome! Thanks a lot!
alek Said,
December 4, 2006 @ 7:36 am
Impressive writeup - love to see more of these type of war stories Matt. I just submitted an update to Slashdot to get the real story out.
Multi-Worded Adam Said,
December 4, 2006 @ 7:37 am
Tick…tick…tick…tick…tick…tick…*KABOOM*!
That’s the sound of webmasters whose sites don’t rank in Google exploding after reading this post, writing nasty post after nasty post in response to this as it couldn’t possibly be true nor could it explain why their spammy…errr…perfectly clean sites don’t rank the way they should.
Matt, you’ve got way more cojones than most people just for posting this. Respek.
The question that remains in my mind in all of this is: how do you know that it was actually hacked? There’s nothing saying that he couldn’t have just put those links in himself. He may own the porn stuff under a different alterego and using the clean site to promote the dirty ones. It’s not that difficult to pull off.
Is this a hack big G knows about? Is it on other sites? Has anyone else seen it? I can’t speak for others, but this is a first for me. I’ve seen hacks before, but usually they take over the whole site…there’s more in it for them that way.
I’m probably wrong, but people have done much more ridiculous things than that and blamed big G. So I figured someone should ask.
JLH Said,
December 4, 2006 @ 7:48 am
Matt,
Very encouraging post. It displays an extra level of understanding that you and the webspam team have regarding the fight against spam. Yes, all spam is bad, but not all spammy-like sites or their owners are bad. Sometimes there is a legitimate reason for such things. More often than not when someone in the Crawling, Indexing, and Ranking google-group posts a question about a site having problems its due to something that happened on the site that they didn’t realize was spammy and not as evil as it looks.
It should be noted as well that in the webmasters tools under the statistics tab, on the page analysis link Google shows “Common words in your site’s content” This is often a good place to spot potential problems. If “animal porn” shows up in your site about the migration of European swallows, and you are pretty sure you never wrote about that subject, well now you know that Google for some reason believes you did. The example given here may just explain such an occurence.
Steups Said,
December 4, 2006 @ 8:07 am
That is as comprehensive a reply as I have ever read from a major corporation. I don’t think anyone can have a reasonable complaint after reading this
Stuart Said,
December 4, 2006 @ 8:08 am
Fantastic post, although I have sympathy for the guy it is ultimately his reasonability to make sure his site isn’t spammy (as you rightly pointed out)
Google sitemaps has improved my own relationship with Google and how I optimize my websites, I recommend that everyone take up using it; the improvements they have made in the past few months have been excellent and it shows no sign of slowing (thankfully).
I think Google’s policies are spot on.
Stuart
Wesley R. Elsberry Said,
December 4, 2006 @ 8:09 am
Matt,
I think that the message you show as a warning is excellent. It clearly states what is wrong, with enough information to permit a webmaster to locate the problem.
I only wish that I had actually received it.
Before I made my complaint, I checked my incoming email. There was no sign there of an attempt to contact me from Google. Lunarpages.com, where the TOA is hosted, forwards email to my account.
This morning, I learned of this post, so I re-checked my steps. No, still nothing in my incoming mail. I looked for strings from within the warning, to see if the text came through without an obvious “google” connection. No luck on that, either.
I rely upon the Lunarpages email forwarding, but given this post, maybe I was wrong to do so. I logged into the domain’s Lunarpages webmail interface for the first time. I searched for anything with “google.com” in the from field. I searched for strings from within the warning message quoted above. Still nothing.
Bummer. That just leaves examining the SMTP records on my local email account. Google’s message should have been relayed by Lunarpages, so I looked for that in the SMTP logs. Still nothing.
My SMTP logs, BTW, do show rejects for hosts like wr-out-0708.google.com, which is apparently blacklisted at spamcop.net. The following shows rejects on the 28th with a “google.com” domain. I haven’t checked these for spoofs, but I assume that’s what’s up with these:
It would be ironic, though, if the warning message that could have short-circuited this whole affair was blocked because of spam filtering.
As for entitlement, I don’t think that I was out of bounds given the information I had to work with. Google certainly isn’t responsible for fixing the bad stuff that is on my site. I never said it was. Having a third party mess with the site caused the problem in the first place. Having tried to work with Google once the problem became known to me resulted in… nothing. Not until I complained about what happened.
I do feel a bit better to know that Google made an attempt at contact before de-indexing our site. And it is good to know that the site is scheduled for re-indexing within a couple of days, rather than the couple of weeks mentioned on the Webmaster Help Group. I wish Matt and the rest of the folks at Google success in making the process better in the future. If, when I did claim the TOA site via Google Webmaster Tools on Dec. 1, the text of that warning that you quote above had been waiting for me, I would have had no complaint to make. It seems to me that if Google is willing to send that level of information via email, then making it available to the verified owner of a site via the Webmaster Tools interface should not be a problem, either.
And, Adam, neither Google nor you can tell whether the problem is a deliberate cheat or an honest person vicitimized from the problem itself. Google is entirely correct to protect their index by pulling sites that are not in compliance with their guidelines. I never said otherwise. My complaint was what Google’s policy of obscuring the de-indexing decision in the aftermath created, which is a situation in which cheaters have an advantage over honest webmasters, since the cheaters have knowledge of where in their pages the bad stuff lies, and the honest webmaster does not have that knowledge. Whether or not you may accept that I qualify as an honest webmaster, the policy as it currently stands obviously puts honest webmasters at a clear disadvantage.
Chris Hunt Said,
December 4, 2006 @ 8:24 am
If you can’t get at the Google Account holder’s email for some reason, couldn’t you just have a field on the webmaster console where people enter an email address to which Google should send reports?
If Google needs to send a report and this field is filled in, that’s where it sends it. If it’s not filled in, or not on sitemaps, Google guesses like it does currently.
Either way, it would make sense to duplicate the message on the webmaster console as well.
Jason Duke Said,
December 4, 2006 @ 8:31 am
Finding the balance between relevancy and spam aint easy for a search engine yet there is definately an argument (maybe over another burger some time Matt) that search algos drive hacking ?
Aaron Pratt Said,
December 4, 2006 @ 8:46 am
I made a joke about Matt and porn star Bandi Belle in my blog and for some reason all three search engines ranked it for all kinds of spammy stuff. To an untrained eye which is 99% of admins. (when dealing with google) it looks like Google is punishing me. My stats can confirm this if I believe it to be true, but I am not so sure yet.
Matt - Is the Google algorithm so primative as to confuse some text with an entire blog that has nothing to do with porn or is “SEO” equal to or lesser that spam algorithmically?
Sorry if I sound a little bitter, looked at my stats today for SEO Buzz Box and went “What the..?”
New blogs have a tough time recovering from incorrect first impressions in your algorithm Matt. Sorry if I am losing anyone, it’s hard to be clear on something that has sooo many variables….
Shelley Said,
December 4, 2006 @ 8:56 am
So what you’re saying is that if a site has links to subjects that Google doesn’t approve of, and these links are hidden, you remove the site from Google’s results?
When is Google going to realize if you can’t legislate morality, you can’t algorithmically control morality, either?
Kristen Owen Said,
December 4, 2006 @ 9:06 am
I think it’s far too easy to be mad and place blame elsewhere rather than figure out the problem, this situation isn’t abnormal by any stretch of the imagination.
I also think the site owner(s) realized there was a problem but didn’t know how to fix it, so rather than figure it out and correct the issue attacking a major company is easier — and more far reaching. Plus it makes the individual site seem like a “woe-is-me-the big-guys-are-picking-on-us-again” sob story rather than taking responsibility and correcting the issue on the dl.
HOWEVER, this could also be a way to gain press. Any press is good press, right? How much do you think the site’s traffic has/will increase over the next few days/weeks?
I’m sure I could get loads of extra traffic by picking a fight with Google too and blasting it across the internet. So someone’s either really lazy or really thinking.
DaveScot Said,
December 4, 2006 @ 9:11 am
Hi Matt,
I’m an admin at http://www.uncommondescent.com and we were deindexed by google in September. We never did find out what we did wrong. We used webmaster tools but got nothing but a cryptic “you did something wrong” message. We submitted a site map, changed our wordpress theme to a cleaner one, tried cleaning up our RSS support, told our authors to stop pasting articles with RTF format, and had our lawyer send a letter to an unauthorized mirror site which was duplicating out content without permission. After all that we still weren’t reindexed until November and that I suspect was only because users who were shareholders in google phoned or wrote to investor relations asking why a blog with a 6/10 rank at google run by a famous professor/author (William Dembski) and linked to by hundreds of .edu sites had been delisted.
At any rate, in case anything like this happens again, I bookmarked this site so we can contact a human being. I understand why google handles all this in by automated electronic means but you really, really, really need to give out more information about the reason for delisting in the webmaster tools.
Matt Cutts Said,
December 4, 2006 @ 9:14 am
Hey everybody, thanks for the supportive comments. When you’re posting late at night, you’re never 100% sure that you’re making sense.
Richie Hindle, my impression was that we couldn’t use the email address registered with a Google Account without asking for permission first, but you’re right that we would have at least something to work with. I’ll check more on my side.
RedCardinal, most of the webmaster console team is on route to Chicago for the Search Engine Strategies conference this week, so I wouldn’t be surprised if the webmaster help group at http://groups.google.com/group/Google_Webmaster_Help was quiet right now.
varun, if a site gets hacked again, we basically repeat the same process. I’ve seen that happen before, e.g. if the site’s webhost has a larger security hole that hasn’t been fixed yet.
Jules, good point about the explicit language. I agree that the right direction is something more like a) having a reliable contact address and using it to b) send a letter that’s less likely to get blocked, but that has a link to more specifics.
Dirson, you can confirm for yourself that theoi.com was hit in the same wave of site hacks. MSN is pretty slow in this instance, so they have a copy of theoi.com from 11/24/2006. Here’s what I see at the bottom of the cached MSN copy of http://www.theoi.com :
<script>document.write(String.fromCharCode(60,100,105,118,32,115,116,121,108,101,61,39,100,105,115,112,108,97,121,58,110,111,110,101,39,62))</script><a href=’image/?i=animal-porn’>animal porn</a>, <a href=’image/?i=animal-sex’>animal sex</a>, <a href=’image/?i=beastiality’>beastiality</a>, <a href=’image/?i=zoophilia’>zoophilia</a>, <a href=’image/?i=horse-cum’>horse cum</a>, <a href=’image/?i=horse-fuck’>horse fuck</a>
We found the hacked spammy content on theoi.com at the same time, and we sent an email to the same contact addresses at theoi.com with the timestamp of 2006-11-28 14:25:06. The only real difference in the email was “* The following hidden text on theoi.com:
e.g.
animal porn, animal sex, beastiality, zoophilia, horse cum, horse fuck”
I’ll check to see if theoi.com is clean now and file a reinclusion request for them if they are.
steve, to find the webmaster console (as I call it), go to http://www.google.com/webmasters/ and follow the link to http://www.google.com/webmasters/sitemaps/ . It’s the middle link on the left-side. The official name is “Webmaster tools” but we also sometimes call it the “Webmaster console.”
Tgr, fair point that we could probably make the email message more clear. We do mention the exact url with the problem, but instead of “The following hidden text on talkorigins.org:” we might be able to say something like “The following hidden text on the specific page ‘talkorigins.org’:” or something a little more explanatory.
Multi-Worded Adam, is “respek” an Ali G reference? Respek back for that.
The short answer is that we had pretty high confidence that this was a real hack because we’ve seen things like this before (e.g. I mentioned that the same cracker hit theoi.com).
JLH, your suggestion to use the “Common words on your site” feature of the webmaster console to self-diagnose hacks is a fantastic one. In fact, I was planning to do a post along these lines. If it makes sense that hacked sites would show off-topic junk words in this section, now go read http://www.seroundtable.com/archives/006782.html . Notice the second complaint? Barry notes a site owner on WebmasterWorld who says “When I look at the keywords for the new site in the Google webmaster tools, the list is packed full of words… that have nothing to do with my site! They seem to be about cruises, sports, casinos, and various commercial and financial matters.” I’d be willing to bet a shiny quarter that the site in question was hacked and has a ton of pages with junk content. Another way to self-diagnose is with the site: query. If you see pages with weird extensions such as .dhtml that you don’t normally use, and the pages look like pay-per-click (PPC) pages on your site, you’ve probably been hacked. Check your root page and check your .htaccess file as well.
John Wilkins and Wesley R. Elsberry, thanks for stopping by to comment. Sorry if my post came across as brittle. It was just late at night. Wesley, I promise that I’ll try to get a “your site has been hacked” message into the webmaster console so that when someone claims their site, it’s much more clear what happened.
Ruman Said,
December 4, 2006 @ 9:28 am
Why not display the text of the email in the webmaster console too?
a. this takes care of bouncing/bad email addresses.
b. No one can claim that google doesn’t explain what the actual problem is.
all said the original post appears rich! Wesley is unhappy with google’s communication while relying on a service which happens to prevent communication w/google
Aaron Pratt Said,
December 4, 2006 @ 9:29 am
“Richie Hindle, my impression was that we couldn’t use the email address registered with a Google Account without asking for permission first, but you’re right that we would have at least something to work with. I’ll check more on my side.”
Have a checkbox in Google accounts saying: “Check here to be contacted if we find illegal activity on your website or blog”.
Now what about my darn question? I feel like a child pulling on dad’s pant leg. If you do not answer the questions Matt others will and as you are seeing they often are incorrect. You don’t want that homie! ;o)
me Said,
December 4, 2006 @ 9:31 am
As somebody who has a Wordpress installation and knows that spammers try to ‘hide’ their real domains among links to legitimate websites, I hope that the Google team is aware of this stratgey and that no innocent website gets punished.
Matt Cutts Said,
December 4, 2006 @ 9:38 am
Hi Shelley, thanks for stopping by! You said “So what you’re saying is that if a site has links to subjects that Google doesn’t approve of, and these links are hidden, you remove the site from Google’s results? When is Google going to realize if you can’t legislate morality, you can’t algorithmically control morality, either?”
The fact is that when someone uses Google to find a site and then they discover that the site has hidden text, they get angry. They feel as though they were deceived, even though the vast majority of the time the hidden text wasn’t a factor. Then those angry users send us email.
Hacked sites are even worse, because a large number of hacked sites try to install malware when an innocent person reaches the hacked site.
So yes, Google provides quality guidelines that say things like “Don’t show hidden text. Don’t cloak. Don’t do sneaky JavaScript redirects. Don’t put viruses or malware on your pages.” And we can take action when we see things that violate our webmaster guidelines, just like every other major search engine does.
Webmasters are welcome to do whatever they want on their own sites, but surely you have to allow Google to do what we think is needed to provide a good experience for searchers on Google, too? You wouldn’t require us to list a site that we thought was bad for users, would you?
Michael Schaap Said,
December 4, 2006 @ 9:41 am
Good stuff!
One comment I haven’t seen here yet: you may also want to include the domain’s whois contact(s) in the email you send out. That, at least, is well-defined and supposed to work, while things like contact, info, support and even webmaster may not exist.
- Michael
Matt Cutts Said,
December 4, 2006 @ 9:46 am
Michael Schaap, we did that for a little while, then someone on (I think) WebmasterWorld got angry because the email reached the technical contact or web host instead of only the site owner. So for the most part we don’t do that anymore.
Wesley R. Elsberry Said,
December 4, 2006 @ 9:49 am
Uh, no. Finding the problem in my case turned out to be simple. From the time that I found out about the de-indexing of the TOA to submitting the reinclusion request was maybe three hours, tops. That included fixing the problem and a wait to make sure that the same problem that was in our default main page was not in the other 5,000+ pages in our archive. That’s not what I am concerned about.
There was no provision for me as a webmaster with a de-indexed site to communicate with Google about it. I looked for that. Maybe I missed something. I had no great desire to have the abuse and sneering heaped upon me that I knew would result, but I felt morally bound to speak truth to power. I doubt that I will convince you of that, but that is the case.
In the situation I found myself, Google’s policy on the obscured de-indexing decision privileged cheaters over honest but victimized webmasters. I see that as a problem. Matt’s account of Google’s attempt to contact me pre-de-indexing does show that they are trying to provide good information to webmasters, which allays my concerns somewhat. As I mentioned above, though, that information was not available through the Webmaster Tools interface, which was my only conduit to information from Google about my case at the time.
I guess it all depends critically upon whether one sees a problem in the way that Google obscures a de-indexing decision and limits how webmasters obtain information about it. If one thinks that the current system is perfect, then any complaint must necessarily be due to laziness or avarice. If we can agree that there may be improvements to be made (as even Matt Cutts says in his post above), then there is a third option.
Matt Cutts Said,
December 4, 2006 @ 9:58 am
And for the record, I agree with Wesley. Our alerting process is better than other search engines, but it’s still not where (I personally believe) it should be. It’s from hearing complaints and feedback like in Wesley’s post that Google can prioritize what things need to be done next.
If we ever reach a point where users and site owners don’t complain about things that Google should be doing, or urge Google to improve its processes, that will be a very sad day in my book.
Pro-SEO Said,
December 4, 2006 @ 10:02 am
Wow, I bet they feel a bit silly now for making out they are being victimized by google.
graywolf Said,
December 4, 2006 @ 10:08 am
You know there really should be some sort of notification in webmaster console I’ve gone through the ordeal recently myself and documented the whole process.
http://www.wolf-howl.com/seo/is-my-website-banned-in-google/
http://www.wolf-howl.com/seo/my-website-isnt-banned-in-google/
The oddest art of the entire ordeal was pagerank didn’t go graybar. is the condition where site:example.com returns 0 results but pagerank stays indicative of this type of banning?
Wesley R. Elsberry Said,
December 4, 2006 @ 10:25 am
Matt,
Thank you for the gracious comment on my feedback. It is all too rare to have a pointed complaint like the one I made turn into a dialogue. My best wishes to you in making Google even better.
JohnMu Said,
December 4, 2006 @ 10:37 am
Nice summary, Matt. Good ideas everyone. Looking back just 1-2 years, all of this would have been top-secret. It’s good to see that Google is communicating more - keep it up.
How would you more subtile handle hidden links like on http://www.unesco.org/webworld/portal_bib/pages/Cool/ ? The larger the site, the less likely you’ll EVER reach a webmaster who knows what it means and can clean it up. I’ve tried for the last year or so, never an answer.
How can you be certain that you reach someone in charge who can handle it correctly?
Michael Lefevre Said,
December 4, 2006 @ 10:39 am
Wesley R. Elsberry: “My SMTP logs, BTW, do show rejects for hosts like wr-out-0708.google.com, which is apparently blacklisted at spamcop.net
…
It would be ironic, though, if the warning message that could have short-circuited this whole affair was blocked because of spam filtering.”
I’m afraid it looks like that is exactly what has happened. Spamcop has the Google server’s IP blocked (actually there are several IPs, only some of which are blocked), because email has been received from it at a spamtrap.
Google’s system is guessing email addresses and sending automated, and unsolicited, messages to them. That pretty much matches the definition of spam, and is certainly enough for a Spamcop listing. It’s quite possible that some of Google’s messages will reach real addresses who don’t want to hear from Google, which is not good.
On the other hand, I can understand why this “spam” may be desirable - as this example shows, the webmasters may well actually want this information, but they don’t know they want it until they’ve got it, so they aren’t really in a position to solicit it.
I don’t imagine the Spamcop folks would do anything to change this listing. It would be good though if those emails weren’t coming from the same IP addresses as any other email, to avoid other kinds of email getting caught in people’s filters. On the other side, people using Spamcop’s list for filtering should also be aware of how the list works, and that the fact that it is very aggressive means it isn’t appropriate in all circumstances - in particular, it may be better to use it as part of a spam scoring system, rather than blocking connections outright.
Kenny Heimbuch Said,
December 4, 2006 @ 10:45 am
My site, xtort.net was also dropped from Google’s index early last month. I am starting to think that our host may have been hacked in some way as well, because I have never and would never do anything that would deliberately contravene any of Google’s webmaster policies, and haven’t made any site changes in a few years now.
I did a reinclusion request a month ago just in case someone maliciously filed an exclusion request or ran some sort of script like the one mentioned above. I still have not heard anything from that though. I wish that there was some way to ascertain what went wrong, so I can safeguard against something like this happening again.
Christer Edwards Said,
December 4, 2006 @ 11:37 am
One of the biggest things that bothers me is lack of contact information on many websites. If the webmaster of the site in question has his contact information prominently displayed he would have been notified and the problem would have been avoided or quickly resolved.
I always include an about or contact page on my sites. Not only does it allow your users (remember, we are all nothing without our users!) to contact you but I think it builds confidence.
Wesley R. Elsberry Said,
December 4, 2006 @ 12:24 pm
Christer Edwards,
The TOA does have a contact page:
http://talkorigins.org/origins/contact.html
It is linked from our main page, link text of “Contact Administrator”. I just checked the email addresses given, and they are working.
Michael Lefevre,
Thanks for the info on the SpamCop thing. Given that Matt didn’t mention any attempt to send email directly to my local email address, the SpamCop blacklisting should not be associated with the warning email attempt. The timestamps don’t appear to match, and the SpamCop rejections don’t correlate with forwarded email from Lunarpages.com.
CarlenLea Said,
December 4, 2006 @ 12:24 pm
Matt –
Thanks for this view into how and what Google does with spammy sites. Now I know if my site gets hacked and I get dumped to check the webmaster panel.
Overall, just really appreciate the chance to see some of the inner workings of Google anti-spam activity.
Twan Said,
December 4, 2006 @ 12:39 pm
Great post Matt. Very interesting indeed. Though I must say that for some reason, I find it very disturbing that Google requires a penalized webmaster of basically admitting guilt before the possibility of being re-indexed. If someone got hacked, it doesn’t seem very fair or equitable that they MUST admit guilt. Seems to me to be a bit extreme, even a bit fascist.
NTulip Said,
December 4, 2006 @ 12:40 pm
Consider the process that google uses, does google see hackers/phishers taking advantage of it and sending emails to potential users of gmail.com and tricking them into providing their google account and password through a look a like site?
I would recommend google take precautions against that if not already done.
JohnMu Said,
December 4, 2006 @ 12:49 pm
Would digitally signing the mails make any sense? Imagine the amount of email spam that is going to use that exact text to sell you the SEO services you always needed…. “To get reincluded, please contact us at 000-000-0000 after you transfer $$$$ to our off-shore bank account”.
I imagine a signed mail would have a better chance at making it through the mail filters (perhaps). Does Google do SPF?
Chris Santerre Said,
December 4, 2006 @ 12:49 pm
Coming from the RBL side, I know EXACTLY what you are gong thru. We get angry delist requests from domains screaming “Why didn’t you tell us?!”
Frankly, its not our job. There are far too many spam domains for us to babysit them all. Google went above and beyond what they had to do.
Keep up the great work.
–Chris
Anax Said,
December 4, 2006 @ 1:16 pm
Posting site-status messages on the webmaster console is a great idea, and is probably all that is needed. If Google were to contact webmasters by email I’d feel better having Google use the email address on file with the webmaster console than the one in whois data. I keep my whois data private for a specific reason: my site is devoted to exposing scams, and when it was part of my personal site (with whois data available) I often got threatening messages from scammers. This convinced me of two things: (1) this was a worthwhile site, since some really bad people were trying to shut it down, and (2) keeping whois data private does *not* necessarily mean a site is spammy: it may be necessary and legitimate protection for an honest webmaster.
Chris Santerre Said,
December 4, 2006 @ 1:22 pm
I actually would like to point out that the spamcop listing of google has been a big topic on one of the antispam lists. Antispam ppl are split about the listing. Some think its correct, others think its bad.
I find it ironic that google doesn’t contact spamcop to get the listing removed. And change their practices to keep them from getting listed again.
–Chris
Curtis Cameron Said,
December 4, 2006 @ 1:40 pm
Matt,
You mention that the hidden links to porn spam were causing problems for users - what are those problems? If those links were all hidden, causing users not to notice (I assume users would not have noticed else Wesley would have heard immediately from them), and the site content creators didn’t even notice, then how is that such a big problem that it warrants deletion from your indexing?
J-Dog Said,
December 4, 2006 @ 1:43 pm
Matt - re: DaveScott and Uncommon Descent
Google probably deindexed them for excessive stupidity, as Dembski and DaveScott Springer actually believe in Intelligent Design!
Tell them “The Designer Did It” and they are “out of here”.
Bwa ha ha ha!
marie Said,
December 4, 2006 @ 1:44 pm
“But couldn’t Google have done more?” Well, it turns out that we did do more”
I think Google’s new webmaster console is great. Thanks Google.
However, I would never expect an email or Google’s help to identify the violation - that is ludicrous. If Google provided personalized service in this area, every slimy SOE would consume huge amounts of Google’s technical support just so they could probe the skinny edge of being spammy.
Zandr Said,
December 4, 2006 @ 1:53 pm
“Tgr, fair point that we could probably make the email message more clear. We do mention the exact url with the problem, but instead of “The following hidden text on talkorigins.org:” we might be able to say something like “The following hidden text on the specific page ‘talkorigins.org’:” or something a little more explanatory.”
Like, oh, maybe a well-formed URL?
Specs are written for a reason, after all. http://talkorigins.org/ is unambiguous. even less so.
Keith Cash Said,
December 4, 2006 @ 2:08 pm
Google tech support is great, Sounds like they are under appreciated.
As an IT manager, the google team went past what anyone else would do to help someone solve this problems
Good Job Google Team
bob o'bob Said,
December 4, 2006 @ 2:20 pm
It may also help some folks reading this thread, to point out that even Spamcop does not recommend using a listing in bl.spamcop.net as the sole criterion for making a “throw the message away” decision. Use it as part of a ranking decision, or use it to add decorations to the message to be used by later filtering steps, but just tossing a message out based ONLY on a listing is simply NOT recommended.
Lots of people|hosts|providers do so, (I even do, for one of my domains) but part of the problem under discussion is due to the recipient not being fully aware that their provider made configuration decisions which Spamcop actually recommends against.. More “disclosure” is better. If you know your spam fighting configuration can throw away some “false positives” without notifying you, then you might be less likely to make an incorrect assumption when something important fails to arrive.
BlueBobbo Said,
December 4, 2006 @ 2:34 pm
If you’re in the heat of the moment, you’ll say things irrationally without logical evidence. I think this is the case here. Google is doing awesome, I love webmaster console.
eric Said,
December 4, 2006 @ 2:43 pm
what about a feature in google’s webmaster console to allow a specific email to be set for websites so that google can contact directly instead of having to guess at email addresses? Given the massive scale of email spam I don’t have email services for the websites I administer, so common email addresses like info at, contact at, support at, etc simply won’t work (or any for that matter.) The webmaster services are already tied to a google account, so sending the gmail account an notification email would also work.
marc Said,
December 4, 2006 @ 2:53 pm
couldn’t google have hacked his site and left that message on his homepage?
Gregg Said,
December 4, 2006 @ 2:55 pm
Matt, can you clarify by what process Google chooses email addresses to attempt to send a ‘hacked site email?’ Is it automated and always to those same four addresses (support, contact, info and webmaster) or is there an attempt to scour the site itself for published addresses?
I bring this up because of Wesley’s comment regarding the talkorigins contact page:
The actual “mailto:” email addresses on those page are all obfuscated, presumably to prevent robots from scraping those email addresses. (For example, “mailto:archive@talkorigins.org”)
I don’t know to what degree the process of identifying contact emails for a hacked site owner is automated, but I wonder if this was a case of the email obfuscation also preventing Google from getting a good address.
Reed A. Cartwright Said,
December 4, 2006 @ 3:17 pm
Given that Google already generates warning emails about spam, wouldn’t it be rather simple to link the webmaster tools to the email database? Therefore when someone claims their sites, they can see any recent alerts that google has issued.
Ross Hill Said,
December 4, 2006 @ 3:39 pm
Thanks for such a comprehensive post, it is really good that google does so much to help webmasters out when they are just trying to do the right thing.
dfre Said,
December 4, 2006 @ 3:50 pm
I think it is great what Google is doing to improve communications with webmasters. Maybe one day Yahoo will take a step in this direction.
Chuck Swiger Said,
December 4, 2006 @ 3:51 pm
+1 to Matt for providing a reasonable and detailed description of the issue from his or Google’s perspective. (At least in part– sure, Matt probably doesn’t speak for all of Google, but at least he is responding in his particular bailiwick.
While it is expected that should work (see RFC-2142, which defines the common mailbox names for various services), it is also true that Google could send a contact email or warning when there is a problem to the WHOIS contact(s).
Having that message be signed is less important; if you receive a suggestion that your webserver has been hacked, it shouldn’t matter much if it comes from a well-known domain, or from someone anonymous at a forged domain. You ought to check things out regardless. Sure, it wouldn’t be a bad idea to use PGP/GnuPG to sign the message, but it’s not critical.
And SPF wouldn’t help in this case, JohnMu– while Google does have a published SPF record, the record is “v=spf1 ptr ?all”, which basically scores as positive if the IP address reverses into the Google.com domain if you do a PTR lookup, and neutral otherwise (ie, the “?all” means all hosts are neutral).
Yves Courbet Said,
December 4, 2006 @ 3:56 pm
Thank you Matt for taking the time to answer and explain Google policies while pondering improvements. The “Google world” isn’t as closed as some people make it out to be…
Re getting a warning: in my case - I am banned - I don”t believe there was a way for Google to advise me of the problem. I hadn’t registered anywhere in Google.
I have been indexed and ranked high for so many years (over 10) than I never bothered to look at the webmaster’s rules and stuff in Google. I just went along, and took it for granted. Then, recently I was dumb enough to make a duplicate site.
Bam! Zap!
I fixed the problem but, I am still nowhere in spite of a request (Matt, can you help? …please?)
Moral of the story:
- never take anything for granted, and Google least of all
- this isn’t the nineties anymore when search engine were a free for all. Pay attention.
- don’t bitch about Google - they’re not perfect, but they’re trying their best fighting our common enemy-the a.s. spammers…
One suggestion though… Maybe it’s time to evolve the Google algorithm with less - or no - emphasis on links from other sites, in favor of more from links from people’s browsers. The concept of ranking from site links was a good idea in the beginning, but today it seems to reflect little of the content value.
Thank you,
Yves
Amit Patel Said,
December 4, 2006 @ 4:20 pm
Here is another story about how Google works with sites:
http://www.idcide.com/affair/
Matt won’t review this story though, and the owners will ever get an explanation….
Troy Roberts Said,
December 4, 2006 @ 4:23 pm
We have been fighting automated spambots on our forums that post links like the ones in your post. We delete the posts as quickly as we find them.
If the Google spider happened onto one of those posts in the short time before our moderators caught and deleted it would we receive a penalty?
bong hitz Said,
December 4, 2006 @ 6:29 pm
So there’s an RFC for webmaster, info, and support addresses?
This is news to me, heh.
I think a catch-all address is a good thing for stuff that might get missed.
Seb Said,
December 4, 2006 @ 6:36 pm
Hi Matt,
I think Google is doing a good job. What also could be improved is telling the webmaster how long the penalty is for (approx is enough). I bought a domain that was burned and after my request Google reinstated the index.html (within only 10 days). Unfortunately all the other pages (actually an entire blog) are not indexed yet and it would be great to see if there is a penalty or anything still pending.
Otherwise keep up the good work! I give Google a lot of credit for being so honest!
Seb
C Snover Said,
December 4, 2006 @ 6:45 pm
Matt, it’s nice to know that least parts of Google communicate well with others. Maybe you could try to rouse some of this open and honest communication from within the AdSense department, whose idea of open communication is “We have detected invalid clicks on your site and have permanently closed your account with no recourse, and we won’t give you any more information because it’s ‘proprietary’”? Not that I’m bitter or anything, but it would be nice if they had policies for being a fraction as open as you are.
Yves Courbet Said,
December 4, 2006 @ 6:51 pm
Well, interesting turn of events…
Even though I was guilty of breaking a G. rules, I was banned for a reason I was totally unaware of, and would have never known if it wasn’t for some helpful fellow on the Google board. Lucky break for me.
So this brings us back to the original topic and the complaint that Google doesn’t notify of a ban.
It does makes the honest webmasters double victims - not only do you get hacked, but you get zapped by G without a warning.
Yes, we all have to be aware, and take responsibility, but who checks their site status daily? and if you’re just a regular joe, how do you know were to look for the spam and hack lurking in your site? I couldn’t find the files in the server even after someone said I had been hacked.
Google is now the king-maker and critical for our businesses. That does imply some responsibility on their part, and not just to the people looking for relevant info, but also to the people who’s livelihood now depends on Google’s indexing.
I believe I deleted all the bad files. Hopefully I will be reindexed before too long.
Holding my breath…
Thank you,
Yves
Rob Said,
December 4, 2006 @ 7:07 pm
Wow, this info is helpful to me. My site before was also hacked by unidentified hackers. I wander why I haven’t recieved a notification email from google, But the problem was fixed it right after few days it was hacked. so, maybe that is why I didn’t recieved notification email because I fixed it right away.
Timothy Clemans Said,
December 4, 2006 @ 8:43 pm
I wish that Google would give a clear explanation why talkorigins.org is not in Google’s index when I request “site:talkorigins.org”. Could Google just mark spammy and cracked sites in the index. Why is there not a backup of the site map for Google so that is does not need to be re crawled to reappear in the index?
Chuck v Said,
December 4, 2006 @ 9:06 pm
While including the ‘offending’ text in the mail you attempt to send to webmasters is laudable, I think doing so also has perhaps the highest possible chance of getting your mail filtered as spam..
As someone else here pointed out, things like Javascript, and especially functions used most commonly for obfuscation (such as Sting.fromCharCode() ) along with the various ‘naughty words and phases’ found in the links would be highly likely to get the mail filtered as spam, (for the very same reasons that google’s own filters twigged to that bit of code as offensive).
A much better route might be to send a fairly generic message, and include a link, or directions (perhaps just going to google.com and pasting a given guid into the search field?) that would let the web-master see what was wrong.. The key would be to do this in such a way as to not appear to be phishing mail either… since mail telling people there is something wrong with their site and immediate action needs to be taken also sounds a hell of a lot like what some phisher might do.
Larry Hosken Said,
December 4, 2006 @ 9:22 pm
As a professional technical writer, I am shocked. I am shocked and dismayed. Those hackers misspelled “bestiality.” Three times! It makes my blood boil.
Wesley R. Elsberry Said,
December 4, 2006 @ 9:25 pm
That would have worked for me. The WHOIS email contact for the TOA is a real live email address.
But it sounds like Google did make an attempt to communicate pre-de-indexing. That’s a hugely good thing.
My problem was that once that initial attempt at contact failed, my site was de-indexed, and there was no option available for me to learn about what Google had already decided to make known pre-de-indexing. Indeed, I couldn’t find out even that Google had considered making more information available pre-de-indexing. If that level of specific information goes into Google Webmaster Tools, as it sounds like it might, that would prevent the situation I found myself in from happening in the future. Like I’ve said, for my site, the problem was easily found and fixed. That would not necessarily be true for other webmasters whose sites get cracked. Will Google run into more webmasters who would legitimately benefit from having that sort of warning message with its specific information on what and where a problem lies waiting for them in the Google Webmaster Tools? I think that the answer is clearly, “Yes.”
Multi-Worded Adam Said,
December 4, 2006 @ 10:13 pm
You know, you’re the first person besides a guy I play fantasy baseball with that has actually gotten that reference. Mountain View Massiv reprazentin’!
One day, he’s gotta interview you. That’d be some great stuff.
Thanks for the explanation…and once again, respek.
Wesley: what you’re saying does make some sense, and in your case I can see why you’d be upset. I feel bad for you…straight up. I had a server I was working on hacked once (5 years ago, when I was stupid enough to host with Interland), and it sucks.
The problem is that your logic, as reasonable as it is, is exactly backwards. And I’ll explain why, since you’re obviously new to this.
Let’s outline a scenario of Google vs. a spammer. And let’s say Google does the full disclosure routine and lets spammers know when they’re messing around:
Spammer: “Let’s try repeating this word 10 times in succession.”
Google: “You’re spamming.”
Spammer: “Okay, I fixed it.”
Google: “Yep, it’s okay now.”
Spammer: “10 times didn’t work, so let’s try 8 times and let’s try a different word so no one clues in.”
Google: “You’re spamming.”
Spammer: “Sorry, someone else edited my content, I was unaware.”
Google: “Yep, it’s okay now.”
Spammer: “Okay, let’s try it 6 times.”
…
And so the cycle repeats itself. That’s just one scenario. There are infinite ways in which this could play out…spammers are just that twisted a lot (fun to mess with sometimes though, if you’re creative about it.
)
Also, for every time big G sends a spammer (or you) an email, the time and money it took to send that email could be put to much better use nailing the spammy stuff in the first place. That’s why the webmaster console, as relatively obscure as it is, is the best way to have handled that situation: they can still “talk to webmasters” about the legit stuff without sending individual emails out, and they can still tell spammers the nothing they so richly deserve to hear.
Communication really is a double-edged sword in this case. Unfortunately in your case, so is a lack of communication (from your side of it), but as Matt explained and you verified, there were attempts and it really wasn’t big G’s fault you got hacked. They’ve got users to protect and all that.
I’ll give you a little free piece of advice which I don’t think anyone else has yet: you may want to look at finding another host. If you want to find a good one, go to http://www.webhostingtalk.com and try to find one that hasn’t been torn to shreds yet. If they like the host, it’s a damn good host (because there are some HARSH members on that board.)
Apologies if you took what I was saying as a criticism or a knock at you. I really didn’t know the situation, and there are a lot of … individuals … out there who would choose to do exactly what I said, and that’s to take a site that has good intentions and send it on the highway to Hell.
Multi-Worded Adam Said,
December 4, 2006 @ 10:16 pm
Most people would be offended at the mistreatment and abuse of animals, and this guy’s P.O.ed that Hooked on Phonics didn’t work for the hacker.
Priorities, dude. Priorities.
ToddW Said,
December 4, 2006 @ 10:47 pm
WOW. I can not believe Google did so much!
Wemblei Said,
December 4, 2006 @ 10:50 pm
What annoys me is that google only contacted him because he made a big fuss and because he had the popularity to get media or blog attention. Google would happily ignore the complaints of a smaller site, and would not give them the time of day - let alone contact them.
I don’t blame you Matt because your mostly SEO PR for google, but don’t spit in my ear and tell me it’s rain. What is google doing for the little guy, or rather smaller sites? The sites that aren’t PR 7+, they still make money for Google - they still provide you with content, they still help you in your battle against net neutrality so that you can continue reaping the large profits that telcoms want to take from you. What do they get in return? Will Google help them in the future, will they give the same treatment that it offers to larger sites?
Should we not have a system where the small sites are not assumed to be spam sites by default?
Simon Said,
December 4, 2006 @ 11:11 pm
support@ info@ and webmaster@ ??
RFC 2142 addresses that might apply webmaster@ abuse@ security@
I think anything beyond mailing webmaster@ was overkill as regards the RFC based addresses. Google could have used the whois data, as we know Google has its hands on it.
I’m always amazed Google doesn’t punish our site more for the rubbish some of our users publish, something is still working well, even if I can’t persuade the index to drop the spammy bits we deleted month ago.
Abhilash Said,
December 4, 2006 @ 11:32 pm
That arithmetic gets me every time. :’(
Matt, Adam, the comments & responses suggest we are at an impasse with respek to communications to webmasters: On one hand, webmasters with honest intentions and *without* knowing misbehavior require the notification that could very easily help prevent legit businesses from taking serious losses. On the other hand, you can’t commit to these notifications for fear of the A|B testing by the spammers (which even Vanessa mentioned today after the “Lunch with Google Sitemaps Team”.
You decide that webmasters who got hacked deserve to know. But webmasters who could be spammy don’t get the info. There’s a third group though! — Webmasters who inherit sites & clients trying to overcome problems created by previous inexperienced & shady SEOs.
In the case where honest businesses were swindled or mistreated by some spamhappy SEO and are now in the hands of a legit firm trying to straighten things out–there would be NO solutions for this poor shop. Aren’t these the folks that Google can’t afford to overlook? More folks are coming to me for advice on how to solve their problems: “Sorry,” I say, “I’ve done all I can & now there’s nothing left you can do. Now Google thinks you’re a spammer. Good luck ditching that domain you spent 15 years building.” Is that where we’re left?
Indeed, Google does a better job of notifying some select group of webmasters–but as an engine with more traffic & more impact than any other, Google needs to lead the pack here lest it be terribly embarrased by the MSN squad (yeah right you say?…). Claiming that Google does more than any other is just like claiming that a Giant carries a heavier load than a dwarf (I hope that’s not offensive…). Of course he does & of course he should! But he could have still been inefficient and yet carried a heavier load.
We need some sort of middle ground of communication that allows for these previously mishandled sites some ground to rebuild their businesses within search–that can’t be done without Google’s traffic. OR should those shops call themselves doomed casualties of search?
hadaso Said,
December 5, 2006 @ 12:09 am
The fact that mail to the address published on the WHOIS database reaches “the technical contact or web host instead of only the site owner” should not in it self be a reason not to use the address for contact. The WHOIS record has three contact addresses (Registrant Contact, Administrative Contact, Technical Contact) and one of them should be the proper address to use. If a webhost doesn’t agree that their address would be in that field they should demand that the registrant provide a working address. and if they do agree to their address being listed it means they agree to handle the mail sent to the address. In other words: the fact that webhosts/registrars don’t use the whois record properly, or offer “privacy” services by replacing thje registrant’s data with their own data without dealing with the consequences of this offer should not be a reason for Google not using the proper contact address that is published for the domain.
Generic addresses like “webmaster@, hostmaster@, contact@, sales@, info@ etc. are blocked by many domain owners because they receive a huge amont of spam. Addresses on the whois database also receive spam, but they are easily replacable by editting the whois record. Static generic addresses are not a good solution for providing contact info these days.
I use a forwarding address that is greylisted on whois and receive almost no spam. I have replaced whois address about once a year and it is working for anyone using acceptable email standards. (A tool that dynamically updates the whois database with semi-random temporary forwarding addresses would be better but I don’t know of any such tool).
Itai Levitan Said,
December 5, 2006 @ 12:41 am
It seems that Matt is doing a very good job. This is an important post that I will pass on to my rellevant collegues.
I can also understand the frustration of the talkorigins.org’s owner, after the hack. May I suggest that Google launches this feature:
IF there is no indication that the site owner/admin received the alert from Google Webspam Team, then allocate X days (e.g. 7) where the SERP (spammy) target page redirects to a page that says something like:
“Innovative Google technology on the inside keeps spam on the outside. Spam was detected on this page. Please click here to go back to the results.”
Plus - have adwords ads by that message. I think it is fair to monetize this Google page since you are putting an effort to give the (unaware) webmaster a last X days (better than 60 days, as Matt wrote). It also communicates to the world that Google is doing something about spam, while being extra fair and giving the (hacked in this case) owner a last chance.
As people here said, the owner may not be getting emails from Google. It is a bit of a problem to rely on email only. Also, Pete (first talkback) suggested a good idea - to give an alert on the Webmaster console. Well, giving an alert on the actual (redirected) target page is also another communication channel.
Cheers,
Itai
Co-CEO
easynet search marketing
Aaron Said,
December 5, 2006 @ 1:06 am
As the owner of Theoi.com I was just as shocked as talk.origin when my site suddenly dissappeared from Google.
Although I’m pleased that I was contacted yesterday by Google reps to help sort out the problem.
Have the other sites also been contacted? The spam script at the top of this page clearly lists the sites affected the hacker, like theoi.com they also appear to be innocent victims of the hack:
vvu.edu.gh
deepx.com
ugobe.com
Matt Cutts Said,
December 5, 2006 @ 1:40 am
Twan, I think that it’s a misperception that we require the webmaster to admit guilt on their part. I believe we already softened that language once. Let’s see. Right now the language is “I believe this site has violated Google’s quality guidelines in the past.” which could apply to e.g. a hacked site without implying that the webmaster did anything bad. Maybe we could still soften that language more though.
NTulip, the case you mentioned hasn’t happened in my experience. In any case, normal phishing for account info would probably be more likely, since the fraction of people enrolled in the webmaster console is lower than the total number of people with Google accounts.
JohnMu, I don’t know if we do SPF; personally I like the idea that people that sign up for the webmaster console have an incentive to give a solid email address so that we can contact them if we see issues. Well put as well, Anax.
Curtis Cameron, once a site is hacked, it can often also host malware. See e.g. http://news.netcraft.com/archives/2006/09/22/hacked_hostgator_sites_distribute_ie_exploit.html
for a recent example of that. We don’t have the cycles to dig through a hacked site to make sure that it’s 100% benign. In addition, you get these hacked sites showing up for off-topic searches. With the hacked text above, talkorigins.org might show up for “psp downloads” or “psp games,” which is clearly not a great result for users since the site doesn’t have either of those.
Zandr, I’ll check on making it a well-formed URL in our emails to improve the clarity of the message.
Gregg, good question. We do have a list of common email addresses that the emailer can select from. In addition, as we crawl the web we can detect some email addresses that are up on web pages. So potentially we might be able to email more specific aliases, assuming you left the email alias on the web somewhere.
Amit Patel, the page you mention already quotes an opinion I gave that the site may have run into issues because of the thousands of pages of duplicate hotel information on idcide.com. The site is showing up in Google search results now, so it’s unclear to me what other resolution you’d like regarding this site?
Troy Roberts, I would recommend not letting spammy posts linger on a forum if you can help it.
Seb, in this case the email from Google did list the penalty length: 60 days (unless the site owner fixed the issue and did a reinclusion request).
Chuck v, I agree that it would be better to make the email more generic and include a link to more details.
Larry Hosken, it’s not a huge surprise. Spammers deliberately misspell “bestiality” because lots of people type in “beastiality” instead.
Multi-Worded Adam, you give a good example of how 100% transparency would immediately help spammers.
Abhilash, I think the general trend is toward more webmaster communication at Google. Threads like this show that communication has been successful and that we need to do more of it, and do a better job on it.
Yuri Said,
December 5, 2006 @ 2:48 am
Sorry for not reading the whole 100 comments, Matt. But.
Whatever bashing occurred on Google is some misunderstanding, to say the least. The original post seems to take quiet an unfair stance on Google. Why not just check the website against all webmaster guidelines, instead of whining? A quick glance at a page’s code should reveal everything one needs to know.
Though you don’t need my protection, be aware that some SEOs aren’t that evil and try to view the whole thing from the both sides of the fence.
Heh Said,
December 5, 2006 @ 3:21 am
I have no comments, but im proud to be the 100th comment.
eric shannon Said,
December 5, 2006 @ 6:21 am
Hi Matt,
I am developing a community health website at http://www.medicine.org and recently discovered that the site was removed from the google index.
From the best we can tell, this happened when one of Valueweb’s (our host) customers was caught sending spam from, or spamvertising their ValueWeb-hosted site. Then SORBS blacklisted the entire netblock, falsely implicating many innocent websites, medicine.org included.
Next, Google delists all the websites blacklisted by SORBS. In our case, http://www.medicine.org is a non-commercial healthcare website without any commercial activity. It doesn’t send email, doesn’t sell products or display any advertising.
Even though our server never relayed any spam, we still discovered some server settings needed adjusting and have made those adjustments. We’re in the process of writing a letter to Google to explain and will start using the webmaster console as a result of reading your blog.
This experience has been quite a setback but we’re learning from it and with your help, hope to overcome it soon. I blogged about here - http://www.internetinc.com/banned-by-google and will post follow up on the resolution as it happens.
Thanks for giving us some more direction.
-eric
David Castle Said,
December 5, 2006 @ 6:45 am
Matt, as always more than just useful - many thanks
Elias Kai Said,
December 5, 2006 @ 6:51 am
Well Done Matt. Bravo and I think it is the best way to run and protect users, webmasters. I mean If it wasn’t Google doing this and warning webmasters, who else can do it for you ? Thanks a lot.
As I said lately that all spammers on Google’s Serps are attacks to .edu sites and old free CMS system where holes still an easy way for hackers.
We stand by Google for this fight !
And as I mentioned before, this was your reason not going for SES in Chicago ? http://www.mattcutts.com/blog/ses-chicago-this-week/#comment-90494
Bepenfriends Said,
December 5, 2006 @ 7:18 am
Hi Matt,
After reading your post i have a doubt. I have the blog which is about entertainment. I was unable to stop the spammers (mostly viagra guys) from the blog comments area. Even though I stopped the links by removing the protocol and the a tags, But those pages display their comments (it is a big list)
Will this penalise the webmasters because there is viagra and other blocked content.
Albert
Séan Said,
December 5, 2006 @ 7:23 am
Well done, great work and great post.
It is really very good that such things get posted up. This is a clear indication of how things are now, and how they might improve yet further in the future.
Wesley R. Elsberry Said,
December 5, 2006 @ 7:26 am
Adam,
I’ve had long experience in having abuse thrown at me. Peruse the TOA and you’ll see why. This latest round simply adds another group that wants to hurl abuse, as the comments on my weblog post show.
The TOA is hosted at Lunarpages.com. A brief look at your link seems to show primarily good things said about them as a hosting company. I did check out a variety of sources for reports on hosting companies before moving the TOA to Lunarpages a few years ago.
The discussion seems to have brought out that Google actually can distinguish cracking to some degree from deliberate SE exploitation. I think finding the right balance between giving adequate information to those who show signs of having been cracked and denying a fine level of information to exploiters may not be the impossible task that some have characterized it to be. Maybe I am simply naive about this. I guess we’ll see what Google brings about in the coming months concerning de-indexing policies.
Multi-Worded Adam Said,
December 5, 2006 @ 8:32 am
As long as you’ve got a skin for it, Wesley. You’re gonna need it for at least the next 3-4 days. It sucks to be you, although I kinda wish I were loved enough to be that hated.
*** Oversimplified explanation to follow ***
You’re right in that you’re (slightly) naive, but it’s not your fault. The problem, and the reason I asked the question in the first place, is that there are a lot of people out there who would simply put