If you’ve never read my blog before, welcome. I’m the head of the webspam team at Google. And I have a blog for days just like this.
Okay, first off you should go read this post. It’s entitled “Me Against Google” and the author is unhappy that talkorigins.org was nowhere to be found in Google for the last 5-6 days. After that post, go read this Slashdot post, entitled “Google De-indexes Talk.Origins, Won’t Say Why.” By the time you’re done, your pulse should be pounding. Hell, you should be angry. Damn that evil Google for not communicating with webmasters!! Or as Wesley put it in his blog:
You might think that a company that prides itself upon advanced textual analysis and automated decision-making algorithms might provide helpful warning messages to webmasters concerning problems found in their sites. You would be wrong.
Okay, ready for my side of the story? Here’s the timeline of how things happened:
– talkorigins.org was hacked on November 18th. I know this because Wesley says so in his blog post.
– By November 27th, Google had detected spammy links and text on talkorigins.org. In case you’re wondering, here’s what the cracker added:
<script>document.write(String.fromCharCode(60,100,105,118,32,115,116,121,108,101,61,39,100,
105,115,112,108,97,121,58,110,111,110,101,39,62))</script><br><a href="http://vvu.edu.gh/images/?i=animal-porn">animal porn</a>, <a href="http://vvu.edu.gh/images/?i=animal-sex">animal sex</a>, <a href="http://vvu.edu.gh/images/?i=beastiality">beastiality</a>, <a href="http://vvu.edu.gh/images/?i=rape-sex">rape sex</a>, <a href="http://vvu.edu.gh/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://deepx.com/images/?i=animal-porn">animal porn</a>, <a href="http://deepx.com/images/?i=beastiality">beastiality</a>, <a href="http://deepx.com/images/?i=dog-porn">dog porn</a>, <a href="http://deepx.com/images/?i=horse-porn">horse porn</a>, <a href="http://deepx.com/images/?i=rape-sex">rape sex</a>, <a href="http://deepx.com/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://theoi.com/image/?i=animal-porn">animal porn</a>, <a href="http://theoi.com/image/?i=animal-sex">animal sex</a>, <a href="http://theoi.com/image/?i=beastiality">beastiality</a>, <a href="http://ugobe.com/media/?i=dvd-covers">dvd covers</a>, <a href="http://ugobe.com/media/?i=dvd-ripper">dvd ripper</a>, <a href="http://ugobe.com/media/?i=psp-downloads">psp downloads</a>, <a href="http://ugobe.com/media/?i=psp-games">psp games</a>, <a href="http://ugobe.com/media/?i=psp-movies">psp movies</a>
Not pretty stuff–lots of text about rape and animal porn. In case you’re wondering, that JavaScript at the beginning produces the string “<div style=’display:none’>”, which makes the entire section of spammy junk hidden. So talkorigins.org has these porn words and spammy links, and it’s all hidden via sneaky JavaScript.
We have pretty good reason to believe that this site was hacked, but it’s still causing problems for regular users, so Google has to take action. Here’s what we do:
– By November 27th, the site was classified as hacked and spammy. We stopped showing it for user queries.
– By November 27th, we started flagging this site as penalized in Google’s webmaster console. I believe that Google is the only search engine that will confirm to webmasters that their site does have penalties. No, we don’t confirm penalties if we think it might clue in web spammers that they’ve been caught. But yes, we do try to confirm penalties if we think a site is legitimate or has been hacked. You can read more about how we confirm penalties in this previous post.
I hear a few people ask, “It’s nice that I can sign up for Google’s webmaster console and learn that Google penalized my site. But couldn’t Google have done more?” Well, it turns out that we did do more:
– By November 28th, we emailed multiple addresses at talkorigins.org to let them know exactly what happened. According to the records I’m looking at, we tried to email contact at talkorigins.org, info at talkorigins.org, support at talkorigins.org, and webmaster at talkorigins.org with a timestamp of 2006-11-28 14:24:15. Here’s an excerpt from the email that we sent:
Dear site owner or webmaster of talkorigins.org,
While we were indexing your webpages, we detected that some of your
pages were using techniques that were outside our quality guidelines,
which can be found here: http://www.google.com/webmasters/guidelines.html
In order to preserve the quality of our search engine, we have
temporarily removed some webpages from our search results. Currently
pages from talkorigins.org are scheduled to be removed for at least 60 days.Specifically, we detected the following practices on your webpages:
* The following hidden text on talkorigins.org:
e.g.
animal porn, animal sex, beastiality, rape sex, sleeping sex, animal porn, beastiality, dog porn, horse porn, rape sex, sleeping sex, animal porn, animal sex, beastiality, dvd covers, dvd ripper, psp downloads, psp games, psp movies
…We would prefer to have your pages in Google’s index. If you wish to be
reincluded, please correct or remove all pages that are outside our
quality guidelines. When you are ready, please visit:https://www.google.com/webmasters/sitemaps/reinclusion?hl=en
to learn more and request a reinclusion request.
…
You can read more about how we try to email webmasters about issues on their site in this previous post. According to his post, Wesley did a reinclusion request recently, and I’ve confirmed that the reinclusion request was approved, so I expect talkorigins.org to be back in Google within 24-48 hours.
But let’s take a step back. This site was hacked and stuffed with a bunch of hidden spammy porn words and links. Google detected the spam in less than 10 days; that’s faster than the site owner noticed it. We temporarily removed the site from our index so that users wouldn’t get the spammy porn back in response to queries. We made it possible for the webmaster to verify that their site was penalized. Then we emailed the site, with the exact page and the exact text that was causing problems. We provided a link to the correct place for the site owner to request reinclusion. We also made the penalty for a relatively short time (60 days), so that if the webmaster fixed the issue but didn’t contact Google, they would still be fine after a few weeks.
Ultimately, each site owner is responsible for making sure that their site isn’t spammy. If you pick a bad search engine optimizer (SEO) and they make a ton of spammy doorway pages on your domain, Google still needs to take action. Hacked sites are no different: lots of spammy/hacked sites will try to install malware on users’ computers. If your site is hacked and turns spammy, Google may need to remove your site, but we will also try to alert you via our webmaster console and even by emailing you to let you know what happened. To the best of my knowledge, no other search engine confirms any penalties to sites, nor do they email site owners.
Wesley and anyone else who works on talkorigins.org, I’m sorry that this was a stressful experience for you. Could Google do a better job? Absolutely, and we’ll keep working on it. For example, maybe we can show a more specific message for hacked sites in the webmaster console. Google could also try to identify better email addresses when writing to site owners. For example, for talkorigins.org, there are email addresses such as “archive@” and “submissions@” that we could have used instead that might have reached the right person. I’m open to other suggestions too. But please give Google a little bit of credit, because I do think we’re doing more to alert webmasters to issues than any other search engine.
Note to new readers of my blog: I pre-moderate my comments, and it’s after 2 a.m. and I’m going to bed now. If your comment doesn’t show up immediately, it’s waiting for me to approve it after I wake up. 😉
Matt,
great posts, thank you for this nice example. However, I would avoid emailing webmasters. I would show the message in the webmaster console instead.
In my opinion, most webmasters don’t read such generic accounts as webmaster@ or info@, just becuase there is too much spam coming to those addresses. Or they use a spam protection that your bot can’t answer, and your emails will not be received.
Just my opinion.
Pete, I agree. In the past, we’ve tried to find email addresses mentioned on the sites themselves, which can work a little better because they aren’t as generic. But it’s true that the webmaster console is a good place to communicate with site owners; it’s just a shame that not everybody knows about it yet.
Why not email the address(es) that have that site registered with Webmaster Tools? Even a basic “Dear Webmaster Tools account holder, please visit Webmaster Tools ASAP” would be worthwhile.
Richie Hindle, I agree that’s a good idea, except right now it doesn’t require an email address to verify a site in the webmaster tools. So we don’t have the contact info in order to be able to email site owners. That would be nice to offer though, I totally agree.
Ugh, it’s past Pi (3:14), so I’m going to bed now. 🙂
Matt: “..doesn’t require an email address…” – I thought Webmaster Tools required a Google Account, which in turn required an email address?
Matt,
That’s very scary stuff. However, it’s great to see your team has fingers on the pulse.
I have to say that google’s response to the problem is well more than one could or should expect from a search engine. I’m surprised that the posting was slashdotted, given that the scenario presented is relatively common within the webmaster community. It’s great to see that google (Matt) is finding ways to improve the situation for webmasters, but on the same token everything that google did besides filtering out or penalizing the site is above and beyond what they need to do.
Everybody is responsible for their own quality control. If you can’t manage that on your own, hire somebody who can.
Matt, thanks for the details.
But: If they have their site registered on a webmaster console account why does google not contact the email from this account? Wouldn’t that be the most reliable way to contact a webmaster in such cases?
Apart from that a more detailed message in the webmaster console would be great of course.
Matt,
I guess most site owners get mad at Google from time to time, for one reason or another, I know I have. However, from your explanation it strikes me that Google has gone way beyond its duty to the site owner and has done everything it could while meeting its obligation to Google visitors.
Great to know that while fighting the war against spam, Google is also making huge steps towards improving communication with site owners.
Nice overview of the situation Matt. An email notification would be great or even something tied into Google Alerts.
In fairness, you guys did everything you realistically could and were pretty fair about it. I don’t see how he could get so upset about it.
Matt, this is by far the most detailed respnose I have seen in regard to what an SE does when a site has been hacked. Thanks so much for you candor on the subject. I agree that the webmaster console is the preferred tool to communicate this info as well as an email to the acct on file for WT at Google. great post!
Matthew
Love the post. Great incite into how google operates with spammed/hacked webpages..
I’ve got to say i feel alot happier at how you are approaching these situations and yes maybe you could be better.. but heh your a hell of a lot better than i had imagined.. good job all round!
The first website I have made have been black listed because I used bad practices on it.
After that I made corrections, cleaned the website and send a mail to Google. The web site reappears in Google some times after.
Now, after that, I think it’s normal and good that Google makes effort to block web site using bad techniques to let everyone quite equals chances!
The only thing I regret that is no one prevent me for my black list and my reintegration (that was one year ago).
heh, good old String.fromCharCode() — that’s a 100% block pattern in email, it’s only used in hostile HTML 😉
That slashdot blurb contradicts itself…
Rather mysteriously, Google pulled the plug on its search engine
This was apparently triggered by a recent cracking of the site that added ‘hidden links to non-topical sites…
I think this is one of those slipperly slopes for Google. There is this “entitlement” attitidue among some webmasters – and Google does a lot to promote that attitude in the name of “do no evil.”
Maybe because we’re small we’re a bit more humble. We were hacked on November 3rd and again on November 11th – after two passes at security we’re hopefully good now. But on inspection, all three SE had “bad pages” and we caught each hack within minutes of it happening.
Maybe it’s me, but I thought I was ultimately responsible for what my site was feeding robots.
Matt, you mean that you didn’t contact the man, request his ftp details and fix the changes for him!
Well done Google. This is very nice to see. The only shame is that Google couldn’t reply to the same queries made on the Google Webmaster Group, and it was left for you to respond here after the issue became more open?
Of course I’m still wearing my tin-hat after my post about Thinkhouse PR mysteriously disappeared completely from Google’s index after being crawled, indexed and ranked.
I hope that Adam might respond to my thread on the group when he returns from SES (as he was the on who put out the fire the last time a post about Thinkhouse PR mysteriously vanished from the index).
Not withstanding, it’s great to see you guys improving your communications. Hopefully you’ll figure out a better way to become proactive rather than having to put out the fires.
Well done Google 🙂
“It’s past pi”… I LOVE IT!!! Gonna have to use that one, and then stand back and endure the abuse I’m sure to get.
Hi Matt !
What if the site is hacked again ? What is the guarantee that the webmaster has fixed the site and prevented future hack attempts.
Matt,
Thanks for letting us know what you do in cases like this. I hope its a process I never need to go through. I think rather than work hard on finding better e-mail addresses, a more useful approach for improving this kind of communication would be to attempt to ensure the e-mail gets through spam filters. Looking at that text, there’s no way it would get through mine. It includes the following words which are all nearly-perfect spam indicators according to my bayesian filter:
webmaster webpages porn sex beastiality rape
The following words are additional strong spam indicators:
results dvd ripper psp downloads movies Google
OK, so there’s probably not a lot you can do about the last one (in there because of loads of SEO spam I keep getting), but the problem is largely the stuff you’re quoting from the spam links. If I were setting this up, I wouldn’t include that info, but put it behind a link for the recipient to click to find out what was wrong.
Matt: Did you also ban ‘theoi.com’ due to this hacking issue? Theoi.com was linked from talkorigins.org during the deface, but the webmaster claims he’s nothing to do with this issue.
Above and beyond! take no notice!
As one of the Archive foundation members, I have amended my own blog entry to note this, and retract my own complaint of Google failing to contact us. Wesley was also travelling during the relevant period, and may have simply missed the message, although I think it is more likely it got spam trapped. In short, it may have been an unfortunate confluence of circumstances.
My apologies.
Can you provide a more specific link to this “Google webmaster console” you describe? I visited http://www.google.com/webmasters/ and can see nothing that goes by that name. I tried the “site status wizard” but it doesn’t appear to give any infomration on whether or not a site is penalized. Otherwise, there are just some links to blogs, discussion groups, and a “tool” that lets you submit sitemaps. Thanks…
Great work and recount of events.
Matt, will your blog now be de-listed for impolite words like “animal porn?” Seriously, how does other web sites cite these examples without getting flagged? Also, does it mean Google indexes javascript injected text?
Thanks for the eye-opening post. Heading for the console now!
“Then we emailed the site, with the exact page and the exact text that was causing problems.”
Did you? I can’t find the exact page in the excerpt you posted. Or is it ‘talkorigins.org’? If so, it could use a more careful wording, because right now it could be understood refer to the site, not the opening page, and “dear webmaster, there is this spammy text in one of your gazillion pages, but we won’t tell which, have fun hunting it down” is not a very friendly message.
Also, in his post, the webmaster says he checked Webmaster Tools, and didn’t find any reason for deindexing, so thats another point where you could improve communication.
The best solution would of course be to automatically differentiate between spammers and honest webmasters. It can’t be that hard to identify at least some of the legitimate users: if a site has ran continously, with high PageRank and no bad behavior for years, then it’s probably not a spammer. Of course spammers sometimes buy domains of formerly legitimare sites, but that can’t happen in great numbers, and to find weaknesses in Google’s algorithms, one should have to do lots of experiments and receive details from Google a lot of times.
Man, you’re so politically correct it’s hypnotizing, even when someone questions the quality of the work you do (though indirectly through an attack on the webspam team).
You’re definitely the kind of guy it’s impossible to have a long term problem with. The kind I’d invite over for a beer to watch a hockey game of the Canadiens if I could 😉
Awesome! Thanks a lot!
Impressive writeup – love to see more of these type of war stories Matt. I just submitted an update to Slashdot to get the real story out.
Tick…tick…tick…tick…tick…tick…*KABOOM*!
That’s the sound of webmasters whose sites don’t rank in Google exploding after reading this post, writing nasty post after nasty post in response to this as it couldn’t possibly be true nor could it explain why their spammy…errr…perfectly clean sites don’t rank the way they should.
Matt, you’ve got way more cojones than most people just for posting this. Respek.
The question that remains in my mind in all of this is: how do you know that it was actually hacked? There’s nothing saying that he couldn’t have just put those links in himself. He may own the porn stuff under a different alterego and using the clean site to promote the dirty ones. It’s not that difficult to pull off.
Is this a hack big G knows about? Is it on other sites? Has anyone else seen it? I can’t speak for others, but this is a first for me. I’ve seen hacks before, but usually they take over the whole site…there’s more in it for them that way.
I’m probably wrong, but people have done much more ridiculous things than that and blamed big G. So I figured someone should ask.
Matt,
Very encouraging post. It displays an extra level of understanding that you and the webspam team have regarding the fight against spam. Yes, all spam is bad, but not all spammy-like sites or their owners are bad. Sometimes there is a legitimate reason for such things. More often than not when someone in the Crawling, Indexing, and Ranking google-group posts a question about a site having problems its due to something that happened on the site that they didn’t realize was spammy and not as evil as it looks.
It should be noted as well that in the webmasters tools under the statistics tab, on the page analysis link Google shows “Common words in your site’s content” This is often a good place to spot potential problems. If “animal porn” shows up in your site about the migration of European swallows, and you are pretty sure you never wrote about that subject, well now you know that Google for some reason believes you did. The example given here may just explain such an occurence.
That is as comprehensive a reply as I have ever read from a major corporation. I don’t think anyone can have a reasonable complaint after reading this
Fantastic post, although I have sympathy for the guy it is ultimately his reasonability to make sure his site isn’t spammy (as you rightly pointed out)
Google sitemaps has improved my own relationship with Google and how I optimize my websites, I recommend that everyone take up using it; the improvements they have made in the past few months have been excellent and it shows no sign of slowing (thankfully).
I think Google’s policies are spot on.
Stuart
Matt,
I think that the message you show as a warning is excellent. It clearly states what is wrong, with enough information to permit a webmaster to locate the problem.
I only wish that I had actually received it.
Before I made my complaint, I checked my incoming email. There was no sign there of an attempt to contact me from Google. Lunarpages.com, where the TOA is hosted, forwards email to my account.
This morning, I learned of this post, so I re-checked my steps. No, still nothing in my incoming mail. I looked for strings from within the warning, to see if the text came through without an obvious “google” connection. No luck on that, either.
I rely upon the Lunarpages email forwarding, but given this post, maybe I was wrong to do so. I logged into the domain’s Lunarpages webmail interface for the first time. I searched for anything with “google.com” in the from field. I searched for strings from within the warning message quoted above. Still nothing.
Bummer. That just leaves examining the SMTP records on my local email account. Google’s message should have been relayed by Lunarpages, so I looked for that in the SMTP logs. Still nothing.
My SMTP logs, BTW, do show rejects for hosts like wr-out-0708.google.com, which is apparently blacklisted at spamcop.net. The following shows rejects on the 28th with a “google.com” domain. I haven’t checked these for spoofs, but I assume that’s what’s up with these:
It would be ironic, though, if the warning message that could have short-circuited this whole affair was blocked because of spam filtering.
As for entitlement, I don’t think that I was out of bounds given the information I had to work with. Google certainly isn’t responsible for fixing the bad stuff that is on my site. I never said it was. Having a third party mess with the site caused the problem in the first place. Having tried to work with Google once the problem became known to me resulted in… nothing. Not until I complained about what happened.
I do feel a bit better to know that Google made an attempt at contact before de-indexing our site. And it is good to know that the site is scheduled for re-indexing within a couple of days, rather than the couple of weeks mentioned on the Webmaster Help Group. I wish Matt and the rest of the folks at Google success in making the process better in the future. If, when I did claim the TOA site via Google Webmaster Tools on Dec. 1, the text of that warning that you quote above had been waiting for me, I would have had no complaint to make. It seems to me that if Google is willing to send that level of information via email, then making it available to the verified owner of a site via the Webmaster Tools interface should not be a problem, either.
And, Adam, neither Google nor you can tell whether the problem is a deliberate cheat or an honest person vicitimized from the problem itself. Google is entirely correct to protect their index by pulling sites that are not in compliance with their guidelines. I never said otherwise. My complaint was what Google’s policy of obscuring the de-indexing decision in the aftermath created, which is a situation in which cheaters have an advantage over honest webmasters, since the cheaters have knowledge of where in their pages the bad stuff lies, and the honest webmaster does not have that knowledge. Whether or not you may accept that I qualify as an honest webmaster, the policy as it currently stands obviously puts honest webmasters at a clear disadvantage.
If you can’t get at the Google Account holder’s email for some reason, couldn’t you just have a field on the webmaster console where people enter an email address to which Google should send reports?
If Google needs to send a report and this field is filled in, that’s where it sends it. If it’s not filled in, or not on sitemaps, Google guesses like it does currently.
Either way, it would make sense to duplicate the message on the webmaster console as well.
Finding the balance between relevancy and spam aint easy for a search engine yet there is definately an argument (maybe over another burger some time Matt) that search algos drive hacking ?
I made a joke about Matt and porn star Bandi Belle in my blog and for some reason all three search engines ranked it for all kinds of spammy stuff. To an untrained eye which is 99% of admins. (when dealing with google) it looks like Google is punishing me. My stats can confirm this if I believe it to be true, but I am not so sure yet.
Matt – Is the Google algorithm so primative as to confuse some text with an entire blog that has nothing to do with porn or is “SEO” equal to or lesser that spam algorithmically?
Sorry if I sound a little bitter, looked at my stats today for SEO Buzz Box and went “What the..?”
New blogs have a tough time recovering from incorrect first impressions in your algorithm Matt. Sorry if I am losing anyone, it’s hard to be clear on something that has sooo many variables….
So what you’re saying is that if a site has links to subjects that Google doesn’t approve of, and these links are hidden, you remove the site from Google’s results?
When is Google going to realize if you can’t legislate morality, you can’t algorithmically control morality, either?
I think it’s far too easy to be mad and place blame elsewhere rather than figure out the problem, this situation isn’t abnormal by any stretch of the imagination.
I also think the site owner(s) realized there was a problem but didn’t know how to fix it, so rather than figure it out and correct the issue attacking a major company is easier — and more far reaching. Plus it makes the individual site seem like a “woe-is-me-the big-guys-are-picking-on-us-again” sob story rather than taking responsibility and correcting the issue on the dl.
HOWEVER, this could also be a way to gain press. Any press is good press, right? How much do you think the site’s traffic has/will increase over the next few days/weeks?
I’m sure I could get loads of extra traffic by picking a fight with Google too and blasting it across the internet. So someone’s either really lazy or really thinking.
Hi Matt,
I’m an admin at http://www.uncommondescent.com and we were deindexed by google in September. We never did find out what we did wrong. We used webmaster tools but got nothing but a cryptic “you did something wrong” message. We submitted a site map, changed our wordpress theme to a cleaner one, tried cleaning up our RSS support, told our authors to stop pasting articles with RTF format, and had our lawyer send a letter to an unauthorized mirror site which was duplicating out content without permission. After all that we still weren’t reindexed until November and that I suspect was only because users who were shareholders in google phoned or wrote to investor relations asking why a blog with a 6/10 rank at google run by a famous professor/author (William Dembski) and linked to by hundreds of .edu sites had been delisted.
At any rate, in case anything like this happens again, I bookmarked this site so we can contact a human being. I understand why google handles all this in by automated electronic means but you really, really, really need to give out more information about the reason for delisting in the webmaster tools.
Hey everybody, thanks for the supportive comments. When you’re posting late at night, you’re never 100% sure that you’re making sense. 🙂
Richie Hindle, my impression was that we couldn’t use the email address registered with a Google Account without asking for permission first, but you’re right that we would have at least something to work with. I’ll check more on my side.
RedCardinal, most of the webmaster console team is on route to Chicago for the Search Engine Strategies conference this week, so I wouldn’t be surprised if the webmaster help group at http://groups.google.com/group/Google_Webmaster_Help was quiet right now.
varun, if a site gets hacked again, we basically repeat the same process. I’ve seen that happen before, e.g. if the site’s webhost has a larger security hole that hasn’t been fixed yet.
Jules, good point about the explicit language. I agree that the right direction is something more like a) having a reliable contact address and using it to b) send a letter that’s less likely to get blocked, but that has a link to more specifics.
Dirson, you can confirm for yourself that theoi.com was hit in the same wave of site hacks. MSN is pretty slow in this instance, so they have a copy of theoi.com from 11/24/2006. Here’s what I see at the bottom of the cached MSN copy of http://www.theoi.com :
<script>document.write(String.fromCharCode(60,100,105,118,32,115,116,121,108,101,61,39,100,105,115,112,108,97,121,58,110,111,110,101,39,62))</script><a href=’image/?i=animal-porn’>animal porn</a>, <a href=’image/?i=animal-sex’>animal sex</a>, <a href=’image/?i=beastiality’>beastiality</a>, <a href=’image/?i=zoophilia’>zoophilia</a>, <a href=’image/?i=horse-cum’>horse cum</a>, <a href=’image/?i=horse-fuck’>horse fuck</a>
We found the hacked spammy content on theoi.com at the same time, and we sent an email to the same contact addresses at theoi.com with the timestamp of 2006-11-28 14:25:06. The only real difference in the email was “* The following hidden text on theoi.com:
e.g.
animal porn, animal sex, beastiality, zoophilia, horse cum, horse fuck”
I’ll check to see if theoi.com is clean now and file a reinclusion request for them if they are.
steve, to find the webmaster console (as I call it), go to http://www.google.com/webmasters/ and follow the link to http://www.google.com/webmasters/sitemaps/ . It’s the middle link on the left-side. The official name is “Webmaster tools” but we also sometimes call it the “Webmaster console.”
Tgr, fair point that we could probably make the email message more clear. We do mention the exact url with the problem, but instead of “The following hidden text on talkorigins.org:” we might be able to say something like “The following hidden text on the specific page ‘talkorigins.org’:” or something a little more explanatory.
Multi-Worded Adam, is “respek” an Ali G reference? Respek back for that. 🙂 The short answer is that we had pretty high confidence that this was a real hack because we’ve seen things like this before (e.g. I mentioned that the same cracker hit theoi.com).
JLH, your suggestion to use the “Common words on your site” feature of the webmaster console to self-diagnose hacks is a fantastic one. In fact, I was planning to do a post along these lines. If it makes sense that hacked sites would show off-topic junk words in this section, now go read http://www.seroundtable.com/archives/006782.html . Notice the second complaint? Barry notes a site owner on WebmasterWorld who says “When I look at the keywords for the new site in the Google webmaster tools, the list is packed full of words… that have nothing to do with my site! They seem to be about cruises, sports, casinos, and various commercial and financial matters.” I’d be willing to bet a shiny quarter that the site in question was hacked and has a ton of pages with junk content. Another way to self-diagnose is with the site: query. If you see pages with weird extensions such as .dhtml that you don’t normally use, and the pages look like pay-per-click (PPC) pages on your site, you’ve probably been hacked. Check your root page and check your .htaccess file as well.
John Wilkins and Wesley R. Elsberry, thanks for stopping by to comment. Sorry if my post came across as brittle. It was just late at night. Wesley, I promise that I’ll try to get a “your site has been hacked” message into the webmaster console so that when someone claims their site, it’s much more clear what happened.
Why not display the text of the email in the webmaster console too?
a. this takes care of bouncing/bad email addresses.
b. No one can claim that google doesn’t explain what the actual problem is.
all said the original post appears rich! Wesley is unhappy with google’s communication while relying on a service which happens to prevent communication w/google 🙂
“Richie Hindle, my impression was that we couldn’t use the email address registered with a Google Account without asking for permission first, but you’re right that we would have at least something to work with. I’ll check more on my side.”
Have a checkbox in Google accounts saying: “Check here to be contacted if we find illegal activity on your website or blog”.
Now what about my darn question? I feel like a child pulling on dad’s pant leg. If you do not answer the questions Matt others will and as you are seeing they often are incorrect. You don’t want that homie! ;o)
As somebody who has a WordPress installation and knows that spammers try to ‘hide’ their real domains among links to legitimate websites, I hope that the Google team is aware of this stratgey and that no innocent website gets punished.
Hi Shelley, thanks for stopping by! You said “So what you’re saying is that if a site has links to subjects that Google doesn’t approve of, and these links are hidden, you remove the site from Google’s results? When is Google going to realize if you can’t legislate morality, you can’t algorithmically control morality, either?”
The fact is that when someone uses Google to find a site and then they discover that the site has hidden text, they get angry. They feel as though they were deceived, even though the vast majority of the time the hidden text wasn’t a factor. Then those angry users send us email. 🙂 Hacked sites are even worse, because a large number of hacked sites try to install malware when an innocent person reaches the hacked site.
So yes, Google provides quality guidelines that say things like “Don’t show hidden text. Don’t cloak. Don’t do sneaky JavaScript redirects. Don’t put viruses or malware on your pages.” And we can take action when we see things that violate our webmaster guidelines, just like every other major search engine does.
Webmasters are welcome to do whatever they want on their own sites, but surely you have to allow Google to do what we think is needed to provide a good experience for searchers on Google, too? You wouldn’t require us to list a site that we thought was bad for users, would you?
Good stuff!
One comment I haven’t seen here yet: you may also want to include the domain’s whois contact(s) in the email you send out. That, at least, is well-defined and supposed to work, while things like contact, info, support and even webmaster may not exist.
– Michael
Michael Schaap, we did that for a little while, then someone on (I think) WebmasterWorld got angry because the email reached the technical contact or web host instead of only the site owner. So for the most part we don’t do that anymore.
Uh, no. Finding the problem in my case turned out to be simple. From the time that I found out about the de-indexing of the TOA to submitting the reinclusion request was maybe three hours, tops. That included fixing the problem and a wait to make sure that the same problem that was in our default main page was not in the other 5,000+ pages in our archive. That’s not what I am concerned about.
There was no provision for me as a webmaster with a de-indexed site to communicate with Google about it. I looked for that. Maybe I missed something. I had no great desire to have the abuse and sneering heaped upon me that I knew would result, but I felt morally bound to speak truth to power. I doubt that I will convince you of that, but that is the case.
In the situation I found myself, Google’s policy on the obscured de-indexing decision privileged cheaters over honest but victimized webmasters. I see that as a problem. Matt’s account of Google’s attempt to contact me pre-de-indexing does show that they are trying to provide good information to webmasters, which allays my concerns somewhat. As I mentioned above, though, that information was not available through the Webmaster Tools interface, which was my only conduit to information from Google about my case at the time.
I guess it all depends critically upon whether one sees a problem in the way that Google obscures a de-indexing decision and limits how webmasters obtain information about it. If one thinks that the current system is perfect, then any complaint must necessarily be due to laziness or avarice. If we can agree that there may be improvements to be made (as even Matt Cutts says in his post above), then there is a third option.
And for the record, I agree with Wesley. Our alerting process is better than other search engines, but it’s still not where (I personally believe) it should be. It’s from hearing complaints and feedback like in Wesley’s post that Google can prioritize what things need to be done next.
If we ever reach a point where users and site owners don’t complain about things that Google should be doing, or urge Google to improve its processes, that will be a very sad day in my book.
Wow, I bet they feel a bit silly now for making out they are being victimized by google.
You know there really should be some sort of notification in webmaster console I’ve gone through the ordeal recently myself and documented the whole process.
http://www.wolf-howl.com/seo/is-my-website-banned-in-google/
http://www.wolf-howl.com/seo/my-website-isnt-banned-in-google/
The oddest art of the entire ordeal was pagerank didn’t go graybar. is the condition where site:example.com returns 0 results but pagerank stays indicative of this type of banning?
Matt,
Thank you for the gracious comment on my feedback. It is all too rare to have a pointed complaint like the one I made turn into a dialogue. My best wishes to you in making Google even better.
Nice summary, Matt. Good ideas everyone. Looking back just 1-2 years, all of this would have been top-secret. It’s good to see that Google is communicating more – keep it up.
How would you more subtile handle hidden links like on http://www.unesco.org/webworld/portal_bib/pages/Cool/ ? The larger the site, the less likely you’ll EVER reach a webmaster who knows what it means and can clean it up. I’ve tried for the last year or so, never an answer.
How can you be certain that you reach someone in charge who can handle it correctly?
Wesley R. Elsberry: “My SMTP logs, BTW, do show rejects for hosts like wr-out-0708.google.com, which is apparently blacklisted at spamcop.net
…
It would be ironic, though, if the warning message that could have short-circuited this whole affair was blocked because of spam filtering.”
I’m afraid it looks like that is exactly what has happened. Spamcop has the Google server’s IP blocked (actually there are several IPs, only some of which are blocked), because email has been received from it at a spamtrap.
Google’s system is guessing email addresses and sending automated, and unsolicited, messages to them. That pretty much matches the definition of spam, and is certainly enough for a Spamcop listing. It’s quite possible that some of Google’s messages will reach real addresses who don’t want to hear from Google, which is not good.
On the other hand, I can understand why this “spam” may be desirable – as this example shows, the webmasters may well actually want this information, but they don’t know they want it until they’ve got it, so they aren’t really in a position to solicit it.
I don’t imagine the Spamcop folks would do anything to change this listing. It would be good though if those emails weren’t coming from the same IP addresses as any other email, to avoid other kinds of email getting caught in people’s filters. On the other side, people using Spamcop’s list for filtering should also be aware of how the list works, and that the fact that it is very aggressive means it isn’t appropriate in all circumstances – in particular, it may be better to use it as part of a spam scoring system, rather than blocking connections outright.
My site, xtort.net was also dropped from Google’s index early last month. I am starting to think that our host may have been hacked in some way as well, because I have never and would never do anything that would deliberately contravene any of Google’s webmaster policies, and haven’t made any site changes in a few years now.
I did a reinclusion request a month ago just in case someone maliciously filed an exclusion request or ran some sort of script like the one mentioned above. I still have not heard anything from that though. I wish that there was some way to ascertain what went wrong, so I can safeguard against something like this happening again.
One of the biggest things that bothers me is lack of contact information on many websites. If the webmaster of the site in question has his contact information prominently displayed he would have been notified and the problem would have been avoided or quickly resolved.
I always include an about or contact page on my sites. Not only does it allow your users (remember, we are all nothing without our users!) to contact you but I think it builds confidence.
Christer Edwards,
The TOA does have a contact page:
http://talkorigins.org/origins/contact.html
It is linked from our main page, link text of “Contact Administrator”. I just checked the email addresses given, and they are working.
Michael Lefevre,
Thanks for the info on the SpamCop thing. Given that Matt didn’t mention any attempt to send email directly to my local email address, the SpamCop blacklisting should not be associated with the warning email attempt. The timestamps don’t appear to match, and the SpamCop rejections don’t correlate with forwarded email from Lunarpages.com.
Matt —
Thanks for this view into how and what Google does with spammy sites. Now I know if my site gets hacked and I get dumped to check the webmaster panel.
Overall, just really appreciate the chance to see some of the inner workings of Google anti-spam activity.
Great post Matt. Very interesting indeed. Though I must say that for some reason, I find it very disturbing that Google requires a penalized webmaster of basically admitting guilt before the possibility of being re-indexed. If someone got hacked, it doesn’t seem very fair or equitable that they MUST admit guilt. Seems to me to be a bit extreme, even a bit fascist.
Consider the process that google uses, does google see hackers/phishers taking advantage of it and sending emails to potential users of gmail.com and tricking them into providing their google account and password through a look a like site?
I would recommend google take precautions against that if not already done.
Would digitally signing the mails make any sense? Imagine the amount of email spam that is going to use that exact text to sell you the SEO services you always needed…. “To get reincluded, please contact us at 000-000-0000 after you transfer $$$$ to our off-shore bank account”. 🙁 I imagine a signed mail would have a better chance at making it through the mail filters (perhaps). Does Google do SPF?
Coming from the RBL side, I know EXACTLY what you are gong thru. We get angry delist requests from domains screaming “Why didn’t you tell us?!”
Frankly, its not our job. There are far too many spam domains for us to babysit them all. Google went above and beyond what they had to do.
Keep up the great work.
–Chris
Posting site-status messages on the webmaster console is a great idea, and is probably all that is needed. If Google were to contact webmasters by email I’d feel better having Google use the email address on file with the webmaster console than the one in whois data. I keep my whois data private for a specific reason: my site is devoted to exposing scams, and when it was part of my personal site (with whois data available) I often got threatening messages from scammers. This convinced me of two things: (1) this was a worthwhile site, since some really bad people were trying to shut it down, and (2) keeping whois data private does *not* necessarily mean a site is spammy: it may be necessary and legitimate protection for an honest webmaster.
I actually would like to point out that the spamcop listing of google has been a big topic on one of the antispam lists. Antispam ppl are split about the listing. Some think its correct, others think its bad.
I find it ironic that google doesn’t contact spamcop to get the listing removed. And change their practices to keep them from getting listed again. 😉
–Chris
Matt,
You mention that the hidden links to porn spam were causing problems for users – what are those problems? If those links were all hidden, causing users not to notice (I assume users would not have noticed else Wesley would have heard immediately from them), and the site content creators didn’t even notice, then how is that such a big problem that it warrants deletion from your indexing?
Matt – re: DaveScott and Uncommon Descent
Google probably deindexed them for excessive stupidity, as Dembski and DaveScott Springer actually believe in Intelligent Design!
Tell them “The Designer Did It” and they are “out of here”.
Bwa ha ha ha!
“But couldn’t Google have done more?” Well, it turns out that we did do more”
I think Google’s new webmaster console is great. Thanks Google.
However, I would never expect an email or Google’s help to identify the violation – that is ludicrous. If Google provided personalized service in this area, every slimy SOE would consume huge amounts of Google’s technical support just so they could probe the skinny edge of being spammy.
“Tgr, fair point that we could probably make the email message more clear. We do mention the exact url with the problem, but instead of “The following hidden text on talkorigins.org:” we might be able to say something like “The following hidden text on the specific page ‘talkorigins.org’:” or something a little more explanatory.”
Like, oh, maybe a well-formed URL?
Specs are written for a reason, after all. http://talkorigins.org/ is unambiguous. even less so.
Google tech support is great, Sounds like they are under appreciated.
As an IT manager, the google team went past what anyone else would do to help someone solve this problems
Good Job Google Team
It may also help some folks reading this thread, to point out that even Spamcop does not recommend using a listing in bl.spamcop.net as the sole criterion for making a “throw the message away” decision. Use it as part of a ranking decision, or use it to add decorations to the message to be used by later filtering steps, but just tossing a message out based ONLY on a listing is simply NOT recommended.
Lots of people|hosts|providers do so, (I even do, for one of my domains) but part of the problem under discussion is due to the recipient not being fully aware that their provider made configuration decisions which Spamcop actually recommends against.. More “disclosure” is better. If you know your spam fighting configuration can throw away some “false positives” without notifying you, then you might be less likely to make an incorrect assumption when something important fails to arrive.
If you’re in the heat of the moment, you’ll say things irrationally without logical evidence. I think this is the case here. Google is doing awesome, I love webmaster console.
what about a feature in google’s webmaster console to allow a specific email to be set for websites so that google can contact directly instead of having to guess at email addresses? Given the massive scale of email spam I don’t have email services for the websites I administer, so common email addresses like info at, contact at, support at, etc simply won’t work (or any for that matter.) The webmaster services are already tied to a google account, so sending the gmail account an notification email would also work.
couldn’t google have hacked his site and left that message on his homepage? 😛
Matt, can you clarify by what process Google chooses email addresses to attempt to send a ‘hacked site email?’ Is it automated and always to those same four addresses (support, contact, info and webmaster) or is there an attempt to scour the site itself for published addresses?
I bring this up because of Wesley’s comment regarding the talkorigins contact page:
The actual “mailto:” email addresses on those page are all obfuscated, presumably to prevent robots from scraping those email addresses. (For example, “mailto:archive@talkorigins.org”)
I don’t know to what degree the process of identifying contact emails for a hacked site owner is automated, but I wonder if this was a case of the email obfuscation also preventing Google from getting a good address.
Given that Google already generates warning emails about spam, wouldn’t it be rather simple to link the webmaster tools to the email database? Therefore when someone claims their sites, they can see any recent alerts that google has issued.
Thanks for such a comprehensive post, it is really good that google does so much to help webmasters out when they are just trying to do the right thing.
I think it is great what Google is doing to improve communications with webmasters. Maybe one day Yahoo will take a step in this direction.
+1 to Matt for providing a reasonable and detailed description of the issue from his or Google’s perspective. (At least in part– sure, Matt probably doesn’t speak for all of Google, but at least he is responding in his particular bailiwick. 🙂
While it is expected that should work (see RFC-2142, which defines the common mailbox names for various services), it is also true that Google could send a contact email or warning when there is a problem to the WHOIS contact(s).
Having that message be signed is less important; if you receive a suggestion that your webserver has been hacked, it shouldn’t matter much if it comes from a well-known domain, or from someone anonymous at a forged domain. You ought to check things out regardless. Sure, it wouldn’t be a bad idea to use PGP/GnuPG to sign the message, but it’s not critical.
And SPF wouldn’t help in this case, JohnMu– while Google does have a published SPF record, the record is “v=spf1 ptr ?all”, which basically scores as positive if the IP address reverses into the Google.com domain if you do a PTR lookup, and neutral otherwise (ie, the “?all” means all hosts are neutral).
Thank you Matt for taking the time to answer and explain Google policies while pondering improvements. The “Google world” isn’t as closed as some people make it out to be…
Re getting a warning: in my case – I am banned – I don”t believe there was a way for Google to advise me of the problem. I hadn’t registered anywhere in Google.
I have been indexed and ranked high for so many years (over 10) than I never bothered to look at the webmaster’s rules and stuff in Google. I just went along, and took it for granted. Then, recently I was dumb enough to make a duplicate site.
Bam! Zap!
I fixed the problem but, I am still nowhere in spite of a request (Matt, can you help? …please?)
Moral of the story:
– never take anything for granted, and Google least of all
– this isn’t the nineties anymore when search engine were a free for all. Pay attention.
– don’t bitch about Google – they’re not perfect, but they’re trying their best fighting our common enemy-the a.s. spammers…
One suggestion though… Maybe it’s time to evolve the Google algorithm with less – or no – emphasis on links from other sites, in favor of more from links from people’s browsers. The concept of ranking from site links was a good idea in the beginning, but today it seems to reflect little of the content value.
Thank you,
Yves
Here is another story about how Google works with sites:
http://www.idcide.com/affair/
Matt won’t review this story though, and the owners will ever get an explanation….
We have been fighting automated spambots on our forums that post links like the ones in your post. We delete the posts as quickly as we find them.
If the Google spider happened onto one of those posts in the short time before our moderators caught and deleted it would we receive a penalty?
So there’s an RFC for webmaster, info, and support addresses?
This is news to me, heh.
I think a catch-all address is a good thing for stuff that might get missed.
Hi Matt,
I think Google is doing a good job. What also could be improved is telling the webmaster how long the penalty is for (approx is enough). I bought a domain that was burned and after my request Google reinstated the index.html (within only 10 days). Unfortunately all the other pages (actually an entire blog) are not indexed yet and it would be great to see if there is a penalty or anything still pending.
Otherwise keep up the good work! I give Google a lot of credit for being so honest!
Seb
Matt, it’s nice to know that least parts of Google communicate well with others. Maybe you could try to rouse some of this open and honest communication from within the AdSense department, whose idea of open communication is “We have detected invalid clicks on your site and have permanently closed your account with no recourse, and we won’t give you any more information because it’s ‘proprietary'”? Not that I’m bitter or anything, but it would be nice if they had policies for being a fraction as open as you are. 😉
Well, interesting turn of events…
Even though I was guilty of breaking a G. rules, I was banned for a reason I was totally unaware of, and would have never known if it wasn’t for some helpful fellow on the Google board. Lucky break for me.
So this brings us back to the original topic and the complaint that Google doesn’t notify of a ban.
It does makes the honest webmasters double victims – not only do you get hacked, but you get zapped by G without a warning.
Yes, we all have to be aware, and take responsibility, but who checks their site status daily? and if you’re just a regular joe, how do you know were to look for the spam and hack lurking in your site? I couldn’t find the files in the server even after someone said I had been hacked.
Google is now the king-maker and critical for our businesses. That does imply some responsibility on their part, and not just to the people looking for relevant info, but also to the people who’s livelihood now depends on Google’s indexing.
I believe I deleted all the bad files. Hopefully I will be reindexed before too long.
Holding my breath…
Thank you,
Yves
Wow, this info is helpful to me. My site before was also hacked by unidentified hackers. I wander why I haven’t recieved a notification email from google, But the problem was fixed it right after few days it was hacked. so, maybe that is why I didn’t recieved notification email because I fixed it right away.
I wish that Google would give a clear explanation why talkorigins.org is not in Google’s index when I request “site:talkorigins.org”. Could Google just mark spammy and cracked sites in the index. Why is there not a backup of the site map for Google so that is does not need to be re crawled to reappear in the index?
While including the ‘offending’ text in the mail you attempt to send to webmasters is laudable, I think doing so also has perhaps the highest possible chance of getting your mail filtered as spam..
As someone else here pointed out, things like Javascript, and especially functions used most commonly for obfuscation (such as Sting.fromCharCode() ) along with the various ‘naughty words and phases’ found in the links would be highly likely to get the mail filtered as spam, (for the very same reasons that google’s own filters twigged to that bit of code as offensive).
A much better route might be to send a fairly generic message, and include a link, or directions (perhaps just going to google.com and pasting a given guid into the search field?) that would let the web-master see what was wrong.. The key would be to do this in such a way as to not appear to be phishing mail either… since mail telling people there is something wrong with their site and immediate action needs to be taken also sounds a hell of a lot like what some phisher might do.
As a professional technical writer, I am shocked. I am shocked and dismayed. Those hackers misspelled “bestiality.” Three times! It makes my blood boil.
That would have worked for me. The WHOIS email contact for the TOA is a real live email address.
But it sounds like Google did make an attempt to communicate pre-de-indexing. That’s a hugely good thing.
My problem was that once that initial attempt at contact failed, my site was de-indexed, and there was no option available for me to learn about what Google had already decided to make known pre-de-indexing. Indeed, I couldn’t find out even that Google had considered making more information available pre-de-indexing. If that level of specific information goes into Google Webmaster Tools, as it sounds like it might, that would prevent the situation I found myself in from happening in the future. Like I’ve said, for my site, the problem was easily found and fixed. That would not necessarily be true for other webmasters whose sites get cracked. Will Google run into more webmasters who would legitimately benefit from having that sort of warning message with its specific information on what and where a problem lies waiting for them in the Google Webmaster Tools? I think that the answer is clearly, “Yes.”
You know, you’re the first person besides a guy I play fantasy baseball with that has actually gotten that reference. Mountain View Massiv reprazentin’!
One day, he’s gotta interview you. That’d be some great stuff.
Thanks for the explanation…and once again, respek.
Wesley: what you’re saying does make some sense, and in your case I can see why you’d be upset. I feel bad for you…straight up. I had a server I was working on hacked once (5 years ago, when I was stupid enough to host with Interland), and it sucks.
The problem is that your logic, as reasonable as it is, is exactly backwards. And I’ll explain why, since you’re obviously new to this.
Let’s outline a scenario of Google vs. a spammer. And let’s say Google does the full disclosure routine and lets spammers know when they’re messing around:
Spammer: “Let’s try repeating this word 10 times in succession.”
Google: “You’re spamming.”
Spammer: “Okay, I fixed it.”
Google: “Yep, it’s okay now.”
Spammer: “10 times didn’t work, so let’s try 8 times and let’s try a different word so no one clues in.”
Google: “You’re spamming.”
Spammer: “Sorry, someone else edited my content, I was unaware.”
Google: “Yep, it’s okay now.”
Spammer: “Okay, let’s try it 6 times.”
…
And so the cycle repeats itself. That’s just one scenario. There are infinite ways in which this could play out…spammers are just that twisted a lot (fun to mess with sometimes though, if you’re creative about it. 😉 )
Also, for every time big G sends a spammer (or you) an email, the time and money it took to send that email could be put to much better use nailing the spammy stuff in the first place. That’s why the webmaster console, as relatively obscure as it is, is the best way to have handled that situation: they can still “talk to webmasters” about the legit stuff without sending individual emails out, and they can still tell spammers the nothing they so richly deserve to hear.
Communication really is a double-edged sword in this case. Unfortunately in your case, so is a lack of communication (from your side of it), but as Matt explained and you verified, there were attempts and it really wasn’t big G’s fault you got hacked. They’ve got users to protect and all that.
I’ll give you a little free piece of advice which I don’t think anyone else has yet: you may want to look at finding another host. If you want to find a good one, go to http://www.webhostingtalk.com and try to find one that hasn’t been torn to shreds yet. If they like the host, it’s a damn good host (because there are some HARSH members on that board.)
Apologies if you took what I was saying as a criticism or a knock at you. I really didn’t know the situation, and there are a lot of … individuals … out there who would choose to do exactly what I said, and that’s to take a site that has good intentions and send it on the highway to Hell.
Most people would be offended at the mistreatment and abuse of animals, and this guy’s P.O.ed that Hooked on Phonics didn’t work for the hacker.
Priorities, dude. Priorities. 😉
WOW. I can not believe Google did so much!
What annoys me is that google only contacted him because he made a big fuss and because he had the popularity to get media or blog attention. Google would happily ignore the complaints of a smaller site, and would not give them the time of day – let alone contact them.
I don’t blame you Matt because your mostly SEO PR for google, but don’t spit in my ear and tell me it’s rain. What is google doing for the little guy, or rather smaller sites? The sites that aren’t PR 7+, they still make money for Google – they still provide you with content, they still help you in your battle against net neutrality so that you can continue reaping the large profits that telcoms want to take from you. What do they get in return? Will Google help them in the future, will they give the same treatment that it offers to larger sites?
Should we not have a system where the small sites are not assumed to be spam sites by default?
support@ info@ and webmaster@ ??
RFC 2142 addresses that might apply webmaster@ abuse@ security@
I think anything beyond mailing webmaster@ was overkill as regards the RFC based addresses. Google could have used the whois data, as we know Google has its hands on it.
I’m always amazed Google doesn’t punish our site more for the rubbish some of our users publish, something is still working well, even if I can’t persuade the index to drop the spammy bits we deleted month ago.
That arithmetic gets me every time. :'(
Matt, Adam, the comments & responses suggest we are at an impasse with respek to communications to webmasters: On one hand, webmasters with honest intentions and *without* knowing misbehavior require the notification that could very easily help prevent legit businesses from taking serious losses. On the other hand, you can’t commit to these notifications for fear of the A|B testing by the spammers (which even Vanessa mentioned today after the “Lunch with Google Sitemaps Team”.
You decide that webmasters who got hacked deserve to know. But webmasters who could be spammy don’t get the info. There’s a third group though! — Webmasters who inherit sites & clients trying to overcome problems created by previous inexperienced & shady SEOs.
In the case where honest businesses were swindled or mistreated by some spamhappy SEO and are now in the hands of a legit firm trying to straighten things out–there would be NO solutions for this poor shop. Aren’t these the folks that Google can’t afford to overlook? More folks are coming to me for advice on how to solve their problems: “Sorry,” I say, “I’ve done all I can & now there’s nothing left you can do. Now Google thinks you’re a spammer. Good luck ditching that domain you spent 15 years building.” Is that where we’re left?
Indeed, Google does a better job of notifying some select group of webmasters–but as an engine with more traffic & more impact than any other, Google needs to lead the pack here lest it be terribly embarrased by the MSN squad (yeah right you say?…). Claiming that Google does more than any other is just like claiming that a Giant carries a heavier load than a dwarf (I hope that’s not offensive…). Of course he does & of course he should! But he could have still been inefficient and yet carried a heavier load.
We need some sort of middle ground of communication that allows for these previously mishandled sites some ground to rebuild their businesses within search–that can’t be done without Google’s traffic. OR should those shops call themselves doomed casualties of search?
The fact that mail to the address published on the WHOIS database reaches “the technical contact or web host instead of only the site owner” should not in it self be a reason not to use the address for contact. The WHOIS record has three contact addresses (Registrant Contact, Administrative Contact, Technical Contact) and one of them should be the proper address to use. If a webhost doesn’t agree that their address would be in that field they should demand that the registrant provide a working address. and if they do agree to their address being listed it means they agree to handle the mail sent to the address. In other words: the fact that webhosts/registrars don’t use the whois record properly, or offer “privacy” services by replacing thje registrant’s data with their own data without dealing with the consequences of this offer should not be a reason for Google not using the proper contact address that is published for the domain.
Generic addresses like “webmaster@, hostmaster@, contact@, sales@, info@ etc. are blocked by many domain owners because they receive a huge amont of spam. Addresses on the whois database also receive spam, but they are easily replacable by editting the whois record. Static generic addresses are not a good solution for providing contact info these days.
I use a forwarding address that is greylisted on whois and receive almost no spam. I have replaced whois address about once a year and it is working for anyone using acceptable email standards. (A tool that dynamically updates the whois database with semi-random temporary forwarding addresses would be better but I don’t know of any such tool).
It seems that Matt is doing a very good job. This is an important post that I will pass on to my rellevant collegues.
I can also understand the frustration of the talkorigins.org’s owner, after the hack. May I suggest that Google launches this feature:
IF there is no indication that the site owner/admin received the alert from Google Webspam Team, then allocate X days (e.g. 7) where the SERP (spammy) target page redirects to a page that says something like:
“Innovative Google technology on the inside keeps spam on the outside. Spam was detected on this page. Please click here to go back to the results.”
Plus – have adwords ads by that message. I think it is fair to monetize this Google page since you are putting an effort to give the (unaware) webmaster a last X days (better than 60 days, as Matt wrote). It also communicates to the world that Google is doing something about spam, while being extra fair and giving the (hacked in this case) owner a last chance.
As people here said, the owner may not be getting emails from Google. It is a bit of a problem to rely on email only. Also, Pete (first talkback) suggested a good idea – to give an alert on the Webmaster console. Well, giving an alert on the actual (redirected) target page is also another communication channel.
Cheers,
Itai
Co-CEO
easynet search marketing
As the owner of Theoi.com I was just as shocked as talk.origin when my site suddenly dissappeared from Google.
Although I’m pleased that I was contacted yesterday by Google reps to help sort out the problem.
Have the other sites also been contacted? The spam script at the top of this page clearly lists the sites affected the hacker, like theoi.com they also appear to be innocent victims of the hack:
vvu.edu.gh
deepx.com
ugobe.com
Twan, I think that it’s a misperception that we require the webmaster to admit guilt on their part. I believe we already softened that language once. Let’s see. Right now the language is “I believe this site has violated Google’s quality guidelines in the past.” which could apply to e.g. a hacked site without implying that the webmaster did anything bad. Maybe we could still soften that language more though.
NTulip, the case you mentioned hasn’t happened in my experience. In any case, normal phishing for account info would probably be more likely, since the fraction of people enrolled in the webmaster console is lower than the total number of people with Google accounts.
JohnMu, I don’t know if we do SPF; personally I like the idea that people that sign up for the webmaster console have an incentive to give a solid email address so that we can contact them if we see issues. Well put as well, Anax.
Curtis Cameron, once a site is hacked, it can often also host malware. See e.g. http://news.netcraft.com/archives/2006/09/22/hacked_hostgator_sites_distribute_ie_exploit.html
for a recent example of that. We don’t have the cycles to dig through a hacked site to make sure that it’s 100% benign. In addition, you get these hacked sites showing up for off-topic searches. With the hacked text above, talkorigins.org might show up for “psp downloads” or “psp games,” which is clearly not a great result for users since the site doesn’t have either of those.
Zandr, I’ll check on making it a well-formed URL in our emails to improve the clarity of the message. 🙂
Gregg, good question. We do have a list of common email addresses that the emailer can select from. In addition, as we crawl the web we can detect some email addresses that are up on web pages. So potentially we might be able to email more specific aliases, assuming you left the email alias on the web somewhere.
Amit Patel, the page you mention already quotes an opinion I gave that the site may have run into issues because of the thousands of pages of duplicate hotel information on idcide.com. The site is showing up in Google search results now, so it’s unclear to me what other resolution you’d like regarding this site?
Troy Roberts, I would recommend not letting spammy posts linger on a forum if you can help it.
Seb, in this case the email from Google did list the penalty length: 60 days (unless the site owner fixed the issue and did a reinclusion request).
Chuck v, I agree that it would be better to make the email more generic and include a link to more details.
Larry Hosken, it’s not a huge surprise. Spammers deliberately misspell “bestiality” because lots of people type in “beastiality” instead.
Multi-Worded Adam, you give a good example of how 100% transparency would immediately help spammers. 🙂
Abhilash, I think the general trend is toward more webmaster communication at Google. Threads like this show that communication has been successful and that we need to do more of it, and do a better job on it. 🙂
Sorry for not reading the whole 100 comments, Matt. But.
Whatever bashing occurred on Google is some misunderstanding, to say the least. The original post seems to take quiet an unfair stance on Google. Why not just check the website against all webmaster guidelines, instead of whining? A quick glance at a page’s code should reveal everything one needs to know.
Though you don’t need my protection, be aware that some SEOs aren’t that evil and try to view the whole thing from the both sides of the fence.
I have no comments, but im proud to be the 100th comment.
Hi Matt,
I am developing a community health website at http://www.medicine.org and recently discovered that the site was removed from the google index.
From the best we can tell, this happened when one of Valueweb’s (our host) customers was caught sending spam from, or spamvertising their ValueWeb-hosted site. Then SORBS blacklisted the entire netblock, falsely implicating many innocent websites, medicine.org included.
Next, Google delists all the websites blacklisted by SORBS. In our case, http://www.medicine.org is a non-commercial healthcare website without any commercial activity. It doesn’t send email, doesn’t sell products or display any advertising.
Even though our server never relayed any spam, we still discovered some server settings needed adjusting and have made those adjustments. We’re in the process of writing a letter to Google to explain and will start using the webmaster console as a result of reading your blog.
This experience has been quite a setback but we’re learning from it and with your help, hope to overcome it soon. I blogged about here – http://www.internetinc.com/banned-by-google and will post follow up on the resolution as it happens.
Thanks for giving us some more direction.
-eric
Matt, as always more than just useful – many thanks
Well Done Matt. Bravo and I think it is the best way to run and protect users, webmasters. I mean If it wasn’t Google doing this and warning webmasters, who else can do it for you ? Thanks a lot.
As I said lately that all spammers on Google’s Serps are attacks to .edu sites and old free CMS system where holes still an easy way for hackers.
We stand by Google for this fight !
And as I mentioned before, this was your reason not going for SES in Chicago ? http://www.mattcutts.com/blog/ses-chicago-this-week/#comment-90494
Hi Matt,
After reading your post i have a doubt. I have the blog which is about entertainment. I was unable to stop the spammers (mostly viagra guys) from the blog comments area. Even though I stopped the links by removing the protocol and the a tags, But those pages display their comments (it is a big list)
Will this penalise the webmasters because there is viagra and other blocked content.
Albert
Well done, great work and great post.
It is really very good that such things get posted up. This is a clear indication of how things are now, and how they might improve yet further in the future.
Adam,
I’ve had long experience in having abuse thrown at me. Peruse the TOA and you’ll see why. This latest round simply adds another group that wants to hurl abuse, as the comments on my weblog post show.
The TOA is hosted at Lunarpages.com. A brief look at your link seems to show primarily good things said about them as a hosting company. I did check out a variety of sources for reports on hosting companies before moving the TOA to Lunarpages a few years ago.
The discussion seems to have brought out that Google actually can distinguish cracking to some degree from deliberate SE exploitation. I think finding the right balance between giving adequate information to those who show signs of having been cracked and denying a fine level of information to exploiters may not be the impossible task that some have characterized it to be. Maybe I am simply naive about this. I guess we’ll see what Google brings about in the coming months concerning de-indexing policies.
As long as you’ve got a skin for it, Wesley. You’re gonna need it for at least the next 3-4 days. It sucks to be you, although I kinda wish I were loved enough to be that hated.
*** Oversimplified explanation to follow ***
You’re right in that you’re (slightly) naive, but it’s not your fault. The problem, and the reason I asked the question in the first place, is that there are a lot of people out there who would simply put links to porn sites (or other sites) at the footer of topically unrelated sites for search engine and traffic-type reasons. There are even entire schemes devoted to this sort of thing. The reasons are generally search-engine related, and quite often target Google.
That’s why I asked the question I did…how do we know it’s a hack? There are a number of webmasters out there who would link to Debbie Does Dallas from a philosophical website for those types of reasons. It’s completely idiotic behaviour, but the quest for Page 1 in a search engine has a tendency to turn off the common sense switch (if one ever existed) in some people. I’m just glad to see that you’re not one of those people, and that you seem to have a bit of common sense.
As far as revealing of information goes, I’m personally of the belief that Google should err on the side of caution as to what to reveal. There are too many people out there that would take information big G put out there and twist, turn, manipulate and generally use it for their own self-interests, without considering the end user of the engine. (See the example I posted above.) I don’t have a problem with what I’m told, and if I do, there’s this rumour floating around that other search engines besides Google exist out there in cyberspace somewhere. I might have to use one of them. Mind you, I figure other search engines are fictional, just like leprechauns, centaurs, and Eskimos. 😉
This is a fantastic response that could have lead to a nightmare among webmasters.
While I do agree that Google could have done more; they did do what their guildlines and preceedures allowed them to. After all Google isn’t going to hack a website just remove spam because a webmasters can’t conduct an investigation on their own hacked site.
Remember boys and girls it takes two tango.
Er, how about following the RFC’s? The email convention for cases like this is not “info” or “contact” or “support” … but “abuse” or “trouble.” Matt did say they tried “webmaster,” which is another RFC convention; albeit, the most spam-laden. So, 3 out of 4 of goog’s attempts to communicate were completely arbitrary and apparently oblivious of RFC convention. From a company of goog’s statute, that strikes me as fairly inexcusable.
Arguably, the RFC is due for an update, seeing as “webmaster” has pretty much been spammed beyond all practical use. Nevertheless, until goog takes over completely, RFC’s are still the rules of the road. Sorry, but if you or I cross the double yellow lines and then suffer a head-on collision, we’ve got no one to blame but ourselves. In this case, it appears that both parties were swerving all over the road and then complaining about all the oncoming traffic. A histrionic waste of time for all involved; but then, blood does sell newspapers.
http://www.ietf.org/rfc/rfc2142.txt
Matt:
Could Google have done more? Like what, go knock on his door and tell them they have been hacked and now their site has spam?
I can understand the site owner’s frustrations however I am impressed that Google went as far as it did.
Hacked or not (cause we don’t know if it was hacked for sure or if for some strange reason the site owner did it himself…not accusing, just stating the fact that we do not know for sure who put it there) Google had to do what it did to protect it’s interest.
Thankfully, Google is willing to give feedback.
Also noticed that you mentioned “If you pick a bad search engine optimizer (SEO)…” since you mention the possibility of a bad seo, it reasons to be that you would validate that there are good / quality seo’s too, which I am glad to hear. =0)
Deindexing or penalizing sites is nothing but a good thing for webmasters and the internet in general, as it will push most of us to make our sites stronger and spammed resistance. Having a banned site increased my efforts in trying to improve it, making it more valuable and as a side note, more resistance to competitor’s complaints in the remaining searh engines. Even bans on sites that weren’t hacked but became spammy could be benefited as they could also be given the chance to improve to levels of being accepted again. However, admitting guilt in creating cheesy, low quality pages can also help Google *profile* the account owner as non-desirable. Then, any effort placed in improvements will count for nothing and there won’t be redemption.
Matt, as for the dilemma above… has your team considered publishing an up-to-the-second list of sites that get de-indexed somewhere on Google’s site? Since the current obstacles are:
1) emails not a trustworthy means of communication.
2) not everyone knowing about Webmaster Tools.
a public page could solve both problems as:
1) it will be linked and made known by every blog out there, therefore becoming a faster reference to WT.
2) the page can provide instructions on how to create an account where *confidential* information can be shared.
I learned about Webmaster Tools 2 weeks after my site disappeared and spent that time trying to figure out where I was loosing traffic or how. Not everybody checks their logs daily, but such page would have been a quick *official* admission of deindexing.
I should note that I’ve never said that Google was wrong to de-index the TOA. There was a problem with the site, and the Google index needs to be protected. That’s not my issue.
I have been with google webmaster tools for sometime and I absolutely love it. I am glad to be with a search engine who looks out for the ordinary users 🙂
First off, I don’t see any responsibility on the part of Google to let webmasters know they’ve been punished…for any reason. That being said, if it’s suspected that a hacker is involved, a simple email would be nice, but not NECESSARY.
Matt, I have a question about the code:
I don’t use this exact code, but If I were to use a javascript that by default “hides” a sitemap on the bottom of each page, am I in jeopardy of being punished? There is a link that reads: “sitemap” that shows the links on a click.
Justin wrote:
Quote:
“heh, good old String.fromCharCode() — that’s a 100% block pattern in email, it’s only used in hostile HTML ;)”
Why would you assume that String.fromCharCode() is only used in hostile HTML? I’m hoping Matt chan confirm that fromCharCode() by itself does not flag a site as hostile or spammy. It can be a great way to hide email addresses from email scraping bots and still provide a nice mailto link on the page.
Something like:
window.location = String.fromCharCode(109,97,105,108,116,111,58)+
‘chris’+String.fromCharCode(64)+domainname
would give you “mailto:chris@domainname” and make it hard from a bot to scrape the email address from the page especially if you broke it down even further. Isn’t that a good thing?
Matt,
I’m blown away by the extra effort Google makes and apparently your team. I really had no idea and as a webmaster I’m a little embarrassed that I didn’t.
I’m wondering if your team could do a simple spammy analysis of Google’s own Adwords sponsored links. If you don’t find this appropriate for this Blog, please feel free to remove.
I’ve been learning the hard way about Adwords and have become quite frustrated and although it’s not your department, it does seem like you are in a perfect position to conduct an internal audit and evaluate the effectiveness of the Adwords ad quality rating and ad selection system. I’ll try to explain in a little more detail by giving you my case study.
After reading and watching the online tutorials and documentation for the adwords program I submitted my first two ads tied to the keywords “Real Player” and “Windows Media Player”. I won’t say what my company’s product is so that this doesn’t seem like yet another ad but I’ll just say that the product is a plug-in for both of these players and leave it at that. The next morning I went online to see how our ads were doing hoping to see lots of impressions and clicks and was disappointed to find that our ad had been deactivated and the minimum CPC had been raised to $5 and $10 respectively. On submitting a support request, to Google I was told that our landing page had a poor quality rating for the keywords. The email went on to explain that sites that have poor quality rating provide a bad experience for the user and thus have to pay a higher CPC or optimize the landing page to improve the quality rating. Here is where the internal audit comes in. I decided to test this user experience to see if the ad at the top of the list provide a good user experience. My test was simple; using only the top sponsored ads and searching with the keywords “real player”, try to download the free Real Player. I encourage anyone to test this for them selves with some other product that you know of to see if Google’s quality rating system is in fact providing a quality user experience.
What I found were landing pages that promised “Click here” to download Real Player but instead took me back to Google by a round about way. I assume they must get referral payments or credits for pointing at Google. Or sites posing as search engines but are simply full of Google Ads participating in the Adwords content network again to get referral payments. And even one site that continues to be at the top of the list was a complete scam to get my credit card and email before I could to download the free version of RealPlayer. In my case none of these sponsored ads provided me with anything useful related to RealPlayer and it wasn’t until the 5 ad, which was the actual real networks site, that I found an easy way to download Real Player. I’d say that’s a pretty lousy user experience and maybe its just me but I consider all of these ads spam intended to trick the user into generating revenue without providing any useful product or service and these types of ads should be detected and pulled from Adwords or given a min CPC of $10.
Halo Matt,
I#m glad to hear what google is doing something against serach engine spam.
I thing what you do is perfectly right.
What I wonder is could you make your results availiable to the public.
very often I expierienced a relink odyssee – clicking a link, popups, redirect to
the next page popups und so on. To prevent this I thing it would help to know that a page is clean before entering ist. What I want is a box at google where I can enter a link and get an answer that this page follows your guidlines or not and the date of the last scan.
To make money you could offer to webmasters to scan their site more often , if they pay for it.
Best regards
Jens
I always assumed that google automatically baned sites spotted as spam websites without a possibility for appeal.
Good to know that you guys actually look out for us webmasters 🙂
Well done thou good and faithful servants. 🙂
Chris
“Why would you assume that String.fromCharCode() is only used in hostile HTML? ”
I’d be willing to bet Google can tell what text will be generate by a call to String.fromCharCode() . I the talkorigins.org case the text generated was “”, so it was attempting to hide text.
I think a bot could easily scrape this:
“window.location = String.fromCharCode(109,97,105,108,116,111,58)+
‘chris’+String.fromCharCode(64)+domainname
would give you “mailto:chris@domainname” and make it hard from a bot to scrape the email address”
Btw..have you all seen this great new tool for web dev?
getfirebug.com
It could help webmasters find spam on their pages quicker!
Hi Matt,
I ws recently aware of your blog I like it and especially of my interest to know more how Google does internally. Especially because I’ve some concerns that Google could become more evil than before with the constant increase of its market value and constant needs to show such increase to shareholder.
For this specific case, may be I misunderstood but I find it strange that you’ve stated that you couldn’t use the e-mail registered as the username to log in Google Webmaster Tools.
When a user registers for the Google Webmaster tool, he’s requires to enter either a valid Google Account in the form xxx@gmail.com or his own e-mail appress in the form of xxx@userdomainname.com.
Then the user’ll receive the e-mail confirmation request:
“In order to verify that the email address associated with your account is correct, we have sent an email message to xxx@hackxion.com. To activate your Google account, please access your email and click on the link provided.”
After the user has confirmed the e-mail by entering an URL (or just click on the link) in the form of:
http://www.google.com/accounts/VE?c=123412341234123412&hl=en
He’s got access to the Webmaster console to manage the following item among others:
robots.txt analysis
Manage site verification
Crawl rate
Preferred domain
Enhanced Image Search
Among the above option, the “Manage site verification” permits user to proof ownership of the web site to Google.
Anyway before going further, let’s come back to my main question.
Why could you not use the e-mail addresse used as username of Google Webmaster?
I just a question to ensure I’ve undestood well your thoughts.
Thanks.
VK
Matt,
I have another question I preferred to post separately.
You seem to say that Google did an impressive effort to handle the cracked web site by trying to contact the web site ownership.
I must say he must be in a privileged position.
I used to get some of my web pages excluded by Google index. I was never informed of the reason. I just discovered it myself when suddenly some web pages that were indexed just disappeared.
When I tried to understand what’s going on with the Google Webmaster tool, I just have the message that stated my web sites is not indexed by Google; which is a little bit limited and didn’t give me any clues of the reason.
Of course, I have to analyze what happened during the last couple of days and to conclude myself that the web page was removed because I did some 3-4 promotions through some hi-PR web sites and/or because I’ve submitted to several directories within a short time.
It was all supposition and I was just in with myself during hours in front of the nice Google Webmaster tool that just doesn’t give me enough info and real valuable/helpful info.
I even expect the exact reason to be given by the alert for the penalty, but just a simple short email with the basic alert that the web site page penalized. Period.
Unfortunately, the real thing is Google Webmaster tool doesn’t send the message about penalit at least to me and same of several of my friends.
It’s very frustrating and after such things happen more than once, some of friends finally asked my help to develop a kind of monitoring to check the existence of a specific web site in Google index, a cron task that is scheduled every 15 minutes.
Sure we’re not alone to be obliged to do so. It’s the natural way of adaptation. It’s just about creating some comfortable ways to ease the life of webmasters. If Google doesn’t give that, we’ll build it ourselves.
So to come back to the main question, why don’t you systematically alert the web site ownership (e-mail registered and ownership approved) when there’s a penalty that happens and the web site or some web pages are removed.
Thanks.
VK
Matt:
I must be a relic from the past, but all I see here are defenses of “we tried emailing to mulitple addresses, etc”. Doesn’t anyone use a phone anymore? Any legitimate business has a phone system and answering machine. If you tried calling and they never got back with you, I would say Google went above and beyond. If all you did is send emails, I am sure their spam filter prevented any of it getting through because it would have been too generic.
Doing SEO as a business, I must say this was a very informative. It’s good to know that Google is doing such a good job controlling spam sites!
Keep it up!
Bob
Ok,… this post may not be p.c. in the blog environment, but this is a cry for help to Matt ( I apologize to the members for taking space here for a personal issue).
My site http://www.tahiti-explorer.com was hacked in early Sept and a bunch of hidden links were stuffed in there. Without warning, I disappeared from the Google index.
It took me a while to figure-out there was a problem at all, and then I never figured-out what the problem was until few days ago when – lucky for me – someone looked at the cache and told me I had been hacked.
I am a small travel agent, not a webmaster, and a lot of this stuff is obscure to me. All of my business comes from the web, 100% from web searches. But I have no SEO, and when I need the site updated I use a freelance dude working out of his bedroom. I just put up good content and don’t pay too much attention to the rest (big mistake..).
So, I have nothing of value to add to this discussion about what Google should do about notifying site owners when this kind of stuff happens. Obviously it’s a complex issue which many of you have already eloquently debated.
I am not here to whine either. I am not here to rant against Google or make waves. I am just a victim of hackers who want his life back. I am here to beg.
I am here to beg Matt to please intervene so my site gets un-banned (you did help the people at talkorigins.org who had a similar problem).
I know, Matt, that in the universe of Google I am insignificant, and that this request may be out of line. But I have to try as I am desperate, and a few minutes of your time can make a huge difference to my survival.
I am not asking for any favor other than having the ban lifted. I have built my business on the web through my good ranking in Google over 10 years of doing the right thing – consistently providing in-depth and honest content in an attractive site. Not by luck, and not with tricks or gimmicks.
I’ve now cleaned-up the site and submitted a request for reindexing which seems to be just a long shot in the dark. While I wait, I have had to lay off 1 employee (I only have 2) and my family is getting worried. I can’t afford the rates of Adwords because it takes an average of 6 months to collect on travel sales.
I thank you in advance, Matt, for your consideration, and again, apologies to everyone for the sob story. It is rather embarrassing, but I feel totally helpless and this blog is the only little light I can see in my tunnel.
Best wishes to all.
Yves
oh my God… I AM UP again in the index!!
thank you thank thank you!!!
I don’t know if it was you Matt, but thank you anyway for being here, and for giving us the opportunity to communicate with someone at the Big G.
finally ….I’ll sleep tonight…
🙂
Yves
p.s. free trip to Tahiti for everyone!
(… just kidding 😉
I think you often don’t get enough credit for what you do. Keep up the good work.
An update on the TalkOrigins Archive (TOA) pages: The cracker was still active through today despite various efforts to get him locked out of our LunarPages.com site. Our latest action: a complete wipe of the public directory followed by starting to restore content from an apparent pre-cracker backup, where all *.sh, *.pl, *.asp, *.shtml, and *.php files have been removed, as well as all files in cgi-bin and scgi-bin.
By the things that the cracker did yesterday afternoon, I’m inferring that his very specific aim was to get Google to de-index the TOA. I’ve sent Matt the details on that by email.
Here is my question, where do you submit your questions if you think your site is been penalized and you want to know what you need to change, before you submit a reinclusion request?
also what is the email for reinclusion
thanks
David Smith, the fact is that email is more scalable than phone. An email like this could be sent in under a minute. But I’ve seen Adam Lasnik try to talk someone through a webmaster problem on the phone for 20-30 minutes. Given the sheer number of sites out there, I don’t expect that we’d be able to help people by phone.
Imo you don’t even need to defend yourself on a matter like this. I’m sure that if the site owner spend his energy on a more positive way to contact Google it could have been solved quickly. But, on the other hand, you seem to care and if it keeps you sharp and prevents good sites from being penalized I guess it’s a good thing this happens once in a while. At least you get to prove them wrong.
I guess it comes with being the nr1 search engine but it sometimes surprises me how people claim being indexed in Google as if it’s right. Google is responsible for Google, and although I feel for the victim of the cracker, the webmaster is responsible for the site’s content. Just as it is the webmaster’s own responsibility to know of the webmaster console. I mean if your site is removed from the index, you probably want to contact google, and if you go to google.com to find a way to contact google, you can hardly miss the webmasters tools. I rather see Google fight spam harder than spend so much energy on an individual case.
Johan,
I tried Webmaster Tools. I tried the telephone. I didn’t see “a more positive way” to initiate contact with Google; maybe I missed something, in which case I’m sure that somone will point out the link I missed or the phone number that led to a real person on a page that I overlooked. So far, though, no one has done so.
And the situation was resolved quickly. I had the fix done within a few hours of finding out about the de-indexing, and Google processed the reinclusion request quickly, so the TOA was only off the Google index from 11/30 or so until 12/05.
I agree with Johan that being indexed at Google is a privilege, as I said in my weblog post on 12/03: “Certainly, they [Google] have the responsibility to keep their index from giving unwarranted weight to cheaters. That is not at issue here.”
There is a symbiosis here. Google does not produce the preponderance of content, they index it and allow users to find relevant content by their service. The webmasters actually generate the content without which the users wouldn’t care whether an index existed or not.
It turns out in this individual case, the attack was aimed at removing my website from the Google index. The symbiosis was working all too well in the estimation of the cracker, apparently. Should Google ignore the fact that crackers now are specifically aiming to harm websites by disrupting Google’s relationship with those sites? Concentrating on spam to the exclusion of cracking that itself is exploiting knowledge of what will trigger a Google de-indexing doesn’t look like the answer to me. Because Google did open up a line of communication, things did get resolved more quickly and more amicably, and Google now knows a bit more about a non-spam situation that also targets their algorithms for evaluating a website.
Hello again.
I noticed that my comment was removed – talk about having censorhip at your discretion – so I’ll re-post it.
Someone on a website suggested that google could integrate google mobile into google webmaster central.
You send an sms to google webmaster central and you get an update on your site’s status.
Does that sound like a good idea to you?
Thanks.
Hello Wesley,
First let me say I’m sorry to hear about the unfortunate events, you have an interesting site. After reading this your reply above, I can only conclude the same: you would have been, and you will be, better off if you use that energy more positively (like securing your website). I think your writing ability has to do with the extra attention this seems to get. That’s a compliment btw.
“Should Google ignore the fact that crackers now are specifically aiming to harm websites by disrupting Google’s relationship with those sites?”
Not ignore that fact, but imo they did the right thing and their policy shouldn’t be influenced by the intention of crackers ‘towards your site’. As I mentioned in my previous post, the webmaster is responsible for the website. You can put it however you want, but that fact remains. I think the first person who replied to your blog, comparing the situation to what an ISP would do, is spot on. The intentions of a cracker targetting your site are irrelevant, it’s the content that matters. Who cares who put it on there.
Almost every device hooked up to the Internet is being attacked as I write this. You seem to be very sure it was a targetted attack. I have my doubts, but don’t have enough information. Regardless, I praise Google for not having an attack on your site influence the index negatively for ‘me’. And I bet it encourages you to make sure your site is clean. You seem to think that because you think the attacker was actual trying to deindex your site, Google should somehow treat the situation differently. Obviously it doesn’t and cannot even work that way. It also seems that because you cannot determine who attacked you, you are taking it out on Google. Which imo is wrong, very wrong. It’s not “Me against Google” it’s “The cracker against me”. You could have installed TripWire in less time than it takes you to write up these blog entries.
I run a handful of sites, and they are all directly responsible for my income. It would be bad for me personally, and a lot of users, if they would be deindexed, but I rather have Google temporarily deindex any of my sites if any of those links/words were on them and leave it up to me to correct the issue. In your case the head of spam at Google even helps to solve the issue.
“I tried Webmaster Tools. I tried the telephone. I didn’t see “a more positive way” to initiate contact with Google; maybe I missed something, in which case I’m sure that somone will point out the link I missed or the phone number that led to a real person on a page that I overlooked”
I said more positive, not a ‘different way’. I don’t believe there’s was actually someone at Google ignoring your attempts to contact them. “oh there’s that guy with the xxxx and xxx stuff again”.
I wish you good luck running your site!
How can the Algo decide when a site is penalized or not? When is there a definitive penalize bit set?
Since Google is constantly updating their index, I don’t see when they could be sending an email telling the web master that the site has been penalized. It would be nice to get an indication other than “No pages is in our index….” since we already know that! No reason to even display that message at all.
I see in the posts here that several travel sites have problems and my theory is that travel sites are targeted by scraper sites and spammers just because of AdSense since a click is worth a lot of money, and that Google constantly remove travel sites because of this. Since Google uses an Algo, a lot of clean sites also gets hit by this filter and have to work their way back into the index by trying to contact Google in one of the few ways that exists. Since it is a filter in the Algo, Google needs to change the Filter not to include clean sites, and this takes time and doesn’t always work.
I have a site that Google has removed and even after reinclusion requests it never made it back into the index, even though Google crawls the site every day. I can see thousadns of pages with scraped content from my site but the site itself is nowhere to be found.
Every page of my site was stripped from the index on November 4, and there was showing 403 HTTP header errors for every directory on my site. Then It came back and was listed again December 5. Everything was alright according to the diagnostic tools in the webmaster tools directory – no errors.
Dec 10, again I see there are HTTP header errors for every directory again, and there is no apparent cause in any of my directories. All of these errors are 403/4xx related.
Is there anyway to find out what is blocking Googlebot from my site? When I try any of the directories that Google has been having trouble reaching for the past few days again, I have no problem. And according to my logs, nobody else does either. I am certain that the site is not causing these errors as outlined in the “What are HTTP errors?” faq page. The diagnostic panel doesnt really help either, as the information it gives is limited.
Matt Cutts I thin kyou shoudl repond to this post demanding an explanition about these hacked pages that google hasn’t removed or done anything about.
http://www.seonewsblog.com/google-and-hacked-edu-pages
Thank You
My website got hacked back in April in a night from Saturday on Sunday ( I know it cos I spent whole Sunday checking things ). As I woke up and went to check the counter log file I noticed that the latest changes happened to a subdomain folder. I didn’t change it for months so it was suspicious that it showed up as first in the list ( server files listed by date ), so I checked what’s in the folder and found things that didn’t belog there. Somebody, somehow gained access to my account and installed an entire site. I did some researches about this script and the site this came from and what it was. . . It was an exact copy of the files this attack came from, so the entire content was copied/installed on my site aswell, and other could use these scripts to hack other sites. I had luck that I discovered this about five hours after the hack and could prevent more dmanage to my site and to other sites. I deleted the entire folder out of panic, but I should have done a backup copy so I would know what exactly this was and could have sent it in to the server administrator, which told me that there was no sign of an actuall site hack . . . which was and still is confusing.
I talked to some web security experts and I was told that this things got on my site due to some unsecure php scripts, brutoforce attack on my account or maybe because my computer got hacked and people ganed access this way to the ftp account . . .
The day after this shock I got another punch in the stomach related to my website, and since then I try to keep all those bad things aways from my site, even more than I did before those things happened.
It’s a very time consuming proccess to keep everything clean, to block all bad things, check new codes, test new better secury options etc. Right now it takes me something more than one hour to go throught all this things, well if I do it daily. I was away for nine days and I was shocked when I saw the log file. If the bad bots, IP’s User Agents are not blocked the first two days, then they come more often and in a much bigger amount than when they’re blocked in these first two days. From time to time I’m about to give up the fight on this, but I worked too hard to keep everything running just to make them win, so I get back into the fight.
About the e-mail address . . . I had those standard/usual addresses and I got nuked with spams, but then I deleted all mails and put only two on my website, the one I have from hotmail and the one from gmail. This way the spam’s filtered out and people still can contact me.
Just my 2cts, but I think it is bloody good service that Google notifies a webmaster of anything related to their indexation. People quickly forget that in the end Google is offering a service to their users, not to webmasters. Google is a tool for web surfers to find information / the sites they are interested in. Nobody is interested in a page with spam on it and it is therefore in the interest of all but the webmaster to remove such a page. The fact you get notified is excellent service, personally I think the webmaster should be aware of what goes on on his site.
In most cases if the webmaster isn’t aware of issues with his site, the site is old/outdated and/or of no relevance to surfers. Yes, occasionally somebody will get hit by this even though there was no bad intention, but does that really remove Google’s need to provide surfers with the most relevant information? No.
It is up to the webmaster to ensure that the content remains useful, and follows the terms of service. This is not, and will never be the responsibility of Google.
I think a lot of people expect way too much of Google, in the end it is a business ran by humans and you can’t expect them to obey your every whim just because they happen to be successful and make ‘some’ money in the process. You wouldn’t expect Microsoft to send you an e-mail when your graphics card breaks down and your computer no longer works either now would you?
It’s good to know Google is taking care of websites like that. I just had a nice spammy mess all over my laptop reading this post while looking for SEO information.
I’m glad Google is still the ‘not quite evil empire’. Keep it up!
I have to say the sense of entitlement and responsibility-passing amazes me sometimes. From what I’ve read, I believe Google went above and beyond what should’ve been expected by any reasonable adult. Yes, it’s a sad state to get hacked, but is it Google’s fault that you were hacked? Where does the accountability of the site owner/webmaster come in? It would stand to reason that the majority of sites are hacked because of poorly maintained scripts…if you can’t properly take care of your site, you shouldn’t have one. Expecting someone else to take care of business isn’t realistic or reasonable. As a very frequent user of Google, I would prefer not to see your spammed/hacked site in my search results. To me, that’s one of Google’s primary jobs…and I’m glad they’re doing it.
It was a pretty good action on Google’s part to inform the Talk.Origins webmaster about the de-indexing taking into consideration that the site is legitimate and has been hacked. I am sure no other search engine would be following this practice except Google. But what’s the most shocking aspect of this is inspite of emailing at various email addresses including contact at talkorigins.org, info at talkorigins.org, support at talkorigins.org, and webmaster at talkorigins.org none of the emails were noticed and accordingly action was taken.
Savoni says the new law violates his privacy, comparing it to America’s antiterrorism law that allows authorities to monitor Internet use without notifying the person in question.
Firt think that comes to my mind is that. If I am a so-called seo who put spam on a website (or a hacker who do the same). I can also add the website to my list on Google webconsole because lots of site owners does not know that website and most of them also doesnot have any interest for technical issues. So if you give information on google web console, I can saw it on the console as a hacker or so-called seo.Then I can remove it and try to find other ways to do the same thing after Google recrawled the website.
In this case, the email address which the domain registered can be more trustful than the email which is registered through google webmaster console.
Moreover most of small companies works with IT companies or web designer who do everything for them. So a trusted guy for that company put spam on pages easily.
So, how can you Google understand that a spam is because of a hacker or a so-called seo?
And I am appriciated by the information about hacked websites, but if it is a bad seo practices, what will be different? Do you take the same way for that websites?
Thanks for helpful information.
Matt,
Don’t let this guy get you down, I feel that Google did everything they could to notify this person that his system was compromised and last I checked that wasn’t the business of Google.
I have been in the security field for a long long long time, and one thing I have realized is when someone gets hacked they are always looking for someone to blame, he obviously refuses to take any responsibility for his part and if I where an employee at Google I’d ban his site all together for just being a cry baby…
Its a good thing nice people like you work for Google and not people like me 🙂
Hope you have a great new year!
Clever how the hackers used JavaScript to show the div tag, and printed the rest in straight out text. Unfortunately, today’s web technologies sometimes help the bad guys.
Yes, but what if your site is removed without you knowing if it is a hacker victim? I can tell you, I have worked my fingers to the bone on my site for three years… carefully following Google’s Guidelines. Today, all the links to my site are gone. My home page traffic is floundering. It’s the worse day of my life–all those years, days, nights and hours wasted.
I can’t stop crying. . . It feels like a kick in the stomach when you’re constantly building up a company and telling people how to do what that company likes and BAM–suddenly you’re kicked off.
Mean.
Matt: I managed a site in mid-2005 that was spammed and I cleaned it up within 48 hours, but it was banned by Google without notice for two months, despite my emails begging for help. When did Google start notifying webmasters of such problems, or reinstating quickly?
Matt –
I don’t mean to be funny here, but wouldn’t putting all those obscene terms on THIS blog get your site deindexed?
I’ve been reading that many individuals’ “perfectly good” sites with original content and no violations have suddenly disappeared from the index. I myself received no email or warning in my webmasters console, but one of my sites is completely gone. The console says:
“Googlebot last successfully accessed your home page on Jan 15, 2007. No pages from your site are currently included in Google’s index. Indexing can take time. You may find it helpful to review our information for webmasters and webmaster guidelines.”
It appears that I have made no violation, and I’ve seen others having received the exact same message, with their sites suddenly deindexed. Might I suggest that if Google “burps” in this manner they make a public statement that their indexing methodology is not perfect? Would it not help to address the panicked mob? I’m serious that this sort of thing could cause someone to have a heart attack – I myself had a panic attack when I saw it, as if I’d made some egregious error with the principal… Moreover, the financial hardship something like this could cause could be tremendous, although, of course, “we shouldn’t rely on Google for traffic, etc.” The point is, however, that this sort of bugginess COULD cause serious injury. What if a site that caters to victims of crime was suddenly unreachable because it had been deindexed via a bug?
I appreciate you being available and hope that Google can do more in the future in regard to situations like this. I am also very hopeful that, as the message I received indicates, my site will be up and running in a short while.
I also wonder if it is true that filing a reinclusion request constitutes an admission of guilt and that it absolves Google of responsibility.
Matt
Is there any way to contact Google if your site does get hacked? About 6 weeks ago we found over 8,000 spam links appear in Google which appeared to be URLs from our website. After digging about we found someone had hacked into a directory on our server and dropped a little 404 URL rewrite and a redirect that directed to their website. Then by creating a ton of non-existent links to that directory from other sites… all the links redirect to their own. We quickly found the malicious script (so now all the links now report a 404), secured the directory against future hacks and, resubmitted an up to date Google sitemap with all the correct URLs. This was a month ago and rather than the links disappeared like we hoped a further 10,000 have just appeared in google bringing the total to around 18,000 non-existant spammy links.
We are obviously concerned that this will hurt us as alot of the links have porn or hacking stuff in the URLs. Is there anything else we can do to get the links removed?
Joe (UK)
Matt:
Our site was hacked about a year ago, but we only discovered it recently. The hackers installed hundreds of malicious pages on our server unknown to us (and not visible to regular site visitors). These pages dealt with with: porn, online pharmacy/drugs, gambling, etc.
When I checked using Google Webmaster Tools (Page Analysis), it showed that common words “In your site’s content” was primarily full of these bogus terms, and, the words “In external links to your site” was showing almost 100% malicious terms. Needless to say, out former top ranking and Google traffic has been demolished, and we rank only #50 for our *own name*!
To correct, we flushed out our server, reset password, and freshly installed our site. Using Google URL removal tool, we had all the malicious pages we could find succesfully removed from the index. Also, we set up 409 redirects (“page gone”) in our .htaccess file. for these bad pages.
We made a Google re-inclusion request, but this will probably be ignored because our site was actually successfully being indexed (just not for the correct, legitimate terms!).
Just checked Webmaster Tools again, and eureka, the common words “In your site’s content” seem to be OK now, BUT, the words “In external links to your site” is still showing almost 100% malicious terms. Since Google places heavy emphasis on incoming links, I fear that our site has been permanently destroyed because there is no way we can stop those bad 3rd party links (the hackers must have set up some sort of interlinked network of fake pages on many other sites). There are thousands of these bad and unrelated incoming links, and they probably far outweigh our on-topic legitimate links.
What can we do? Is our site permanently “finished” in the eyes of Google?
Any advice or assistance would be greatly appreciated.
Thanks,
SS
My site was just hacked. One of my newsletter folders and blog folders (so far) and did not realize it until I looked at the keywords in webmaster tools that I just signed up for – boy they were not my keywords!!
Two things, at least, here:
1. You say it is the web masters responsibility but how on earth can anyone keep on top of things that are going on in the background when it is coming through hosting companies and blogs (for example). My example is so similar to the one you site above, and when a search was done I saw ‘Brown University”, “Duke University” and many many others that may not be known by name but are going through the same hack.
2. Why is Google indexing this sick porn in the first place. Web masters would not have to worry about it if you had a filter – or at the very least put them in their own search index and leave businesses alone.
Jan
Your spam protection for commenting is too hard, how am I supposed to know what 3 + 9 is.
But anyways, good insight into how google deals with hacked websites. However, if you type text containg rape and animal porn you still come up with alot of perverted sites, often containing harmful scripts waiting to be run on un-protected users. Why aren’t these sites automatically black listed? Or maybe I didn’t grasp the concept.
Great article about what to do when a site hacked and google deindex that site. My site http://www.morocco-moroccan.com was hacked and deindexed by google though I fixed my site and inform google about this matter, google not reindex my site till now. So I created another site http://www.moroccodir.com and it was indexed by google but its page rank parameter is grey. Is my new site also get/well get ban by google?
Thanks.
Hi Matt:
Your article was good reading. On Monday of this week Sep 17, 2007, I received a message from Google telling me that some pages(my Index page) from my site had been removed for 30 days, due to hidden data on the page.
They sent us the hidden data.
We did find that there was hidden data on the page. The hidden data, which we had no idea was there, was a duplication of our sites description. The only thing we can surmise is that during the original construction of our site 7 years ago, the web page designer copied, or started a description on the web page which was later revised.
I am not a web designer, but I have assumed the position of a webmaster on my site. After our site was designed, I took control of the site, and edit the site with Front Page. I do not use code. Period.
Because I do not use code, I cannot see the code on the site. Using Front Page, I do not need code. I have no idea if something else is on my site because I never look at the code. It is a foreign language to me.
I have taken the necessary steps for to have the pages reincluded.
However, I do have some serious issues with the manner in which Google handled this, and the consequences it has had on my small business.
It seems strange, that Google does not send out some sort of warning letting someone know that their site is not in compliance with Googles Policies, and allow them time to correct them.
To just remove pages from your search engine has great consequences on a small business. Our internet sales dropped over 50% over the weekend, after our pages were removed from the search engine.
In the seven years we have been on the web, we have gained good positons for our products, now our products are not listed at all, and we have no idea what kind of placement we will get when we are reinstated.
Our site does not even show up on google search. It’s like we have disappeared off the face of the earth.
Based on the email I received, we will be removed for 30 days. This will have a terrible effect on our business, in the busiest part of our season.
When looking at the information that was hidden, the employee at Google should have been able to determine that this was not an intentional act. He could readily see that it was a duplication of the pages description.
We had no idea it was there, but we are paying a huge price for a small error.
Can you give me any suggestions. We have found that we cannot contact Google.
Ken Fallaw
I will probably be out of business by the time you read this, but I might as well write this because it may help someone else. I love Google– at least until this week. My site was hacked. We removed offending script that added 100 pages of spam to our site for helping teachers work with traumatized children. Even AFTER the intrusion was repaired, hacker keeps posting 100 pages PER DAY of porn, drugs and cell phone links that appear to be part of my site for helping children who struggle. I have requested reinclusion, updated my robots.txt file, filed a complaint with ic3.gov, and done everything people say to do. My site still cannot be found for any keyword that it normally could be found for. Traffic is off by 99% and I will be gone within weeks. There is no way to reach a live person at Google. Spiders don’t understand hacks. They just add more of the junk pages. If you research this issue, hacked sites are a huge problem yet there is no ombudsman at Google to turn to. I know Google means well but I was victimized by the hacker then by Google. I think Google needs a human ombudsman available only to webmasters who have been hacked AND file an FBI ic3.gov report. The penalities for filing a false report are substantial. I think few people would do do that. Google needs some human mechanism to stop hackers from being able to destroy legitimate sites. Now, all one can do is sit back and wait and wait and hope to be crawled soon and that the spider will eventually make all the right interpretations and corrections. Until then, the hacker wins. Google, please consider offering a human resource when a crime has occured. Until then, the hacker has 900 pages on my site, I have only a 100 . That is why I may soon be gone.
My site just got hit by a new WordPress hack. The hacker left hidden text that got my personal blog deindexed. I’ve already cleaned the hidden text out, and applied for reinclusion.
I’d advise Matt and all other WordPress site owners to check for infection. This hack looks nasty.
Ruth Wells
I think you could overcome your problem with a properly constructed robots.txt, first with allows for all pages that are part of your site, then a disallow for the entire site.
Google manipulates public opinion in Brazil
In 2010 there will be an election in Brazil and one of the candidates, the governor of Sao Paulo, Jose Serra of the PSDB – Conservative Party, a rival of President Lula – has always been protected by the media (TV’s, newspapers and magazines in Brazil are in the hands of conservatives).
On 13/11/2009 there was a serious accident in one of the works of the governor, called Rodoanel (like “Ring Road”). There are many allegations of corruption in this work.
I saw on a site that Serra had ordered to Google to not publish the accident in the Rodoanel. At first, I thought that’s a joke. I like Google and did not believe there was this kind of manipulation. I always knew of the manipulation in newspapers and magazines, but in Google? I could not believe it…
Well, try to find a picture of the Rodoanel accident there. I tried but I could not find anything. Not a single image. The complaint is on this site (portuguese): http://cloacanews.blogspot.com/2009/11/serra-mandou-google-obedeceu-sistema-de.html
This is not scary?
(sorry my poor english)
I know this is years later, but i was just reading this post, and was wondering how have these issues changed since then. is it easier for you now to contact owners, or, what is the current format and language of spam emails?
also, i was curious if you are able to tell us a little about how you handled the gawker hack from google’s end. being that it is a very major site, do you delist it? because even though it was hacked, users still wanted to find it. etc.
That is to be expected in a long-term, high-risk project like ours. So, we turned to the blogging community for help – and got it! We have published our problems, and the community responded with results!
Thanks Matt!
My personal website has been hacked and when you type in my web address a big red sign pops up that says “Visiting this site may harm your computer, this site contains malware. . . etc. ” Basically no one can access my site as it clearly has been hacked. This is a site I generate work from and it’s been out of commission for a few months now and I can’t seem to figure out how to get it fixed! Have called my hosting company, my web designer etc. Please help???!!! 🙁
Miranda
I’ve just tried to access ’10 Steps to marketing your business on Facebook’ in Google and get a page proudly declaring that the page has been hacked by someone with the highly imaginative name of Hacker. Does that mean I can’t access that page again?