Step into my shoes…

When you work at a search engine, you hear complaints from site owners when they don’t show up in Google. Even if they say their site is 100% clean, it’s often worth a second look. Recently I saw an email about a family-oriented site (potato salad recipes, snack recipes, Christmas crafts, things like that). Their site wasn’t showing up in Google at all and they wanted to know why. I’ll share my reply with you.

I’m not going to be shy about mentioning concrete details, but for this time I’ll anonymize my reply. Here’s what I said:

Hi, I saw that you wrote to user support asking about We recently launched some technology
that looks for duplicate or scraped content. In this case, our
algorithms calculated that there were some strange pages on this domain.

Without know which pages the algorithm detected, I looked around a bit.
I’m guessing
probably kicked things off. This appears to be a scraped copy of
In fact, the page on appears to have not one,
but two copies (one right after the other) of the original page from the
FTC? Also,
appears to be a copy of
as well.

It also looked like there were some pages such as
that were selling legal forms? It’s good that these pages have been
removed; it didn’t seem like legal agreements such as
“Contract Employing Real Estate Broker for Lease of Property” were
really relevant to

Other problems that I noticed included pages such as
These pages appeared to be identical pages copied from XXXXXXXXXXXXX?

Finally, what can you tell me about
I noticed that the contact info on WHOIS for that domain is
XXXXXXXXX XXXXXX (; is this the same
person that owns webmastersXXXXXXXXX seems
to have some things we discourage, such as automated submission of urls
to Google, an affiliate link to get “15,000 links with one click” and
selling software on
that includes things like an “Instant Site Maker”, which sounds like it
may be making doorway sites quickly instead of creating useful content.

Anyway, I didn’t have a long time to check out the site you mentioned,
but these were a few things that I noticed. I’d recommend pulling the
pages copied from the FTC and XXXXXXXXXXXXX at a minimum, and taking a
hard look as some of these other issues as well.

Matt Cutts
Google Software Engineer

A little while later I sent a follow-up email:

Also, someone appeared to be scraping Google to create pages such as
I can see where someone was putting up pages that just scraped Google
search results for “ringtones”. I can tell Google was being scraped
because the text didn’t copy some characters correctly. I’d
recommend looking into how scraped results from Google ended up at
the root level of this domain. If a particular scraping program was
making these pages, I would recommend not using that software.

Matt Cutts
Google Software Engineer

You really don’t want to get the sincere email from me. Now is this case beyond repair? No, it’s not. If this site cleans up the scraped/copied/off-topic content, it can still be reincluded.

51 Responses to Step into my shoes… (Leave a comment)

  1. That is classic – this is exactly the kind of stuff I envision when I skim through all the “I got booted for no reason” threads in a certain well known forum.

  2. Woohoo!

    Matt has a blog, let the games begin!


  3. Gee Whiz, I wish I got letters like that from Google. That’s a blueprint for reinclusion, which would have saved my job one time πŸ™‚

    I find it strange that linking to a site that sells something that’s probably a scraper could get you banned. I’ve never scraped, but that doesn’t mean I might not buy a scraper, and I can certainly see myself downloading something like that via a link from, for example Mr Ploppy. Is this an attempt to stop the promulgation of tools that you or Google as policy don’t like? That seems like a potentially arbitrary point to demarcate the “useful content” line.

    I think it’s great that you made a blog.

  4. Harith Al-Jibury

    Why donΒ΄t you just turn pages with duplicate content to Supplemental Results in order to make webmasters aware of such duplicates and allow them to remove such duplicates by themselves.

  5. I’d be interested in hearing how much of your working week is taken up with what could loosely be described as “spam fighting” and how much that affects you. Do you find it satisfying or does it become a little soul destroying after a while, you know a bit same old same old?

    On another matter will we be seeing you in Edinburgh in September?

    Finally a friend asked me to ask [well its a friend of a friend actually] what would you be asking per month for ROS footer links? πŸ™‚

  6. Congratulations for your blog… i,m reading you. (sorry for my english :P)

  7. Thanks for sharing your experience with us!

    It must be tiring reading all these similar emails everyday. But most people don’t really know the “rules”. I for one, do not understand why my site do not show up on Google as well. And the description does not show up like you described in your Metallica post, even though it is in the ODP.

  8. Hey Matt… Great post but maybe you help clear up some of the confusion that I am feeling after reading it. I have always thought that there was NOT a duplicate content filter that would penalize a site for having the same data as another site.

    If there is a filter then how would the engine know who had the data on their site first? Also, if this was true and I wanted to hurt my competitors rankings then it would seem possible that I could create thousands of pages that displayed my competitor’s content and this would negatively affect their rankings.

    I have always thought that whole idea behind RSS/XML,etc. data feeds are to take data from one site and display it on another site. If Google and other engines are penalizing for this then why are so many sites adding feeds to their sites? Your blog has a link to your feed located at and if I subscribe to your blog data feed then your blog’s content will instantly show up on my site. Would Google and other engines penalized me for doing this? Just trying to understand….

  9. Do you respond to every e-mail of that kind??

  10. Thanks for the insight Matt – Nice to know that when someone has done so many things wrong in the past that they can still get their domain cleaned up and reincluded. So many Mom and Pop eCommerce shops fall prey to shaddy SEO companies. Nice to know that life isn’t over for them.

    I really look forward to the insights you’ll provide on this Blog.

    I see your site isn’t yet indexed by Google… Maybe you should bring some latte’s down to the engineers. =)

  11. Matt,

    Thanks for taking the time put this info out there. It’s not like you guys are busy or anything. Enjoyed the Google Dance, especially “Meet the Engineers.” Mom says hi.


  12. Hi Matt,

    I met you today at SES Conference in San Jose, and I have to tell you… you were the best among the panelists. Good sense of humor, Good advise about linking… Real useful stuf…

    Thanks a lot for clarifying my questions, and I wish you very Good Luck with your blog site,

    – Praveen

  13. quit whining, hhh. coming to edinburgh, matt?

  14. What bothers me, and probably others, is that the rules only ever seem to apply to small operators. Big corporations and sites that advertise heavily on google seem to be able to do anything they want and get away with it.

  15. Hi Matt,

    This is a great example of what not to do and how to fix it πŸ™‚

    Question: Is there a way to ask you (privately) about a particular popular website that seems to be having problems appearing in Google?

    First, we thought it was a robots.txt issue… But upon looking at it a bit further, we think there is some other problem… The “site:” command lists way too many (thousands) pages for that site… Even though the robots.txt explicitly “asks” not to go to that directory.

    We want to fix things, but are not sure what’s wrong.


  16. Amazing! How many mails like that do you have to write in a day?

  17. How many sites do you take the time to evaluate in such great lengths? I had a very well established portal site banned 2 weeks ago because of a screwup by an SEO firm we hired 3 years ago (aparantly we were linking into several link farms).

    Google has been fairly helpful in letting us know what the steps are to be reincluded, but it would have been GREAT to actually find out what was wrong, as we fixed a few things (dumped the links) and emailed them back but still haven’t heard and are wondering if we found the problem issue, or if there is something else that is stopping our inclusion.

  18. There is a community of multi-bloggers growing rapidly on the web and some are genuinely interested in providing quality original content targeted to multiple niche audiences. At the same time others are attempting to create a lot of ‘nothing’ in the hopes of capitalizing off contextual ads and/or other quick buck schemes; a practice I personally deplore. (and other blog platforms) already seem to foster multiple blogs (according to their FAQs) but beyond providing the ability to create them, the tools for multi-blogging are a bit scant at the moment and I have attempted to communicate suggestions to staff (but that’s in their court, so let’s get to the topic I think you might address).

    I am an unabashed happy multi-blogger and internet consultant. I’d like your professional feedback on the challenges and need for this type of personalized publishing as it relates to Google and content issues. If you are not familiar with multi-blogging, my story might prove interesting and educational.

    I currently run 100 blogs on Blogger, each with a core topical theme, but yet, each is targeted and personalized to a target audience. I used to run one blog that attempted to be a “one-size-fits-all” internet marketing tips blog (my profession is that of an internet marketing consultant), but ONE blog just would not work for me. As I wrote content each week, I felt like I was putting “stretch-pants” on every reader, hoping I could make my content fit.

    My challenge was that my blog audience came from many different companies and were not attracted to such a broad scoped ‘generic’ content stream that one blog allowed. If you’re a ‘blog owner’ you want to read marketing tips for ‘bloggers’, while a web site owner wants to explore internet marketing tips for ‘web site owners’ – not tips for bloggers.

    I have clients ranging from Mary Kay Cosmetics or Avon all the way to your ma & pa at home business owners and entrepreneurs. They all want help in online marketing (which my blog might offer) but here’s the challenge…. An Avon rep won’t read a blog called “Marketing Tips for Tomboy Tools” and a male franchise owner won’t read a blog called “Internet Strategies for Woman in Business.” Even if the strategies and tips are much the same for all.

    This challenge led me to create a specialized software (I call it Blog-zilla) specifically for true niche market multi-blogging. Blog-zilla is a blog publishing system that takes a core original article (my blog content) and personalizes and customizes that content for each blog audience. It can posts across many blog platforms (though I use and can augment content with matching RSS feeds (if desired) but does not allow RSS only posts.

    I now easily maintain 100 targeted blogs in about the same time it would take me to run 2 blogs the uni-blog way.

    I now offer the web-based software I created for my personal needs to others, but with a big caveat… knowing that software can be abused or misused for spamming and content thievery (and I personally abhor that), I’ve established strict policies on what is permissible and what is not when multi-blogging with my web-based software. Potential subscribers must complete a qualifying application and go through a 1 week training and monitoring period before they become officially licensed for continued use. Then ongoing training supports the needs of this multi-blogging community with guidance from me in responsible publishing and marketing concepts.

    In the end, I think both blog publishers and the blog readers benefit. I expect multi-blogging to become popular, perhaps not as popular as podcasting, but it is a quality solution for certain publishers; especially for online businesses and entrepreneurs.

    Heres a Yahoo news story on the software I created:

    Here are our Anti-spam, Anti-Thievery policy:

    Here is a TagCloud of my 100 Blogger blogs:
    (TagCloud actually liked my concept so well, they’ve partnered with me)

    And here (of course) is my Blogger profile and 100 blogs:

    I’ve been multi-blogging for over 6 weeks now and my feedback is positive. Readers enjoy reading a blog that is personalized to them rather than the ‘generic’ no-name one-size-fit-all-stretch-pants variety. It’s really no different than personalizing a newsletter as you send it out to your opt-in list… each recipient reader much prefers a personal touch.

    My question is… will my original content (which is personalized to my various readers) be acceptable to Google? I attempt to vary and personalize things so that a minimum of duplication occurs, but because each lesson or tip I post has common elements, there is some overlap. My test indicate posts are between 25% to 75% different from blog to blog to blog and all content is author originated.

    My current lessons have been focused on Pay-per-click marketing strategies as I attempt to help people in my industry learn how to run PPC ads.

    In closing, make note I’m not specifically blogging for search engine rankings (that is secondary for me), I’m blogging to communicate with my various audiences. But, I’m really curious if you have an opinion on this.

    Happy blogging!

  19. Hi there Matt,

    I met you at SES ( The Web Hosting Guy ). Firstly I want to say that meeting you and some of the other engineers was an honor and a treat. The way you all handled 30 – 40 people huddled around you and barraging you with questions was exemplary. Thank you again for your time and thoughtful answers.

    Excellent post about a real situation where someone is doing something to be banned and then wonders why they are not included. How about a success story where you have found someone to be removed from the index without good cause (glitch) and was reincluded?

  20. That’s hilarious!

    It reminds me of the time when I was handling customer support for a local cable company (ah, the joys of temp’ing!), and I got a really vulgar earful from some guy who was furious that a movie on HBO had cut out “just at the &*@$*!’ing best part!” He insisted that we “$&*!#’in fix it!” right away.

    I looked up his account, noted that he didn’t subscribe to any premium services (including HBO), and assured him that I’d make sure he was taken care of right away.

    I sent two of our burliest and scariest installer guys out to his house, who then basically told the idiot that he could either give up his black box or he’d be facing quite a bit of (unspecified) unpleasantness. He indeed gave up his black box, and I wish I could have seen his face.

    To this day, it never ceases to amaze me how people can be not only so ethically challenged AND stupid at the same time. Then again, I suppose that’s much more desireable than evil + brilliant, right? πŸ˜€

    Anyway, thanks for starting this blog. I look forward to more info and entertainment. Just sorry I was a cheap bastard and only got the exhibit pass at SES and didn’t have a chance to hear your presentation.

  21. What about reciprocal linking? All of the top sites in my industry do it but we don’t as I think Google will eventually crack down on this. The sad thing is it seems to work very well.

  22. Hah… I wondered what Matt Cutts would blog about and this is *perfect*. Thanks for starting.

    Now where’s the real-world post about all the whining you have to listen to at the webmaster conferences? lol. Or maybe how often “Mom” and “Pop” blame some unnamed third party (“an unscrupulous seo company” or “a screwup by an seo firm we hired”) for their own devious endeavors. It must get very tiring!

  23. This post reminds me of your outstanding email support years ago when I was struggling in an ocean of index spam. I’m glad to see that you’ve kept this attitude. I’m looking forward to reading more stuff like that:)

  24. This blog is a great idea Matt. I’m hoping in a future post you’ll discuss duplicate content pitfalls and what sites should do if they have a lot of snippets and URLs scraped by other sites. Both in New Orleans and at SES there was a lot of buzz about this issue.

  25. Thanks for your insights Matt, best of luck with the site!

  26. Hilarious post, all that site lacked was a “hexagonal water” section. Glad to see you bloggin’, Matt.

  27. This brings to mind a site I have that I assume was banned before I bought the domain name. Since I know the site is clean, but it doesn’t get indexed at all (google sitemap and index page crawled daily or every two days), I have to assume one of two things…it was a previously-banned site, or the hosting company has some sort of blanket penalty applied to it. I’ve requested a reinclusion twice, and only received the standard, Here is how to get indexed auto-reply. If this was a ban from a previous owner, what does it take to get Google to recognize a new owner and allow the site to start fresh?

  28. ———————————————————-
    Brett Said
    Hey Matt… Great post but maybe you help clear up some of the confusion that I am feeling after reading it. I have always thought that there was NOT a duplicate content filter that would penalize a site for having the same data as another site.
    What I have seen hit the hardest is those that “have/own” multiple sites with a very high percentage of content duplicated on multiple sites they control (often sharing the same IP or at least sharing the same NS).

    Anyways, that is a classic post as after looking at many sites that have been hit – it’s easy to see that many innocent site owners are not so innocent. They usually respond with – “oh, that, well “that” never bothered the site(s) before.

    Matt, you are going to have your hands full with this blog..hehe
    Good Luck!

  29. Hi Matt –
    I have a client who has another SEO pitching to them and the guy showed them a “demo” of how he could take a ranked page and redirect it to their site. I looked at it and it’s obvious he’s cloaking, when you click on the cached version in Google, you are taken to a page that redirects to a popular corporate website (not his). When you click the main Google listing your being directed to OUR site. Um, this pisses me off because in an attempt to “demo” his services he is risking my client being banned for cloaking/spamming. Especially because the terms are not at all related that he is using – the term is automobile related and we sell travel. How can I report this guy and explain that we’re not doing it? I’d be happy to provide the search term he’s demonstrating to someone so they can see for themselves what he is doing.

    Sorry if this is more than you want to tackle here. We met at WMW in New Orleans but I couldn’t find your contact info.

  30. ok…you answered the easiest site question that is available…but we don’t link to anyone suspect…we are trying to be white hat…but we can’t get into the top 1000…and in our niche…the top website has hidden links and hidden text…and I have reported it via your complaint link…7 weeks ago…nothing done. I, personally am disgusted with google. In our niche…the only thing that matters is being listed in dmoz and age. It stinks!…sorry… we aren’t topix…and we aren’t m.a.s.h. and we aren’t star wars…so listing in the dmoz…we are only a mom and pop site…and a better deal than the top 100 in google but for some reason…since we don’t know every single guideline google has… we can’t show up in the returns…googler stinks!

  31. HaHa.Got to laugh.
    So you have guys found 1 spammer.LOL
    Take a look at your serps, There is a million more you haven’t.
    Oh By the way,
    Get a life!

  32. Matt, since you are now using a blog yourself, my question would be: how much is RSS an issue for “scraped” content? If I read your post right, you got some filters in place to hunt down the duplicate content. But if look around, many site syndicate other blogs or news feeds (like Yahoo) into their site. That normally (if you use out-of-the-box software) creates a single sub-page with the RSS snippet…

    It would be great, if Google could say something official about RSS and syndication…

    By the way: good to have someone from G to calm down us hyperactive webmasters – thanks!

    PS: why don’t you have any adsense on your blog?

  33. Hi Matt,

    I would say “good Luck” but… I don’t think you ‘need’ good luck with this blog, its perdy well set to be a success.

    As far as your blog design goes… The lime green sucks, you color blind?

    Can’t wait for all the ‘advice’ on how to rank my sites in google… πŸ˜‰

    I think NFFC asked first, but I just have to know; How much for ROS footer links? Put me down for a couple, if the price is right…

  34. Matt,

    Great Blog. The very idea that a real, live, Google Guy will share his insights is great! I’m going to become a regular reader.

    Since 1999, I had built a site by word of mouth, and for the last couple of years Adwords. Everything I do is strictly by the book, and completely visible to the audience. i.e. no link buying / selling, hidden text, etc.

    4 weeks ago, I was knocked completely out of the index, but have no idea why?

    It would be fantastic if Google actually told us a list of our infractions that we are unknowingly violating. I have emailed support a few times, and each time they have told me their engineers are looking into it, or to re-read the webmaster guidelines.

    I would happily comply with all requests Google makes of me, because:

    1) I know that the ultimate objective is to make the the serps as relevant as possible.
    2) I can only receive Google traffic if I help them with item #1.

    I just don’t want to guess at what I’m violating and start dismantling sections of the site that may be completely innocent, just because a couple of pages are (unwittingly) violating the TOS.

    My question is, does Google help us by providing specific recommendations on fixes, or are we left to guess what to fix?

  35. I guess I just found out why most of the posts on your blog are “closed for comments.” πŸ™‚ Good luck.

  36. Awesome informative blog… I bet your going to be TOTALLY swamped with people asking you the same questions… You are very brave!!

  37. GoogleIsAGreatStockToShort

    What does it feel like to destroy the internet affiliate business of single mother with 3 kids whose husband died in Iraq because she signed a guestbook to get a backlink and oh no, made a keyword the same color as the background. The horror! You so low you play handball under a pregnant ant. What software did you engineer? You scrape sites and put ads on it. Got Google stock? AHAHAHAHAHAHAHAHA. Google “shorting a stock” HAHAHAAHAHAHHAAH

  38. Matt, it appears you are very polite to spammers!

    Interesting insights, it will definitely help us understand what’s going on out there.

    Welcome to the blogosphere!

  39. Another blog to brush up my brain πŸ™‚

    Good Luck.

  40. I have a question about “duplicate content”:

    The company I work for runs an ecommerce site in the .com domain for Americans, but also has a site in for the U.K. market. Although the site is localized for the U.K. market in that it has Sterling prices and different shipping and payment methods, much of the “content” on the two sites (.com and will neccessarily be the same, as we only have one core product data database, and use the same site template for both sites.

    Should we put up a robots.txt on the U.K. site to block Googlebot so that it doesn’t get confused and potentially ban our U.S. site from the index? Seems a shame not to be able to direct U.K. customers to the right product page with Sterling prices, but we can’t really risk getting banned altogether.

    P.S. we met for about 2 seconds at WebmasterWorld in New Orleans. That was the first bar I’d been in that had a Pizza Hut Express window right in the bar!

  41. Matt,
    It would be great if in another post you could clear up when using the “site:” command that five times as many results as there are pages show up for a given domain? Are the extras considered duplicates? If so, how can you ever figure out how to get rid of them? Or which ones to get rid of? Do you just have to wait for Google to figure it out? Is this even an issue?

    Our site was sunk after Feb. 2nd and have never been able to figure out why. This is the only obvious thing that jumps out.

  42. Spammers are cool, whitehats are clean and white but the blackhats rock and arn’t afraid of a bit of dirt.

  43. Stephanie Krebs

    In regards to the scraping/duplicate content – take for example the legal documents of a family of websites, the Privacy Policy for 10 different domains in 10 different industries, but owned by a parent company. Obviously the legal verbiage will be the same on all 10, save a search/replace for the domain name.

    Could this type of scenario be seen as duplicate content even though there’s a legitimate reason for its presence? Intriguing.

  44. I agree with Kurian: Does Google help us by providing specific recommendations on fixes, or are we left to guess what to fix? And I weant to ask too…. Why don’t you have any adsense on your blog?

  45. Someone said that you discussed the 302 hijacking at the San Jose SES…. What about a top ten site that someone scrapes, they add redirect scripts to all your original content pages and put copies of them on their site with the same directory structure, etc.

    It has all the markings of a 302 hijack but it is not. Google index uses all those pages and removes the top site as a duplicate. Any suggestions?

  46. Thanks very much for posting this information and maintaining this blog, but (and you knew there was a but) I’ve got a not-so-hypothetical situation with a site that I can’t seem to get indexed properly. The domain name was previously owned by a link-farming spammer type, but since we’ve taken it over we’ve done everything we can think of and yet we can’t seem to get included and indexed correctly. I’ve submitted support requests for “inclusion” or “reinclusion” and haven’t received a response. Is it possible some sort of block is on this domain name because of the activities of the previous site owner? The site performs very well on other search engines and is getting traffic from some high-profile blogs, yet the PageRank remains at “0”.

    The most frustrating part is that this site is for a prominent political candidate who is doing everything the right way — they’ve got an active blog, they frequently link to relevant content, they’ve got over 147 inbound links and we’ve even implemented the new Google Sitemaps — but it seems the site is being penalized for things outside our control. I know it’s possible to eventually get reincluded because I’ve read about it, but I don’t know how to do anything more than I’ve done.

    Since so many sites are reusing old domain names which might have been owned by bad internet citizens in the past, this seems like a significant issue that needs to be dealt with. I just don’t know what more I can do. Maybe a blog post on this topic or some sort of FAQ would help those of us in this tough situation.

  47. Hi:

    Like many others, my site was completely dropped last week in Google. It has been up for just over a year now with over 600 pages indexed. The ranking has been great as well. Still new at this I did learn when researching this about “hidden text” I have a small amount of text at the top of each page and did it for cosmetic reasons. Its being fixed as we speak. That is the only thing I can think of as to why they would drop me and why after all this time? I have not made any major changes in about 8 months. Any thoughts or suggestions would be appreciated.

  48. As we know, google don’t like doorway sites, why I found there are so many results indexed by google for “” or “” ?

  49. Matt:

    I noticed in your emails that you mentioned the webmaster scraped the contents from the .gov website. However, all .gov websites are government sites that contain public domain information. It is perfectly legal to copy, distribute and reproducte those information. So why do you flag the site???
    aren’t you goin over board???

  50. Matt:

    You said
    “If this site cleans up the scraped/copied/off-topic content, it can still be reincluded.”

    I know many personal sites which obviously do not have a so called “topic” and theme. Say I have a site that is about my life. Obviously there’s no central theme. I can write anything I want. If I came across a great report on money saving tips by the US Gov(Public Domain info) and decided to put on my site to share with my visitors, can you actually say this is off-topic content?

    I am confused!

  51. damn life in he day of a google employee looks like a lot of stress with all the angry webmaster wanting to know why they aint number 1 πŸ™‚