Duplicate content question

Someone recently asked me

I read this overview of what you said at an SES conference:

Matt Cutts – Google Not prepared, but informal remarks. High order nits: what do people worry about? He often finds that honest webmasters worry about dupe content when they don’t need to. G tries to always return the “best” version of a page. Some people are less conscious. The person claimed he was having problems with dupe content and not appearing in both G and Y. Turns out he had 2500 domains. A lot of people ask about articles split into parts and then printable versions. Do not worry about G penalizing for this. Different top level domains: if you own a .com and a.fr, for example, don’t worry about dupe content in this case. General rule of thumb: think of SE’s as a sort of a hyperactive 4 year old kid that is smart in some ways and not so in others: use KISS rule and keep it simple. Pick a preferred host and stick with it…such as domain.com or www.domain.com.

From http://www.seroundtable.com/archives/003398.html

If this is an accurate summary, and I’m reading what you’re saying, then there’s no need to worry about duplicate content issues when submitting articles. Is that correct?

My response:

What I was saying was: I often get questions from whitehat sites who are worried that they might receive duplicate content penalties because they have the same article in different formats ( e.g. a paginated version and a printer-ready version). While it’s helpful to try to pick one of those articles and exclude the other version from indexing, typically a whitehat site doesn’t neet to worry about 1-3 versions of an article on their own site. However, I would be mindful that taking all your articles and submitting them for syndication all over the place can make it more difficult to determine how much the site wrote its own content vs. just used syndicated content. My advice would be 1) to avoid over-syndicating the articles that you write, and 2) if you do syndicate content, make sure that you include a link to the original content. That will help ensure that the original content has more PageRank, which will aid in picking the best documents in our index.

We use additional heuristics of course, but I figured other people might want to hear that take.

117 Responses to Duplicate content question (Leave a comment)

  1. I have a poetry blog. I uploaded a couple of new poems lately, and I included links to the original content in my RSS feed, like you had suggested, because another service was scraping my feed. It did no good, though. The other service now ranks for my poems, and the original content cannot be pulled up in Google.

  2. In this new world of distributed platforms and no “central” web, syndication is an integral part of any content website’s growth. I know that this idea has been floated before, but having a “no-index” tag around content would go a very long way.

    In the current situation you have two choices:
    1) Use a javascript delivery mechanism
    2) Link back to the original article

    Option 1 has issues because you might want to do some data processing on the delivered data. You can get around this by using a staged process mechanism but that’s just a major pain. Plus some people don’t have javascript enabled, etc etc.

    Option 2 has major issues for the site using the content since they’ll be penalized for duplicate information.

    Therefore having an easy “no-index” tag that we can wrap around content to indicate to the search engines to not read specific text would be great. I’m sure this introduces other issues for YOU guys since it can be open to abuse, but that’s why you guys hire the smart PhDs 🙂

  3. “make sure that you include a link to the original content” Are you sure that Google takes into consideration? I have some examples that seem to prove otherwise …

  4. Hi Matt,

    Thanks for the clarification.

    With regards to the link back, for example :

    Would you recommend including the link-back on a print friendly page also?

    i.e. View this page online at : xyz.com/some-article/

    Thanks

    Shahid

  5. A lot of sites publish articles on blogs, then large aggregator sites grab them and publish them verbatim or just the content of the feed. These secondary publishers often outrank the original source in the SERPs. I think that a way to identify the source website is needed so that original content is rewarded. Maybe a service such as feedburner could be used to identify the true author?

    Also, if you have the good fortune to have an article make it to the homepage of Slashdot or to Digg that will result in a large number of copies of your content grabbed and published all over the web. Even if that article appears first on a rather powerful site it will be quickly outranked by large scrapers and unauthorized republishers on queries for the exact article title.

    I’d like to see the search engines find more ways to give first publisher the best rankings. It would also be best for search engine visitors because they click straight to the author’s website. This will encourage publication and discourage dupe content entry into the index.

  6. Matt,

    What effect does having the same listing data on thousands of real estate sites have? Does it devalue just those pages where the data appears or could there be a penalty to the home page becuase of the duplicant content?

  7. Thanks for the tip Matt.

    On a somewhat related note…

    What about entire site copies? By “entire” I mean it’s just a one page site of mine that is copied on about 50 other sites b/c people love what I do. lol

  8. Thanks Matt.

  9. What about tagging? Organizing the same articles in 38923893 different categories…and in the process, targeting 3829239q122zomg293 keywords than you “normally” would.

  10. Yahoo provides a robots-nocontent class tag that can be used to remove content from the page flow from being indexed (or used in determining a pages weight). Does Google support this tag? If not, are there are plans to support such a tag?.

    Reason is headers, footers and even some page content may be repeated throughout a website and it would be nice to force an exclusion from any duplicate content penalties.

  11. Thanks for clearing that up Matt. I figured as much but wasn’t 100% sure. Now onto bigger and better things.

  12. Thanks for the clarification Matt, it makes sense to put the original URL inside the syndicated article.

    The second point (duplicate content for owner of .com and .fr) is very interesting and is worth an explanation I feel. If someone has website.com AND website.fr and put some part of the same genuine content on both, he will not be penalized?

    The other thing I wonder is that how Google assign an article to be original?
    The website that is older in the G index is set as “original” source and Google track duplicates?
    Or is it only by report of the user (DMCA complaint)?
    Or both…

    Thanks

  13. Hi, Matt,

    In your experience does Google appriciate more if web site has fewer but only valuable pages, or prefers when website has 1.000s of pages, but with no real value content but leading to valuable pages?

    And second question is regarding
    Does this tag tells google:
    Follow links to important pages and take this page into calculation of links, but do not use them for search results.
    Also do not count page as possible duplicate content.

    Only web pages tagged with content=”index,follow” consider for results from my web site.

    Thanks.

  14. (some text was missing from previous post)

    ….valuable pages?

    And second question is regarding: meta name=”robots” content=”noindex,follow”
    Does this …

  15. Off topic, but please tell us your gonna address Microsoft’s bid for a 45 billion buyout of Yahoo. Personally what do you think? I know as a company man, you can’t go near it, but personally?

  16. There is a different, but somewhat important, side of duplicate content on your site — this will apply to most blogs, but maybe other sites as well. It affected me, anyway.

    I run a WordPress blog, with all the archive pages that come with it. By default, WP will use the full content (or at least up to a more-tag) for those pages. That means that an article might appear in several places, especially when it’s recent:

    – the front page
    – the yearly archive (if you use the semi-standard /archive/year/month/day/postname permalink structure)
    – the montly archive
    – the day archive (same thing as with year)
    – one or more category archives
    – one or more tag archives
    – the single post-page

    Since archive pages tend to receive way more links than single post pages, that might result in the archive pages outranking the single post pages. That’s what happened to me, anyway. It means that new visitors (arriving from some search) might land on a long (archive) page, with the relevant (for them) content somewhere at the bottom. They may scroll, they may not. Most will not, I fear. This means you’re losing potential readers because of duplicate content on your own site.

    Not exactly the same as the duplicate content problems mentioned you mentioned, but it’s still a problem. Of course it would be kinda impossible for google to fix this without adding loads of heuristics. This is a case of rather poor navigation design. And it’s not WordPress-only, lots of blog apps do this. The only way to change this is by making sure single-post pages receive more link love than archive pages, cutting the amount of archive pages, and not holding back on the more-tags and excerpts.

    More on-topic: it should be possible to “backtrace” the origin of a digg by following the backlinks until you can’t go no further. This is most likely the origin, especially if several backtrace routes end up at the same point. But for all I know, you guys may already be doing this. =]

  17. Patsy Sermersheim

    Matt- PLESAE HELP! I have not recieved email for two days nor has the guy who set me up or any of his other clients. We cannot find a love person anywhere at Google to help us with this problem. I have spent two days answering the phone from people telling me my address has “permanent errors”. I was calm yesterday KNOWING it would be fixed today- but not only is it not fixed I am afraid that Google isn’t even aware of this problem!

    You can’t email me- my number is 615 509 7413

  18. Why not stop worrying about duplicate content , when you know you are not having any real duplicate content. If you are worried about content duplicated in your own blog/website why not add a nofollow to these print pages, tag pages etc. Having a [] in your index.php, archieve.php, tags pages, search result pages etc in order to dislay just a small summary of your article and all of these linking back to the full article page. This way you can guide the search engine bots clearly to the full article because in my opinion , the sitemaps are something which was used 2-3 years ago and now the SE bots are going to index your pages within minutes of posting them.

    Another important point is the domain name/ website authority which is very important which depends on many factors like age of the domain/website with relevant backlinks and content which is useful both for visitors and search engine led visitors, which rank better in serps rather than new websites. In this case even if a authority website copies from a new website they still rank in the top, Example of this being a Article submitted to ezinearticles.com and the same posted in your blog, obviously ezine’s article page would rank #1.

    Originally the point matt has given above makes sense but i feel there are lot more factors related to this Duplicate content and their ranking factors.

    Regards
    Amit Bhawani
    http://www.amitbhawani.com/blog/

  19. “If you do syndicate content, make sure that you include a link to the original content. That will help ensure that the original content has more PageRank, which will aid in picking the best documents in our index.”

    Based off of that statement Matt, are you saying that even though the original content was added to my site and indexed by Google first that a site that syndicate’s content could out rank my page for even the simplest term such as the article title?

    What about scrappers that have “more PageRank”; are you telling us that even though I’m the original author and my site had it first that some site can come along and take my content and push my site down?

    Amit Bhawani…

    Duplicate content is an issue for many people simply because we cannot always control who uses our content and who doesn’t. No Follow is helpful but that is only on our end.

    One of my co-workers suggested copyrighting and that is always an option but when your dealing with small business owners who don’t have the time, money or resources to combat any number of scrapers then copyrighting becomes illogical.

  20. If only your checking of PR and “additional heuristics” would work properly! There seems a lot of Google Collateral damage in this area.

    Why is it that I am seeing many sites disappear from top Google rankings?
    1 – less powerful sites go to 100+
    2 – more powerful sites can just drop from 1st to 10th

    I then check for other sites duplicating the meta description and words including one before and after the main search phrase. After finding many scraper websites, I then change the text on my clients pages, and find their rankings:
    1- come back from the dead the moment that Google recaches the pages 2- slowly come back over weeks after having made the changes
    3- some seem to still have the “penalty” attached to them, despite the change in wording.

    This aspect of the job is taking a good deal of my time.

    My article on this – Change your pages wording frequently

    I did an experiment where I copied an article word for word on another of my sites Duplicate content experiment. From the 18 November 2007 till now there has been a tortuous journey of the pages being decached, the copy showing and my page being ditched to only now, several months later, the correct original page been shown.

  21. Sometimes – due to a bug in Google – an older page will in fact be deindexed due to duplicate indexing issues if a copy – for whatever reason – gets prioritized on the SERPs.

    This has happened in the past and continues to happen. It could be a backup domain with the same pages duplicated, or even someone replicating a page.

    The duplicate pages sometimes takes the place of the original web page on the SERPs and even replaces it in some of the keyword rankings.

    Sometimes this will take months to self heal – sometimes, years. Also there is a NEW algo threshold of what is considered a duplicate page – it no longer has to be 100% or close – it appears the new algos have lowered the threshold to the majority of a page being duplicated or having duplicate information or SHARDS .

    blog.searchenginewatch.com/blog/060313-090116

    Sometimes this will appear on some datacenters but not others.

    BTW:

    What is your take on the Microsoft Yahoo merger?
    It is frustrating to want to talk about it – but be stymied by corporate policies.
    This would be a perfect forum to debate it.
    Perhaps Google should place its bid for Yahoo 😀

  22. How can i solve this problem with duplicate Content if its used more den 3 times? All Links points to the Homepgae but the Parameter A_ID its a Value to identifies our Affiliates Partners and its used often.

    http://www.site.top/index.php?A_ID=1111
    http://www.site.top/index.php?A_ID=2222
    http://www.site.top/index.php?A_ID=3333
    http://www.site.top/index.php?A_ID=4444
    http://www.site.top/index.php?A_ID=XXXX

    greetings
    Franky

  23. I am currently mid-way through releasing a 10 part series of articles through various article syndication sites.

    A lot of time and effort has gone into producing 10×500 words of interesting reading that may entice other websites to republish my articles (and yes this is purely for linkage purposes from relevant and themed sites…but it’s their choice based on the quality of my content).

    I have not placed the articles on my own site for fear of a penalty, as the syndication sites often get indexed long before mine – shortly followed by scraper sites. A waste of content, but hey ho stuff ‘appens

    Is duplicate content really a penalty or is duplicate content simply ignored?

  24. On the dupe content issue may I ask this general question –

    An 12 year-old, high PR site has 100 pages and images of content (out of 2,000) total. It wants to put the same images and captions on a new domain which adds more functionality to the images (send, buy, save, tag, etc.) and also offers other display options (slide shows). The text, image names and file names (but not domain of course) are the same.

    Does the second new site need to robots.txt out the search engines for fear of hurting the first one?

  25. so would this mean it is not wise to write an original article and post it to 50 article sites along with having that article on your site?

  26. Matt,
    So, am I correct in assuming that creating a mobile version of my website …which would, of course, be an entirely duplicated site, will not create problems. Or should I use “do not index” tags on the mobile version of the site.
    Thanks.

  27. Dave (original)

    I have never understood why anyone would give away their own site content to another site. Likely started when some “SEO/forum guru” suggested article syndication as SEO.

    Seems to me that scraping site content is problem enough.

  28. @ John Jones : If you have a fear of other copying your content , read this guide on how to copyright your content. This way you need to first email the copycat website owners webhost and send them complete information proving you own the original content, and get them remove those articles from their website or else allow them to link back.

    Edit : No idea why every other commentor asks a non-related question @ yahoo-msn merger, dont you guys think matt would have made a post if its something discussing about? 🙂

    Regards
    Amit Bhawani

  29. Thanks for clarifying the syndication thing. I still talk to lots of people who think syndication can do no wrong. But I’ve seen sites that get outranked fairly consistently for their own syndicated content, even with links in place.

    Shahid – You shouldn’t need to include a link on the printer-friendly page, but it might help if people scrape your pages. It’s best to stop having it indexed altogether, using robots.txt or nofollow. Then you don’t have to worry about dup issues internally.

    If people are linking to your printer-friendly pages instead of regular ones, you could lose some link from not having them crawled. But that should be rare.

  30. Suppose Two sites A and B.

    Now B is using some content that is first published in A. But B is also offering link back to A indicating it as original…..here they played trick and put nofollow tag following link to site A.

    Now the question is, will Google penalize B for this or Not????

  31. I used to syndicate myself all the time double posting articles on dotcult.com and shoutwire.com Now I simply just include an RSS feed of my posts from one site on the other, and if it’s a bigger post just summarize it and link to it from one of the other blogs.

    I’ve found it works much better to do a new blog post on one that summarizes, references, and links to the other. No penalty, and users start actively trying to read both the full post and the summary and commenting on both.

    SEW, Google sent out a notice telling all employees not to comment on the MS / Yahoo thing either officially or unofficially.. so I don’t think Matt is allowed to.

  32. Hi Matt

    This all confuses me.

    Content is the heart of any website but that is not enough to rate well with Google we need to find quality contextual links to point to us.

    As we choose not to purchase links the next best alternative is to share our content with other credible industry websites. But due to duplicate content, in many instances, we will loose our content value.

    What are your recommendations.

    Have an exSEOllent 2008

  33. So, Matt, is PR what actually defines which one is the original article? Don’t bigger PR sites gain an unfair advantage because of this?

    Thanks,
    — Mike

  34. “I have never understood why anyone would give away their own site content to another site”

    Well, some of us from the open source world thought it was the “right thing to do” – if code should be freely shared, so should content. So, originally, I licensed my stuff under a Creative Commons license and thought that the little guy who needed a couple of articles would be helped, or if someone needed something for a newsletter..

    Yeah, right: the reality was very different and I quickly changed my mind about that. And it actually makes sense: code executes, web pages are read. They aren’t the same, and there is no reason you need to physically copy my content: link to it.

    If someone does want something for a newsletter or some other special case, I am happy to give permission, but they do have to ask now. No more free copying.

  35. perhaps a solution for dup content is a regisery. – URL, article title, date & time stamp. First to register is original author, all else is dup content

  36. Matt –

    On the topic of content duplication, I was wondering if you could pass along the following suggestion to the Webmaster Central team –

    Add a “Last Updated On” date to the Diagnostics -> Content Analysis page.

    After going through the posts on Google Groups, it seems that a few webmasters were curious as to the update frequency of that page since they corrected identified issues yet they still display as problems.

  37. Yeah, I know about the same-content penalty and how it could get you de-indexed… But is it a REALLY big deal that we need “to try avoid making duplicate-content at all cost”?

  38. SEW, Google sent out a notice telling all employees not to comment on the MS / Yahoo thing either officially or unofficially.. so I don’t think Matt is allowed to.

    At the risk of sounding like a smartass, wouldn’t that include talking about the notice itself?

  39. Dave (original)

    Well, some of us from the open source world thought it was the “right thing to do” – if code should be freely shared, so should content. So, originally, I licensed my stuff under a Creative Commons license and thought that the little guy who needed a couple of articles would be helped, or if someone needed something for a newsletter.

    IF your aim is to expose the content to a wide an audience as possible, why worry which one Google picks?

    I still say the vast majority erroneously give away their content because they THINK it will help with “SEO”.

  40. Hi Matt, although I subscribe to this feed this is my first time posting a comment. Thanks for all of your good advice!

    I have an issue regarding duplicate content. I am not an “expert” per se, but I am learning from every resource I can find. Here’s the situation. I have been using Blogware (Blogharbor) for a few years now. I have indexed quite high in some keywords and am getting some good daily traffic from the blog itself. I have now taken all of my blog entries and uploaded them to my own domain (…./blog). The concern I have is that now I have identical articles posted on blogharbor and on my blog (/blog). I am not sure how I should work this as far as duplicate content and my previous pages that are indexed with Blogharbor. I don’t want to lose the traffic, but don’t want to be penalized for duplicate content. Should I cut my losses and cancel Blogharbor or deal with the penalty from Google (and the other SE’s) for duplicate content. Any bit of help on this subject would be appreciated.

    Thanks!

  41. Why not stop worrying about duplicate content , when you know you are not having any real duplicate content. If you are worried about content duplicated in your own blog/website why not add a nofollow to these print pages, tag pages etc. Having a [] in your index.php, archieve.php, tags pages, search result pages etc in order to dislay just a small summary of your article and all of these linking back to the full article page. This way you can guide the search engine bots clearly to the full article because in my opinion , the sitemaps are something which was used 2-3 years ago and now the SE bots are going to index your pages within minutes of posting them.

  42. Just Feedback! I wanted to drop a note or feedback for Google on a policy in which I don’t agree with. I can remember that Ebay and Amazon both disappeared from your search results for something googlebot didn’t like and disqualified them for the results. Google, fixed the problem within a couple of hours. Yet, when a regular webmaster like myself has a problem with googlebot it has taken almost 3 years to gain back trust from google and still no results. I have done and followed all your steps in your guidelines, read webmaster groups and blogs. I also have talked with pros on webmaster sites and none can find a problem with my business. It has been 3 years. I can understand that amazon and ebay gets a lot more traffic and may effect more employees. However, my business disappearance affects me as much if not a whole lot more than them in comparison. How come google acts like this. If this was an equal Internet or policed and fair your company wouldn’t get away with this. All I am saying is that if you can fix them in hours, how come I can’t get fixed in years, with the help of pros from well respected seos?

  43. Thank you for this Matt, I still have a question that I hope you can help with.

    Not long ago, I moved a wordpress.com blog to GoDaddy using WordPress software.

    The problem I came across was that WordPress.com doesn’t offer 302 redirects, only 301.

    As I’ve been building this blog for over a year now, and although there are not many links in, I still did not want to lose the links that we did have. As a result we used the 301, but it bothers me a bit.

    Does this now pose a problem with duplicate content? If it does, I would love any suggestions/advice if you have the time.

    As the move itself was quite tricky, we compiled a ‘how t’o document for other people wanting to move from wordpress.com to self hosted.

    The only area we didn’t cover was duplicate content as we were not sure how it fits in, and did not want to offer advice that was incorrect. If you have time, I would love your take so we can include it.

    Thanks again for this post!

  44. Dave (original) – the vast majority of ‘duplicate’ content is syndicated news content from Reuters etc. Every news organisation on the planet, thousands and thousands of articles / day.
    Nothing to do with “SEO”

  45. Dave (original)

    Chris_D, yes I know, your point?

  46. Matt, I remember a conversation we had a while back at the Google Dance @ SES about duplicate content issues and it sounds like there are still issues if you need a “link back” to establish ownership.

    Does this mean that scrapers can still get top billing for your content because they certainly don’t give links.

  47. Dave (original)

    If *your original pages* are being outranked by a scraper site, you have much bigger issues to worry about, IMO.

    My site pages are frequently scraped, yet not 1 of the scraped pages outranks my originals. Google is pretty good at ensuring only the original ranks, unless one constantly and frequently syndicates their content.

  48. With regards to the link back, for example :

    Would you recommend including the link-back on a print friendly page also?

    i.e. View this page online at : xyz.com/some-article/

    Shahid, I would pick one version of your article that you want to be preferred (search engineers at Google call it “canonical”) and point everyone to your preferred url for your content.

    Yahoo provides a robots-nocontent class tag that can be used to remove content from the page flow from being indexed (or used in determining a pages weight). Does Google support this tag? If not, are there are plans to support such a tag?.

    Reason is headers, footers and even some page content may be repeated throughout a website and it would be nice to force an exclusion from any duplicate content penalties.

    Veign, we don’t current support that tag, for a couple reason. We think we do pretty well on detecting boilerplate (e.g. you’re not likely to run into any issues of duplicate content for header/footer type stuff). The other reason is that we haven’t seen a lot of sites using the tag after Yahoo mentioned it. Given the choice on where to put engineering resources, not a ton of people have asked for this feature.

  49. Emmanuel, normally a .com vs. a .fr would have French for the .fr and English for the .com, and in that case there’s almost no way that duplicate content would be an issue between the two sites.

    Max Roeleveld, good points. The recent update in WordPress (2.3) does much better about uniting url aliases under one url. But any software package that allows monthly/daily/yearly archives, tags, etc. will always run the risk of having content appear under different urls. My guess is that over time, both Google and WordPress will get better about such issues.

    Patsy Sermersheim, I don’t have the cycles to contact everyone who is having issues with their site. But Googlebot is designed to handle temporary issues (such as a web server being down) pretty well. If you can reach your web site with your browser, Google can usually crawl it. And if you can’t reach your web site with a browser, that’s the issue I’d concentrate on.

    John Jones, every search engine is going to employ heuristics to try to find the best copy of content. For the most part, those heuristics work well and pick the best copy, but there are definitely steps you can take that make the decisions easier for search engines.

    Omar Khan, I would try to make sure that the newer site has enough new information/copy/details that it’s clearly different from the older site.

    Lid, it sounds like the blog has well and truly moved to a new location. Since it’s not a temporary/transient move, using the 301 sounds perfectly appropriate to me.

  50. Hey all, my parents came in on Friday and are visiting for a few more days. My Mom wants to kick me off my laptop to catch up on her email? Can you believe that?! I’m all like “Why didn’t you bring a laptop with you?” and she’s all like “I gave birth to you and took care of you for years and raised you well” and in truth, that probably trumps anything I could say. Maybe I should get her some flowers, too. 🙂

    So I’ll be a little scarce this week on the blog..

  51. Hi Matt

    Thank you for your response, it is very kind.

    However, I’m a huge dope and I hope you will not hold that against me.

    I got the numbers back to front – turns out WordPress only offer 302 redirect (temporary) to a new domain – not a 301 (permanent) even though it is a permanent redirect. (Had to write that so I don’t feel I’m going sillier than I am).

    I am sorry for the error, I’m taking huge amounts of antibiotics and pain killers for a tooth ache – and I really hope this is the reason I am dopey today.

    I hope you will take some pity on this person that needs your advice.

    Finally, having little people myself have to tell you – give your mom your laptop! And forget the flowers – think LOTS of hugs!

  52. I have recently been involved in article writing for my customers, this piece of information is excellent as i have been worried about duplicate content issues.

    I have been using some of the popular article submission sites such as ezine, article hut etc (well here in the UK). And have sometime been submitting the same or a similar article to a blog on my customers site.

    As this creates two separate RSS feeds, one from the article site (which supplies the author with a feed) and one form the blogging software installed on my customers web site will this effect the way that G interprets the text?

    Keep up the good work!

    Nick

  53. Hi Matt,

    I have a large website with thousands of different pages. We are currently targeting the global audience, but want to specifically target a certain segment. In terms of duplicate content, can we have exactly the same page (in the same language and format), but just call it a diffrerent domian i.e. .co.uk, .com, .co.nz, .co.au etc etc. Then can we geo-target each different domain to the country.

    If you reply I can give you some further information about this point.

    Thanks

  54. Hello Matt, good point for .com and .fr even though sometime some English content are pushed over the .fr because the translation is not done yet

    The question of duplicates remains for website.com and website.co.uk domains (both English content)

    No! do not tell me that we have to write the first content in American-English and translate it for the second site in UK-English 🙂

  55. Hi,

    Can someone guide here to read more about duplicate content issues related to web page and stripped mobile version of the page?

    – example.com/article-1.html (web page; with lots of stuff)
    – example.com/mobile/1.xhtml (mobile page; only main content)

    I wanted to know –

    – how google bot access these pages
    – how these pages are indexed
    – submitting sitemaps for both in google webmaster tools

    In my earlier experience, I have seen mobile content mixed with normal content in Google search index and that caused landing on the wrong page, i.e. stripped mobile version of the page (and that caused fear of being penalized because of dup content !!)

    I am sure I will get the right answer here…

    Thanks & regards

  56. Matt…. here should be your motto: “your mum is ALWAYS right” 🙂

  57. Matt,

    “Maybe I should get her some flowers, too”.

    You go out and buy your Mom a new laptop and keep it at your home for her usage when she is on visit. Because:

    “I gave birth to you and took care of you for years and raised you well” 😉

  58. Duplicate content can can drive both parties at risk.,

  59. Thanks Matt for clearing this issue, I used back-links to my articles wich was the right thing to do.

    best regards

    Frank

  60. Thank-you Matt, I will robots.txt the stuff out. I do think though that Google should find a way not to penalize duplicate content in another presentation form – especially images and captions – and where the two sites link to each other and acknowledge a relationship. But I understand the complexities of doing this in an automated fashion. I appreciate your response.

  61. Google is good but not all that good at picking up dup content. I have seen examples of only slightly changed work indexed for very similar searches.

  62. Thanks Matt for your valuable information, and long live Matt-MaMa

    Deb

  63. Matt,.
    One of our clients has multiple domains. Most of the are 301 redirected to the main .com domain. For the dutch language we use a .nl domain, which is not 301 redirected in any way. Since we use another language, I don’t think we will have duplicate content issues.

    But we also use a .be domain (Belgium, with also dutch content). We don’t redirect the be or nl domain to one antother, because we would like to rank high in google.be with the local be domain, and rank high in google.nl with the local nl domain.

    Do you think this is the best setup?
    I appreciate yuor response.

  64. I’ve currently got a big issue with my site and the consensus after being reviewed by dozens of top SEO’s and Webmasters is it’s due to either duplicate content or a glitch by Google.

    My site was getting 15,000 visitors a day, from some 100,000 unique search terms per month and 90% of traffic derived from Google.

    On the 26th Jan in a matter of minutes traffic dried up to basically zero, all Google rankings vanished… Many thousands of page 1 positions there one minute gone the next.

    I do syndicate content, and all syndicated content contains a link back to my site as instructed. My site has Pagerank 5/4/3 pages, now when searching for anything from my site i get WordPress “splogs” coming up page 1 and i’m not in the top 500 results.

    These “splogs” generally have PR0 Homepages and PR N/A pages my content is on and more advertising jammed in to their sites than you can poke a stick at.

    Even searching for “My Content Title – My Domain Name” brings them up first page and i’m back on page 4 of 4.

    How is this a good user experience?

    My site didn’t buy and sell links, complied with the Google Webmaster guidelines and was monetized by Google Adsense and was working to one day become a Premium Publisher. The site was my only source of income as i recently quit my job to concentrate on developing my site so i could work at home and look after my special needs daughter.

    What do i do? Pollute the web and join these automated “splogs” instead of doing things the right way. If Google’s “heuristics” don’t start working correctly by the end of February, my net connection and hosting will have to go in order to eat.

    I thought Google’s motto was “Don’t be Evil”?

  65. I have used duplicate content and have not been harmed whatsoever.

    Quite frequently when posting a blog entry I find it useful to quote another person so I will quote them and attribute them via a link to the orginal commentary.

    Plus I add my own unique thought and views on the post.

    Matt has been saying for what seems like a century now…

    If you are worried that much about duplicate content, you are very likely doing the wrong thing.

    Here is are really interesting thing I have personally found, the moment I stopped worrying about Google and focussed only on producing content for my visitor my traffic began to soar.

    Matt, that’s the best advice I EVER received for publishing content!

    Dupe content folks, algo analyzers et al… Don’t bother looking to exploit issues with the Google algo, they fix broken stuff and they fix it fast.

    Rather than poke holes in an algorithm why not roll your sleeves up and get busy with some hard work, find your voice in your niche and go forth and be interesting.

    Might sound like rubbish to you hardcore SEO’ers but it’s some simple advice that has worked very well for me.

    Be real, be useful and you will do well

  66. Matt, that’s interesting, but what if an original site, a .com for instance, publishes an article and then you set up a UK operation and have a .co.uk site with the same dupe content that is totally relevant to the UK audience too but you want to keep the US/UK sites well apart? Posting a link to the original articles in that case might not be appropriate. How would that be dealt with?

  67. I agree with Colin McDougall..

    Personally I would prefer G could be more efficient in detecting the original source of an text, but for now it seems that this is not happening, even with the link to the source.

  68. Matt,
    Question, the company I work for has a blog at blog.skylighter.com. We write articles now and then for an email newsletter, these go into an article section on our website, as well as appearing in our newsletter archive. Since we started the blog we also post them on our blog. Would this create a duplicate content problem since the blog is a subdomain?
    Thank!
    -Jess-

  69. I had a major problem with duplicate content on two of my web-sites. I had two web design sites that had similar but not exact copy. You could pick out sentences in both of the sites, but I was heavily penalised for this.

    The result was that each site seemed to alternate in the rankings, a bit like clark kent and superman, you never saw them together. Then eventually they both dissapeared off the rankings all together. It took me a long time to work out what was going on.

    After a long while of researching and changing the sites, I came to realise it was the duplicate content issue, and I saw just how sensitive Google is at picking it up.

    Once this problem was changed, took me a good few months for the sites to get back to where they were.

    Anybody else seen anything like this?

  70. Dave (original)

    Here is are really interesting thing I have personally found, the moment I stopped worrying about Google and focussed only on producing content for my visitor my traffic began to soar.

    Prudent advice. The sooner Webmasters figure out “SEO/SEM” is a myth the better off the search World will be.

    BTW, doesn’t the “S” in “SEO” stand for Snake and the “O” for Oil 🙂

  71. Duplicate content is also a huge search spam issue. Here is a website that its only indexed pages in Google are all duplicate content and it uses numerous techniques to spam Google from illegally copying others content, etc… and yet Google ranks them very highly for their respective keywords (number one in many searches):
    headlightso lution. net (just take out the spaces – I did that so I wouldn’t add to their already false, deceptive and inflated rankings.

  72. Dave (original)

    Bluegill, the site you mention has a TBPR of 2 and doesn’t rank for any of the main targeted terms. I have no idea if that is due to the reasons your suspect or others. Regardless, Google seem to have it right.

    Would I be correct in assuming the site is in competition with you?

  73. I find this talk of duplicate content very funny. I have reported to google three times a case of duplicate content and after 6 months the 2 sites owned by the same people with exactly the same content is number one and number four on a popular search term.

    This is the search URL http://www.google.com/search?hl=en&c2coff=1&safe=off&rls=en&q=brazil+property&btnG=Search and it doesn’t take long to figure out that google is being spammed in a big way. This probably explains why the said company ranks no. 1 for nearly every term related to this search.

    Rob.

  74. I wish I hadn’t written anything here at all now. As soon as I said something my site got thrown onto the second page 🙁 .

    Maybe blackhat seo does pay off after all.

  75. Hi

    We run a scripts directory where developers submit there web scripts developed in various languages. Many times, we find that they submit the same content as they had done with other directory. How to deal with this type of content ?

  76. Yes, I have commented on this before, but its happening again…

    I am getting increasingly annoyed at how Google is handling duplicate content – ie scraper websites taking either the meta description, or the snippet around the search phrase of a top ranked page. Google is treating the original as worthless content, and wiping it from its top SERP’s.

    When I change the text of my pages, rankings come back. But for a clients site, it had only come back for less than a week before it was again dropped from the SERP’s because the scrapers had found it and copied the new content.

    Another client has had its home page dropped from the SERP’s for now a week. And although Google has a newly reworded cached copy dated 15 Feb and now 17 Feb, Google has not yet integrated that content into its indexes properly, so the SERP’s have not yet returned. Matt, why is it taking so long for your indexes to be updated? You have commented before that Google has been superfresh, but its now been 5 days since a cache, and that cached information is still not in your indexes properly. Pathetic!!!! A few weeks ago it was max a day from cache to integration into indexes. And now I see that a scraper site has yet again taken a copy of the new meta description…

    I other things to do and should not have to continually update the content of my clients and my sites. The text was “perfect” the first time – usability, good snippets for many phrases… To have to continually rewrite is not appreciated.

    When is Google getting rid of this collateral damage/bug in its algorithm? Can you at least acknowledge its existence as you have other bugs in your algos.

  77. I’m not sure I follow this 100%. I’m currently working on a review site. One of the features I would like to offer is different versions of the reviews in different languages. Since my rating criteria are always the same, I do the write-up only once in english and then have the review translated into french by a professional translator. I would hope that having a structure like http://www.example.com/en/service_name.htm and http://www.example.com/fr/service_name.htm would not penalize for duplicate content while at the same time allowing for links in separate languages to the 2 languages (and url’s) of the review. It would be easier and more cost effective for me to run 1 site with all the languages rather than separate domains and hosting accounts for each. Should I rethink my approach?

  78. I have a PR6 blog that has been around for 5 years, I get around 50,000 visits a day as a result of Google searches. My content is all original.

    What I am finding is that some of my posts are being copied by others and posted to community-content sections of large established sites such as Yahoo or Epicurious. When the posts are copied, a link to the original is often not included.

    When this happens, even if my post has been around and indexed for years, it will be removed from the Google index and priority will be given to the copy on the other site. I am assuming this is because Google trusts Yahoo.com and Epicurious.com more than it trusts my site.

    When I find that one of my posts has been dropped from the index, I do a check for a snippet of text and I almost always find that the post has been copied to one of these big sites. My only recourse at this point is to re-write the post, change the sentence structure of each sentence. Within a day or two after doing this, my post is back in the index.

    This is extraordinarily time-consuming. And it is a problem that will only get worse as these user-generated community content sites get bigger.

    So, I would have to agree with other commenters that the Google approach could be greatly improved.

    One thing that would make it easier for me to manage is to be able to easily tell which posts from my blog are no longer in the index.

    The Google webmaster tools let me see the gross number of pages Google sees from my Sitemap compared to the gross number of pages from the Sitemap that Google is indexing. For example, Google sees 754 pages from the sitemap and 751 of those pages are in the Google index. What about the missing 3 pages?

    At the moment, the only way I can figure out which pages are having problems with being indexed by Google is to search for them manually, one by one.

    If Google can already see the number of pages in my Sitemap, and can see the number of pages that are indexed, can’t Google also provide the pages from the sitemap that are not showing up in the index?

    And if Google can’t do that, do you know of another service that I could use that could do this?

    Of course, if the pages wouldn’t go missing in the first place, that would be ideal. But assuming that it will take some time for Google to straighten out its method, a way to help us manage through the pages missing from the index would be very helpful.

  79. What is Google’s stand on using parameters to create pages that basically regurgitate content on a site?

    I ask because I have seen quite a few sites lately that are tacking parameters onto a category page in a blog for example and then return different content if those parameters are present. In the cases that I have seen, they are taking each category link and creating 20 or more links that each show the same posts but in each is shown in a slightly different order. The result seems to be the creation of several hundred pages of fake/duplicated content on a site.

    Obviously I consider this to be a bad thing, am I wrong?

  80. Hi guys, any help on this one from anyone would be greatly appreciated. This is the situation. I was working with one creative writer [I’m bitting my lip not trying to say his name] that I don’t normally work with, he job was to write articles for a new site I’m throwing up aboutvegas [dot] tv. The problem I ran into (my fault) is I’m so used to working with the guy I usually work with that I give as much money as needed when called for but I got to the point with this other guy that I ended up sending him close to $800 over what he had already written for articles he was suppose to write. He had already written about 80 plus articles when he asked for another $200 (to help pay for rent) on top of the $800… that’s when I told him that I would need to bring the balance down to $200 before I could send him more money because the balance was getting bigger than I wanted it too… to make a long story short, I told him that I would not need him to right any future articles after the balance was gone and he then told me that he would only be able to write one article a day (compared to the 10 to 30) he would write a week because he said he would have to factor in trying to make up for the money he wouldn’t be making with me with new work from other people… so basically he was putting all my work on the back burner. I bit my tounge again and, in short, asked if we could just finish. He wrote one article in 11 days (I feel nauseous). on the 11th day I emailed him and asked for a refund. In all all we both threatend legal action (It’s not the point of the $800, I just really can’t believe someone would try doing this to someone who had in my eyes helped them out..).

    So now, I’m assuming he will republish the articles some in article directories or use them or sell them to someone else… “I had no written contract saying he could not” only a verbal agreement and an email in the begining that told him that I would need all rights to the content that he wrote. But all said and done if he publishes it and I found out or get hit in the search engines for duplicate content by the time I go through the court process it will have I huge affect on rankings I’m assuming. All in all, I really don’t care about the money… I hate to be the one that loses out on it but the ranking is what I really care about.. SO ANY ADVICE APPRECIATE HERE TEAM!

    I was thinking I would contact him and telll him to keep the money and just to leave it on somewhat good terms so he doesnt use my conent but he’s going to keep the money anyways… I’ve never came across this situation before, help guys, Matt, anyone please..

    Thank You!

  81. P.S. The sites new so I havent added the content yet. I’ve thought of adding just posting it as is with no real hierarchy manner and submit it with the rss and copyright.

  82. Hey Matt, how are we supposed to deal with Zimbio? They are reading stories from Google News and then framing the full story from our site in a way so that they “own” the content. It gets indexed by Google with THEIR URL causing duplication problems over which we have no control. Yahoo doesn’t have a problem figuring out and ranking the original source. Google does.

  83. As with the above poster, I would like to hear a reply with regards to Zimbio, this is how I found this post, looking for answers that I cannot find about the duplicate content issue they could cause. Please, I would love to hear views on this.

    Now normally my pages can get picked up within 30 minutes, I’m not sure how long it takes Zimbio to index my articles, but if they get my article ranked in their site first would that make Google think they ‘own’ the content when really it belongs to me and my blog?

  84. What is to become of SEO providers who say or instance own 100 + blogs on various ips and just dupe a post on all blogs at once with given keywords.

    If the keywords that are linked are varied but the dupe content is the same, will the effect be the same?

    SO for instance, same article duped on 100 blogs but on each blog different word is hyperlinked to main page?

  85. In regards to the duplicate content part of this blog post, I personally use the http://www.copygator.com website to find and stop duplicate content:

    1. it’s automated and brings me results instead of me searching for duplicated content. All i had to do was submit my feed and it started monitoring my feed showing me who’s republished my articles on the web.

    2. i get notified by email so it contacts me when it finds copies of my articles online.

    3. i use their image badge feature to alert me directly on my website when my content is being lifted.

    4. it’s a free service as opposed the “per page” cost of copyscape/copysentry.

  86. We have two domains for example a.nl and a.be. Two separate domains that point to the same website. Just because people in Belgium would probably feel more confident buying something from a local website then one based in an other country. But the content under both domains is identical. The reason behind this is. In Belgium (.be) and The Netherlands (.nl) people speak Dutch. So there’s not really an option in providing content in a different language.

    Does Google or any other search engine see this as duplicate content? And if yes what can be done about it. My thought was to add 301 responses to the .be site and do everything over the .nl site. But this would render .be useless in my opinion

    http://lenss.nl/2009/02/search-indexing-and-duplicate-content/

  87. Always best to get it from the horse mouth so here’s My question I previously work for a company and unfortuanly that company when bust, I wrote about 100 articles for them and now the site is no more. Now if I republish them on my site will I get a duplicate content penalty? The content is mine and the site no more, but Google would have record of them as the first publisher of the content. Will I get penalized for using this content as I see how this could be exploited? Thank in advance for any replies

  88. Hi Matt.

    I found your blog trying to find why google would show me the same site twice when searching for wedding bands !

    It showed me the same site twice!, for the same term!

    I found http://gilletts.com.au/custom_titanium_rings.htm
    and http://myring.com.au/custom_titanium_rings.htm

  89. I have read in various forums that Google (potentially) penalizes a business that utilizes more than one website to promote itself online. There seem to be many businesses who create separate websites to promote different services they offer or different locations they serve. I have not come across any examples of businesses who have been penalized for this type of optimization practice.

    Will search engines penalize me for creating 10 websites with 100 pages each instead of compiling the same content into one large 1,000 page site?

  90. Hi..

    I am designing a site where the owner asked for a wordpress based “news” section (as he wants all the nice features of wordpress)
    He also want the RSS feed from wordpress to be displayed on the homepage – so visitor as aware of new news stories in the news section.

    Is this setup is a good idea or not?
    Will there be duplicate content problems with google?
    I’d probably just display the summary of the news on the homepage, but it will be rendered as HTML

    thanks

  91. Hi Matt,
    I was wondering about whether checking Broken Link in AddMe.com results in adding negative effect to our site by duplicating the content?

    AddMe checks broken link and make one report by fetching Title and Meta Description of any website. I used it once and now I want to know about, is to bad for our website or it is ok to use it?

    Hoping for your answer.
    Thank you.

  92. Matt, we found a case where it might make sense to think about the class robots-nocontent.

    At least we didn’t find a way to sell around the issues with other techniques Google offers.

    The case: to help our users who download a business sofware version from our site we offer links to business news that are hosted in another section of our site. Now we got alerts from Google Webmaster Tools that those download description pages are a News Hub, and some of those downloads are even included into the Google News Index, but of course they are filtered away from appearing on the front pages.

    But not only the Downloads are filtered away, but also the regular News. Maybe there is a filter “site has too many low quality news so don’t trust them”. We couldn’t sell around this by submitting News Sitemaps, Writing Mails to the Google News team etc.

    In this special case a robots-nocontent class tag would be great: we could just tell Google and other search engines that a certain part of the page is for human navigational purposes only and not for Bots.

    For Adsense purposes Google offers such a tag – to “ignore” a special part of the page and to focus more on other parts of the page.

    From the earlier post:

    Yahoo provides a robots-nocontent class tag that can be used to remove content from the page flow from being indexed (or used in determining a pages weight). Does Google support this tag? If not, are there are plans to support such a tag?.

    Veign, we don’t current support that tag, for a couple reason. We think we do pretty well on detecting boilerplate (e.g. you’re not likely to run into any issues of duplicate content for header/footer type stuff). The other reason is that we haven’t seen a lot of sites using the tag after Yahoo mentioned it. Given the choice on where to put engineering resources, not a ton of people have asked for this feature.

    Would be great to get an update on this. It is a spam issue too. We don’t want dowloads to show up in Google News – it is not the type of Editorial Content we stand for.

  93. Hi Matt,

    Just a quick question really about duplicate content, and the “punishments” Google are supposedly hand out for cheating sites, I was wondering whether this sort of website would constitute as duplicating content (big style)?

    http://www.evoluted.net/towns/

    each and every one of the town links within this page has duplicate content, other than a difference in the town name.

    I have seen many websites like this, and I have also informed Google by using the webmasters tools (twice) but nothing has ever happened.

    It’s websites like this that spam their way to the top of Google search results, and it’s cutting out the ability to provide search engine users of more relevant smaller web design agencies or freelancers in those local towns.

    I am interested to hear any thought’s by either yourself or any other commenters!

    Chris Sparshott

  94. I just ran a test by syndicating an article and, even though every copy of the syndicated article links back to the original, Google is _still_ unable to determine which article is the original. And the original is on a website which shows a PageRank of #5!

    My original article now ranks #6, where it used to rank #2. It is now preceded in the results by two syndicated copies (both of which link back to the original), two pages which are completely irrelevant to the search, and one other page on the same topic.

    Google tells us “Make pages primarily for users, not for search engines.” However, their inability to tell original content from syndicated content forces us once again to ignore what is good for the users and focus on what is required by the search engines. From now on, when I syndicate content I will be sure to alter the page titles and perhaps even some of the content.

  95. I read regularly your interesting blog.

    I need your help our your advice.
    A new site has copied my site http://www.jm-contacts.net
    I am very afraid about the reaction of google for the duplicate content.

    The duplicate content is a matter on which you have write a lot.
    Could you tell me the risk of blacklisting from google for my site?
    Could you also give me an email address and a name at Google Company for relating my problem? Perhaps you can transfer my mail to the right person.

    Thank you very much for your help
    Best regards.

    JM-CONTACTS managing director
    Jean Morvan

    PS: excuse me for my poor English

  96. Matt,

    One of my site has become so popular recently and gaining momentum per day, i have started receiving lots of requests from other site owners to give them access to fetch my full article via RSS and show on their sites. My main concern is, what if those sites are ranking higher on said keywords? would my original article will be penalized?
    Basically, i am looking for Content Syndication Best practices which gradually promote original content with any penalty.

    For e.g. builder.com.au is fetching the same articles from techrepublic.com but i don’t see if they are penalized by Google. What should be the best way for distributing original content via RSS to multiple site owners?

  97. I have a website that every so often drops out.

    When it does, I find out that it has been scrapped.

    When I use DMCA, and or change the website; my position comes back.

    Yahoo and MSN can figure out duplicate isssues. Why can’t google ?

  98. What about duplicate content from languages that are similar. Like US and UK english. If I have the location based pages, but in many locations (Australia, UK etc. the language is the same, some are so similar that the content is the same) – but they are a different language en-GB, en-US and so on. Will these get indexed seperately as seperate languages? Or do they all fall under the English umbrella – will only one get indexed?

  99. In these times were social sites, twitter, facebook and stuff are taking over the net, it’s hard to tell what’s duplicate content. If someone bookmarks my post on digg, mixx, delicious and other how can i prevent that my ‘duplicate content’ is being distributed over the internet and harm my true content and web page? In my opinion search engines doesn’t really put so much attention to duplicate content cos it really can’t detect what is true and what is duplicate.

  100. In many cases google will not index at all, if it finds a content that is substantially duplicate of an original content. I have done this experiment and your can find the experiments and the results at my blog at

    http://referencedesigner.com/blog

    I will however like to point that google in now way can catch the duplicate idea – it can only catch the duplicate copy paste contents. This has led to many article rewriting scams. Many people get the idea of a content and just rewrite them in their own words. The million dollar question is how google is going to prevent this – if at all ?

  101. What are the best practices around syndicating product content from the primary domain to a network of 4000 distributor sub domains which will be used as a destination website?

    Primary – http://www.mysite.com/products/hockey_puck.html
    Sub domain – http://www.agents.mysite.com

    Can anyone provide any insight on how to best accomplish this without hurting our SEO efforts?

  102. I ran across some interesting Google results today, which I wonder if they would be considered duplicate content… When doing a search for: handyman services, the top two organic spots are occupied by the same exact web page, operating under two different URLs, namely mrhandyman.com and handymanpros.com. It does not look like they tried to hide it either, because the Title and Description are identical. Would that be considered duplicate content?

  103. One of my site has become so popular recently and gaining momentum per day, i have started receiving lots of requests from other site owners to give them access to fetch my full article via RSS and show on their sites. My main concern is, what if those sites are ranking higher on said keywords? would my original article will be penalized?
    Basically, i am looking for Content Syndication Best practices which gradually promote original content with any penalty.

  104. If you shut down a site, such as a free blog, and you take that content and post it onto another site (paid hosting), will that count as duplicate content? Will it hurt your rankings in anyway? This is assuming that the closed blog content is still indexed. Should you wait for it to become deindexed?

  105. I tend to question this myself. I found my site dropping in ratings and on researching found my site had become a popular scraping target. Only after regular changes to the content did it creep back into the top ten for its key words. It was the only variable I could put the drop down to.

  106. Yesiree, Jason. Doesn’t that just suck? All someone has to do to f*@! up your site in Google is to swipe a lot of your content. Even the pages that aren’t lifted from your site can suffer. Not your fault, not much you can do about it, and the only way to recover from it is to spend a ton of time either pursuing legal action to get the content removed from the offending sites or rewriting your content. Neither of these options is particularly fair.

  107. Hi Matt,

    I have a question about the way Google deals with duplicate content that comes in from a totally different angle to the above. My question regards the Copyscape service. As you are no doubt aware, copyscape’s free search detects duplicate content in Googles index, which in turn allows webmasters to remove or otherwise correct the duplicate content issues. This also presumably has a very positive effect on Googles index itself, because as the duplicate pages are removed the index is in effect cleaned and results therefore improved. However, when a user needs to check a larger site with copyscape, Google API restrictions come into play which means that the user has to check against Yahoo’s index (sorry for the foul language hehe 🙂 I cant help but think that Google is missing out on a trick here, and that in this instance the restrictions which are in place to preserve Google’s server resources are doing more harm than good, and resulting in Yahoo’s index being improved instead of your own. I would like to point out that I have nothing whatsoever to do with copyscape, but I am sure that if Google were to adopt a special API stance for plagiarism detection services such as this, it infact benefit Google and it’s users as much as it would the webmasters using copyscape. I would be interested to hear your take on this.

    Kind Regards,
    Nick D

  108. it infact benefit Google
    it would infact benefit Google

  109. Hi Matt,

    Not sure you’ll notice this comment on a post so old, but here we go 😉

    Regarding the syndication of content, there is a syndication website that is publishing my content in full, which I don’t want obviously. There are no contact details or any way to request a removal. I’ve sent emails to generic email address’ like admin, administrator, postmaster etc. to try and have it removed with no luck.

    The content is clearly mine, links back to my site etc. Is there anything I can do with Google to have these pages de-indexed or something along those lines?

    Thanks for any advice in advance.

  110. @ Andrew Keir

    Hi Andrew, you could try submitting a DMCA infringement notice.
    http://www.google.co.uk/dmca.html

    But I would first try contacting the hosting company that host the offending site and requesting that they forcibly remove the content. (use WHOIS to find the host)
    They will either remove it themselves of ask their customer to remove it.

    There is a good template for a removal request letter on the link below:
    http://labnol.blogspot.com/2007/09/dmca-notice-of-copyright-infringement.html

    Hope this helps.

  111. Can someone who is sure of a correct answer please respond to these questions? If I can get answers for these it will help me to save a lot of time:

    If I am a part of one of those membership clubs where every member of the club (from 50 up to 150 members) gets the same sites and pages, then these sites are made up of syndicated content and not duplicate content, right? And Google treats them differently in the search results? Would Google filter out these pages and possibly site on which the pages reside?

    If I change the keywords on the pages so that my pages are called up for different keywords (than the pages of the sites of the other members who leave the pages the way they got them) then that should work in giving me the uniqueness I need to get different traffic from different keywords? Most of the other members do not change anything on their pages. Would changing all the keywords work even though most of the content is the same as other members?

    What if I only want to rank for the home page and I completely rewrite the home page and the first sentence of each paragraph in the other 19 pages of the site? Will I be able to rank for the home page since Google ranks pages and not sites? Or, will my whole site be sand boxed since there is so much syndicated PLR content on the other pages of my sites that are not rewritten (only the first sentence of each paragraph is)?

    The new google farmer penalty does not affect my membership site home page and I can theoretically get #1 in Google because the home page is completely rewritten and the domain key word is unique compared to the other 50 to 150 membership sites? This ranking can happen even though most of the content on the other pages is not rewritten?

  112. Hi Matt!

    In these times were social sites, twitter, facebook and stuff are taking over the net, it’s hard to tell what’s duplicate content. If someone bookmarks my post on digg, mixx, delicious and other how can i prevent that my ‘duplicate content’ is being distributed over the internet and harm my true content and web page?

  113. Hello Matt and everyone,

    My question is what if a publisher has 100+ sites and they plan to produce a standard “about us”, “terms of use” and “privacy policy page” for all the 100+ sites. Is this a case of duplicate content or will the google algorithm be kind enough to turn a blind eye to it? Please answer, it’s very important when trying to create 100+ sites

  114. It’s really problematic especially the UTM tags which are making my articles index twice and thrice in Google. I have currently blocked UTM tags with the help of .htaccess but I would like to know that does it make any problems?

  115. Does anyone know how many page can have a duplicate content. For instance if there is a server issue where pages that do not exist shwing up as duplicate content – do the site get penalised? I would really appreciate if someone could answer. Thank you. David

  116. Hi,

    Matt, is a 7 rows disclaimer or any disclaimer that appears frequently on one website, that appears on every page of my website, considered to be duplicate content ?

css.php