Sitemaps with www vs. non-www

(Just a quickie post.)

The more I see from the Sitemaps team, the more I like them. 🙂 Here’s a good post from the Sitemaps blog about viewing stats for both www and non-www versions of a site, including a pointer to this support answer about consolidating such sites together.

The only thing I’d add to that answer is one recommendation: check your internal links and to make sure that they’re consistent (either all to www or all to non-www). That will also help search engines pick the root page that you prefer.

42 Responses to Sitemaps with www vs. non-www (Leave a comment)

  1. Yes I would agree that this is one of the better products by Google this year.

    I have seen a lot of posts around the net of people looking to create a sitemap for Blogger. I wouldn’t have thought there’s any real need for this Matt or am i wrong?

    Out of the need for others I have created a dynamic (php) sitemap for Blogger if anybody does want one as most of the examples on the net that i have seen are a little buggy and you have to modify your template. This way you avoid it.

    It’s tested it in Google Sitemaps and works fine.

  2. Hi Matt

    I really like those quickie posts of yours today. Now we can talk canonicals and 301 redirect 🙂

    Good night.

  3. I think the sitemaps are good but I have had the hardest time trying to get mine verified. First I added the google page, verified and figured the verification was over, so I removed the page. Well that was a definite mistake. So I tried again, and this time around I’ve been waiting for days for the stupid verification to finish. It does look like a great tool once I can get it working again!
    BTW… I’ve noticed that the www/non www issue with pagerank seems to be going away on some of my sites. All links into the site use www. and all internal links are relative (so should be using www based on incoming link). But I get the same PR for both www/non www sites. Is this happening google wide or have I just missed something and Goggle has now indexed both?

  4. I’ve had the same problem as AHFX…days go by with the verification process not working. AND, this happened for a bunch of sitemaps I had up there that were already verified!

    Having whined, I’ll admit that I really think it’s a great direction to go, both to allow webmasters to ensure that all of their pages are catalogued, and also to “self-weight” the importance of pages. All too often I see the SEs choose to link to my “print this page” page instead of the main page for one of my products!

    As well, it helps get around the SE’s problem of determining what’s a meaningful parameter vs. a tracking parameter etc. in the URL (oh no, I smell the duplicate content detector!!!) and lets the webmaster put forward one version of a page. For instance, I’ll list the same product in a number of categories, and one parm in my URLs describes the category currently selected, enabling me to produce a “more like this” link for the user.

    Michael.

  5. Here’s a quick “how to” for Apache to 301 everything (not just the root):

    http://forums.digitalpoint.com/showthread.php?t=13

  6. At the risk of sounding stupid, why would you use explicit links to your own web site?

    I’ve been building them for years and always used relative links, just the page names, so the sites will work on my local PC, with an initial temp domain, IP address, final domain, etc.

    Are you suggesting all links should be explicit?

  7. I also use relative URLs. Makes total sense and keeps page-bloat down. Provided you have a 301 in place, you don’t need to worry about URLs being misinterpreted (www vs. non-www). There are also other benefits; SSL compatibility is simplified, moving domains requires no HTML rewrite and CGI coding remains portable, to name a few. Long live relative URLs 😉

  8. I find it hard to believe Google would have issues with www vs non-www! If it does, Google is having issues with at least 50% of sites out there.

    Just as Google doesn’t NEED W3C validated code on all pages, surely it can take care of issues with www vs non-www???

  9. Sitemaps seem to have really helped with the indexing of our very large site after what were very serious 301 and 302 problems, but I’m hoping for a Christmas present of ranking like we did back in January. You are the real SantaCom, right Matt?

  10. Dave,

    I guess you haven’t visited WebmasterWorld recently? Probably for the best, as this issue has reduced many of us to quivering piles of jell-o! One of the most discussed topics throughout the recent Jagger update has been the resolve of www vs. non-www issues.

  11. Google seems to have issues with everything related to indexing a site. I wish they would hire some mature leadership (like Bill Gates) and get their act together. Instead they have children in blogs dressing up as pirates espousing their REM sleep. Oh Lord, and they’re dispensing business advice. If that isn’t the height of lunacy I don’t know what is. That’s why your sites suffer. You’re dealing with damn odd people with queer ways and you pay the price for it. Geez, I wouldn’t let management anywhere see me running an on-line diary. Call me old-fashioned but its something men don’t do unless management has already written them off.

  12. Matt,

    I don’t understand why you removed my posts. Was it inappropriate to comment on relative URLs, or www vs. non-www issues? I think I got unfailrly anti-spammed by the Google Spam Team head ;(

  13. Now they’re back! There’s something very strange going on here. Sorry, Matt. Please unread my previous post 🙂

  14. Hey! just want to drop a line. Nice blog, I can tell it’s powered by wordpress. So does my website. Thanks for the head-up about www and non-www. I think I’ll need to fix mine.

    I arrived at ur blog from ur recent article on how Google does the indexing. I work in a datacenter, btw. I’d like to know more about that later.

    Hope to say hi someday.

  15. I used to use relative links but I was having issues with the www and non www versions of sites so I did a bit of reading and made them all explicit and made sure anyone linking to me had the www in the link. I also created a Google sitemap around this time and all the above remedied the situation so I can’t say for sure which helped the most but it got resolved without any 301’s being needed!

  16. Sitemaps are great fun!

    One reason I really like the Google Sitemaps project is that the whole thing forces the webmaster to look at their sites like a crawler would (some people even use website crawler software). The more you look at your site like that, the more likely you’ll spot the problems that any search engine will get when crawling your site. Most people find lots of broken links, 404-handlers that 200, crawler-traps, session-ids, multiple URLs pointing to the same content, etc. etc. These things are all easy to clean up and make life a ton easier for all search engines, not only Google.

    That being said, of course a website should be made for visitors, not search-engine-bots — but a website that is clean for crawling, is usually also clean for visitors: both sides win.

    And one thing is for sure — the people who run the sitemaps project at Google are amazingly fast. If there is a bug, a problem, whatever: they’ll fix it in no time, and blog about it to boot. They’re even present in the “Google Group” for the project and if a real issue pops up, they work to get it solved. And keep in mind how many languages they have to support and how hard it must be to stick up for webmasters (“get my site indexed now”), spam-masters (“get my trash-site indexed now”) and Google (“only index what is good”)… They get my Google-People-Of-The-Year-Award (not that anyone cares :-))

  17. If google’s spiders index both the www and non-www URLs, how do webmasters remove from the index the un-wanted domain? In an ideal world i would like to have only one domain indexed and everything else pulled out. how do we prevent googles spiders from pulling both WWW and non-WWW in my case i have thousands of pages indexed with the non-www and only 24 with the www. causing problems. looks like its fixed in the test DC. 🙂

  18. IncrediBill,

    Yes, you should make all links explicit. It helps with search engine ranking, because then you have other pages pointing to your pages (even though it is the same domain, it still helps).

    I used to use relative links too, but that was before I found out about SEO.

    Pat

  19. I have been reading on the WWW/non WWW issue and would like to ask a similar question.
    How about these type of links: *ww.website.com/directory/ versus *ww.website.com/directory/index.html

    I have noticed that the same page sometimes is cached as directory/ and also as directory/index.html

    Should all internal links on a site point to one or the other or is there any negatice effects of having both? Or is this a completely non issue and just does not matter?

    Thank You!

  20. I’ve actually included a couple sitemaps, each sitemap uses either all www. or all non-www pages. Should I convert them all to using strictly one or the other?

  21. >How about these type of links: *ww.website.com/directory/
    >versus *ww.website.com/directory/index.html
    They’re different URLs –> they get treated differently by ALL search engines. For all we know, your default file could be “default.asp” and “index.html” is just something else… Fix your links to all point to one or the other (I prefer just “…blabla/”) and have the other one 301 redirect . There’s a simple htaccess-addition you can do to redirect them automatically, see http://www.cre8asiteforums.com/forums/index.php?showtopic=31624&view=findpost&p=158216

  22. side note: i’d loke to see the google help center articels having a last updated timestamp as some of them (not the one linked above) are pretty old.

    regarding google sitemaps i like the feature of seeing the keywords of the organic search (both impressions and clicks). i understand that google is not giving away the numbers (how many impressions/clicks) but they could publish which timeframe this top list covers (e.g. last month, last 30 days, overall performance).
    without this information it is much less of an information 🙂

  23. Matt,

    your own site is not redirected 😉
    it has a pr7 with and a pr4 without ‘www’. that makes me wonder. you might try to lure seos into something … ho ho ho…. suspicous hollydays 🙂

  24. Matt,
    I am really glad to see a post specific to this canonical issue. First thanks to some of the SEO’s on the Google Sitemap Blogs that have published some pretty good insight into this problem (and fixes). I hope this post is not too lengthy but here are some of my thoughts and questions.

    We implemented a 301 redirect solution back in October on our site because virtually every one of our website pages in the Google index had a duplicate entry (one with and one without a www prefix). Assuming a site only has one unique page that may be referenced with and without the www prefix and that the www prefixed pages were “preferred”, here’s some observations and questions:

    1. The link command on http://www.alltheweb.com seems to return the best results to show external links to your site that have no www prefix which probably triggered our problem. I.e link:mysite.com (specifically omitting the www prefix) only returns links that do not have the prefix. We did this and found about 15 websites had created links to various pages on our site without the www prefix. We contacted the site owners to correct this.

    2. Prior to the 301, we were seeing Jagger results with non www prefixed pages out positioning our www prefixed pages in Serps. In some cases www prefixed pages only appeared if user selected “Omitted duplicate results”.

    3. Since implementing the 301, some improvement has been noticed, but I have found results all over the board with Google site: command based upon which test or live center we used. I know that the 64.233.179.104 is not ready yet, but its results seem consistent with the other live centers. We still see about 80% of our website with duplicate non www index entries in the Google index. Some of these pages have cache dates back to April 2004.

    4. I actually see the most progress on 66.102.9.104, the last test center for Jagger3. Here there are the fewest duplicate non www pages in the index and that is reflected in the search results (obviously) as well.

    Matt, it would be nice to know what to expect as a natural progression to solving this problem. Since Google has now posted an “official” acknowledgement and suggested 301 solution to this problem, how long would you expect a 301 to take effect and is 3 months too short a time period? Has Google considered relaxing its unofficial “duplicate content” filter or penalty in light of this widespread issue?

    Oh, and one more question. Should website owners be more aware of the counts returned by the site: command? We were seeing initially 3 times the number of pages in March and now its around 10 times the number of pages Google returns versus our actual page count. I suspect this was related to these canonical issues.

  25. then there is the https issue, too.

  26. All my sites redirect the www-prefix to my domains without www. I think you should do it the same way with your site, Matt Cutts! :o) There is a campaign about this issue at http://no-www.org/

  27. re: Jon Wright’s comments

    I use both Blogger and Gallery (version 1.5) and have created a PHP script to create a sitemap for both:
    http://eric.nagel.name/2005/09/dynamic-google-sitemap-for-blogger-and.html

    It doesn’t have much documentation to it, but most PHP hackers can figure it out. And there’s no need to modify your blogger template, as this script reads the directory structure.

    The .htaccess bit is just a little trick and not necessary (thankfully, Google Sitemaps will accept extensions other than .xml)

  28. RE: “Matt,

    your own site is not redirected
    it has a pr7 with and a pr4 without ‘www’. that makes me wonder. you might try to lure seos into something … ho ho ho…. suspicous hollydays”

    Exactly, how important is this really? Me thinks not very.

  29. Google Sitemaps are great, but why dont you poke a bit on the team handling Google Analytics?

    Are they opening for new signups any time soon?

  30. ok I made www to no-www redirect ( http://www.foo.xx TO foo.xx ) ok?

    but now I from here I link duplicate.foo.xx what happens? I think google find duplicated content for foo.xx

    Or not?

  31. Thought you should know that I tried the google suggestions with auto removal tool as we had duplciate content for nonWWW and WWW

    We have our site http://www.domain.com which had a good PR and also loads of back links We ranked no.1 for cricket equipment for the past 5 years.

    However after the recent updates we were no-where , and it was suggested it was a duplicate site issue.
    We looked closely and google had indexed http://domain.com aswell as http://www.domain.com
    Also gave it a different PR.

    Two months ago I split the sites, so (www) was on a separate server than http://domain.com

    2 days before xmas I asked google to remove the http://domain.com content by placing a robots.txt on the http://domain.com server

    You know when you wish you hadn’t done something. !!! Google removed all the http://domain.com (Great!!) and also http://www.domain.com content (Disaster) , I filed a re-inclusion request explaining the error but to no avail as of yet

    Will I really have to wait 180 days and then be sandboxed. I believe in my own mind google has made a whoops, but really think Im a silly ass for Worrying about duplicate content to the n’th degree. (Think google has got me this way )

    Hope that all makes sense

  32. Our company has a webhosting business, and I’m not sure, so I’m asking: our hosted homepages are accessible with and without www, but neither responses a redirect, but simply put you the same page. And if our custumers use relative link, it remains the same. E.g.: http://www.webtown.hu and http://webtown.hu are the same page, with relative links. Is good for us (for our customers), or not?

  33. Quite confused on the www vs non-www issue. The basic question for me is which one is best to use? After lots of reading I could not figure that out for some odd reason.

    Please don’t laugh (I am not a Geek) but I always somehow assumed it was best to get both listed in G. Therefore, with some sites about 1/2 are www and 1/2 without (done by design). Should I not do that anymore?

    Also, I have a site which is a PR5 with WWW but PR3 without. Does that mean if we 301 the non-www to the www my PR5 would gain even more PR, perhaps become a PR8 since the PR3 could be added to the PR5 number? Or is there perhaps some other PR achievable (or perhaps it has no effect at all).

    A reply on these issues would be most appreciated. Thanks Matt.

  34. Matt,

    The whole issue with canonical URLs and multiple URLs is pretty much understood now. Indeed, Google has guidelines for this, but there has also been much talk about duplicate content within the site.

    I vaguely remember a while ago Google Guy saying that 85% uniqueness page-to-page was required.

    Can you be any more specific? Or, at least agree?

  35. I’m presently researching duplicate content issues. I have seen that having different url’s for the same content is duplicating content.

    At the sqaame time I notice that you have that on your own site. In individual posts and category url’s.
    For example:
    http://www.mattcutts.com/blog/fun-with-trends/
    http://www.mattcutts.com/blog/type/googleseo/
    Both have the article “Fun With trends”
    I was starting to think blogs were bad for duplicate content, some have the same article in three or four places.
    Main page as a new post.
    In a category Archive.
    In a date archive.
    On its own page, ie: permalink
    Am I wrong?

  36. Quick question on how can I submit a sitemap if my host does not allow ftp? Thanks.

  37. One feature in Google sitemap I found useful was to tell Google to change displaying your search result all with “www”, in addition to making sure all your urls are using “www”.

  38. Does having a post in multiple blog categories create duplicate content?

  39. I had this same problem on my site so I wrote an article on it.

  40. How about the sub domain name ?

  41. Hey Matt – the link to the support answer is broken! any chance of geting it fixed!

  42. Would it be possible to fix that link?

    Thanks!

css.php