Canonicalization update

It’s almost not worth mentioning, but I know one website noticed this, so I’ll talk about it. Last week there was an update to how we canonicalize a small number of urls. What is “canonicalization” again? Read this previous post, or see this post by John Andrews to see all the ways that you can have the same content on urls that are technically different. Some people ask “Why don’t you just assume www.example.com and example.com are the same?” The answer is that they don’t have to be, and for some websites they are different. For example, http://phpicalendar.net/ is a different page than http://www.phpicalendar.net/. This happens more often than you might think; FindWhat has different www vs. non-www pages, for example.

Okay, back on topic. 🙂 The data for externally visible PageRank didn’t change. The only way someone would notice their PageRank changing last week is for example if they were checking for a different canonical url (e.g. externally visible PageRank is shown for www.example.com, but Google changes the canonicalization from www.example.com to example.com).

That’s a really rare situation, as evidenced by the fact that not many (any?) people in the blogosphere noticed any PageRanks changing. In general my advice is not to worry that much about changes in canonicalization (if you can see your PageRank on either one of www.example.com or example.com, Google generally has your PageRank stored and uses it correctly in scoring).

For the people who want to make sure that all their webmaster ducks are in a row on this topic, here’s my two-minute advice:
– Pick one way of writing all your urls and use that consistently in your pages and your links.
– If you pick (say) www.example.com as your preferred root page, make sure that you have a permanent (301) redirect from pages such as example.com to www.example.com. Michael Nguyen has a nice short post about how to do this in Apache, or Beyond Ink shows how to do a 301 redirect on several platforms.
– To be extra safe, feel free to use Google’s webmaster console to specify the preferred root page of your domain (www.example.com vs. example.com). Read this post by Vanessa for more details.

Whatever you decide, I recommend that you make sure that your choice is consistent. These short steps will help search engines refer to your site the way that you want people to refer to it.

77 Responses to Canonicalization update (Leave a comment)

  1. Thanks for the update, Matt. I’ve had some interesting arguments err.. conversations this week about that backslash minutia post, and it’s always nice to be able to point to Matt Cutt’s comments.

  2. your “blog bar” has an adult link in it. (don’t approve this comment)

  3. Matt

    “That’s a really rare situation, as evidenced by the fact that not many (any?) people in the blogosphere noticed any PageRanks changing.”

    What? of course I have noticed PageRank changing 🙂

    Ok. Not real changing, just my two “Gadgets” keep showing old PageRank

    72.14.207.99
    72.14.207.104

    Or was it a new PageRank?

  4. Shawn, the posts are pre-moderated so if you want to post something without it showing up I think you can use an email address you haven’t used before and the only Matt will see it. But then again i could be wrong.

  5. This can be the proof that the subdomains are considered to be different pages, and that his links cost more than the internal links?

  6. Matt

    I have been testing out the links from your Google Bar and I have to say that I am a bit disappointed.

    Most of the links appear to end in the site’s 404 page.

    Sean

  7. I have an interesting thing to share. Using Google’s webmaster console I set http://www.example.com as prefered domain then all the pages if accessed only example.com(without www.) started showing no page rank even though page served was same.
    After removing preferance back from Google’s webmaster console, the pages http://www.example.com and example.com both gain started showing same rank.

    “That’s a really rare situation, as evidenced by the fact that not many (any?) people in the blogosphere noticed any PageRanks changing.”

    Thats what i have noticed and posted a comment about it on March 18 at http://www.mattcutts.com/blog/non-google-search-news/

  8. Usually I open links in a new tab, so I didn’t notice earlier that the link destination loads in the iFrame. Should target _parent.

    On my blog I got the adult links too:
    http*//information-google.blogspot.com/2007/03/thede-re-kim-kardashian-sex-tape-full_5364.html
    The URL and strings like “You received this message because you are subscribed to the Google Groups…” in the splog post trigger “Google” in our blog bars.

  9. I’ve always setup my site with a 301, but thought this was the norm for years…am I wrong? has this just started?

  10. Doesn’t a little bit mod re-write in the HTML solve this? I noticed on my personal site,the number of back links and page rank of the www. version were alot higher than without the www. so, mod re-write made it redirect. I actually have never been told that, that method is allowed i just assumed as its only my personal site after all. All though i would like to know if it cool as i could really use it on a few of my other sites.

  11. Sorry to double post, i meant to say Htaccess not HTML. Small mistake, didnt want to sound like a fool.

  12. Hi Matt,

    I like the 301 thing in htaccess for Apache. I also tried the Google’s webmaster console – but there still is a thing … someone has linked my Domain like a Subdomain – now the Subdomain results appear in Google even if there is none. This is bad for duplicate content, I do not have to explain to you. Maybe Webmaster Tools in further future offer workarounds for this cases – which are bad for me as also Google.

    Thanks, Michael

  13. Thanks Matt, I have solved can issue through htaccess file and also solved through google webmaster account. but I am little bit confuse about my site page rank, it will not increase since 3-4 months, could you please explain in your blog (if you already done please tell me the url)

    thanks
    Deb

  14. Thanks for the update!

    Does:
    https://www.mysite.com
    http://www.mysite.com
    Make a difference??

    What should we do if an internal “https” page moves to a new “http” URL or visa versa?
    Should we also use a 301 redirect or is there no need to worry about what type of redirect we use as long as the user can navigate without a problem?

    I noticed that http://www.coca-cola.com/ does a 200 redirect to http://www.coca-cola.com/index-d.html
    If I were in a similar situation, would the 200 the best redirect to use or should we use a 302 on the home page?

    I also noticed that Coca Cola uses the robots.txt to prevent the second URL from being indexed.

    Should I use Coca Cola as a roll model website?
    This is all very confusing.

    Thanks for your help,
    Dave.

  15. Jason, the ModRewrite code that you mention in your .htaccess file is exactly the same 301 redirect code that Matt Cutts is talking about in his original post here.

    You MUST make sure that the returned HTTP status code really is 301. Use an online HTTP header checker tool to verify that. It should NOT return a 302 code.

  16. Despite the authoritative advice (yours!), I wonder if picking either www or non-www and then doing the permanent redirect will help in terms of PageRank. Why? Because half of the world will continue to be lazy enough to link to the “wrong” domain of yours (e.g. they link to http://www.example.com even if you always redirect to example.com — and why shouldn’t they, as your redirect makes sure their link works). In other words, if Google in their internal PR calculations doesn’t take permanent redirects precisely “literally”, then you’ll continue to have some kind of PageRank split. As a test-case, I wonder if I’d keep a PR9 (if I had one!) if I just permanently-redirected the whole domain (example.com) to a new one (example2.com).

  17. Matt, if we use the Google webmaster console to specify we want to be seen as http://www.domain.com rather than domain.com do/should we also 301 redirect the non-www to the www as well?

    Reason I ask is because our IT guy says to 301 a non-www to a www will create an infinite loop and crash the Microsoft IIS server. weird??

    There seems to be lots of confusion around this for Microsoft IIS server.

  18. This may sound like a really dumb question: Let’s say a site has never had a 301 redirect between the non-www and the www version and BOTH versions have PageRank; if you suddenly decide to toss the 301 up and point everything to the www version, is there any chance that the PageRank of the www version will increase because you’re now taking the PageRank love the non-www version was getting and throwing it towards the www version?

    Did that question even make sense? It sounded so much more thought out in my head. LMAO!

  19. That’s the whole point. If you say that all votes for non-www belong to www, by using the 301 redirect, then Google will hold the correct “score” for your site as a whole, and transfer stray PR over to the “correct” URL. This might take many months to be visible in the Toolbar of course.

  20. The Beyond Ink ASP 301 code has a small, but important error in it: it’s missing the Response.End line.

    Googlebot, MSNBot and Slurp will try to process the page after the 301 redirect point if the end line isn’t included.

    So it’s actually something like this:

    <%
    If InStr(Request.ServerVariables(“SERVER_NAME”),”www”) = 0 Then
    Response.Status=”301 Moved Permanently”
    Response.AddHeader “Location”,”http://www.”
    & Request.ServerVariables(“HTTP_HOST”)
    & Request.ServerVariables(“SCRIPT_NAME”)
    Response.End
    End if
    %>

    Not sure how this will look in a blog post, but I’m about to find out.

  21. Hello Matt,

    Thanks for your update on Canonical URL issue.

    🙂

  22. “Multi” Adam 🙂

    That was a nice table. Care to post the html codes.

    Thanks.

  23. It’d be nice (for some sites) to take this one step further and be able to specify the filename too, so you can run the canonicalization process on domain.com/ and domain.com/index.php for example.

  24. Give evaluating REQUEST_URI a try.

  25. > That’s the whole point. If you say that all votes for
    > non-www belong to www, by using the 301 redirect,
    > then Google will hold the correct “score” for your
    > site as a whole, and transfer stray PR over to
    > the “correct” URL.

    It certainly makes sense, I’d just love to be 100% sure about it.

  26. Any tips for when you can’t do a 301 redirect? I’m using smugmug for my photo website, and I’m using a custom domain, however they don’t redirect the subdomain on their site to my custom domain, and I’ve tried to explain it to them but they seem to think it has no effect on rankings.

    To make things worse, all the links on their site link to the subdomain, not the domain it should actually be showing up for. And since their main site is linking to the subdomain more than I can link to the real domain, I think it’s causing issues.

    subdomain: rifoto.smugmug.com
    real domain: http://www.rifoto.com

  27. “That’s a really rare situation, as evidenced by the fact that not many (any?) people in the blogosphere noticed any PageRanks changing.”

    Oh it seems Google is starting to reach its goal,.. nobody cares about PR anymore,…. 🙂

    Wouldn’t that be painful if nobody actually cared anymore about PR,.. 🙂

  28. *** It’d be nice (for some sites) to take this one step further and be able to specify the filename too ***

    I simply set a 301 redirect for index.(html|htm|php|cfm) and for default.(html|htm|asp|cfm) to redirect to “/” for the root and for “/foldername/” for all folders.

    It is just 4 lines of code in the .htaccess file.

  29. *** It’d be nice (for some sites) to take this one step further and be able to specify the filename too ***

    I simply set a 301 redirect for index.(html|htm|php|cfm) and for default.(html|htm|asp|cfm) to redirect to “/” for the root and to “/foldername/” for all folders.

    It is just 4 lines of code in the .htaccess file.

  30. Sebastian: That would cover savvy webmasters, but it’d be nice to have something to cover other platforms.

    Another issue I see is where 2 domains point to the same webspace. Sometimes the webhost is not able to 301 the entire domain, so being able to select which is the primary domain within the console would also be helpful.

  31. Pittbug, many domains and subdomains pointing to one webspace can be handled easily in a .htaccess file:

    # redirect permanent domain.com, domain1.com, dupdupdup.domain2.net… URLs to http://www.domain.com
    RewriteEngine On
    RewriteCond %{HTTP_HOST} !^www.domain.com [NC]
    RewriteRule (.*) http://www.domain.com/$1 [R=301,L]

    Just insert the canonical server name.

    In weird environments that’s a PITA 🙁

    Perhaps Google can HEAD the server within the Webmaster console and provide .htaccess files or IIS redirect tutorials depending on values of “Server”, “X-Powered-By”… and if all that doesn’t help PHP, ASP … code 😉

  32. Harith, use blockquotes surrounding what you are saying. Google it!

    I have a new question about blockquotes, does the Google algorithm know they are used to express duplicate content quotations often? It would be great if it did. 😉

    *Waits to see if Matt throws Phil a bone*

  33. Aaron

    Thanks a bunch. Problem with Matt’s blog there is no option to edit once you post. However, lets give it a try.

    [quote]Matt hasn’t posted any weather report about SEO-Emmy and Bouncing-Oz for a looooong time 🙂 [/quote]

  34. Last week there was an update to how we canonicalize a small number of urls.

    Was there a problem? 🙂

  35. Heather Paquinas

    hi matt, what did you mean here when you talked about fonts used on a web page?

    Can different font families or sizes improve our pages quality as seen by google, or can they only decrease the quality, as in the case of webspam in a very small font?

  36. Hi Matt,

    One month ago I’ve changed my preferred domain in google webmaster for carsandtuning.org to http://www.carsandtuning.org and did a 301 redirect from the non www to the www. Now I have PR 0 and I had PR 3.

    What should I do?

    Thank you.

  37. Is this going to severely effect subdomains?

  38. Harith that is the right idea but it is the word “blockquote” with

  39. So it wasn’t a PR update right?

  40. Political Forum, that’s right. It wasn’t a PR update.

    keniki, I’m glad you got that out. It is out of your system now, right? 🙂

  41. hey matt, when hosting free robs an page, to take advantage of its positioning and to have impressions of publicity with her, some penalty to hosting from google exists ??.
    or exists something that the finder can recognize this page in another servant?, host are many that disguise their services free
    later to eliminate the page and to place publicity like adsense
    you observe to ifastnet dot com

    another question is better to make redirection 301 from or the file htaccess ?

  42. Besta not playa-hate on keniki. He be bustin’ mad dope rhymez from SEO Compton, y0.

    Harith: not sure what table you’re referring to, unless it’s the ASP code snippet. I’d post the HTML codes if I knew them (or if they existed, and I’m pretty sure they don’t…mind you, I only rock the ASP shizzle fo sizzle, so I don’t use da static HTML for my funk.)

    Dammit, keniki, now look whatcha did!

  43. Heather Paquinas

    Someone on syndk8.net had this as their signature:

    “What if sites that are using gateway pages are just trying to escape the web ghetto – Ali G”

    I don’t think that’s an actual Ali G quote, but close to one.

  44. Good morning Aaron & Adam

    [blockquote]Thanks for your attention and help. We live, we share, we learn :)[/blockquote]

    Back to the cannonical issue. What are the consequencies of a site suffering of being indexed under both www- and non-www versions? Does GOOG and Adam Lasnik see that as duplicates? Does the PR of such site get divided by the two versions (www- and non-www)?

  45. If it wasn’t a PR update why I have PR0 on both www and non www?

  46. Nobody cares about PR anymore right guys?

    *crickets*

  47. One more thing…

    Thank you.

  48. “Some people ask “Why don’t you just assume http://www.example.com and example.com are the same?” The answer is that they don’t have to be, and for some websites they are different.”

    This is understandable but why can’t Google assume that if they see say domain.com and http://www.domain.com with the exact same content that obviously they are the same site and therefore not penalize for duplicate content?

  49. hey keniki, that was not Spam, this being spoken of PR ritgh ?
    I am speaking of PR robbery .
    matt is the one that this one in charge to eliminate the Spam in the search ritgh ?
    if your you look for in google and the result, and visits the page he is something that it does not look for but that is pure pulicited is spam ?
    yes is spam, I could not recover my PR and do rediccion 301 for that reason.
    if post seems to you bad it is because you are owner of host and you do not want bad publicity
    hey matt, i sorry but it was not Spam my english is bad perhaps I did not know myself to explain at least that read thanks

  50. Interesting experience I’m having with this issue. I recently changed my weblog from a subdomain to a fully-fledged domain and sat back to let the search engines index it.

    Yahoo has 65 pages indexed but, strangely, three pages are indexed without the www, despite all my internal links being to the www version and there being a redirection in place from non-www to www. There’s unlikely to be external links to those pages using the non-www version either, since most internal links are all to the old (now redirecting) subdomain.

    Google’s indexing it fine, of course, and including the www on all the URLs. 🙂

  51. Love the word “bring your webmaster ducks in row” – it is a real good description of what we try to do to get consistent links / navigations / paths etc. It isn’t that easy if you have multiple editors / producers / webmasters and even developers working on several parts of your site(s).

    I understand that we need to do our job (being consistent) to avoid additonal analytical work load on your side – so Google shouldn’t assume “that obviously they are the same site and therefore not penalize for duplicate content” – as we see a lot of tricks in this market.

    Better for me that we get “penalized” now for being lazy (or inconsistent in our navigation) than being overrun by spammy sites that use such “assumptions” for a while to get around filtering.

    For me it looks like a trend at Google: you work to educate us, but also give us the Tools (Webmaster Console etc) to scale this educational process.

  52. Hmm, I may have jumped the gun there – further examination shows Yahoo has the www. version indexed, but the green url text under the snippet in site explorer is missing the http://www... Weird. 🙂

  53. Hello, my site: example.com had a PR0 and http://www.example.com had a PR3. Are you saying that this was just visual? In reality the PR was the same for both pages?

  54. Thanks Matt, I have used Google’s webmaster console to specify the preferred root page of domains in the past to avoid this problem. Also great link to the info about how to do 301’s for different platforms.

  55. *** Why can’t Google assume that if they see say domain.com and http://www.domain.com with the exact same content? ***

    For any dynamic site – shopping, forum, news, etc – even looking at both pages (www and non-www) within seconds of each other, will find that the pages are not exactly identical, because the visible content will already have chnaged in some way.

    It may be minor, just a few changed words, an extra headline, a new post, an extra or different internal link, but that will still be enough of a change that now makes it non-identical.

  56. I was afraid of this. Pr of my two websites droped down, and I dont know what is wrong with this.

    Now I know that everything is ok.

  57. I just want to say Thank you Matt! As a small business owner/self-seo/marketer, etc., I try to stay up with what’s right and what’s wrong in the search industry. And all I can say is that having your blog, as well as the blogs of some of your colleagues to read, has been nothing short of a miracle for me.

    This post is a great example of that, my site had old pages that still showed in the index and bothered me to no end. But as you said Google has removed those since then so I too would recommend not “freaking out” about it (as I did) and let Google do their thing. It has always worked for me so far. Please keep it up, as your valuable insight has helped a lot of us figure out what Google really wants, and ultimately help provide a better search experience for the end user.

  58. Has anyone noticed that with ie7 the google search facility (generally top right) only has searches through Google.com if Google are chosen as the search method? So if you’re in Canada, or the UK, the search results are sub-optimal? If this becomes the default search mechanism (as it is doing) users aren’t getting good results and will stray from Google – as will be welcomed from Microsoft, developers of ie7 (of course!).
    You can change to a ‘national’ Google as the choice – but man! You have to jump through some hoops – and even then I can’t get it to persist! 99% of users aren’t going to bother – they’ll just go to Yahoo or MSN instead.
    Regards
    Baron

  59. Hello,

    I work for a company that wants to change the URL of one of our subsections from city.example.com/denver to city.example.com/centraldenver. I suggested using a 301 redirect.

    Our CMS vendor stated that a “google expert” advised them against using redirects because they had problems getting ranked at launch(sandbox). This seems like a bit over cautious as this is a legitimate use of a redirect. Am I correct in thinking there would be no penalty in this scenario?

    Thanks,

    Bryan

  60. The new URLs would be picked up fairly quickly, and the old URLs would hang around in the SERPs mostly as Supplemental Results for many months. However, anyone clicking on those old results would be delivered to the correct content by the 301 redirect that you have installed. Make sure that all of your internal links point at the new location for the content, and then try to get most of your external incoming links to also point at those new URLs too. Using a program like Xenu LinkSleuth would be very useful to verify that all of your internal linking is 100% accurate.

  61. Matt,

    I want to develope new areas of my site. Can you comment on what structure is better for SEO?

    newarea.mysite.com
    or
    http://www.mysite.com/newarea

    Thanks

  62. Hey Matt, thanks for the link.

    Multi-Worded Adam and Sebastian — thanks for your input. I have corrected the ASP example code on our 301 Redirects howto at http://www.beyondink.com/howtos/301-redirect.php

    Happy redirecting,
    – Ryan

  63. I’ve heard about canonical issues for the first time today. Before I specify a preferred domain in Google and set 301 redirects for the many non-preferred domains that I seem to have, I have a question. Are there any possible side effects on the currently indexed pages and their rankings? What happens to the ranking of the non-preferred (non-www.) pages?

  64. Matt the two URLs that you mentioned http://phpicalendar.net/ and http://www.phpicalendar.net/ are not different from each other.

  65. Well, This is really a grate information… So thank you.
    But how to apply the 301 url redirection on windows hosting dose the .htaccess file exist on the server as apache hosting ?

  66. No on Windows you need to use isapi_rewrite or IIS_rewrite, similar but nowhere near as good.

    Take a look at this page it has examples: http://www.askapache.com/htaccess/list-of-methods-to-redirect-users-to-different-page.html

  67. Thanks Matt for this explanation. I think this helps some of us understand why others have seen some minor and rare updates this time around, while everyone is waiting for the current PR update to occur. Many are freaking out because the current PR update seem so late and then they see the minor changes and think they’ve missed the boat, so hopefully this will put things in perspective for everyone.

  68. Thanks for explaining it. I didn’t know that you could have the www in the url point to a different site. I’m checking both ways on my site now for pr!

    ttyl

  69. Can you write more about this issue? Namely, something covering the difference between pages being viewed by Google as both http and https, even though there is a redirect to https in htaccess.

  70. In the past I also had problems with lost PR on a 301 redirect. Is this common or I missed something?

  71. It would be really nice to know the effects of 301 redirect on PR. All the explanations everybody is giving, is how to successfully inform Google to index your new page, but nobody tells you what the effects are on your PR.

    We are in the process of creating a new site with the same content as our current site, but management is so scared of losing PR, we are starting to lose all the gain we are suppose to be getting with the new design in regards to usability and presentation.

    I am going to go bold before my time. Matt please help.

  72. On prominent sites, url’s such as (for example)
    http://www.somedomain.com/folder/page.asp
    …when removing the www [and hitting Enter]
    http://somedomain.com/folder/page.asp

    the http://somedomain.com/folder/page.asp quickly becomes
    http://www.somedomain.com/folder/page.asp

    …maintaining the entire url

    However, if

    http://www.somedomain.com/folder/page.asp has the www removed and becomes

    http://www.somedomain.com

    …and doesn’t maintain the entire url, what is set wrong? Or, what settings are wrong?

    IIS6 Win 2k

    Thanks in advance!

  73. Hey Matt,

    THanks for the great info.

    My domain is hosted. Is this 301 redirect something I need to have my host configure, or is it a file I include like my sitemap?

  74. Hi all…

    Relative newbie here learning about all this after just creating a web site for my new business venture. I just published my site and Google searches just began displaying some of the pages last night. As I was exploring Webmaster Tools yesterday I discovered in the Settings tab the option to choose a preferred URL or not.

    Have to admit this is a bit overwhelming for a beginner but I was wondering if someone could elaborate a bit more on how to check (and what to look for) with regard to “internal links”. I assume you’re referring to the links on each web page that link to all the other pages on your site? What is it that we should be looking for?

  75. Curious George

    Hi guys,

    Did anyone ever answer Hawaii’s question about “200 redirects” ?
    I’ve just come across them myself.
    I would usually recommend 301 to my clients, but I’d like to know if
    a 200 redirect is “search engine friendly” or not, with SEO in mind.

    cheers,
    G

  76. Regarding the setting in Google Webmaster Tools, we have indicated our preferred domain and it seems to work. However it did not resolve the problem of our site listing as domain.com/index.html and domain.com/

    Our big challenge is getting our /index.html to consolidate under domain.com on Apache. Does anyone have advice on this?

css.php