Learn more about robots.txt

We made a video about how Google handles the robots.txt file. You can watch it if you want:

This answers a couple questions such as:
- Why is my url showing up in Google when I blocked it in robots.txt? Did you fetch that url?
- How do I make that url disappear from Google?

I hope the video helps if you have questions.

94 Responses to Learn more about robots.txt (Leave a comment)

  1. Hey Matt, have you planned to switch back to “textual content” for your blog someday ? I don’t really understand the benefits of video for all those articles. Here are 3 reasons why I don’t like videos :
    - Harder to understand for non english / american people
    - Harder to load with slow connections
    - Harder to watch when for example in China or any country / organizations where mosts videos sites are blocked.

  2. Juan

    Great info!

    But how can Google crawl a page to read the meta tag if the crawling it self is blocked by the robots.txt?

    Should we in that case not include the blocked page on the robots.txt?

  3. Tejaswi

    Why o why video. Why not just write about it so that Google can not only index the title tag of this post, but also its body :-(

    Seriously, why video – why not just good ol’ plain text?

  4. That’s strange never knew it was “ok” for google.com to index websites even when it said User-agent:* Disallow:/

  5. Thanks Matt. It really clarified my doubts about type of URLs on Robots.txt file.

  6. @Tejaswi
    He used RDF tags which you can read more about here:
    Official Google Webmaster Central Blog: Introducing Rich Snippets
    Official Google Webmaster Central Blog: Supporting Facebook Share and RDFa for videos

    Also, video is great because you can listen in the background. Heck, I wouldn’t mind a podcast.

  7. Thanks for that Matt.

    After all I’ve one little question:
    What is with content which’s not on a HTML-site like a http://……/folder1/x.jpg-picture someone’s linking to? If I don’t allow with a robots.txt to crawl anything in ‘folder1′, would you index that from an anchor, too?

  8. webjet

    The robots.txt is a much misunderstood file. First, it only works with those bots that obey it (such as Google) so its not really a “privacy” option. In my view, the video was a pretty good short and simple explanation of how sites/pages can still be indexed (via external links and anchor text, or other directories). The “Noindex” meta tag or “Remove URL” are options after blocking in robots.txt, but maybe not for all search engines. there is also the “noodp” tag, if I recall that stops the odp listing or at least used to ?

  9. Mark… a bigger marker next time :-)

    Thanks for the video!

  10. Martin

    Why video??? OMG?

    I am unable to run any video WITH SOUND here. If you really need to exhibition in a movie, show some slides there too. But good old text would be better.

  11. Dudibob

    Matt,

    Cheers for the video, was informative however what can people do who use some of those weird CMS Ecommerce system which creates multiple URLs (as not all of these URL’s can be redirected to the ideal one).

    How can someone block the ‘real’ pages which have a rubbish URL structure as they can’t be redirected (else the site breaks) using the meta tag would apply to both versions of the page, the robots.txt doesn’t block Google directly (as internal links still seem to count towards the plain SERP listing even when robots.txt’d out) and the URL remover tool only works for 90 days.

    if you want more information about this I can email you if that works better?

  12. “Disallow” is a crawler directive, it simply tells bots “do not fetch”.

    “Noindex” is an indexer directive, it forbids indexing. When you use “noindex” on the page or with x-robots-tags in HTTP headers (PDF, images, video …) you must allow crawling, because without crawling no search engine can read and obey those indexer directives.

    Google is the only search engine supporting “noindex” in robots.txt, hence the safe way to keep stuff out of search engine indexes is:

    robots.txt: Allow: /path or Disallow:

    Per URI in /path: “noindex” meta elements / x-robots-tags

  13. Hi Matt

    Very nice video, thanks.

    Not wishing to be critical, but I think you could have made an even nicer video. :)

    This is a topic I have often had to cover with clients over the years. The key thing I elaborate is the difference between crawling and indexing – this seems to unlock the barriers most people have to understanding this issue. robots.txt controls crawling, not indexing (and applies to all robots, even non-indexing robots); URLs can still be indexed even if they are listed in robots.txt – they just can’t be crawled. The “noindex” bit of the robots meta tag applies to indexing, not crawling; it can’t stop a URL being crawled, because the URL has to be crawled in order to read the tag.

    I wrote an article on this long ago, when Google was in its infancy and was the first engine to exhibit this kind of behaviour. It may be a little out of date now (e.g. it pre-dates the URL removal tool), but it’s still relevant and describes in much more detail the difference between crawling and indexing, and robots.txt and the robots meta tag. It’s here:

    http://bit.ly/ukKqG

    Matt, feel free to re-use any of the concepts it contains when you’re talking on this.

    [PS @ Tejaswi, the above link is to a good old fashioned text article!]

  14. Yeah nice vid, even when I prefer good old text instead of vids ;)
    Cleared some open questions im my head, thanks!

  15. I found my client clientdomain.com had a duplicate site indexed on clientdomain.info which pointed 170K links to the .com version.

    I told the client to 301 the duplicate.info site to the .com site (when we found the problem a few months back) and then remove the 301 and add a no index to the robots.

    An “expert” has got my client in a panic saying:

    a) you are at risk of being black listed
    b) when you remove the 170K links your rankings will suffer significant damage.

    My response to both is “nonsense” because

    a) you have taken the necessary steps to clear up the duplicate sites and have therefore removed any perceived attempt to spam the index
    b) 170K links from a single domain that has been 301 redirected to your main domain will not carry any significant “link juice” if any at all.

    Am I right or am I missing something?

  16. seo

    Why cann’t see the video?

  17. Dan

    I think I’m right in saying that ‘allow: /directory-name/’ isn’t actually supported in the standard protocol. I know Google supports it but I don’t know about other search engines.

    That right Matt?

    @seo: Because you have your eyes closed?!

  18. Excellent video…
    Some told me that if you screw up the robots.txt file, you are out of the search engines… which is clearly not the case. It’s the meta NO INDEX tag which will completely get you out of the search results.

    So here is my question:
    If you go through the ROBOTS.TXT file and see that you can’t crawl example.com/folder/123.htm

    If you do not crawl that page, how will you know if that page has a NOINDEX NOARCHIVE on that page?

    Will the page continue to show up in the SERP since you can not crawl to determine the intention of that page?

    Thanks

  19. @Evan Islam … NOINDEX does not necessarily mean don’t index a URL either. Matt asked “What should NOINDEX do?” last year:

    http://www.mattcutts.com/blog/google-noindex-behavior/

    The simple answer is that if you don’t want other sites (including search engines) to link to your site, then you shouldn’t publish your site on the Web. I gave a more complex answer here:

    http://bit.ly/JNylb

  20. Maybe you guys can save yourselves and many other people some grief by including a notice along the lines of “This URL has not been crawled by Google.” with URL-only listings.

  21. LaptopHeaven

    Are pages which are marked with the NOINDEX directive used when calculating inbound and outbound links?

  22. Hey Matt,

    I’m glad Google takes robots.txt compliance so seriously. Google is definitely above deck when it comes to privacy concerns.

    Perhaps unintentionally, Google goes a little overboard. When I switched from blogger to wordpress, I blocked / in robots.txt for a few hours while I worked on moving all my posts, because I didn’t want search engines to come along and see the site under construction. To my horror, Google cached the robots.txt file! For about a week, webmaster tools was showing my whole site blocked, even though the blocking robots.txt only existed for a matter of hours. My main sitemap is still showing only about half the URLs indexed, when it was close to 100% before the move and the robots.txt catastrophe.

  23. I’ve seen something a few times which is really confusing me. If I perform a search for ‘robots.txt’, one of the top results is the Google robots.txt file:

    google robots.txt
    User-agent: * Allow: /searchhistory/ Disallow: /search Disallow: /groups Disallow: /images Disallow: /catalogs Disallow: /catalogues Disallow: /news …
    http://www.google.com/robots.txt

    I understand that robots will read this file, but it seems a little weird that a robot.txt file of all things would be included in the search index?

    My question is – ‘is there a way from stoping the robots.txt file itself from being indexed’

  24. @Aaron Newton, in my experience a site’s robots.txt file may be indexed if a crawlable link to that file exists. Most often that is not the case.

    There is a way to stop the robots.txt file being indexed. You would need to use the X-Robots-Tag in the HTTP response for that file:

    X-Robots-Tag: noindex

    Please note I have tested this tag on some files, but I have not tested it on robots.txt itself – there may be side effects and I don’t consider that file being indexed as a big deal. For more info on X-Robots-Tag, see:

    http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html

  25. Thanks for clearing this up for me. I was confused and had just struggled with removing a dead url.

  26. V

    Thanks Matt for clearing that up,

    We have a a lot of problem on out forums with this.

    V

  27. Simon Garlick

    Cunning information management strategy here from Google. Useful information displayed in a form – a personally-narrated video – that can not be copied and pasted anywhere, that can’t be attributed to anyone else.

    Of course it’s also not searchable or indexable. It’s like rain on your wedding day.

  28. Hi Matt,

    I am thankful to you for sharing such a wonderful solution for the Robots.txt file.
    On your behalf I ma typing it in text so that the other who don’t have the video acces then understand the explanation of your Robots.txt.
    Specially for Tejaswi October 6, 2009 at 11:39 pm.
    1. Matt said in the video about the Robots.txt. Sometime we say that even though we have set in the robots.txt to disallow few pages of the website which we don’t want to crawl by google and to protect from user.
    2. Matt has given a solution to use the . When we use Nofollow then this page will totally avoid by the google crawler and never get index.
    3. He has given one example of the http://www.dmv.ca.gov that the page even have disallowed in the robots.txt but the meta description were displaying in the SERP. So Matt said it happens due the submission in the ODP (Dmoz), sometime the description fetched by crawler from ODP.

    If I am missing some points then please do it correct. I hope it is easy now for those who don’t have video access.

    Thanks & Regards,
    Vaibhav Pandey
    India

  29. I noticed that too. When I created my WordPress blog with Fantastico the no crawling option of WP was on by default and for the longest time my website would show up in Google for some keywords but the title and description would never appear. I thought it was some kind of bug until I realized I had blocked all search engines from crawling my site. Thanks to the SEO for Firefox addon that highlight no follow links, otherwise I might have never noticed that option was on.

  30. Fact is – if this video didn’t exist, no one would’ve ever known that Google follows this complicated way of indexing (or not indexing) websites. The internet is filled with text saying “if you don’t want a page indexed, use disallow in robots.txt”. Well anyway, thanks you that this has been made clear.

    I understand Google’s point of view on this and in a way, logically you’re right, only it has to be documented more clearly in Google Webmaster Guidelines :
    http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449

  31. Great video. Shame that most people will not find this “official-unofficial” stuff, I think the big G needs to make thins like this far more accessible to “joe public webmaster” who has no clue about even the basics.

  32. Hey Matt,

    Thats really a great video. Few days back i had the same query from one of my clients. I used Robots.txt to ban a couple of directories for not to show up in search results. But soon after the next crawl i saw many pages from those directories in search results. I was shocked, whether its a new google labs science or something else :). So i deleted those directories using URL Removal tool from Google Webmasters.
    I still have a question, is there any way that we can block our site to be indexed for a specific time period? suppose some days or even some hours? Is there any solution of adding some code in robots.txt which automates the process.
    Regards,

    Aqeel Bilal Malik

  33. Matt, what happens to the PageRank under these two directives in that case? If I block a file using robots,txt, I am sure, google will still index it if it has backlinks and will gain PR from those links and pass it on to the other pages.

    On the other hand, if I use meta noindex tag, the URL is altogether dropped from the index. Now will it accumulate PR? Will it distribute PR? I think it won’t because this page is out of the web net per say and the links that point to it should not be passing any PR. Am I right?

  34. Thanks for the clear explanation Matt :)

  35. Great video! Its funny that people think they can tell Google what to do AKA- don’t crawl this website, and they think Google will just do what they say. My theory is that Google will do whatever they want to if they think they are going to produce the most relevant search results for their users. Besides duplicate content…and there are other ways to deal with that, why would you not want Google to list your site?? Thanks for the Info Matt….btw how is your Apple only products 30 day challenge holding up??!

  36. I owe you one for this, and the timing is perfect. I had a bit of an argument with a client just the other day who was up to some nogoodness trying to hide it with a disallow.

    Busted! I said, you’ll get busted.

    If you want to hide stuff from you cat’s you gotta have a pretty sick cloak these days, keep up the good work ;)

    Missed you at SEOktoberfest this year, heard you were invited. You missed out, awesome times, did you see the video?

    Take care, look forward to seeing you again soon, PUBCON? Seeing how I can’t buy you a drink, I’ll let you buy me one :0

  37. Matt:

    Very nice video and thorough explanation of the issues and facts surrounding this issue.

    I’m scared to disallow anything in my robots.txt for fear of screwing up and getting pummeled. LOL!

    When your whole purpose in life is to move sites up the rankings, some things you just are not comfy with monkeying around with….and I know how it works.

    RM

  38. Well, that’s the most self-serving Semantics I’ve heard in a long while. Come on Matt, surely Google doesn’t need anymore pages to list in the SERPs? I thought Google was in favor of quality over quantity?

    You guys know FULL well that when someone uses a robots.txt to block googlebot they don’t want the link showing up in the SERPs. Why else would you “hear all the time….”?

  39. Dave (Original):

    No disrespect, but I think I have to disagree with you. Robots.txt came about as a set of traffic signals for spiders – its purpose was to say, “okay, you can check out this page, but not that page.” As another commenter said, robots.txt is about crawling. Search engines, however, are a function of indexing, which is dramatically different from crawling.

    In this video, I believe that Matt is saying ‘use the right tool for the right job.’ What is so wrong about that that it deserves to be classified as ‘self-serving semantics’?

    Thanks for the video – it is much appreciated…

    Greg H.

  40. Wander

    hi matt,
    this was a very helpful video, but there is one thing im wondering
    the noindex thing can be used to block an html page from being shown in the google results
    but what about non-html files, like pdf’s, images, etc?

  41. Thanks for clearing this up, I had some pages that I didn’t want indexed and this should help do just that

  42. The video is clear no doubt and thanks for clearing up any questions i have.

  43. don

    Matt,

    I have read conflicting reports on the value of the robots file to help filter out duplicate content, especially when it comes to blogs…can you shed some light on this..thx

  44. What happens if we do not use robot.txt problem?

    moreover
    Youtube banned in Turkey can not watch video

  45. Thanks for the video Matt, was great, it is always pleasant to watch you, you already had an explanation on this subject on your blog before, but so many people have complained about the way Google deal with the robots.txt file. Video are always more pleasant to watch, compared to reading a post, but Matt next time before you make a new video you should ask to the person who made the first comment, he seems not to like the idea you explain things on video, just don’t pay attention to stupid people, I vote for the videos…

  46. Very clear and concise Matt, great job!

    Karl

  47. Matt, How can we handle different bots like Google, adsense, yahoo and bing from single robots.txt?

  48. Thank you for the video.

    It got me thinking. If you put URL’s in the index that are not crawled, how can you be sure this page really exists? And even more important, how do you know it does not host malicious software?

    Curious to hear your thoughts on this.

  49. hmm… looks like Google sometimes simply ignores robots.txt, I can say the same is observed with noindex tag as well….

    Google URL removal tool is basically pretty much the same as blocking urls in robots.txt….

  50. This answered my question and it was explained well. If the link popularity is their the right anchor text then Google can pretty much tell what the website is without having to scan the content on the page. It makes sense.

  51. This is actually really useful because I’d rather hear about how it actually works from someone at Google than all the hearsay online. I don’t know how many articles about robots.txt I’ve read that read like they are the defacto explanation of things like this and end up being.. kind of right… but not really 100% right.. or worse still, have the correct information but the ideology behind it is off. It’s better to understand what search engineers are really looking for and the ‘whys’ behind it so that you can better understand and predict similar or related situations in the future, I think.

  52. thanks great info
    but you gained some weight :)

  53. Greg H

    robots.txt is for SE spiders ONLY. So it makes sense when you block any bot from certain pages, the site owner does NOT want a link to that page in SERPs.

  54. Hi Matt,

    Outside of the scope of the Robot file, and as I ask this I think that many readers in the online universe would like to know the answer to this question, what do you recommend as a solution to removing posts on the Google search that fall under violent, slanderous and that also break copyright and federal trademark laws for using assets in a posting?

    I am having no luck through Google’s spam reporting or abuse report submissions.

    Best,

    Johnny

  55. We just wish there was a faster way to remove URLs from the serps on a low PR site.. You’d think by now it could almost be instant or overnight since so much is built into the webmaster tools section.

  56. robots.txt is for SE spiders ONLY

    Dave (Original), that is simply incorrect. If you’re going to shout at people, I suggest you check your facts first. :)

    robots.txt is for all robots, not for SE spiders only. It applies to non-indexing bots just as much as indexing bots. It controls crawling, not indexing. It’s hard for an indexing bot to index content it’s disallowed by robots.txt from crawling, but it can freely index the following without breaching the robots.txt standard:

    - the URL of that content
    - meta data associated with that URL/content, e.g. the ODP listing

    Indexing bots may choose to interpret the exclusion of a URL in robots.txt how they like – the standard imposes no limitations. In fact indexing is not mentioned at all in the robots.txt standard!

  57. hello,

    thanks for the heads up matt but i think you are slightly mistaken in what you said (unless i have done something wrong)

    if you go to google and type my name “anthony von ducci” you will see my site “www.anthonyvonducci.com” listed in your results. however, if you go to my site you will see clearly that there is the: noindex in the coding of my home page (and only page of my site)

    therefore, if what you are saying is correct in your video…why is my url still showing up in googles search results?!

    have i done something wrong?

    on a side note…i am huge fan of yours and i wish you and google the best of luck in the future!

    AVD

  58. Yes, my bad on SE spiders ONLY.

    Indexing bots may choose to interpret the exclusion of a URL in robots.txt how they like – the standard imposes no limitations

    Sounds like no “standard” to me.

    My beef is with Google linking to the site in their SERPs when they know full well that the user doesn’t want as much. Hence why “we hear all the time”. Surely Google can cut the little guy some slack here? I mean, it’s not as if Google NEED the link in their SERPS. I guess greed knows no limits.

  59. Perfect time for the video. Just had a meeting about Robot text with client who didn’t understand the who and what. I sent them the video before our meeting and everyone was on the same page throughout our conversation.

    Thanks Matt

  60. Hi Matt
    Rather than video, if you could give textual-content as videos are blocked at many places. At mine work I cant see video here. It shows me as a blank page.
    I hope textual form can help many others too.

    Or, if you can atleast place summary of the video at the bottom

    Thanks In Advance

  61. Matt,

    Thanks for that informative video. There were some great gems of knowledge for those who were listening. I always appreciate an in good explanation on how Google looks at robots.txt vs. how it looks at global link popularity. ;)

    Phillip II

  62. sunny

    Video is blocked at my place of work, an edict from executive level with which IT is obliged to comply.
    Ergo, accessibility = 0
    :(

  63. Matt,

    Like some others mentioned here, this is great time for me to read this, as I going through this issue and really learning about Google displays search results.

    Please do not stop doing videos – I greatly appreciate them.

  64. Great, but I do prefer text rather than video. I don’t have to relisten to it when I’ve missed something, easier to make notes from and don’t have to worry about barely seen markers being used.
    Otherwise clears up a few things.

  65. Thank you so much. This video tutorial on robos.txt helps me a lot.

  66. Hi Matt,

    I’ve been trying to see the video you posted but I cant view it for unknown reason.I have read some of the comments but until unless i see the video my self i can state my point of view about the robot.txt files. Can you please convert that video into text so it will be really very easy for all of us to understand what you are trying to say as WORDS SAY MORE THAN VOICES;)

  67. This is an excellent video, thanks for taking the time to review this…

    I always wondered why this happened and it is interesting to learn about the open directory description showing up as well. I have seen this a few times that I can remember for pages that were in the robots txt files…

  68. M1t0s1s

    Here’s something I haven’t seen answered. Why does Googlebot seem to always favor the low-bandwidth versions of forums with this feature? What are the seo implications of this? Does googlebot ‘know’ which types of forums have this and favor it? Also it’s not just the archive pages but individual threads as well. The main type I see this on is vbulletin.

    It’s kind of annoying sometimes, but then again I can understand that there’s still a lot of people out there on dialup connections.

  69. David

    The problem is not blocked pages showing up in SERPS but in webmastertools complaint messages.

  70. Okay, I do not poke my head out here much, but I just had too. With Google I do not find issues with the whole robots.txt or meta tags for indexing and following that cannot be solved. Kudos to Google.

    It is not always perfect, but darn . . . Google makes it easy to clean up.

    I was talking (robots.txt stuff) with Yahoo! about page level and URL level. Here is the live thread:

    http://suggestions.yahoo.com/detail/?prop=SiteExplorer&fid=169192

    Funny thing is they say listen to Matt and give me Google examples. I listen to Matt and Google gets it. I will let you read the above and decide yourself.

    Sometimes all this indexing gets too technical and we miss SE 101. Plus, there is a distinction on page level and URL level. Sometimes you want the page level and just the duplicate URL level to be removed; hence, sometimes there can be misunderstandings with duplicate content issues and the whole robots.txt handling with spiders vs. the “SE index” (SE databases).

    At the very least it is humorous.

  71. Great post Matt :) This is actually something I think should be posted on Google’s Official Webmaster Central Blog

  72. That makes sense. I tried all sorts with robots.txt with an issue I had with a Joomla site where the description was not showing up. I thought it was due to Google crawling but turned out to be a function of Joomla meta that stopped Google taking snipets. Watch out for it if you use Joomla at all.

  73. I find the bigger issue is getting all of my pages indexed. I understand page rank to be key but havent quite worked out a strategy to earn page rank. I also don’t understand the ‘sand box’ stage and the criteria for being taken seriously by google, thus leaving the ‘sand box’.

    Michael

  74. SEO

    Nice video! Never saw this one before! It gives some good advice! I like it.Thanks for this useful post that you have been shared to us readers. I am looking forward for your next post! Good luck for your next post!

  75. Hi Matt

    Would it be possible to have textual information instead of videos sometimes? I agree with the first comment about videos, less accessibility and harder to understand for some people

  76. Good video. Not bothered about textual content as I do get sick of reading blogs/articles. Nice to see some informative video.

  77. angela g

    Boy, I sure agree with JS Paris. And I’m native speaker of English, born & raised in US. Non-video is faster if you’re just talking.

  78. Great video and explanation as to be expected from ‘the horses mouth’…

    I love this part of the text you placed at the top of the video:

    “You can watch it if you want:”

    That’s like saying “Your banker has some information for you on how to increase your investments…watch it if you want.” I just loved the phrase-ology that you used there. :)

    S

  79. Hi matt
    I have seen your most of the videos in you tube and congratulations to one million video views. And finally come up with small clarification from you. Here http://www.google.com/support/forum/p/Webmasters/thread?tid=30390cbd6a1754c6&hl=en i have created thread recently related to Google indexing my site. The problem seems peculiar.. Would you please answer my question in google webmasters thread? Hope you can able to resolve my issue. :)

  80. Should host make it simpler to offer robot and non-crawl functions for newbies?

  81. Hi Matt,

    My one client owns art gallery website having thousands of web pages in which artists gests registered and also some times delete their profile, when they delete their profile then broken link is generated, in such way right now site is having hundreds of broken links
    Website is dynamic and having more and more broken links so removal of broken links from website is some tough task and major problem is that client is not providing me ftp details he says you provide me suggestion I will rectify the same.
    “I told him to remove all the broken links from web pages” then he told me that it is not possible becuase site is having hundreds of broken links.
    Now pls let me know could we fix those broken links with robots.txt also?

    Awaiting for a reply

    Thanks

  82. Thanks for the video Matt; consistent with everything you, robotstxt.org and the Google webmaster blog have said. Useful re-confirmation and explanation.

    @Anthony Von Ducci – The listing that I can see for you is rationally consistent with what Matt said. The page has not been crawled, in accordance with your robots.txt, so it consists of a link and “similar”. If you put “Disallow: /”, that’s exactly what should happen. The robots can’t see the NOINDEX you put in the home page, because they can’t crawl it. If you *had* an ODP (Open Directory Project aka DMOZ) listing, then that would be shown by Google, even if you used the NOODP robots directive on the home page, because the directive would be on a page that the robots aren’t allowed to see. Shows that the robots.txt file is working exactly as it should!

    @Aery – read robotstxt.org and use the “User-Agent” lines to identify the bots by name.

    @Dudibob – you need “Canonical link refs”, I think, to solve your duplicate pages problem; just search for it, it works for all the major search engines, eventually – you need to have the dups crawled, of course, to see the on-page meta tag!

  83. Great simple & to the point explanation Matt. I always thought blocking in robots.txt = no crawl = noindex. Now I know noindex and no crawl are two completely different things. Besides, you can always remove specific urls you do not want showing on Google in Webmaster Tools…

  84. zoe

    Hi, Matt. I’m so disappointed that I cannot watch the video in my country. However, thank you for sharing.

  85. Tom

    Thanks Matt! Great information as always. I rely heavily on my robots.txt. It is time I double check mine…I did the old “set and forget.” Thanks!

  86. Amy

    I usually ignore the robots.txt file. Thanks for the clip, I will keep it higher on my priority list now.

  87. Well I always write ‘index, follow’ in the robots.txt I’ve never really cared about changing it into anything else.

  88. Thanks for posting this. i’ve been wondering about those robots violating this for a while.

  89. Thank you very much for this video, its helpful for newbie like me

  90. Andy H

    The only way to figure out Google is to look backwards. People should forget all they think they’ve learnt and read The Anatomy of a Search Engine before they do anything else.

    In there, you will find the answer to the universe and everything. Matts video explanation above was first explained back in ’97. It seems to me that very little has really changed, which is a good thing.

  91. Thanks Matt – Was a little confused between robots.txt and the .htaccess but this clarifies the two.

  92. Murat

    I find the bigger issue is getting all of my pages indexed. I understand page rank to be key but havent quite worked out a strategy to earn page rank. I also don’t understand the ‘sand box’ stage and the criteria for being taken seriously by google, thus leaving the ‘sand box’. hocaniz.com/

  93. Thank you for sharing this. But, I always do write ‘index, follow’ in the robots.txt and I’ve never really cared about changing it into anything else. However, I may be able to commit to some changes as stated here in your video. Thanks!

Leave a Comment

Your email address will not be published. Required fields are marked *

*

If you have a question about your site specifically or a general question about search, your best bet is to post in our Webmaster Help Forum linked from http://google.com/webmasters

If you comment, please use your personal name, not your business name. Business names can sound salesy or spammy, and I would like to try people leaving their actual name instead.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php