<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Googlebot: Keep out!</title>
	<atom:link href="http://www.mattcutts.com/blog/googlebot-keep-out/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mattcutts.com/blog/googlebot-keep-out/</link>
	<description>neat fun stuff</description>
	<lastBuildDate>Fri, 19 Mar 2010 01:55:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: John S. Britsios</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-208068</link>
		<dc:creator>John S. Britsios</dc:creator>
		<pubDate>Mon, 29 Dec 2008 14:52:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-208068</guid>
		<description>Matt,

can &quot;?googlecrawl=no&quot; be implemented as an alternative to the &quot;nofollow&quot; attribute? If not, why is that?

Thanks,

John</description>
		<content:encoded><![CDATA[<p>Matt,</p>
<p>can &#8220;?googlecrawl=no&#8221; be implemented as an alternative to the &#8220;nofollow&#8221; attribute? If not, why is that?</p>
<p>Thanks,</p>
<p>John</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sunil</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-131482</link>
		<dc:creator>Sunil</dc:creator>
		<pubDate>Fri, 08 Aug 2008 15:22:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-131482</guid>
		<description>Hi Matt, what is your stand on this issue

http://www.youtube.com/watch?v=nLB1-Kc4CWE

all Blogger blogs using custom domains are not being indexd and cached after 30 July. No Google employee is ressponding on this issue in webmaster help group. Where elsse we can hope for the answer?

Please take a look of this issue.

Thanks,
Sunil</description>
		<content:encoded><![CDATA[<p>Hi Matt, what is your stand on this issue</p>
<p><a href="http://www.youtube.com/watch?v=nLB1-Kc4CWE" rel="nofollow">http://www.youtube.com/watch?v=nLB1-Kc4CWE</a></p>
<p>all Blogger blogs using custom domains are not being indexd and cached after 30 July. No Google employee is ressponding on this issue in webmaster help group. Where elsse we can hope for the answer?</p>
<p>Please take a look of this issue.</p>
<p>Thanks,<br />
Sunil</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jenny W</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-98560</link>
		<dc:creator>Jenny W</dc:creator>
		<pubDate>Sun, 04 Mar 2007 20:16:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-98560</guid>
		<description>Awesome!  I love the ?googlecrawl=no idea!

&lt;a href=&quot;http://www.askapache.com/2007/seo/seo-with-robotstxt.html&quot; rel=&quot;nofollow&quot;&gt;robots.txt examples for phpBB and WordPress&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Awesome!  I love the ?googlecrawl=no idea!</p>
<p><a href="http://www.askapache.com/2007/seo/seo-with-robotstxt.html" rel="nofollow">robots.txt examples for phpBB and WordPress</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SEOboy</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-95011</link>
		<dc:creator>SEOboy</dc:creator>
		<pubDate>Sun, 28 Jan 2007 03:17:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-95011</guid>
		<description>Hi Googlebot search logic.
I still don&#039;t know the logic of Google to find brandnew website. But I know some tactic for them to find you.
You can post your URL to some famous forums in which googlebot frequently visits. Then googlebot will see the link to your site there and start to crawl you.
Hai  :)</description>
		<content:encoded><![CDATA[<p>Hi Googlebot search logic.<br />
I still don&#8217;t know the logic of Google to find brandnew website. But I know some tactic for them to find you.<br />
You can post your URL to some famous forums in which googlebot frequently visits. Then googlebot will see the link to your site there and start to crawl you.<br />
Hai  <img src='http://www.mattcutts.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Googlebot search logic</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-90387</link>
		<dc:creator>Googlebot search logic</dc:creator>
		<pubDate>Thu, 30 Nov 2006 19:30:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-90387</guid>
		<description>Can anyone provide me the basic logic with which google starts crawling . This is not the logic of crawling a particular website , but the logic with which Google finds out a newly introduced Website ( with new IP and Domain name) 

Urgently needed
Regards -</description>
		<content:encoded><![CDATA[<p>Can anyone provide me the basic logic with which google starts crawling . This is not the logic of crawling a particular website , but the logic with which Google finds out a newly introduced Website ( with new IP and Domain name) </p>
<p>Urgently needed<br />
Regards -</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: trevor</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-62830</link>
		<dc:creator>trevor</dc:creator>
		<pubDate>Sat, 05 Aug 2006 17:38:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-62830</guid>
		<description>Could collecting info by crawling url&#039;s/content that are forbidden by the webmaster be construed as theft or invasion of privacy? If not, why not?</description>
		<content:encoded><![CDATA[<p>Could collecting info by crawling url&#8217;s/content that are forbidden by the webmaster be construed as theft or invasion of privacy? If not, why not?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keith Ort</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-19437</link>
		<dc:creator>Keith Ort</dc:creator>
		<pubDate>Mon, 27 Mar 2006 21:58:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-19437</guid>
		<description>Matt,
The removal finally went through.  Thanks for looking into it.</description>
		<content:encoded><![CDATA[<p>Matt,<br />
The removal finally went through.  Thanks for looking into it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Payne</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-19321</link>
		<dc:creator>Jon Payne</dc:creator>
		<pubDate>Mon, 27 Mar 2006 15:56:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-19321</guid>
		<description>Matt - regarding the first couple sentences of your post - it looks as though Google is in some cases indexing pages with &quot;id&quot; as a parameter - see the first result (teleflora) on this search:

http://www.google.com/search?q=dozen+premium+red+roses

The resulting page is http://www.teleflora.com/product.asp?id=34529  Does Google recognize that with Teleflora &quot;id&quot; is not a session ID but rather their product sku?  Does Google SiteMaps help is this is a concern one might have?</description>
		<content:encoded><![CDATA[<p>Matt &#8211; regarding the first couple sentences of your post &#8211; it looks as though Google is in some cases indexing pages with &#8220;id&#8221; as a parameter &#8211; see the first result (teleflora) on this search:</p>
<p><a href="http://www.google.com/search?q=dozen+premium+red+roses" rel="nofollow">http://www.google.com/search?q=dozen+premium+red+roses</a></p>
<p>The resulting page is <a href="http://www.teleflora.com/product.asp?id=34529" rel="nofollow">http://www.teleflora.com/product.asp?id=34529</a>  Does Google recognize that with Teleflora &#8220;id&#8221; is not a session ID but rather their product sku?  Does Google SiteMaps help is this is a concern one might have?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sebastian</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-18532</link>
		<dc:creator>Sebastian</dc:creator>
		<pubDate>Tue, 21 Mar 2006 22:18:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-18532</guid>
		<description>Servus Matt,

read your article twice but I guess my english is just not good enought to understand. Anyway maybe someone has a solution for my problem:

I&#039;m the admin of a German karting website, just for fun and private but I think it is quite good (around 120 unique visitors per day). So now after the site is up for 1,5 years it is time to look what could be improofed. Well there is for sure a bunch of stuff to do and that is how I found your blog.

Q1:
in my robot.txt i have:
User-agent: * 
Disallow: /administrator/
So why do i see google bot checking a page that you can only reach after loging into the admin backend O_o
http://www.domain.de/administrator/index2.php?option=com_content&amp;.......

Q2: 
The Site has the feature to print articles (usually about races and stuff like that) as a pdf well seems like the google bot really likes that feature - it indexes the pdf files rather than the actuall page with doen&#039;t help the user cause the pdf engine doesn&#039;t support pictures.
Here is a link as it could be:
http://www.domain.de/index2.php?option=com_content&amp;do_pdf=1&amp;id=80

So how do I tell google bot not crawl these links? As the site is based on the joomla/mambo cms it is not easy to make changes on the system itself so I would prefer a work around. Btw. what would you guys guess, will google then index any site at all?


Maybe I should leave it with that but now that I have started here is another question - witch url is the better one ;)

Q3:
http://www.domain.de/index.php?option=com_content&amp;task=view&amp;id=76&amp;Itemid=1
or
http://www.domain.de/content/view/76/1/
and what about this one?
http://www.domain.de/component/option,com_docman/task,cat_view/gid,80/


So thanks to anyone,
Sebastian

just in case someone wants to mail directly: #azreael#ät#web#pünkt#de#</description>
		<content:encoded><![CDATA[<p>Servus Matt,</p>
<p>read your article twice but I guess my english is just not good enought to understand. Anyway maybe someone has a solution for my problem:</p>
<p>I&#8217;m the admin of a German karting website, just for fun and private but I think it is quite good (around 120 unique visitors per day). So now after the site is up for 1,5 years it is time to look what could be improofed. Well there is for sure a bunch of stuff to do and that is how I found your blog.</p>
<p>Q1:<br />
in my robot.txt i have:<br />
User-agent: *<br />
Disallow: /administrator/<br />
So why do i see google bot checking a page that you can only reach after loging into the admin backend O_o<br />
<a href="http://www.domain.de/administrator/index2.php?option=com_content&amp;......" rel="nofollow">http://www.domain.de/administrator/index2.php?option=com_content&amp;&#8230;&#8230;</a>.</p>
<p>Q2:<br />
The Site has the feature to print articles (usually about races and stuff like that) as a pdf well seems like the google bot really likes that feature &#8211; it indexes the pdf files rather than the actuall page with doen&#8217;t help the user cause the pdf engine doesn&#8217;t support pictures.<br />
Here is a link as it could be:<br />
<a href="http://www.domain.de/index2.php?option=com_content&amp;do_pdf=1&amp;id=80" rel="nofollow">http://www.domain.de/index2.php?option=com_content&amp;do_pdf=1&amp;id=80</a></p>
<p>So how do I tell google bot not crawl these links? As the site is based on the joomla/mambo cms it is not easy to make changes on the system itself so I would prefer a work around. Btw. what would you guys guess, will google then index any site at all?</p>
<p>Maybe I should leave it with that but now that I have started here is another question &#8211; witch url is the better one <img src='http://www.mattcutts.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Q3:<br />
<a href="http://www.domain.de/index.php?option=com_content&amp;task=view&amp;id=76&amp;Itemid=1" rel="nofollow">http://www.domain.de/index.php?option=com_content&amp;task=view&amp;id=76&amp;Itemid=1</a><br />
or<br />
<a href="http://www.domain.de/content/view/76/1/" rel="nofollow">http://www.domain.de/content/view/76/1/</a><br />
and what about this one?<br />
<a href="http://www.domain.de/component/option,com_docman/task,cat_view/gid,80/" rel="nofollow">http://www.domain.de/component/option,com_docman/task,cat_view/gid,80/</a></p>
<p>So thanks to anyone,<br />
Sebastian</p>
<p>just in case someone wants to mail directly: #azreael#ät#web#pünkt#de#</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: CrankyDave</title>
		<link>http://www.mattcutts.com/blog/googlebot-keep-out/#comment-18525</link>
		<dc:creator>CrankyDave</dc:creator>
		<pubDate>Tue, 21 Mar 2006 21:37:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/?p=247#comment-18525</guid>
		<description>**Hmm. A more correct way to put it would be that there is a regular Googlebot and a supplemental Googlebot (though their user agents will be the same), and uncrawled urls from the regular Googlebot will go in the regular index while uncrawled urls from the supplemental Googlebot will go in the supplemental index. Hope that makes sense; I believe that’s correct.**

Matt,

Can you, or anyone else, show me an example of a uncrawled URL  from the supplemental index in that shows up in the search results?

Thanx :)</description>
		<content:encoded><![CDATA[<p>**Hmm. A more correct way to put it would be that there is a regular Googlebot and a supplemental Googlebot (though their user agents will be the same), and uncrawled urls from the regular Googlebot will go in the regular index while uncrawled urls from the supplemental Googlebot will go in the supplemental index. Hope that makes sense; I believe that’s correct.**</p>
<p>Matt,</p>
<p>Can you, or anyone else, show me an example of a uncrawled URL  from the supplemental index in that shows up in the search results?</p>
<p>Thanx <img src='http://www.mattcutts.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>
