<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How to verify Googlebot</title>
	<atom:link href="http://www.mattcutts.com/blog/how-to-verify-googlebot/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/</link>
	<description>neat fun stuff</description>
	<lastBuildDate>Tue, 21 May 2013 20:23:07 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: ganool</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-462684</link>
		<dc:creator>ganool</dc:creator>
		<pubDate>Thu, 04 Feb 2010 02:24:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-462684</guid>
		<description><![CDATA[is there any way to detect googlebot using php?]]></description>
		<content:encoded><![CDATA[<p>is there any way to detect googlebot using php?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Proxy Hi.Jack</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-112417</link>
		<dc:creator>Proxy Hi.Jack</dc:creator>
		<pubDate>Sun, 09 Sep 2007 17:04:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-112417</guid>
		<description><![CDATA[Thought a lot before placing this on a dead and burried blog entry but here it goes. I guess someone might benefit from it. If it&#039;s too l8 just drop the comment :)

With all the buzz about http://www.tellinya.com/read/2007/09/09/defend-your-website-against-google-proxy-hijacking/ I put my previously coded PHP class to good use and now filters my website traffic. And works very well.

PS:I am curious if once an IP verifies as Googlebot would it be safe to consider the entire C class as Google&#039;s???

Thanks.]]></description>
		<content:encoded><![CDATA[<p>Thought a lot before placing this on a dead and burried blog entry but here it goes. I guess someone might benefit from it. If it&#8217;s too l8 just drop the comment <img src='http://www.mattcutts.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>With all the buzz about <a href="http://www.tellinya.com/read/2007/09/09/defend-your-website-against-google-proxy-hijacking/" rel="nofollow">http://www.tellinya.com/read/2007/09/09/defend-your-website-against-google-proxy-hijacking/</a> I put my previously coded PHP class to good use and now filters my website traffic. And works very well.</p>
<p>PS:I am curious if once an IP verifies as Googlebot would it be safe to consider the entire C class as Google&#8217;s???</p>
<p>Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Houman</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-103798</link>
		<dc:creator>Houman</dc:creator>
		<pubDate>Thu, 03 May 2007 23:07:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-103798</guid>
		<description><![CDATA[Dear colleagues,

From now on, you can Check/verify googlebot ip&#039;s very easy.
Here you can verify  ip addresses:
http://www.2dwebdesign.nl/googlebotchecker.php

Notice that this tool makes use of the reverse DNS technique which google has recommended.
This tool also distinguishes all google ip&#039;s from googlebot ip&#039;s.

Regards,
Houman]]></description>
		<content:encoded><![CDATA[<p>Dear colleagues,</p>
<p>From now on, you can Check/verify googlebot ip&#8217;s very easy.<br />
Here you can verify  ip addresses:<br />
<a href="http://www.2dwebdesign.nl/googlebotchecker.php" rel="nofollow">http://www.2dwebdesign.nl/googlebotchecker.php</a></p>
<p>Notice that this tool makes use of the reverse DNS technique which google has recommended.<br />
This tool also distinguishes all google ip&#8217;s from googlebot ip&#8217;s.</p>
<p>Regards,<br />
Houman</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan Thies</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87520</link>
		<dc:creator>Dan Thies</dc:creator>
		<pubDate>Fri, 06 Oct 2006 00:25:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87520</guid>
		<description><![CDATA[Matt, thanks for providing clarity on these questions.

We do use WHOIS as the first pass, it&#039;s more reliable than DNS lookups for one. For anyone who thinks this is all meaningless, with one site I&#039;m working with, more than 95% of the &quot;Googlebot&quot; visits on a daily basis fail the WHOIS test.

So we really on need to fool with DNS lookups on the 5% that pass a WHOIS.

The process (generified since Googlebot isn&#039;t the only spider) is:
1) We have a user-agent identifying itself as a spider
2) Check the IP vs our database of good and bad IPs for that spider
3) Use WHOIS for anything that we can&#039;t look up
4) Use DNS to verify those that pass the WHOIS test, unless it&#039;s MSN
4a) (if MSNBot), cross your fingers and hope you didn&#039;t block the real bot

Robin, don&#039;t assume that those proxies would appear in any published database, respond on a predictable port, etc. If it is a public proxy, don&#039;t assume that the IP making the request is the same as the public proxy address. Some of the biggest public proxies run multiple servers, and the IP fetching pages is not the public WWW server.]]></description>
		<content:encoded><![CDATA[<p>Matt, thanks for providing clarity on these questions.</p>
<p>We do use WHOIS as the first pass, it&#8217;s more reliable than DNS lookups for one. For anyone who thinks this is all meaningless, with one site I&#8217;m working with, more than 95% of the &#8220;Googlebot&#8221; visits on a daily basis fail the WHOIS test.</p>
<p>So we really on need to fool with DNS lookups on the 5% that pass a WHOIS.</p>
<p>The process (generified since Googlebot isn&#8217;t the only spider) is:<br />
1) We have a user-agent identifying itself as a spider<br />
2) Check the IP vs our database of good and bad IPs for that spider<br />
3) Use WHOIS for anything that we can&#8217;t look up<br />
4) Use DNS to verify those that pass the WHOIS test, unless it&#8217;s MSN<br />
4a) (if MSNBot), cross your fingers and hope you didn&#8217;t block the real bot</p>
<p>Robin, don&#8217;t assume that those proxies would appear in any published database, respond on a predictable port, etc. If it is a public proxy, don&#8217;t assume that the IP making the request is the same as the public proxy address. Some of the biggest public proxies run multiple servers, and the IP fetching pages is not the public WWW server.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: IncrediBILL</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87172</link>
		<dc:creator>IncrediBILL</dc:creator>
		<pubDate>Sun, 01 Oct 2006 21:07:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87172</guid>
		<description><![CDATA[&lt;blockquote&gt;And even if there was an open proxy running the useragent string would contain “Googlebot”, and the IP address would be listed in the open proxy databases.&lt;/blockquote&gt;

Robin, not all proxies are just open ports like you&#039;re attempting to find. Some are CGI or PHP proxy sites, and they don&#039;t all pass the user agent thru, or you can configure the user agent. Some CGI proxy sites such as Proxify by default will tell the world you&#039;re using Safari on a Mac, so on and so forth.

Probably someone just running a stealth crawler, not Google (I&#039;m still wondering how you concluded it was Google) but armed just with the IP address you reference, ARIN.NET claims it&#039;s in a block of IP&#039;s assigned to CMP Media LLC and it&#039;s also a banned IP on the Twiki blacklist:
http://twiki.pula.org/cgi-bin/twiki/view/TWiki/BlackListPlugin

Pretty sure it&#039;s not Google ;)]]></description>
		<content:encoded><![CDATA[<blockquote><p>And even if there was an open proxy running the useragent string would contain “Googlebot”, and the IP address would be listed in the open proxy databases.</p></blockquote>
<p>Robin, not all proxies are just open ports like you&#8217;re attempting to find. Some are CGI or PHP proxy sites, and they don&#8217;t all pass the user agent thru, or you can configure the user agent. Some CGI proxy sites such as Proxify by default will tell the world you&#8217;re using Safari on a Mac, so on and so forth.</p>
<p>Probably someone just running a stealth crawler, not Google (I&#8217;m still wondering how you concluded it was Google) but armed just with the IP address you reference, ARIN.NET claims it&#8217;s in a block of IP&#8217;s assigned to CMP Media LLC and it&#8217;s also a banned IP on the Twiki blacklist:<br />
<a href="http://twiki.pula.org/cgi-bin/twiki/view/TWiki/BlackListPlugin" rel="nofollow">http://twiki.pula.org/cgi-bin/twiki/view/TWiki/BlackListPlugin</a></p>
<p>Pretty sure it&#8217;s not Google <img src='http://www.mattcutts.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Bonner</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87145</link>
		<dc:creator>Joe Bonner</dc:creator>
		<pubDate>Sat, 30 Sep 2006 15:43:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87145</guid>
		<description><![CDATA[Hi Matt,

I instituted the double lookup and am now banning all bots that don’t pass the test. I run an ecommerce site with thousands of products and have been the victim of scrapers who copy and repost my products on their pages. The question I have is that within 2 days of implementing bot traps, all but a handful of our pages have gone supplemental or disappeared and I’m wondering if the bot traps may be the cause, or purely coincidence. The majority of SERPs remaining are old links to pages that 301 to new “friendly” URLs. Googlebot is still visiting each day and scans thousands of pages each visit, but recently started hitting old URLs that were 301’d months ago. Any advice would be greatly appreciated.]]></description>
		<content:encoded><![CDATA[<p>Hi Matt,</p>
<p>I instituted the double lookup and am now banning all bots that don’t pass the test. I run an ecommerce site with thousands of products and have been the victim of scrapers who copy and repost my products on their pages. The question I have is that within 2 days of implementing bot traps, all but a handful of our pages have gone supplemental or disappeared and I’m wondering if the bot traps may be the cause, or purely coincidence. The majority of SERPs remaining are old links to pages that 301 to new “friendly” URLs. Googlebot is still visiting each day and scans thousands of pages each visit, but recently started hitting old URLs that were 301’d months ago. Any advice would be greatly appreciated.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87093</link>
		<dc:creator>Robin</dc:creator>
		<pubDate>Fri, 29 Sep 2006 10:25:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87093</guid>
		<description><![CDATA[Yesterday we spotted a 2nd crawler from this ip range on one of our sites. I&#039;m not cloacking at my sites, so I don&#039;t think that is the bot&#039;s intention.
I&#039;ve read your posts above, but still can&#039;t see how pages appear in the Google cache with that IP address (&quot;Your IP: xxx.xxx.xxx.xxx&quot;). I did a full portscan on that host, but no proxies are running. And even if there was an open proxy running the useragent string would contain &quot;Googlebot&quot;, and the IP address would be listed in the open proxy databases. (or else it would be a really odd proxy to replace the useragent string with IE6/WinXP).]]></description>
		<content:encoded><![CDATA[<p>Yesterday we spotted a 2nd crawler from this ip range on one of our sites. I&#8217;m not cloacking at my sites, so I don&#8217;t think that is the bot&#8217;s intention.<br />
I&#8217;ve read your posts above, but still can&#8217;t see how pages appear in the Google cache with that IP address (&#8220;Your IP: xxx.xxx.xxx.xxx&#8221;). I did a full portscan on that host, but no proxies are running. And even if there was an open proxy running the useragent string would contain &#8220;Googlebot&#8221;, and the IP address would be listed in the open proxy databases. (or else it would be a really odd proxy to replace the useragent string with IE6/WinXP).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: IncrediBILL</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87078</link>
		<dc:creator>IncrediBILL</dc:creator>
		<pubDate>Fri, 29 Sep 2006 01:57:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87078</guid>
		<description><![CDATA[@Robin - Why would you accuse Google of being a stealth crawler @ McColo? Most people block McColo because of all the bad bot activity, and it&#039;s NOT Google. Sheesh.

Lot&#039;s of ways it could happen, even a proxy server feeding Google a cloaked link to your site (read my posts above) but they aren&#039;t crawling from those IPs or we would all know about it.

Besides, if Google wanted to checked to see if a web site was cloaking data, they would only have to spot check a few pages, not a full crawl, it wouldn&#039;t be efficient to do a complete crawl to establish if a site was cloaking.]]></description>
		<content:encoded><![CDATA[<p>@Robin &#8211; Why would you accuse Google of being a stealth crawler @ McColo? Most people block McColo because of all the bad bot activity, and it&#8217;s NOT Google. Sheesh.</p>
<p>Lot&#8217;s of ways it could happen, even a proxy server feeding Google a cloaked link to your site (read my posts above) but they aren&#8217;t crawling from those IPs or we would all know about it.</p>
<p>Besides, if Google wanted to checked to see if a web site was cloaking data, they would only have to spot check a few pages, not a full crawl, it wouldn&#8217;t be efficient to do a complete crawl to establish if a site was cloaking.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: walkman</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87071</link>
		<dc:creator>walkman</dc:creator>
		<pubDate>Thu, 28 Sep 2006 19:13:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87071</guid>
		<description><![CDATA[thanks guys. I still am not sure on iframes; I guess I&#039;ll still wait. My bread and butter site is out of G&#039;s favor, and I am afraid that rotating /constantly content is the issue. For each page I had a &quot;related&quot; products section that picked X random ones from the same category, but after listening to Matt&#039;s interview, I think a page that changes each time is loaded does not send a good signal to Google. Oh well....wait and see.]]></description>
		<content:encoded><![CDATA[<p>thanks guys. I still am not sure on iframes; I guess I&#8217;ll still wait. My bread and butter site is out of G&#8217;s favor, and I am afraid that rotating /constantly content is the issue. For each page I had a &#8220;related&#8221; products section that picked X random ones from the same category, but after listening to Matt&#8217;s interview, I think a page that changes each time is loaded does not send a good signal to Google. Oh well&#8230;.wait and see.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: J-Man</title>
		<link>http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87048</link>
		<dc:creator>J-Man</dc:creator>
		<pubDate>Thu, 28 Sep 2006 12:15:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-verify-googlebot/#comment-87048</guid>
		<description><![CDATA[Walkman,
I have a site that used iframes, and no it did not do well at all, the cache of the home page had no content in it, I have since dumped iframe, I had no luck with making it SE friendly for any of the engines.

Best Regards,
J-Man]]></description>
		<content:encoded><![CDATA[<p>Walkman,<br />
I have a site that used iframes, and no it did not do well at all, the cache of the home page had no content in it, I have since dumped iframe, I had no luck with making it SE friendly for any of the engines.</p>
<p>Best Regards,<br />
J-Man</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using n/a

 Served from: www.mattcutts.com @ 2013-05-21 16:06:45 by W3 Total Cache -->