A couple months ago Google gave a way to do Googlebot verification. Basically, it involves two steps: a reverse DNS lookup followed by a forward DNS lookup. Now Microsoft has implemented the same DNS policies, so you can use the same method to verify MSNBot.
Good for them. Who will be next to implement this open method to authenticate search engine bots: Yahoo!, Ask, or Exalead? Place your bets. π
Update: w00t! I’m digging out from comments and just saw this comment by Peter Linsley of Ask: “The issue of whether Ask could support bot authentication was raised in Las Vegas at PubCon and weβve updated the webmaster FAQ. No surprises here, itβs the old DNS roundtrip lookup trick; all our crawlers are under the ask.com domain:
http://about.ask.com/en/docs/about/webmasters.shtml”
So it looks like Ask wins the prize! Here’s the specific entry in Ask’s documentation:
Q. How do I authenticate the Ask Crawler?
A: A User-Agent is no guarantee of authenticity as it is trivial for a malicious user to mimic the properties of the Ask Crawler. In order to properly authenticate the Ask Crawler, a round trip DNS lookup is required. This involves first taking the IP address of the Ask Crawler and performing a reverse DNS lookup ensuring that the IP address belongs to the ask.com domain. Then perform a forward DNS lookup with the host name ensuring that the resulting IP address matches the original.
That’s great news, but doesn’t that also make it a lot easier to do IP cloaking?
BTW – My bet is on Yahoo!
It would only make it easier to do cloaking if they didn’t have “other” methods of detecting that, like through datacenters that don’t send anyone named “Googlebot”.
As for bets… I’d also agree that Yahoo could do this pretty easily. They seem to have a few employees, and someone there has got to be able to understand this concept. π
10 to 1 on Yahoo, I’m surprised that MSN beat ‘Y’ to it…
I am sure Exalead will not help you to create the Google monster. But maybe Yahoo?
if it’s Yahoo! , Francois Bourdoncle will rant again, that “the initiative aims to close the door to new entrants to the market place.”
π
Stupid question: but why wasn’t this done from the start?
Brians, Cloaking is fine, as long as you have their blessing. For example WMW or NyTimes.
Lots of webmasters seem to be getting all vexed and upset about this minus 31 penalty Matt. Would you identify with their frustrations? Would you have anything to add to the limited responses from Adam Lasnik et al? Maybe confirm or deny that its an eval team thing even? Would it be unreasonable to request a post outside of the standard, ‘sorry its an algo thing can’t comment’ type response?
Here’s hoping for a non deafening silence π
Dude did you change something with your feed? There’s no text showing up in my browser
http://static.flickr.com/116/309856197_ab509d7836.jpg
Yahoo has such a mess they should just hire a maid to mop it up.
Keep the pressure up Matt, as Yahoo and ASK need REAL search engines to show them how it’s done π
BTW, IP cloaking was trivial before as everyone had lists of where the SE’s crawled from so I really don’t see how this change makes it any easier.
I’m gonna do what I usually do in a situation like this and bet the longshot:
Gigablast. They’ll have to do something sooner or later since most people can’t identify the bot correctly when it makes its mystical journey through.
Well they do have to increase the bot protection, but those bots aren’t very good, most of the software right not can by pass them easily, but it takes a brain to put that software together, which is why its not so cheap.
Jojo, points for the first post I saw to get it, with joergvader coming up in second place.
M.W.A., if Gigablast does it next, I will do a post just to commemorate that. π
That’s funny because actually, Exalead has got this reverse DNS stuff from the beginning of its existence (that is to say 2000)
Ah we should have patented it and ask Google and MSN for royalties now π
In fact we did it because we were so sure that the other bots already did it for a long time.
If it is that fashion, we gonna add it to our FAQ too π
> host 193.47.80.51
51.80.47.193.in-addr.arpa domain name pointer crawl15.exabot.com.
> host crawl15.exabot.com.
crawl15.exabot.com has address 193.47.80.51
Your timing with this post is uncanny. This morning I checked my stats with recent visitors and noticed two totally different IP addresses that were identified as Googlebot. With a little reasearch I found that IP address 207.176.38.75, one of the two identified as Googlebot, is actually owned by Beyond The Network America, Inc. in Herndon, VA. My question would be, is this something for me to be concerned with?
Matt, I did a little checking and it looks like the ExaleadGuy might’ve beaten you to the punch on this one.
Now if they could just make their user agent string a little more useful or post a page with information about their crawler I can find…
So what you’re saying is that my Gigablast prediction is an off-the-board pick, correct? π
Hi Matt,
The issue of whether Ask could support bot authentication was raised in Las Vegas at PubCon and we’ve updated the webmaster FAQ. No surprises here, it’s the old DNS roundtrip lookup trick; all our crawlers are under the ask.com domain:
http://about.ask.com/en/docs/about/webmasters.shtml
Cheers.
Peter
The solution discussed in this post is more complicated than necessary. I’ve explained a better solution here:
http://botsosphere.blogspot.com/2007/05/automatic-verification-of-machine.html
Hi Matt
I’ve got a question. Both visitsnw.com.au and visitnsw.com got a Pagerank of 6 and same equal amount of links. Why is that so?
All of Tourism NSW’s ranking is on their .com.au domain. Can you please explain? How does Google knows that it is the .com.au that it has to give rankings and not the .com domain even though it has same equalt amount of links and pagerank?
Thanks
I wrote a script (maybe you know about it ?) to ban bad bots.
There are offen questions about, how googlebot handels the robots.txt.
So i also implemented the RDNS Check for Google.