Archive for July, 2006

ASP.NET 2 + url rewriting considered harmful in some cases

Sometimes people ask me “Does Google make any distinction in scoring between Apache, IIS, or other web servers?” And I’m happy to say “Nope. You can use either web server and Google will rank the pages independently of the web server platform.”

But someone in AdSense mentioned an interesting case that they’d heard of. Apparently, doing url rewrites in ASP .NET 2 can sometimes generate an HTTP status code of 302 instead of a 200. This issue isn’t specific to Googlebot (it would impact any search engine bot). The best write-up I’ve seen is at http://communityserver.org/forums/536640/ShowThread.aspx. Looks like one of the first places noticing this was here (note: that post is French; an English translation is here).

It sounds as though if this issue (ASP.NET 2 + url rewriting generates a 302 instead of a 200) affects you, your site may drop out of most search engines. So how would you debug this? Fiddler is one handy tool for Windows. For Firefox, you might use the Live HTTP Headers extension to see the actual request your browser sent, and the raw reply from a web server.

I would also recommend Google’s Sitemaps tool as well. That team recently upgraded Sitemaps to show more details on errors that Googlebot saw when we tried to fetch pages from a site. The upgraded Sitemaps console also lets you download errors as a CSV file for debugging. I found out that I had a few urls with errors:

Sitemaps errors

Clicking on the red oval above lets me download a file listing the problems that Googlebot had crawling my site, for example.

(Thanks for mentioning this, Antoine!)

Comments (66)

Reminder: check your sites

I’m catching up on my RSS and my email. There’s the typical run-of-the-mill emails, like the emails to Googlebot:

Good day Googlebot!
We are an international escrow company.
Now we are looking for a new partners.

You can earn some money - do not lose this opportunity!

It is easy and completely free for you.

Please contact us for more details: XXXXXXXXXXX@XXX.com

I also saw a different email that a group sent to Google. The email is a little prickly. They said things like

our-domain.com is still listed on other search engines and we find it
very difficult to understand why it is no longer listed on Google. We
implore you to reconsider your censorship of our domain name, so that the
spirit of freedom and the hopes of many are allowed to flourish.

Just as a reminder, if your website isn’t showing on Google, you should check your own domain. After reading the email from this group, I checked out their domain on Yahoo! for a couple of minutes. I tried doing a site: query on Yahoo! and saw doorway pages on the first results page. With almost no refinement, I found pages like this:

Bad Site Search

Hmm. Looks like multiple doorway pages on the site (not even in just one subdirectory). So I go to check out one of these pages and it looks like this:

Madonna Star Sex!

Hee hee. It says “Copyright 2004 madonna star sex porno Company. All Rights Reserved.” Heh. All the pages have this doorway template.

Google’s webmaster quality guidelines are pretty clear that autogenerated doorway pages are not welcome in our index. So this is a case where checking out your domain would be just a productive as contacting Google. The best advice I could give would be to remove all the doorway pages and then submit a reinclusion request.

Comments (156)

Next entries »