Okay, go read this post on the Google webmaster blog. In fact, if you read my site, you really should add the Official Google webmaster blog feed to your list of subscriptions, because that blog is almost 100% SEO/webmaster-related posts, and it is official. Done reading? Okay, I’ll give you my personal take on why I like this idea.
I’ve done a lot of site reviews in my time. Many of them go like this:
Webmaster: Matt, can I get a site review for ExampleCo??
Me: Hey, I’ve heard of Example. I really like your red widgets.
Webmaster: Thanks! We’re rolling out a new line of blue widgets this fall. The site is example.com.
Me: Okay, let’s take a quick look.
(small chat about blue widgets until the site loads.)
Me: Huh.
Webmaster: What? What does “Huh” mean?
Me: Well, when I visit www.example.com I get map of the world and then at the bottom of the page there’s a dropdown to select which country version of Example to go to next.
Webmaster: Right. Example is a big business with lots of different country-level domains, so we have to ask the user where they want to go. Why, is that a problem?
Me: It sort of is. Dropdown boxes and forms are kind of like a dead end for search engine spiders. Historically we haven’t crawled through them.
Webmaster: But it’s just a dropdown box with ten countries listed. You can’t just crawl that?
Me: Not really. Think of search engine spiders much like small children. They go around the web clicking on links. Unless there’s a link to a page, it can be hard for a search engine to find out about that page.
Webmaster: But it’s just ten countries. Couldn’t the search engine just pick one of those values and keep going?
Me: In theory you could do that, but in practice the major search engines don’t usually do that.
Webmaster: That sucks. I like how clean the page looks. Is there a way around that?
Me: Sure. You could put the list of countries at the bottom of the page and make them hyperlinks so that Googlebot can crawl through to the other urls. A good rule of thumb is to take a look at your site in a text browser like Links or an ancient browser with JavaScript/CSS/Flash turned off. If you can reach all your pages just by clicking regular links, your site should be pretty crawlable.
I’ve had this conversation a lot over the years. Savvy webmasters and SEOs know how to make a site crawlable, e.g. making sure that someone can reach every page on a site via normal HTML links. But the web is filled with sites that have a dropdown box or some other form that search engines typically didn’t know how to handle.
Now Google is finding ways to crawl through forms and drop-down boxes. We only do this for a small number of high-quality sites right now, and we’re very cautious and careful to do the crawling politely and abide by robots.txt. If you’d prefer that Google not crawl urls like this, you can use robots.txt to block the urls that would be discovered by crawling through a form. But I hope that the dialog above is a pretty good example of why this new discovery method can be helpful to webmasters.
Danny asks a good question: if Google doesn’t like search results in our search results, why would Google fill in forms like this? Again, the dialog above gives the best clue: it’s less about crawling search results and more about discovering new links. A form can provide a way to discover different parts of a site to crawl. The team that worked on this did a really good job of finding new urls while still being polite to webservers.
By the way, I wanted to send out props to a couple people outside Google who noticed this. Michael VanDeMar emailed me a little while ago to ask about this, and Gabriel “Gab” Goldenberg recently noticed this behavior as well. I appreciate them discussing this because it encouraged Google to talk about this a little more. 