This is just a reminder that if you see a problem with your site, one of the first places you may want to look is our webmaster console. In some cases, Google can alert site owners in the webmaster console if we see an issue for things like hidden text. In a case that I just saw yesterday, the robots.txt analysis tool in the webmaster console was a huge help in solving a problem. Here’s an example of debugging a robots.txt issue.
Someone was asking about a particular result in our search results. The result didn’t show a description, and the “Cached” link was missing too. Often when I see that happen, it’s because the page wasn’t crawled. When I see that, the first thing I check out is the robots.txt file. Loading that in the browser showed me a file that looked like this:
# robots.txt for http://www.example.com
At first glance, the robots.txt file looked okay, but I did notice one strange thing. Normally robots.txt files have pairs of “User-Agent:” and “Disallow:” lines, e.g.
In this case, there was a “User-agent: *” by itself (which matches every search engine agent that abides by robots.txt), and the next directive was a “Disallow: /” (which blocks an entire site). I wasn’t positive how Google would treat that file, so I hopped over to the webmaster console and clicked on the “robots.txt analysis” link. I copied/pasted the robots.txt file into the text box as if I were going to use that robots.txt file on my own site. When I clicked “Check” here’s what Google told me:
Sure enough, that “User-Agent: *” followed by the “Disallow: /” (even with a different user-agent in between) was enough for Googlebot not to crawl the site.
In a way, it makes sense. If you removed some whitespace in the robots.txt file, it could also look like
and it’s pretty understandable that our crawler would interpret that conservatively.
The takeaway is that if you see a page show up as url-only with no snippet or cached page links, I’d check for problems with your robots.txt file first. The Google webmaster console also includes crawl errors; that can be another way to self-diagnose crawl issues as well.
P.S. I promised Vanessa that I’d mention that the robots.txt tool doesn’t support the autodiscovery aspect of sitemaps yet, but it will soon. I’ll talk about autodiscovery and sitemaps at some point, but personally I think it’s a great development for site owners, because it makes it easier to tell many search engines about your site’s urls.