More webmaster console goodness

It seems as if every 5-6 weeks, the Google webmaster console team rolls out more requested features into Google’s webmaster tools. You can read Vanessa’s post about the latest good stuff. Danny covers it pretty well over here, but I thought I’d show snapshots from my site.

Here’s what my site looks like:

Matt's crawl stats

As you can see, my site is pretty small (I’ve only written a few hundred posts). On average, Google pulls down 438 of my pages each day. And my domain, while not super-speedy, normally returns pages in under 3/4ths of a second.

The other thing to notice is the crawl rate. Google limits how hard it hits web servers with something called “hostload,” which is a measure of how many bots can be simultaneously fetching pages from a web server. Notice how “Faster” is grayed out for my site? That means that hostload and crawl-rate isn’t anywhere near a limiting factor for my site. Heck, even a single Googlebot could fetch pages at a leisurely pace and still crawl most of my site every day. ๐Ÿ™‚

But suppose I ran a large domain such as wordpress.com or geocities.com, or a site with thousands of pages. Then hostload could potentially be a factor. Even if N bots are allowed to fetch pages simultaneously, those bots might not be able to fully crawl a site in the amount of time that we crawl before beginning indexing. If hostload is a factor for your site, the “Faster” option will be available to you. In that case, if you’re willing for Google to crawl your site harder, we should be able to fetch and index more pages from your site.

Of course, you can always opt for slower crawling as well. If you’d like less load on your webserver but don’t want to block Google completely (e.g. with robots.txt), requesting a slower crawl is a good idea.

Let’s see, what else is in this release? Oh, you can opt-in to having your images labeled in the Google Image Labeler game. If you want people to provide free labels for your images, that’s a great reason to try out the webmaster console right there.

The final feature is just a count of urls that we found in a Sitemaps file. As Vanessa said in the official post:

Recently at SES San Jose, a webmaster asked me if we could show the number of URLs we find in a Sitemap. He said that he generates his Sitemaps automatically and he’d like confirmation that the number he thinks he generated is the same number we received. We thought this was a great idea.

This is the perfect example of how things should work to me. The webmaster console folks have their own ideas on what webmasters will find useful. But talking to webmasters is the best way to hear what people really want (this particular idea came from Tim Jackson at Plumber Surplus, for example).

So what do you want to see from future versions of the webmaster console? Philipp put out a call a month ago and got over 60 comments. I know the webmaster console team will read the feedback here as well. So what should we do next? And if you haven’t tried the Google webmaster console yet, please give it a test drive. It may help you find problems with your site, and it will offer more and more information over time.

61 Responses to More webmaster console goodness (Leave a comment)

  1. Matt,

    Great post, we here at the Surplus are excited about the new features, especially since one of our own was able to have one of his ideas added and help with feedback.

    Thanks!
    -Zac

  2. Ya I just logged in today and have tried the faster option lets see what the results I get and how much does it put up on my server.

    Akash

  3. I dont understand why webmaster must get google sitemap utility, and why google dont understand robots.txt delay commands ๐Ÿ™

  4. When looking at the crawl errors it would be nice to know the source of the bad link. Did it come from the sitemap file, a crawl of our site or an external site. If the external site could be named even better as we could get it changed.

  5. Thanks for this great Googlebot stats page from Google. By the way, it is not really a Googlebot stats page as it also includes hits from Googlebot-Image and from Mediapartners.

    Three more notes:
    – as far as I can see, the “Number of pages crawled per day” includes pages, images and all other files read by Google bots;
    – the minimum column says 1 even when it should be 0 … ๐Ÿ˜‰ ;
    – as the “faster” option is potentially risky, it is a kind of dangerous game to activate it for 90 days. Shouldn’t we have a “return to normal” button in the case our server cannot stand the faster crawls ?

    Jean-Luc

  6. This is more than just some neat graphics and numbers — it’s the only real two-way communiations-channel between a search engine and the webmasters. It’s more than neat – it’s amazing. Just a few years ago Google and the other engines were faceless, secret black-boxes. Now the employees have public blogs, listen to Joe Webmaster’s problems and heck – even let them interact with Google’s data. This opens up totally new possibilities.

    The way to index the web is not to search for all of the data yourself – a good part of the work is done if you can educate the public by giving it a way to view their sites through Google’s eyes. If uncrawlable sites are made crawlable, there’s no need to crawl the uncrawlable. ๐Ÿ™‚

  7. Matt, as long as you are on the subject of Webmaster Tools. Is there anything we can do to actualy locate URLS that Googlebot had trouble crawling? They do not appear to be in our site, and they do not come up in Googles index, so where do they reside?

    Is this a new spin on whack a mole?

  8. Jean-Luc, good points. I was under the impression that you could go back to the console and change the setting, but I’ll ask.

    Charles, I know that’s something that people have been asking for; I’ll pass on the request.

  9. Hi Jean-Luc,

    Once you change the setting (to faster or slower), you can go back at any time and choose the normal setting. This resets the crawl rate.

  10. I just think nearly every webmaster is going to opt for the faster crawl rate. You say Google will index more pages from your site with this option turned on. Although it probably doesn’t affect search results whatsoever i can’t imagine any webmaster refusing this option!

    Otherwise the graphs representation is nice and a great idea. It makes sense for Google to help webmasters make “better” websites so hats off to them for encouraging it with the webmaster console.

  11. Is there any reason why the referrer could not be published next to 404 errors?

    It seems a bit daft that you tell us we have a broken link but not tell us where to find it.

    And the larger your site the dafter it gets.

    Other than that extremely happy ๐Ÿ™‚

  12. I don’t even have Fast crawl option available (It is grayed out).

    So the bot covers only x number of pages everyday? What I had really like to see is the Sitemap matching as mentioned in the comment by Matt. Good tool however! As everything besides Adsense and Adwords under Google’s arm is Beta – I am sure this sitemap tool can be further enhanced.

  13. When sitemaps first came out, I was overall pretty disappointed -but the features that are on there now have been great.

    With the lastest site I’ve been working on, I’ve used the console almost non-stop with some really good results. I really like this crawl rate feature – its a good indication of what Google is thinking about my site.

  14. Terrible feature, caused me to check what the Google bots were doing to one of my sites.

    Now I need to go reread the robots.txt documentation, to stop the critters working their way through the PHP calendar stuff in the Wiki.

    7000 pages a day for a couple of months — then again maybe I can sell Shoemoney a text link in the footers ๐Ÿ˜‰

  15. First off, a shout out has to go out to Vanessa, Adam and the others the are excellent contributers on http://groups.google.com/group/Google_Webmaster_Help

    As far a future enhancements go.

    1) Report where 404’s came from, external, internal, old page in the index, supplemental crawler, etc.

    2) Move the site: and link: command inside the verified owner console. There is too much information there for competitive sites to take advantage of. Perhaps still have site: in the outside, except not have it work for blank queries that return all pages. I understand some people may search for site:mattcutts.com SEO and that’s a helpful tool.

    3) Expand the last crawl date to other pages, perhaps integrated with the site command, or indexed URLs statistics.

    4) A negative sitemap for excluding urls. It’s a real pain to constantly get questions from visitors wanting a discontinued product that is clearly not on the site anymore only to have them say, “but I found it on Google, so it’s got to be there”, which they trust more than the site itself. They don’t understand that a supplimental page may be 6 months old and have information on it that is no longer germain.

    5) Along with the preferred domain, I’d like a preferred root. For sites on IIS servers its darn near impossible to forward “default.asp” to “/” so I’ve got sites that have “/”, “/default.asp”, and “/Default.asp” all indexed though they are the same page.

    Thanks for this opportunity to be heard.

    JLH

  16. Got to agree with JLH — just today I had someone contact me with over 30’000 pages indexed, of which over 10’000 are out of date (products no longer sold, etc), have been returning 404 for many months and where new pages are slow to get indexed. It would be great to be able to tell Google that 404’s are 404’s and “throw them out asap” instead of letting Google believe that they just went on a long vacation and could be returning any month now….

    I know, I know … IIS is not really standards conform (ok, not just “not really”, just “not”), but would it also be possible to add a case-sensitive yes/no switch to the crawler / indexer? I bet URL-rewritten sites also sometimes run into that problem. It could be resolvable by letting the webmaster specify a relative strength factor for the whole sitemap or just for individual URLs: “I am absolutely certain that I want to contents of this page to be indexed with this specific URL.” That would also help solve the more general duplicate content (or multiple URL / one content) problems.

  17. Matt, I love this! Send my regards to the Webmaster Tools team!

  18. I’d love to see some way to help *sites* that fall dramatically in ranks but stay indexed figure out specifically wazzup, especially with the non-penalty filters.
    e.g. “These URLs have dupicate content within your site. Clean these up!”

  19. The currently most annoying thing I find right now is that all the 404s don’t give me the refering site where the link came from. Many of my 404s are from internal links that I could easily correct if you gave me that info.

  20. We’ve had trouble finding a good sitemap building application that doesn’t kill your server to create the sitemap (we have tens of thousands of pages).

    Why not have Google generate a sitemap of how they see our site and all the pages they have indexed? That way we could double-check it against our own sitemap (if we have one) or simply alter Google’s site map and resubmit it.

    You have the data after all so why not simply generate the file as a report. I’m sure with a little imagination you could take this a step further and provide other reports such as which files are in the regular index and which are supplemental, pagerank, last crawl date, etc. Again, you already have the data so why not provide the reports?

    KJ

  21. Hi Matt,

    I always get excited when Webmaster Tools come out with more stuff, so I started going through my profiles to see what it had to told me.

    I have a tiny little Blogger blog which I do not advertise. It has maybe one or two inbound links out there in the wild. Webmaster tools tells me that Googlebot last visited in May of 2006, and “No pages rom your site are included in the Google index.”

    I set it up for some personal note-taking and experimenting with stuff, so I don’t mind that. But wait, when I do a site:myblog.blogspot.com, I get back a handful of Supplimentals, including things I posted long after May. So I always assumed that the supplimental crawler was seperate from Webmaster Tools, which just told me about the main crawler.

    But wait, this new update, I’m being shown lots of crawling! I get those pretty graphs telling me I was accessed dozens of times in September.

    So is this new update rolling up supplimental crawler results with the main crawler? Do I not believe the “Last crawled on…” anymore?

    Oh, and could do me a solid and move those pages from supplimental over to the main index? Thanks.

    ๐Ÿ˜‰

    Darryl.

  22. Wow, it will be interesting to compare these stats to my own in house stats. I get the feeling mine will show a much larger GBot presence. Mainly from rogue bots imitating Google.

  23. Matt,

    I have been doing a lot of reading up on sitemaps and whether or not it is a good thing to implement on a site that is doing fairly well without sitemaps installed. It seems the ratio is about 50/50.

    I have many sites and have never installed sitemaps on any of them due to the scary factor of having the site possibilty de-indexed.

    Your thoughts on this would be appreciated. Unless you can point me to your prior comments on this.

  24. Being from Australia the Top search query clicks & Top search queries all link back to google.com even though I know I am not position one in google.com but I am in google.com.au if there was a way to see which version of google the position is ranked in. I am sure other people from other countries would like this as well.

  25. KJ – let me know if you need help with the sitemap file. I’ve generated my share since they started and have made a few generators. In general, if your server has trouble with the sitemap generators, that often means that your server has problems with crawlers in general — including Google’s. Making a sitemap file is almost a crawlability-test :-).

  26. After setting our crawl speed to faster, GoogleBot went from 500-1,000 hits per day to 10,000 hits per day and counting. Perfect timing on this release as our site was only launched 2 weeks ago.

  27. 1) Along the lines of JHL’s request for a negative sitemap… how about GoogleBot simply differentiating between headers 410 and 404 instead of just reporting 410 as an crawl error? Then, maybe ask: “Is this page really gone? Are you really …really sure?” ๐Ÿ™‚

    2)To help resolve canonical problems and duplicate content… please provide a way to set preference between https and http pages for the site root, folders, or to remove all secure pages period. Since Google recognizes that people do not write perfect code, why expect them to know how to configure servers and write 301 redirects?

    Many thanks to the Webmaster Tools team for the new tools! Great job.

  28. Emil Stenstrรถm – use something like Xenu Link Sleuth to find your internal linking issues…

    I agree that it may be nice to know where the referring offending sites come from… but from a webmaster’s position you really just need to decide if you want to 301 them to a real URI or ignore them. I find that a majority of them are spam sites…

  29. Matt,

    What are the qualifications to being able to have the faster crawl option? Our site we only have the option for normal and slower. We add pages to the site on a daily basis and also just did a new site design. some of the internal pages still have the old site design cached which has been changed months ago. We also added our own MLS technology to show local real estate listings about a month ago. None of these pages have been crawled yet.

    Thanks for your time and info.

    Jared

  30. Ditto what others said about including the referring URL for 404 pages – great opportunity for Google’s spider to do some link checking for the webmaster – and since you have this information already, it should be super-duper easy to provide.

    Similar request for URL’s that 301/302 redirect because then we can fix those – I sometimes forget trailing /’s.

    This is all especially helpful for dinosaurs like me that have hand-coded their sites (vi is my HTML editor!) and fat-finger’ed stuff.

  31. Matt, I tried the Image Labeller Game, but found that around 3/4 of all images were too small to see clearly. Perhaps we could be of more help if we could set a preference to NOT SEE any pix smaller than X x Y size?

  32. Also, you probably want to have a direct feedback link from the Image Labeller to your development staff….

  33. Thanks for the info great post Matt.

    More fabby webmaster tools from Google! We are still tweaking our sitemap and rss feeds to aid deeper site crawling and this will help us locate the problem areas. Ours is a dynamic site and we do struggle to get more pages in the index. Thanks to all involved this will provide a valuable insight.

    M

  34. Congrats to the team on putting in a great feature. Here’s one happy siteowner – our large site was taking ages to index until this. Now we appear to be sailing.

    For the future, my wishlist would be that feedback relating to major filtering elements applied to a site would be shown to assist webmasters with better quality control.

  35. Thanks Matt (and to the google webmasters team). This is a great tool.

    Andrew

  36. I agree with Kellys quote below:

    “Why not have Google generate a sitemap of how they see our site and all the pages they have indexed? That way we could double-check it against our own sitemap (if we have one) or simply alter Googleโ€™s site map and resubmit it.”

    This would be so much easier as I have been struggling to come up with an xml sitemap.

    PS – Loved your google videos too!

  37. Ah, why not have it tell you what you need to do to rank #1 for your terms? ๐Ÿ˜‰

  38. Jared, if you don’t get the “Faster” option because it’s grayed out, that means that crawl rate isn’t anywhere near a factor for your site. So you don’t need to do anything at all. ๐Ÿ™‚

    Darryl, we sometimes crawl extra pages so that we can select the best pages for the index. So I wouldn’t worry about that.

  39. Noticed this a few days back. Super stuff ๐Ÿ™‚ I suppose you could use the time spent downloading -graph as a rough server load log too. Another one of my favorites is Page analysis, but I’m still not getting In “external links to your site” for my sites.

  40. Hi Matt,

    Thank you and the Google webmasters team, this tool is absolutely fantastic!

    I would love to have an export (CSV) of all the pages indexed on my sites… perhaps with a column for the number of inbound/outbound links, page title and anything else you think is relevant (maybe a number of words found on a page, as a page with 0-10 words might be having problems).

    Would also like to see the referrer field on 404’s.

    And possibly the ability to submit URL’s that no longer exist on the website… although I now do 301 redirects, it would be nice to say that the file no longer exists.

    Although, JLH… I don’t think there needs to be an option to include “default.asp” as the same file as “/”… mainly because you might have two separate files with different content… perhaps you should not link to the “asp” file? – this gives you the ability to change the server side language without changing the links or using ASP files for PHP code.

    Anyway, for now I’m very happy with the service ๐Ÿ˜€

  41. This is a genuine step forward for the webmaster console – Google providing information on a specific issue regarding its interaction with a site.

    As I’ve said previously here, I’d like to see the console provide information on factors that influence a site’s ranking. One useful example would be a score on the quality of neighbourhood that the site is hosted on.

  42. Hi Matt,

    My suggestion would be to have a graph view of Google Sitemaps crawled pages, so that we could fully understand which pages were crawled.

    Also on the Query Stats page it would be interesting to have the last date updated (perhaps this could be effective to other stats pages also)
    for us to know when this data was updated.

    Regards

    Manuel

  43. Hey Matt,

    i want to have a field within google sitemaps where i can teach google the URL structure of a website i.e. for SessionIDs.

    Example:
    Alert! Google has found SessionIDs within your Website. Please remove them, otherwise it is possible, that google won’t crawl your whole page.
    If you want to help Google to index your website, you can show us additional information about your url structure:
    ____________________________________________________________________________________________
    http://www.domain.com/cgi-bin/store/***SessionID-Wildcard-Here***/folderX/subfolderY
    ____________________________________________________________________________________________

    Wouldn’t that help the bots to crawl and index a website in a better way?

    ๐Ÿ™‚

    jan

  44. Hello Matt,

    My suggestion would end up being more useful to the website visitor than webmaster. It would be sweet to have a feature allowing you to select the top 5 internal pages of “high importance”. Some may use pages such as about us, store locations, contact us, etc. Other webmasters would be able to select top 5 categories that they would like to highlight in your search result. The “office max” search result below illustrates Furniture, Technology, etc as important urls. Also, this option should only be accessible to one keyword and that keyword being the company name.

    BTW, I do not have the option to select the “Faster” crawl either. ๐Ÿ˜‰

    Damir

  45. Hello Matt,

    Didn’t know that I can’t post images within the blog, here you go again.

    http://img281.imageshack.us/img281/5470/untitled1vp7.jpg

    OR, the actual search result….

    http://www.google.com/search?sourceid=navclient-ff&ie=UTF-8&rls=GGGL,GGGL:2006-22,GGGL:en&q=office+max

    Thanx,

    Damir

  46. What would be really great, particularly for sites like mine that don’t rank well, is to have Google tell me what it actually doesn’t like about my site

    Yahoo, MSN, and even Google Images loves me. As it stands everything on Sitemap looks great with no problems, but I don’t rank for anything anywhere (other than my company name) in the main search on Google, so there’s obviously something you’re not liking.

    It’s been this way for 2 years, so it would be nice to know why.

    Overall though, it’s fantastic to see a company such as Google is taking the responsibility they have over the impact of their search engine on people’s sites (and lively-hoods) seriously. Keep it up.

  47. I would like to see a tool explaining why a site has been penalized so it won’t rank for it’s main keywords when several SEO folks have checked the site and can’t find anything wrong.

    for instance
    simply-cedar.com

  48. Hi Matt

    Posted on an old page a week ago – I’m quite new to this blog stuff, so you probably never saw the comment!

    Love the webmaster tools – they have been a great help. What we really need is feedback on what we do wrong! We have had our site for 4 years, and have built it up to support our agents throughout the UK. We get great results on our brochure printing searches, but almost all searches done for web design (and variations) come in at #31! I’ve read lots about a -30 penalty but don’t know what we’ve done wrong or how to put it right. We can support our agents with enquiries for brochures but not for web design. We have built our pages for brochure printing and web design in the same vein, so we are at a loss to know what to do. The general consensus on the web seems to be if you’ve got a -30 penalty dump the site – you’ll never recover! Surely it would be better for Google to give feedback so we can put things right, rather than dump a four year old site used to keep a lot of people in work!

    Any help or comments would be greatly appreciated.

    Regards

    John

  49. I’ve noticed relationtionship between the time spent downloading a page, and the number of pages Google indexed per day. On my chart, the days when the site performed faster, and Google spent less time downloading each page, Google indexed 4 times the amount of pages as they did on days when the site performed slower.

  50. When I look in the tools for my http://www.domain.com domain, I see that a page on my forum.domain.com is listed as having the highest PR. It should stay with my www domain, and not include any subdomains.

    Similarily, on analytics, I have adwords analysis mentioned on my forum account when I only have my main domain being promoted by adwords – adwords analysis and all cpc analysis both mention adwords.

    Domains and subdomains need to be kept separate. Especially when there are separate webmaster tool and analytics accounts for them

  51. A little late for a post but something hit me the other day that would be a great addition to the tools:

    Why now allow us to enter x number of keywords and our IP (so that we get the same datacenter as we normally do) and it will tell us our postions and track them over time. Yes, there’s an API for this but why not build it into the Webmaster Tools? Besides, the API results aren’t always as accurate as the real thing so that needs to be cleaned up a bit.

    Throw in the current PageRank, backward links and whatever and you’ve got a VERY cool tool Heck, you might even cut down on a little bot traffic ๐Ÿ˜›

    KJ

  52. Matt,

    This is very helpful. It has already helped me to make the decision to upgrade my server. I had my suspicions that my server connection was taking a beating, but this has now proved it to me. The only draw back now is cost, so I’m gonna need to do some pretty pin point costing analysis to make the move.

    But, without the info given to me from Google Webmaster Tools I wouldn’t have been certain that the move was required.

    Thanks again.
    Christian

  53. I like the new webmaster tools and one of the sites I run is for stock photography so it will be interesting how the google image game will influence traffic. Right now we get half our traffic (but no revenue) from google images. I do have one compliant…when playing there are allot of lazy people who can’t describe a sunset, there is a sun over the horizon setting…type sunset, why hit pass, I think the pass button should have a 3 hit max per game. Once if the image is broken, once if it can’t be described, and once if the two can’t find the same words.

  54. This is really neat. I like this webmaster tools (I’m still used to call it google sitemaps though) One major thing I realized is the keyword updates. I see keywords from couple of weeks ago, it is not reflecting real keywords that people are clicking on that week. I’m not sure if they also made changes on that one, I can’t rely on ‘webmaster tools’ to follow keywords right now.

  55. The feature I want is in the “Not found” list. What I want to know is WHERE the Bot was when it found the link it couldn’t follow.

    The problem is that I have a very large site and almost all of my pages are at least partially generated by code. So when I see something in your “Not Found” list, I have no instant or simple way to figure out how you got that bad link – and I really would like to fix it! Right now I have 15 “Not found” links showing – none of those are “In Sitemap”, so they must be in one of 13,264 pages that ARE in the sitemap.. but I can’t find ’em..

  56. I have seen the fatures and parameters of Crawal rate link. It’s really great, The number of page crawal per day, number of kilobytes downloaded per day and time spent for downloading a page is really great. This is really helpful how much time Googlebot spent on my website.

  57. Hi All,

    I have received a phone call from someone in India. The gentleman was very much interested to know the daily hits of Google / Yahoo & MSN Search Engine. I still don’t have any idea how to check per day hits of Top 3 search engines. So I though that I will ask this question to the SEO Gurus. After listening this question I am really very much interested to know the answer. I want to know becuase next time If someone ask me the same question then I will be in a position to answer their queries.

    Thanks in advance.

  58. Hello,

    We have never had a problem until recently when we type in our company name. A couple of months ago, we discovered when we type in our company name, we don’t find the www url of our domain until the 4th page. On the first three pages, there are links to our company’s site from other pages….but not our own url with title and description.

    What would cause this problem?

    Thanks!

  59. This is indeed a nice feature but the usefulness of the Googlebot activity tool is still quite questionable in my opinion. How will an occasional webmaster like me use this tool? I will perhaps keep the Googlebot speed to “Normal” since I don’t want to be crawled less, nor do I have the resources to go fast.

    I think Image labeler game on the otherhand is incredibly creative and fun. However with the amount of websites online, the images it can cover is very minimal. I am expecting an even better solution for this dilemma!

  60. Thank you for updating the Webmaster tools. I have been using them for a long time and the evolution of fuctionality is always appreciated.

  61. Hey, Here’s a bit of problem (i think I’ve noted) about Google crawler.

    Google was crawling and downloading around 250 mb per day.
    One fine day, I decided to increase the crawl rate to let more pages get indexed.
    I changed the crawl rate from “Recommended” to a custom crawl rate. I left it like that for around 2-3 mins and then I spotted the word “recommended” and set it back to normal.

    I’ve seen that my meddling around with this stuff has brought google bot down to my webpage and now, it does’nt even crawl my site after that. The Crawl Stats show that the graph is going steeply down!!!

    Why is that happening when I’ve set things back to normal again??

css.php