MEGO

(quick post. I’m doing the write fast thing to get me back in the blogging habit.)

I had some wonderful teachers in high school. One of them, my English teacher, used the acronym MEGO to stand for My Eyes Glaze Over. MEGO applies to something so technical that most people don’t care. But it’s my blog, so if you don’t want some MEGO, go elsewhere. :)

The hardest part of getting technical feedback from folks outside Google is deciding which stuff to dig into; there’s only 24 hours in a day. Plus you also don’t want to burn cred by bugging someone only to find out that it’s a non-issue. An SEO that I would trust (well, not trust exactly. :) listen to, maybe :) ) complained about something, so I took it to the crawl team, and they dug into it enough to produce the raw docs as they were fetched by Googlebot, and it looked like Google was doing the right thing. I mentioned that to the SEO, who dug into it more on their side. It turns out that the SEO was looking for Googlebot’s old user-agent string instead of the newer user-agent string, so it was on the SEO’s side. Not an issue on Google’s side, and the SEO knows who they are. ;)

Another example that happened this week was when I read Graywolf’s post about Google showing content from domain A on domain B. Graywolf gave three examples of sites getting confounded, with screenshots (always a helpful idea), but every example was from the same IP address. Now if you’re an experienced search person, two sites on the same IP getting confounded makes you think of one explanation: the webhost configured virtual hosting wrong. Right? Can I get a “w00t” from the back there? Cool, thanks.

So I was all set to dismiss this. But then I noticed that the confounding happened over a long period of time (usually virtual hosting errors don’t last long, because people notice and complain to the webhost), and none of Yahoo!/MSN/Ask were showing confounded domains for the example Graywolf. That wasn’t good, so I reported it to the crawl/index team.

Have I mentioned how much I respect our crawl/index team? They do a lot of heavy lifting everyday and they do it so well that few people notice how much is getting done. So that team checked it out and they were able to reproduce a bug on the webserver using only telnet. That means it wasn’t Google’s fault, but last I saw in the discussion, people were talking about ways to make Googlebot even smarter to work around that bug when they can tell the webserver might be affected.

Ah, heavy MEGO. Why didn’t Y/M/A see confounded domains? Well, Googlebot is pretty smart. It can utilize something called persistent connections to a webserver via a Keep-Alive header. If mattcutts.com and shadyseo.com are both on the same IP address, Googlebot can open up a connection and request a page from mattcutts.com, then on the same connection ask for a page from shadyseo.com. That’s more polite on the webserver because you don’t have to break down and set up a whole new connection for every page. As the Apache docs mention, “These long-lived HTTP sessions allow multiple requests to be send over the same TCP connection, and in some cases have been shown to result in an almost 50% speedup in latency times … .” As far as I can tell, bots from other engines probably open and close a connection for every page; that’s why they didn’t see this particular behavior in the webserver.

Interestingly enough, this bug was first mentioned in 1997: ” When using keepalives and name-based virtual hosts, requests on a keptalive connection can get a response from the wrong virtual host.” I guess that there’s some pretty old webservers out there; like I said, the crawl team was last seen talking about workarounds for ancient servers that have this bug.

So that’s two examples in the last week where I asked someone to dig deeper and it was an issue outside of Google. That said, it’s essential to keep reading feedback from forums and the blogosphere. For example, we’ve been refreshing some of our supplemental results, and the feedback that we’ve gotten has helped us find a couple different ways that we could make the site: operator more accurate with the newer supplemental results. GoogleGuy put out a call for feedback about that particular issue on WebmasterWorld, and someone on my team has been reviewing the feedback to find any other issues to pass on.

59 Responses to MEGO (Leave a comment)

  1. Chris_D

    Hi Matt,

    The ‘confounded domains’ issue that you resolved on Graywolf’s blog also solved an identical issue I had recently emailed to Brian. The issue I emailed to Brian was actually the same diagnosis.

    I’ve emailed the hosting company with your advice.

    >>Can I get a “w00t” from the back there?

    Here’s a “w00t”s from down under, Sydney Australia
    :) Thanks

  2. Nice post. Props to Graywolf

  3. Matt,
    Thanks for the feedback. It is encouraging to know that you guys are digging into feedback from the community. I have read a lot of accusations and guesses recently on message boards about Google, and it is reassuring to know that you guys take the problems us little guys are having seriously. Keep fighting the good fight, and I hope you get to the bottom of any actual bugs in the recent rollout.

  4. Dazzlindonna

    But those two examples each came from one observation from one person. The page dropping issue is being noticed by tons of people. I would think this is when you should really start bugging someone – when something is noticeably widespread. Just a thought.

  5. I had someone complain about this too.

    His website, was shown in these two searches as well.

    I directed him to ask his host about this (probably some vhost screw-up, or some weird 301 redirects), and i am 99% sure that G is out of the question.

  6. Have I mentioned how much I respect our crawl/index team? They do a lot of heavy lifting everyday and they do it so well that few people notice how much is getting done.

    Everybody’s pissed on the indexed pages thing Matt, too.

    Can the same team give us some answers on this issue, yet ?

  7. John

    Dazzlindonna,

    My take on the non-mention of all of the chatter from people saying that 99% of their site has been de-indexed, is that it was intended that way, and the new algo is working as they had hoped. I doubt we will get confirmation on that as its really not in google’s interest to comment on particular sites or pages and their position or even existance in the index.

    GoogleGuys’s comments on WMW also mirror this response. They completly ignore the comments on page losses and only address the crawling issues.

    I believe we are hearing the company line, and like I said earlier we can now take from it that if your site lost pages, those pages need to be fixed some how, more unique content, better content, less of something, more of another, yada yada…

  8. Google dropped millions of pages last year and took a couple of months to recrawl everything and rebuild the index. While we may be seeing an extensive lack of URLs this year, at least the SERPs are not flooded with bare URLs. People were screaming about lack of titles and descriptive snippets more than they seem to be screaming about the dropped pages.

    The effect was virtually the same thing. A bare URL in the SERPs means Google doesn’t have any data on the URL, except that it exists. It sounds to me like Big Daddy just doesn’t list all the known-but-uncrawled-URLs.

    Matt, is that the case?

  9. Me going to spend more time obsessing over all that Vanessa Fox and the sitemaps team are sayin’. All this talk of spam is just not my thing, I do not spam, I bitch and whine but I do not spam. :)

    Do a search in the sitemaps group for “no crawl stats” and see all the people complaining that it is broken, if you folks there spent more time doing searches in “groups” you could find and fix all kind of things before you get webmasters like me bitchin’ at you!

    If people are searching or posting using a common phrase often it either means. 1.) There could be a flaw 2.) The people are flawed and you need to educate them.

    And yes, observe that 99.9 percent of the time it is in fact not Google’s fault but I would start searching your groups today, I warned you about canonical issues in 2003 but I just couldn’t describe what it was back then because I do not speak MEGO-OGLE. :(

  10. Harith

    Good morning Matt

    And welcome back to Matt Cutt’s blog :D

    Very nice informative post naswering few important questions. Thanks.

    I recall GoogelGuy mentioning last year on WMW that summer months are good months for the folks at the plex to look at and resolve issues. Can you tell us what the friends at the plex have in mind to do this summer?

    Have a great day.

  11. Dazzlindonna, believe me, lots of different people at Google have been reading that feedback. The Sitemaps team was reading it to make sure that Sitemaps has nothing to do with the issue. The crawl/index team checked into several reports and each time came up with other reasons why the site wouldn’t be crawled as much (e.g. the ‘next page’ url on one site wasn’t short; it was a total hairball with like 200 chars of params), and some supplemental results folks have been through the raw emails, which is how one of the site: changes was noticed. So far, about half of the feedback to the email isn’t about pages dropped. Of the other half, one factor is that several sites have spam penalties. Of the remaining feedback, the two site: changes were the only two that we noticed. We’re going to keep digging in, but people need to bear in mind that Bigdaddy does have different crawl priorities, so a site that had more pages indexed by the earlier Googlebot won’t necessarily have as many pages indexed in the future. But don’t get me wrong; we’re still going through the feedback to see if there’s anything else to be identified and improved.

    Michael Martinez, this lately round of feedback is (I’m pretty sure) nothing to do with Bigdaddy. Bigdaddy was live 100% weeks (months?) ago. In BD, we still return url references where we saw the url but haven’t crawled it.

    Aaron Pratt, very few people speak MEGO, but we keep our eyes peeled for trends in both MEGO-speak and regular English. :)

  12. Harith, it’s true that summer is a great time to work on new infrastructure.

  13. Hi Matt,

    Please explain more on this

    >>so a site that had more pages indexed by the earlier Googlebot won’t necessarily have as many pages indexed in the future
    >>

    Ecommerce companies that employ both organic and campaign methods might not like to hear that.

    Just a thought…

    Thanks

  14. I’ve missed your blogs – nice to have you back.

    One question re supplemental pages – why doesn’t GG simply delete ‘obsolete’ pages when requested by the webmaster using the ‘removal tool’ – instead of only removing them for 6 months or so?

    Just had to remove a whole batch of resurrected supplemental pages from old domains that caused a massive duplicate pages penalty back in Autumn 2004 – only thought to look for them – using site:wwwmydomain – when I noticed I was slowly losing postion again.

    cheers

  15. Matt,

    Great MEGO post – keep those coming – some of us love the technical nitty-gritty. How ’bout a Matt Cutts/Google MEGO perspective on the (in)famous V7ndotcom Elursrebmem SEO contest that wraps up this Monday? Holy outa-control back-links Batman going on over there! ;-)

    alek

    P.S. Repeating an earlier minor nit for the Vanessa/Sitemaps folks (who probably missed it since it was buried 50 comments down), but they still have copyright 2005 at the bottom of their blog. Also recommend they consider turning on commenting to compliment the Google Groups stuff.

  16. Ian

    Um, considering the bug was only in Apache 1.2b10 and was fixed in 1.2b11, and that there have been a number of security fixes and many new versions in the meantime, why is anyone running 1.2b10 any more?

    As for the fix, I assume you can just simply disable persistent connections on Apache 1.2b10, or am I missing something?

  17. Ian

    Ok so on Graywolf’s blog you say they have it with Apache 1.3.7 – could you please post to let us know which versions have this bug and which ones don’t? Otherwise, even if you fix it, if another search engine implements persistent connections they may also end up with this bug.

  18. Dave IV

    >> So far, about half of the feedback to the email isn’t about pages dropped. Of the other half, one factor is that several sites have spam penalties. Of the remaining feedback, the two site: changes were the only two that we noticed. We’re going to keep digging in, but people need to bear in mind that Bigdaddy does have different crawl priorities, so a site that had more pages indexed by the earlier Googlebot won’t necessarily have as many pages indexed in the future.

    I don’t know how many times it can be mentioned without somebody hearing it: This is not a crawling issue. Our 7000 missing pages, for example, are crawled regularly, they are just not making it into the index (unless linked to from our Home page).

    Of the various theories I have read about the cause of the problem, my current favorite is that it is a Backlink/PR issue. It looks like our backlinks, as reported by the flaky “site:” search, reflect the state of our Backlinks around August 2005 (i.e. when the BD index was seeded). Maybe the internal/real Backlinks just need to be refreshed. Missing backlinks = lower PR = less deep indexing = 95% of a sites pages dropped.

    Also, if some of these sites have “spam penalties”:

    1. Can a spam penalty cause a site to lose 95% of it’s pages? Previously we’ve been told that any kind of ban will result in 100% page loss. So has this changed?

    2. Shouldn’t these sites with a spam penalty be contacted? Isn’t it very possible, likely even, that these Spam penalties are part of the missing pages bug? Why would a “spammer” be dumb enough to send an email to Google complaining about missing spammy pages?

  19. Matt. what was the reason for:

    massive amounts of pages being dropped from indexes
    pages that do show in the index are years old and non-existent
    there has seen to be massive fluctuations in the last couple of weeks

    I have 2 pages left from over 400

    traffic before:
    google 35%
    msn 25%
    yahoo 25%

    traffic now
    msn 30%
    yahoo 30%
    google 20% and dropping

    It’s frustrating because I advertise Google all over my pages: search google on every page, adwords, adsense, analytics, sitemap, gmail etc.

    Why should I continue pushing Google when most of traffic is starting to come from MSN and Yahoo?

    sorry to whine… I will stop now

  20. Hi Matt,

    many thanks first of all for yours trouble.

    You wrote “several sites have spam penalties”.

    It is possible to get an exact feedback about my domain (posted in the form)?

    We suspect that our domain have a spam penalty. We implemented
    here numerous measures to correspond to the Google Guidelines. We
    did already a reinclusion request with lots of informations. We think that
    our pages does not violate the Google guidelines and ist one of the consumer and shopping portals with the highest quality (unique content!) in Germany!

    None of our direct competitors has got approaching problems like us with almost identical page techniques.

    Should our pages are not confirm to the Google Guidlines for any reason anyway, I ask for an exact notification. We would immediately remove this unintentional fault.

    Many thanks!

  21. A buddy and me have a old site that once held 100′s of duplicate articles (people submit articles to many places, doh!) that is only showing 6 indexed pages, it is the spammiest thing I have ever done and it is very obvious why the site has only 6 pages currently indexed. What’s in your wallet?

  22. Matt. Do you still want feedback about sites that are having their pages dropped? It’s still happening, and for no apparent reason – and it’s a very big problem.

    It sounds like you didn’t get many example sites, and it also sounds like there were good reasons to account for most of them, but it’s not typical of what’s happening. I don’t believe it’s about crawling. Sites are having the number of pages in the index drop on a daily basis, and we all know that Google takes a very long time to drop pages that no longer exist. People report that the crawling is fine, but their pages are being dropped daily.

    I have such a site myself if you are still interested (not the one from last year) – down from an unrealistic 50-60 thousand pages in the index to 155 pages in the index yesterday (today it may have turned the corner, but time will tell). I didn’t use the published email address because, like most people, I didn’t see it, and I only noticed it happening a week ago.

    Recently, Eric Schmidt said that Google’s machines are full, and that you have a machine crisis. From that, we understand that Google is out of storage space, and what’s happening to a great many sites is assumed to be Google dumping some stuff to make space for other stuff. Is there any truth in that?

  23. Wally

    Insightful as ever Aaron. I guess your one example proves something really significant…I just can’t think what that might be. How did your spammy site fair before Big Daddy?

    If it’s the spammiest thing you’ve ever done. How are your less spammy, but spammy none-the-less, sites doing?

  24. g1smd

    One more site:domain.com search thing that has changed very recently.

    A site that has shown “1 to 100 of about 150″ and then “101 to 120 of about 150″ pages on a site:domain.com search for a very long time, with the snippet showing different text for every page (taken from the meta description), now shows just “1 to 2 of about 150″ in that search (and the second result is a Word doc).

    When you click the “repeat this search with the omitted results included” link (can we abbreviate that to “RTSWTORI” here in the blog? :-) ) you can see that the Google snippet is identical for every page, as it is now ignoring the meta description and just taking the first 20 words of the on-page content (from the nav-bar in this case) instead. These are all normal results. Nothing is Supplemental here.

    The site:domain search is (errr, *was*) a great way to sanity check a site for duplicated title tags and/or meta description information, but no longer is any help in going that… and that is a shame.

  25. frank

    hi matt, everyone,

    since traffic on my sites dropped massively, i have to make the decision to discontinue the site or not. maybe you (or someone else) could give me some infos/thoughts of what to expect for my site in the near future.

    here is what i get from the different dcs for a ‘site:’ search:

    64.233.183.147 -> #found: 217, one page (index) is up-to-date w/ recent cache, the rest are supplemental result (new and really old pages)

    64.233.185.104 -> #found: 24, one page (page) has recent cache but shows title, description from dmoz, the the rest are supplemental result (just really old pages)

    72.14.207.99 -> #found: 1, the page has recent cache and shows the sites title and description

    can anyone make any sense from these infos? will my site recover?

    thx in advance.

    cu
    frank

  26. Nice explanation. In Germany we have an expression along the lines of, “do good things, and talk about them.”

  27. Pete

    Hi Matt,

    My site has the pages dropping problem and I was told I was penalised, indicating buying/selling text links.

    I dont do that, but asked for a reinclusion request mentioning a few things that might have looked that way.

    My question is IF the request fails and I truely have not been buying or selling links – what is the next step for me?

    Is there some way I can get someone to manually help me?

    Thanks

    Pete

  28. Matt, I am getting plenty of crawling….but loads of the crawled pages never make it into the index…please help!

  29. Ronald R

    Google knows there’s a problem, but they’re putting a spin on things.

    Spam free sites losing pages, and some unique content pages turning supplemental is happening to many sites. And if you haven’t lost pages yet, don’t get bigheaded and think it wont happen. One of my sites which was fine until three days ago, has lost 40 pages and then another 100 yesterday.

  30. David

    Matt,

    I don’t want to reopen the “confounded domain” issue, however I do have a question.

    When keepalive is used the location that resolved the request is returned by server

    This uri has two valid forms absolute and relative, the relative form requires that the client (in the case the bot) keep request state to determine the correct absolute uri.

    The problem may or may not be on the part of the server.

    I have seen “confounded domains” in the past and tried to get the party with the problem to contact you folks. They opted instead to obtain a dedicated ip addresses for the affected sites that they had control of.

    Others have mentioned that their traffic and serp positions improved aftwr going to dedicated ip addresses.

  31. David

    Forgot the question.

    Do you have any particular versions of servers that cause problems in this regard?

  32. I’m still part of the group that gets crawled, but 90% of the pages that are crawled never make the index. I setup sitemaps as suggested, and there is no penalty listed there.

    Can you ask the crawl/index team why this would happen? I guess it could be an issue with 301′s, but I see it on regular pages to.

    I use to use url’s like “/foo/1/” and now I use “/foo/bar.html”. It seems like the old url’s are supplemental and maybe they are getting a duplicate content penalty, even though the old url’s all Redirect to the new ones. I even see Google bot visit the old one, and immediatly hit the new one.

  33. g1smd

    64.233.183.147 is what I have called BigDaddy “B” over at WMW. These seem to be older results, have lots of old Supplemental Results going back to 2004 January, show lots of pages for each site, but the index is very slow to update with changes. I suspect this one will be phased out.

    64.233.185.104 and 72.14.207.99 seem to be the cleaned up version of the “experiment” that I have been talking about for weeks. These DCs seems to have thrown away most (all?) Supplemental pages before 2005 June. These DCs seem to be quick to update the changes that occur on existing sites, but most sites have less pages indexed in these DCs than in other DCs. I don’t see any major differences betwwen these two DCs at the moment.

    There are two other “versions” of the Google Index out there if you look around a bit more.

  34. I noticed this same issue this week with two sites, which are hosted on the same dedicated server. Google’s cache of the first site is the content of the second site. Very strange. I have contacted my hosting company, but so far no results. My first thought was DNS issue, but now I am not so sure.Thanks for bringing this up.:)

  35. Jarid

    Hi Matt,
    Great post, and glad to see you back to blogging. I just wanted to comment on the crawl/index issue since I’m pretty confident we’re the ones with the ‘hairball’ URLs… ;)

    First, let me say I’m glad to see you guys dig into the issue even though 99.9999% of the issues reported to you probably are not issues on your end. And I’m sure lots of the problems can be explained away with stupid things like ‘hairball’ URLs.

    However, I still think there is a real issue somewhere. My site went from 400,000+ pages indexed, to just a few hundred, back up to 30,000, back down to 900, and then this weekend, back up to 35,000. All the while, 10,000s of pages were being crawled daily. (btw, all stats are from across multiple datacenters)

    What else could explain the extreme variability in results?

    And this really is a scary statement:
    >>a site that had more pages indexed by the earlier Googlebot won’t necessarily have as many pages indexed in the future

    Why would you index fewer pages? I understand the crawl priorities, and it makes sense from a business perspective, but in the long run, shouldn’t the new index include all of the (legitimate) pages in the old index?
    –Jarid

    P.S. I can confirm the behavior that g1smd is seeing. When clicking the RTSWTORI link, all of the results show the same snippet (which is the text from my navigation). Granted, the search term is only in the page title, but I would expect to see the text from the meta description tag then.

  36. Note that when you really do have two identical sites on the same IP address under different domains, Google will often canonicalize those to only domain. Site and cache searches on the second domain will return references to the first domain in the SERPs.

  37. Recently, Eric Schmidt said that Google’s machines are full, and that you have a machine crisis. From that, we understand that Google is out of storage space, and what’s happening to a great many sites is assumed to be Google dumping some stuff to make space for other stuff. Is there any truth in that?
    I think that Schmidt was just defending the amount of spending they did on new computers. I also think that if you put together everything that we have heard from Matt that you get a pretty good idea of why pages are disappearing from Google. My take is at http://www.ahfx.net/weblog/80

  38. Walt

    How about some cold hard facts (for a change):

    1. It is impossible for Google to know for certain that they have not introduced some serious bugs that are causing the problems that lots of sites are seeing (e.g. 95% of their site suddenly being thrown out of the Google index – for “suddenly” read “post Big Daddy”).

    2. Google’s intense secrecy means that only a very small handful of Google employees have any kind of reasonable understanding of exactly what changes were introduced with Big Daddy. It is more than likely that no one individual has the entire picture.

    3. Some of us out here in Web-land are in a position to know for certain that one of the following two statements are true:

    a: Google have introduced some serious bugs that, unbeknownst to them, are causing the gradual disintetgration of their index. High-quality, non-spam, sites are being de-indexed by the thousands due to one or more of these bugs.

    or

    b: Google have lost the plot, and are now deliberately de-indexing high-quality, non-spam websites by the thousands.

    At the moment, Matt, you seem to be suggesting that Google are gunning for option (c). Namely, that all of the websites having problems are having problems of their own making. Presumably it is mere coincidence that these problems only appeared with the introduction of Big Daddy, and that they do not reveal themselves on any other search engine.

  39. walkman

    Matt,
    you mention spam penalties. What does that mean to Google? Is the site banned, as in no results from site:www.–.com, or just demoted severely in the rankings? I know that one can ask for an reinclusion, but is there such a thing for demotions?

    Can you please clarify.

    thanks

  40. walkman

    To add: for some reason Googlebot is visiting me a lot more lately, and pages are being added every day, so whatever changes you guys made worked, at least for my site.

  41. Dave IV

    Hello Matt

    Why did you remove my earlier comment that politley posited the theory that the current missing pages problems were due to an out-of-date or buggy Backlink index?

    Missing/Faulty Backlinks => Artificially Low PRs => Shallower Indexing => Loads of Missing/Removed Pages

    I’ll be very interested to see if you now remove this one as well. I don’t mind if you remove it, but could you please at least pass on the suggestion to someone? From what I can see, the current backlink index dates back to mid 2005. This alone could explain why some sites have been devasted while others have not.

  42. Dave IV

    Oops. Sorry…just spotted the original comment above…sorry.

    I guess watching my life’s work go down the toilet for reasons beyond my control is finally getting to me.

  43. Andy

    Hi Matt

    A few days ago I sent also an issue about Googlebot’s use of persistent connection to Google. If you use mod_diffprivs for Apache, a security module that can run each VirtualHost under a different user, Google often gets 503s. I contacted the current developer and he made some suggestions which I forwarded to Google. All I got after a few days was that standard email reply… :(

    Andy

  44. I think people need to cut Matt some slack. This is his personal blog and he has stated that unless otherwise noted in a post, anything he says here is a personal communication and not necessarily a formal response from Google.

    Furthermore, given the size of Google’s organization and operation, it’s not always possible for subtle technical flaws to show themselves immediately. While I agree there appears to be something wrong, it could be that what is wrong is more our frame of reference — based on Google’s past behavior — than anything specifically going on under the hood. It may just mean we have to wait a little longer than we have gotten used to.

    And keep in mind that Matt isn’t always at liberty to disclose everything he knows. Most technical companies do have pretty strict guidelines about disclosing details of their latest technology.

    I’d love to know more, but I’m only seeing sporadic problems with my own sites. Some directories on my principal domain have been partially indexed (after being fully indexed prior to Big Daddy) and other directories are completely indexed. I’m getting strong referral traffic from Google, almost as good as I’ve ever gotten from them.

    I cannot speak for anyone else, but I find it hard to complain about a few dozen missing URLs when I know I’m still getting almost 20,000 referrals a month. But it just looks weird. This is very different from pre-Big Daddy. And my gut instinct tells me something needs fixing, but it may not be easy to find.

  45. Matt thanks for the explanation and the info.

    Now not to sound like an ungrateful putz, but I’m of the opinion that a more effiecient crawl that gets tripped up by improperly configured hosts and causes “problems” and “issues” like this might not be the right solution. I bet there are lots of other hosts out there who have similar mistakes and even more web publishers who are not able to figure out the problem or who have access to helpful people like yourself to set them on the right course.

    Just my 2 drachmas.

  46. Hallo Armi, das Problem haben wir auch vielleicht kannst Du dich ja mal melden …. Grüsse aus Germany

    URL ist gepostet

  47. thanks a lot Matt. your veryvery long post got all the information I need for the day.

  48. weary, the url removal tool was always intended to give a few months so that a webmaster could clear things up on their own site. If we get around to redoing that tool in the next few months, I’ll mention that though.

    Ian, certainly upgrading would have fixed this, but turning off persistent connections would have worked too.

    I intend to circle back around to this thread soon, but it’s after 1 a.m. and I gotta get some sleep.

  49. Ian

    Surely persistent connections, if they are done right, are a good thing, no? It would be better not to turn them off if upgrading would work. The Apache changelog is pretty hard to read to find out what versions this affects – if the guys there have an idea, would you mind poking them to let us know?

  50. Absolutely right, Ian. I’d recommend upgrading Apache 10 times before turning off persistent connections. I was under the impression that the Apache in question was 1.3.7, but I’m not sure.

  51. Wow… Pretty amazing stuff. Sounds like GoogleBot has got crawling ethics figured out. Would be nice if the other engines would ‘step it up’

  52. Yes Googlebot is the smartest of them all. I don’t think any of the other search engines will ever win the market ;-)
    Thanks for posting, Matt :)

  53. The Fonz

    There is _still_ no formal, offical, and supported method of reporting bugs to Google.

    I have found four replicable bugs in Google software and there is no reporting mechanism for me to share them and have them fixed.

    This would be expected if Google was two college kids in a garage, but this behavior from a major multinational public company which is supposed to know something about IT it is a complete embarassment.

  54. Jerome

    Hi Matt and happy new year :)

    I got the same problem than reported in Graywolf’s post for web sites I manage – won’t say the names to respect comment policy :)

    The differences are:
    1 – they don’t have the same ip address (but they share the same ip net range and are in the same physical servers even with same root).
    2 – they run on IIS

    I don’t ask you to look at my particular problem but if you could answer some of my questions – that may be interesting for other people – would be really appreciated:
    1 – you said in your post that your crawl/index team was able to reproduce the bug using only telnet: could you post the command used so we can repro?
    2 – is this kind of problem could have a negative impact on our sites SERPs or even worse could lead to a spam penalty?

    Thanks for your blog

    Jerome

  55. The Fonz, I wa affected with the same unability to report those bugs! They are not a bunch of kids anymore, but looks like habit is just habit!

  56. Thanks for the post.

    I’ll need to keep that in mind when I switch to new host (which shouldn’t be too much of a problem).

  57. Thanks for a great post. Very intersting to know which command they used to reproduce the bug usin telnet?

  58. Telnet was one of the first methods devised to allow system administrators to remotely monitor their networks and my favoriteI read this blog article and about bug and I am wonder are there any your recomendation about the webservers, and which kind of webservers should we avoid ?

  59. I’m pleased to see that the term MEGO is being recycled for the computer age. In the distant, non-technical past, MEGO was used by print editors
    as a shorthand way to tell reporters “This writing is sleep-inducing”.

Leave a Comment

Your email address will not be published. Required fields are marked *

*

If you have a question about your site specifically or a general question about search, your best bet is to post in our Webmaster Help Forum linked from http://google.com/webmasters

If you comment, please use your personal name, not your business name. Business names can sound salesy or spammy, and I would like to try people leaving their actual name instead.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php