Search Results for: 302

April Fool’s Day 2007

No, my blog was not hacked — it was April Fool’s Day!

I wanted something that would top last year when I switched jobs with Jeremy Zawodny at Yahoo. A fake hacking had several advantages:
– It was quick and easy to do.
– I didn’t have to coordinate with anybody in advance.
– It was believable. The Museum of Hoaxes has a list of the top 100 April Fool’s Day hoaxes of all time, and reading through that list, I realized that a prank needs to be believable.

I honestly thought no one would be fooled for long. The French phrase “nous sommes le proprietaire de toi” was deliberately wrong. And I made the SEO shout-outs outlandish enough to tip people off that it was a joke. I even put a hint in my comments:
<!– PR9 since 99. MaDD LoVe to e&o. Peace out. –>
“e&o” would of course be my two lovely cats, Emmy and Oz.

Also, it was, you know, April Fool’s. But a surprising number of people believed it. By the way, I really appreciate all the people who wrote and said “This might be a joke, but just in case, I wanted to let you know…” It’s good to know that so many people will drop me a line if they see issues with the site. πŸ™‚

How did I pull off the prank? Well, the credit for the hacked page (and the inspiration) goes to my wife. She rocks. We looked at some Google image search results for [hacked site] and then she just whipped up a great page. Then I did a post or two to lay the groundwork to convince people that the site was acting weird. Having my site get “hacked” was believable, but in practice the latest version of WordPress has been very secure in my experience.

I did take a little care with my .htaccess file:

RedirectMatch 302 ^/$ /blog/index.html
RedirectMatch 302 ^/blog/[^i].+$ /blog/index.html

That makes any page except one starting with an “i” do a 302 redirect to the “hacked” index.html page. The 302 tells search engines that things are temporary and to try back later, so I shouldn’t see any long-term drops in my indexed pages. The directive to redirect everything except stuff starting with “i” is a little sloppy, but it let me redirect all my urls to the index.html page. That way, I could swap /blog/ and /hackedblog/ directories really easily.

As always, if you run a website, it’s a good idea to make backups. That’s a good April Fool’s reminder for us all. πŸ™‚

Misc bits

I’m mostly caught up on my feeds. It was relatively quiet the last couple weeks, but I’ve seen 2-3 things I wanted to talk about in the last couple days or so.

First, WebProNews ran this post that claims that Google is selling PageRank 7 links.

My quick take: when you dig into it, it turns out that it’s a Google directory of enterprise companies that can do things like write plug-ins for proprietary data types for the Google Search Appliance, merge geospatial GIS data, and integrate telephony products with Google Apps. This is a program for enterprise companies, and I don’t think anyone has even suggested before now that this directory could be construed as selling links, but just to avoid even the appearance of anything improper, I’ve already submitted a change to ensure that there’s no PageRank benefit from these links. I left a comment on the original post; I wish that WebProNews would show comments on their blog partner program. Right now, someone would read that article, but wouldn’t know that there are any comments (including mine) on the original post.

Next, Elinor Mills wrote about an interesting allegation. I’ll include the whole content of the allegation: [Just to clarify, this is an allegation that Elinor is passing on from a newsletter, not a claim that Elinor is making herself directly.]

In the past, when you launched a website, or Google wasn’t picking up your stuff, you could call the friendly people over there and they’d look at your website to see if you were legit, look at their search results, and adjust their code appropriately. It used to be this all occurred in the same day. Then it was 24 hours. So, imagine our dismay when www.wesrch.com wasn’t even being picked up two weeks after we launched. We had called Google two days into the launch and they apologized, saying their search engines were backlogged with so many sites to monitor. We called after a week and then called again and again, with no better answer. We even tried posting ads with Google and they couldn’t find us. “Clearly, we had tried their patience, as in the end they threatened to BLACKLIST our websites so no one would ever find us again. Now is that power or what? Funny thing is, Yahoo found us faster and more reliably. So, Google is no longer my home page. More importantly, they are showing all the signs of a monopolist trying to forcibly extract revenues for nothing. Whenever this happens, it’s a sign that revenue growth has peaked and they are trying to force it in order to maintain high stock valuations. So watch out if you are an investor

When Elinor asked for a comment about this, several of us read the original complaint, and I have to admit that we were perplexed. Google doesn’t provide phone support for webmasters; as Vanessa Fox recently noted, over 1 million webmasters have signed up for our webmaster console alone, so offering phone support for every site owner in the world wouldn’t really scale that well. They talk about buying ads later in the paragraph; we wondered “maybe they were talking to phone support for AdWords?” But I can’t imagine anyone at Google on the ads side or anywhere else saying our search engines were backlogged with too many sites to monitor. The Google index is designed to scale to billions of webpages, and it does that job pretty well. It’s even harder for me to imagine anyone at Google saying on the phone that they would “BLACKLIST our websites so no one would ever find us again,” because again, we don’t provide webmaster support over the phone, and I believe AdWords phone support would know better than to claim our index was backlogged or to threaten to remove anyone’s site from our index. Maybe a call to AdWords support reached such a fever pitch that a representative declined to run an ad?

At any rate, I’m sorry for any negative interactions that wesrch.com had with Google. The current description of the issue doesn’t give enough concrete details to check out, but if anyone from that domain wanted to clarify or to provide emails or dates/times/names of phone calls (did they call AdWords? Randomly try to hop into the Google phone tree? Talk to a receptionist?), I’d be happy to try to look into it more.

In the absence of more details about their interaction, I tried to dig more into the crawling of wesrch.com. I didn’t see any negative issues (no spam penalties or anything like that) for the domain. I saw attempts to crawl the site as far back as October 2006, but that earliest attempt got an authentication crawl error (that would have been a 401 or a 407 HTTP status code). I believe that this allegation went out Feb. 2nd, and I believe we had at least one page from that site at that point. I did notice that visiting the root page of the domain gives a 302 (temporary) redirect to the HTTPS version of the domain. That’s kinda unusual, but we should still be able to crawl that.

The other thing to look at is current coverage. Here’s what I saw:

Search Engine Number of pages
Google over 450+ pages
Yahoo 1 page
Live about 176 pages
Ask 0 pages

(Note that if you just do [site:wesrch.com] on MSN/Live, you might get results estimates as high as 500+ results, but the way to verify results estimates is to go to the final page of results, and MSN/Live stops after 176 results.)

It looks like Google crawls wesrch.com at least as deeply as any other major search engine. I’m still confounded who the folks at wesrch.com could have talked to at Google, but I’ll leave open the offer to dig into it more if they want to provide more details. And I’ll wish them well for their new domain in the future.

Moving on, I got a kick out of this one. In the “can’t win for losing” department, there’s this post. Someone going by the handle “earlpearl” pointed out a thread to Barry Schwartz, in which someone reported that Google Maps had incorrect info for Duke Medical Center. The good news is earlpearl mentions a few hours later that the info has been corrected. Everybody’s happy, right? Nope, someone with the handle INFO (which I think is the same person as earlpearl) posts to the thread and says:

I see that Google Maps corrected this information in one day. I’m still
trying to learn how the bad information I submitted can be corrected.

Looks to me like Google only responds to large institutions!

So Google got criticized for having bad info for a medical center. It sounds like someone at Google took action quickly, but then we got criticized for only responding to large institutions. Personally, I think if you’re going to correct bad information, medical centers are one of the first places I would tackle. πŸ™‚ There is an ironic twist on this. I think earlpearl/INFO is partially frustrated because they’ve reported outdated info regarding some bartending schools, and that data hasn’t been changed yet. But the twist is that earlpearl’s thread about bartending schools has gotten two personal responses from a Google employee (“Maps Guide Jen”). Jen’s most recent reply struck me as pretty responsive:

Hey XXXX,

Thank you so much for all this detailed information. We’ll look into your
reports further to try and track down where our data might be outdated. I
definitely appreciate your taking the time on this!

Cheers,
Jen

My hope is that we’ll check into earlpearl’s report as well and then everyone will be happy. πŸ™‚

Those were 2-3 semi-negative posts that I wanted to give a quick take on. Just so that people don’t get down thinking that every post is negative about Google, here’s a really interesting post by Bill Slawski of SEO by the SEA. Bill pulls together mentions of twelve different Googlers who have made nice contributions to Open Source or open standards. I know of several other Googlers who help open-source projects and who aren’t on that list; it’s good to be reminded that Google contributes to the open source movement in a lot of ways.

Update: Clarified the post to note that Elinor didn’t write the allegation I quote up above; she found it from a newsletter and is passing it on to her readers. Thanks for pointing out that my language wasn’t clear, Philipp. πŸ™‚

Infrastructure status, January 2007

Okay, it’s been a while since my last infrastructure status report, so I’ll briefly cover the things that I know are going on. The executive summary is that things are relatively quiet.

The quarterly-ish PageRank export is underway. As always, don’t expect traffic or rankings to dramatically change, because these PageRank values are already incorporated into our scoring. The same quarterly-ish data push that updates PageRank in the toolbar also updates the data for related:, link: and info: (remember that operator?). You can read more about PageRank from this previous post if you swing that way. Also remember that the link: operator only shows a subsample of the links to a page that we know of. I’ve mentioned before that some data centers (I believe 64.233.183.xx and 72.14.203.xx) continue to show PageRank values from a slightly older infrastructure. Not a big deal, but I wanted to mention it for the hard-core data center watchers so that they don’t get confused.

There were some situations where site: would show supplemental results ahead of regular results. I believe we’ve changed that so that regular results will usually show ahead of supplemental results for site: queries.

As a reminder, supplemental results aren’t something to be afraid of; I’ve got pages from my site in the supplemental results, for example. A complete software rewrite of the infrastructure for supplemental results launched in Summer o’ 2005, and the supplemental results continue to get fresher. Having urls in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past. The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit).

I think going forward, you’ll continue to see the supplemental results get even fresher, and website owners may see more traffic from their supplemental results pages. To check out the current freshness of the supplemental results, I grabbed 20 supplemental pages from my site and checked out their crawl date using the “cache:” command and looking in the cached page header. The oldest supplemental results page that I saw was from September 7th, 2006 (and I only saw 2-3 pages from September; most were from December or November). The most recent of the 20 pages was from January 7, 2007, which shows that supplemental results can be quite fresh at this point.

Let’s see, what else? I think we’re going to change the “filetype:” operator so that it doesn’t require an additional query word, so that you could do filetype:doc or filetype:pdf or whatever. That isn’t live at this point, but I believe it will be down the road.

I’ve mentioned this before, but one of our data pushes that used to happen every 3-4 weeks is now happening more like every 1-2 days. Regular searchers won’t really notice this, but if you see more variance in your rankings, I believe it’s probably due to that data push happening more frequently.

An SEO or two has been holding my feet to the fire about root pages of .com’s that are hosted outside the US. Barry has talked about the issue a little bit here. In some (pretty rare) circumstances, you’ll see the root page when you search site:domain.com on google.co.uk for the “search the web” option but you won’t see the root page when you switch to “search pages from the UK”. I thought we’d nailed this issue in December, but we found another way that this can happen. I believe a fix has been submitted and is percolating its way through the system. Of the ~7 examples that I know of, I believe all but one is working now (and the remaining site is doing a chain of like five 302 redirects to weird/long/deep urls). However, if you 1) have a .com that is hosted outside the US, 2) searching on (say) google.co.uk for [site:yourdomain.com] returns your root page and all your pages for “Search the web”, 3) if you switch to (say) “pages from the UK”, the root page does not appear but the rest of your pages do, then this paragraph applies to you. I’d wait 4-5 days to let this second change percolate completely into our index, and if you still see the behavior after 4-5 days, please leave a comment with the name of your site.

Right now I’m not expecting any major infrastructure-related upheavals to our rankings. Should that change in the future, I’ll be here to talk about it then. πŸ™‚

Update: A few people were seeing PageRank 0 for their site. There was a small auxiliary push that needed to happen to complement the PageRank push, and that push happened a few hours ago (i.e. Jan 11, 2007). If you were getting stressed, you might want to re-check now. If you never even noticed, well, good for you. πŸ™‚

My thoughts on recent Google tips

I wanted to talk about Blake Ross’ post entitled “Trust is hard to gain, easy to lose”. I agree with much of what he says. There’s a continuum to showing tips. Toward the “hawk” side of the spectrum is the notion that a company can show whatever reasonable content they want on their own web site. Toward the other side of the spectrum is the desire to show the best services, whether they are competitors or not. Historically, Google has been much further toward the “dove” side than most other companies.

I personally fall somewhere in the middle. If a Google searcher types in [picture] or [hard drive images], offering a tip to use Google Image Search makes sense to me because it tells a user that they should try image searches instead. Image search tips have been running for quite a while and users generally haven’t objected.

But everyone will have different opinions about what is fine or problematic. Here’s why these recent Google tips went over the line for me personally: they’re often poorly targeted or irrelevant. I’ll mention a few searches I’ve done in the last few days where I got annoyed:

In each of the previous cases, I was not in the market for a blog or calendar or photo sharing service. Furthermore, the triggers appear to match on substrings: if I type in “blogoscoped”, I’m looking for Philipp, not to create a blog. The poor targeting alone is enough reason to turn off these tips (if I had my way).

Here’s some Q&A:

Q: But if Google thinks its (say) Calendar is the best, isn’t it okay to give that as a tip?
A: In my personal opinion, not if the tip triggers for too many irrelevant queries.

Q: Is it fair that people hold Google to a higher bar than anyone else in the search industry?
A: Whether it’s fair or not, it’s a fact that people expect more from Google than other companies. People compare other search engines to Google, but people compare Google to perfection. We have such passionate users that they’ll complain loudly if they think Google is ever straying from the right path. If you’re a Googler, it may feel frustrating. Instead, I’d choose to be grateful, because that passionate feedback keeps our heads on straight. When our users yell at Google, they care and want us to do the right thing (for their idea of what the right thing is). What other company gets that kind of feedback? Besides, if Yahoo or Microsoft jumped off a building, would you jump off too? πŸ™‚ So yes, if the decision were up to me, I’d remove these tips or scale them way back by making sure that they are very relevant and targeted.

Update: Blake noticed that recent searches don’t return tips. So a search that has the substring “calendar” in it doesn’t return a tip for Google Calendar:

No more tip for a phpcalendar query

Over on Blake’s blog, I added this comment: “There’s a binary push going on and the tips are removed in that binary push. It will take a few days before the binary makes it out to every data center. Blake, thanks for your feedback on this issue.”

ASP.NET 2 + url rewriting considered harmful in some cases

Sometimes people ask me “Does Google make any distinction in scoring between Apache, IIS, or other web servers?” And I’m happy to say “Nope. You can use either web server and Google will rank the pages independently of the web server platform.”

But someone in AdSense mentioned an interesting case that they’d heard of. Apparently, doing url rewrites in ASP .NET 2 can sometimes generate an HTTP status code of 302 instead of a 200. This issue isn’t specific to Googlebot (it would impact any search engine bot). The best write-up I’ve seen is at http://communityserver.org/forums/536640/ShowThread.aspx. Looks like one of the first places noticing this was here (note: that post is French; an English translation is here).

It sounds as though if this issue (ASP.NET 2 + url rewriting generates a 302 instead of a 200) affects you, your site may drop out of most search engines. So how would you debug this? Fiddler is one handy tool for Windows. For Firefox, you might use the Live HTTP Headers extension to see the actual request your browser sent, and the raw reply from a web server.

I would also recommend Google’s Sitemaps tool as well. That team recently upgraded Sitemaps to show more details on errors that Googlebot saw when we tried to fetch pages from a site. The upgraded Sitemaps console also lets you download errors as a CSV file for debugging. I found out that I had a few urls with errors:

Sitemaps errors

Clicking on the red oval above lets me download a file listing the problems that Googlebot had crawling my site, for example.

(Thanks for mentioning this, Antoine!)

css.php