Archive for January, 2007

Info about malware warnings and how to appeal them

A recent post on the CIO blog got my attention:

Some website operators are complaining that Google is flagging their sites as containing malicious software when they believe their sites are harmless. ….

“We have no bad software or installs or anything that would indicate a need to ban people from viewing our site,” wrote Matt Blatchley, who works for the Greenbush Southeast Kansas Education Service Center, in a posting on Friday to Google Groups.

MattB, please double-check urls such as
http://sss.green bush.org/gbiss/Pricing.html
http://sss.green bush.org/gbiss/Time.html

(I split the urls to prevent accidental clicking. I wouldn’t go there unless you’re running Linux.) View the source and look at the bottom of the page. See the code that looks like

<script language=”javascript” type=”text/javascript”>var k=’(encoded gibberish)’,t=0,h=”;while(t<=k.length-1){
h=h+String.fromCharCode(k.charCodeAt(t++)-3);}document.write(h);</script>

I think that’s what is causing your problem. It looked like your site might be hosting a WMF exploit that could infect any visitor to your site.

I’ve checked out a quite a few “we don’t have any malware” reports at this point, and I’ve yet to see a false positive — the sites in question have each had some malware on them. But this change is also relatively new, and we’ll keep working on ways to help site owners diagnose if their site has been hacked and is distributing malware. Maybe we can show some of the urls that appear to have malware in our webmaster console, for example. In general, I’d check file-modification times for the pages on your site to see if someone has changed your pages recently.

In the mean time, here’s how to appeal if your site is flagged as hosting malware:
- Click on the “StopBadware.org” link on the interstitial that Google shows.
- On the resulting page is the phrase “If you are the administrator of the website that was reported to us and would like to speak with us, please see our contact page.” Click on that contact link to get to http://www.stopbadware.org/home/contact_general to read about how to email and lodge an appeal with StopBadware.

I think this process can still be improved, but at the same time, we’ve heard very positive reactions from users that don’t want to click on potential malware pages. Ultimately I think it helps to alert webmasters that they may be serving up malware to their visitors, because if a site has been hacked it’s good to know about that quickly.

Update: Looks like the webmaster console team has now added example urls for sites that we think are hosting malware. This is a great step to give webmasters more tools to self-diagnose any malware-related issues with their site. As always, thanks to the folks who added this feature.

Comments (172)

Neil Gaiman’s son coming to Google

File this in the “kick-ass” department: Neil Gaiman’s son is coming to work at Google. Apple, you may have the sexy iPhone, but we’ve got Neil Gaiman’s son. :)

And if you don’t know who Neil Gaiman is, then I feel sorry for you. But also happy, because you get to read Neil Gaiman for the first time:
- Start with the Sandman series of graphic novels.
- Then get dark and eerie with Neverwhere if you want to know the real secrets of London. Mr. Croup and Mr. Vandemar are two of the most delightful villains I’ve had the pleasure of meeting.
- Lighten up with the hilarious Good Omens, which Gaiman wrote with Terry Pratchett.
- Round it out with Stardust, which wins my award for “Best use of the word ‘fuck‘ in small print, exactly once, in a book.” You’ll have to read it to understand.

If you see him in person, you quickly notice that Gaiman gives off this “I’m mellow enough to be game for anything” vibe. At a book signing, I asked him to sign a book “Ach Crivens” (which anyone will tell you is a Terry Pratchett-ism). He tilted his head at me for a second, maybe trying to figure out if I was right in the head, then smiled and let ‘er rip:

Ach Crivens!

So Gaiman is a good egg in my book. :) By the way, if you’re looking for a good non-search-engine blog, Neil’s is delightful. His posts are insightfully funny and self-effacing, and the way he responds to readers could be a case study in creating passionate fans.

Comments (42)

Travel/vacation plans for first half of 2007

In case people are interested, this is how my travel and vacations appear to be shaping up for the first half of 2007:

- I’m taking next week off. I didn’t really get to take much time off between Christmas and New Year’s, and my seven-year anniversary is this month (!), so I’m planning to get out of town with my wife, read a few books, and not check email. At all. :)
- mid-February: I think I’ll make it to SES London and then stop by Google’s Dublin office. It’s been a while since I’ve been to a search conference outside the U.S., and I’d love the chance to visit Google folks in Dublin. For that matter, I’d love to visit other Google offices from Hyderabad to Zurich, but it’s important to start somewhere. :)
- March: nothing but work planned. There’s a couple search conferences in Australia, but I think other Googlers will do a fine job speaking down under.
- April: I may hit SES New York, depending on how busy things are, but it should be a good month to get work done.
- May: I’m hoping to take a few weeks off in May.
- June: mostly work. But I may duck up to the Search Marketing Expo (SMX) for a couple days because I’ve never gotten to visit the Kirkland team on their home turf. :)

Right now the second half of 2006 2007 is looking like a solid expanse of work, perhaps with a trip to PubCon in November.

Comments (43)

Infrastructure status, January 2007

Okay, it’s been a while since my last infrastructure status report, so I’ll briefly cover the things that I know are going on. The executive summary is that things are relatively quiet.

The quarterly-ish PageRank export is underway. As always, don’t expect traffic or rankings to dramatically change, because these PageRank values are already incorporated into our scoring. The same quarterly-ish data push that updates PageRank in the toolbar also updates the data for related:, link: and info: (remember that operator?). You can read more about PageRank from this previous post if you swing that way. Also remember that the link: operator only shows a subsample of the links to a page that we know of. I’ve mentioned before that some data centers (I believe 64.233.183.xx and 72.14.203.xx) continue to show PageRank values from a slightly older infrastructure. Not a big deal, but I wanted to mention it for the hard-core data center watchers so that they don’t get confused.

There were some situations where site: would show supplemental results ahead of regular results. I believe we’ve changed that so that regular results will usually show ahead of supplemental results for site: queries.

As a reminder, supplemental results aren’t something to be afraid of; I’ve got pages from my site in the supplemental results, for example. A complete software rewrite of the infrastructure for supplemental results launched in Summer o’ 2005, and the supplemental results continue to get fresher. Having urls in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past. The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit).

I think going forward, you’ll continue to see the supplemental results get even fresher, and website owners may see more traffic from their supplemental results pages. To check out the current freshness of the supplemental results, I grabbed 20 supplemental pages from my site and checked out their crawl date using the “cache:” command and looking in the cached page header. The oldest supplemental results page that I saw was from September 7th, 2006 (and I only saw 2-3 pages from September; most were from December or November). The most recent of the 20 pages was from January 7, 2007, which shows that supplemental results can be quite fresh at this point.

Let’s see, what else? I think we’re going to change the “filetype:” operator so that it doesn’t require an additional query word, so that you could do filetype:doc or filetype:pdf or whatever. That isn’t live at this point, but I believe it will be down the road.

I’ve mentioned this before, but one of our data pushes that used to happen every 3-4 weeks is now happening more like every 1-2 days. Regular searchers won’t really notice this, but if you see more variance in your rankings, I believe it’s probably due to that data push happening more frequently.

An SEO or two has been holding my feet to the fire about root pages of .com’s that are hosted outside the US. Barry has talked about the issue a little bit here. In some (pretty rare) circumstances, you’ll see the root page when you search site:domain.com on google.co.uk for the “search the web” option but you won’t see the root page when you switch to “search pages from the UK”. I thought we’d nailed this issue in December, but we found another way that this can happen. I believe a fix has been submitted and is percolating its way through the system. Of the ~7 examples that I know of, I believe all but one is working now (and the remaining site is doing a chain of like five 302 redirects to weird/long/deep urls). However, if you 1) have a .com that is hosted outside the US, 2) searching on (say) google.co.uk for [site:yourdomain.com] returns your root page and all your pages for “Search the web”, 3) if you switch to (say) “pages from the UK”, the root page does not appear but the rest of your pages do, then this paragraph applies to you. I’d wait 4-5 days to let this second change percolate completely into our index, and if you still see the behavior after 4-5 days, please leave a comment with the name of your site.

Right now I’m not expecting any major infrastructure-related upheavals to our rankings. Should that change in the future, I’ll be here to talk about it then. :)

Update: A few people were seeing PageRank 0 for their site. There was a small auxiliary push that needed to happen to complement the PageRank push, and that push happened a few hours ago (i.e. Jan 11, 2007). If you were getting stressed, you might want to re-check now. If you never even noticed, well, good for you. :)

Comments (260)

Where’s my authenticated email?

Why isn’t email authenticated? I’ll tell you up front that I know very little about this subject. I can’t tell Sender Policy Framework from DomainKeys (link, link) from SenderID.

But what the heck — it’s 2007! How can this not be solved? If you had told me back in 1997 that email wouldn’t be authenticated yet, I would have slapped you in the face. Slapped you. In the face. Go ahead, take a time machine back to 1997 and try me. While you’re at it, fast forward to 2017, collect the email authentication that will no doubt exist then, and bring it on back to 2007.

Sure, there are ancient mainframes running, I dunno, ADA that only know how to use SMTP the way it was used back in 1982. Point taken. They can stick to the old ways. If I ever need to have a deep and trusted email relationship with one of those machines, I’ll whitelist it. And I know that there are corner cases like forwarded email. But somebody tell me in an educational, non-libelous way: what’s keeping the world from enjoying authenticated email? What am I missing? Right now I’m in a position of near-perfect ignorance, so almost any nuggets of constructive knowledge will educate me.

Also, I’m still waiting for my air car and jet pack. :)

Comments (69)

Next entries » · « Previous entries