Archives for November 2007

Which hotel are people staying in for PubCon?

I’m booking my trip to Vegas for PubCon, which starts in a couple weeks. Does anyone know what hotels will have lots of webmasters hanging out late? Which hotel are you staying in?

Unboxing the Everex $200 Linux Computer

Let’s lighten things up with a gadget post. You may have seen that Everex launched a $200 computer that runs Linux. It looks like Wal-Mart sold out of them, but not to worry: more are on the way.

Why should you be interested? Well, instead of Windows, it comes installed with gOS, which is a version of Ubuntu that is customized to work well with web-based tools from Google, Flickr Facebook, and Skype. When I heard that, I had to order one of these PCs to check it out for myself. 🙂

In this post, I’m just covering the unboxing. I’ll use the PC for a while and let you know what I think in a later post.

First, the box. The “gPC” stands for Green PC, not Google PC as some people have thought:

green PC box

When you open the box, you’ll find a pretty flyer on top of the PC:

green PC box opened

Here’s what the flyer looks like in more detail. For people that have never seen Linux, the flyer is a great little introduction:

green PC box flyer

For $199.00, I was expecting a barebones machine, but it comes with a PS/2 keyboard, mouse, and even USB-powered speakers:

green PC box peripherals

The front of the machine looks like a standard computer. You can see the DVD-ROM drive, speaker and microphone jacks, and two USB ports. There are also two silver buttons for power and reset:

green PC front

The back of the PC is pretty standard:

green PC back

Directly connected to the motherboard, you see four USB connectors and an ethernet port. The expansion card is a modem, but I believe the machine only supports broadband connections, not dial-up.

The machine itself is light but sturdy. I jostled it quite a bit and didn’t hear anything loose or rattling around in the machine. Okay, that’s it for tonight. Tune in later to see what I think of it. 🙂

Anti-Google claims: to reply or not?

Last week, Aaron Wall had a guest post on Search Engine Land that originally had the headline “How To Buy A #1 Organic Search Ranking On Google.” Then today I was reading Aaron Wall’s guest post on Google Blogoscoped where he makes a couple unusual claims, such as that “SEO = spam” in Google’s opinion (simply not true). Aaron has been doing so many anti-Google posts since around July that other people started to notice several weeks ago.

Suffice it to say that in my opinion there is another side to Aaron’s story. I’m on the fence about whether to talk about the specifics of what’s going on with Aaron. What do people think?


Update November 19, 2007: Thanks to everyone that gave feedback about this. I added a comment that talks about the situation some from a search engine viewpoint, without mentioning specific sites by name.

The web is a fuzz test: patch your browser and your web server

One of my favorite computer science papers is a 1990 paper titled “An Empirical Study of the Reliability of UNIX Utilities”. The authors discovered that if they piped random junk into UNIX command-line programs, a remarkable number of them crashed. Why? The random input triggered bugs, some of which had probably hidden for years. Up to a third of the programs that they tried crashed.

That paper helped popularize fuzz testing, which tests programs by giving random gibberish as input. Some people call this a monkey test, as in “Pound on the keyboard like a caffeine-crazed monkey for a few minutes and see if the program crashes.” 🙂

I can tell you that the web is a fuzz test. If you write a program to process web pages, there are few better workouts for your program than to pipe a huge number of web pages into your program. 🙂 I’ve seen computer programs that ran with no problem across our entire web index except for *one* document. You would not believe the sort of weird, random, ill-formed stuff that some people put up on the web: everything from tables nested to infinity and beyond, to web documents with a filetype of exe, to executables returned as text documents. In a 1996 paper titled “An Investigation of Documents from the World Wide Web,” Inktomi Eric Brewer and colleagues discovered that over 40% of web pages had at least one syntax error:

weblint was used to assess the syntactic correctness of a subset of the HTML documents in our data set (approximately 92,000). …. Observe that over 40% of the documents in our study contain at least one error.

At a search engine, you have to write your code to process all that randomness and return the best documents. By the way, that’s why we don’t penalize sites if they have syntax errors or don’t validate — sometimes the best document isn’t well-formed or has a syntax error.

But: the web is a fuzz test for you too, gentle reader. As you surf the web, your browser is subjected to an amazing amount of random stuff. Here’s a scary example: a couple months ago, someone was surfing a website and noticed that an ad was serving up malware. I know of a completely different web site that apparently got hit by the same incident.

So the take-aways from this post are:

  1. Fuzz testing is a great way to uncover bugs.
  2. Lots of great web pages have syntax errors or don’t validate, which is why we still return those pages in Google.
  3. If you’re an internet user, make sure you surf with a fully-patched operating system and browser. You can decrease your risk of infection by using products off the beaten path, such as MacOS, Linux, or Firefox.
  4. If you’re a website owner and Google has flagged your site as suspected of serving malware, sometimes it’s because your site served ads with embedded malware. Check if you’ve changed anything recently in how you serve ads. When you think your site is clean, read this post about malware reviews and this malware help topic for more info about getting your site reviewed quickly. Even if your site is in good shape, you might want to review this security checklist post by Nathan Johns.

Update Nov. 15 2007: Fellow Googler Ian Hickson contacted me with more recent numbers from a September 2006 survey that he did of several billion pages. Ian found the number of pages to be 78% if you ignore the two least critical errors, and 93% if you include those two errors. There isn’t a published report right now, but Ian has given those numbers out in public e-mail, so he said it was fine to mention the percentages.

These numbers pretty much put the nail in the coffin for the “Only return pages that are strictly correct” argument, because there wouldn’t be that many pages to work with. 🙂 That said, if you can design and write your HTML code so that it’s well-formed and validates, it’s always a good habit to do so.

By the way, if this is the sort of thing that floats your boat, you might want to check out Google’s Code blog, where Ian has posted before.