Archives for January 2007

I’ll be at SES London next month

I’m going to SES London 2007. I’ll be hanging at the conference (which runs Tuesday February 13th to Thursday February 15th), and I’ll do a Keynote Conversation with Chris Sherman that Wednesday.

If you’re at the conference, please come up and say hello!

“Undetectable” spam

I was reading Loren’s write up on a new link selling service from V7N. He points out an interesting claim from the company, which says

Contextual Links @ V7N are undetectable to search engines. Whether it be by human or algorithmic filtering, our links are impossible to detect. Additionally, an enforced non-disclosure agreement prevents both publishers and advertisers from revealing participating publishers and advertisers.

(emphasis preserved from the original.) Suffice it to say, if “undetectable to search engines” is listed as one of the major selling points of a particular link scheme, it probably violates our quality guidelines and the guidelines of other major search engines.

The “undetectable” claim brought up fond memories of another time someone claimed to me that their spam was undetectable. It was November 2002, so cue up the wavy time-warp special effect and let’s go back in time. 🙂

I had just removed a very large data recovery website from Google. They asked me why their website appeared to be penalized. I replied with this email:

Pages like
http://www.xxxxxxxxxx.com/data-recovery-software-cw.html
http://www.xxxxxxxxxx.com/data-recovery-software-dr.html
http://www.xxxxxxxxxx.com/data-recovery-software-mn.html
http://www.xxxxxxxxxx.com/data-recovery-software-aa.html
http://www.xxxxxxxxxx.com/data-recovery-software-it.html
http://www.xxxxxxxxxx.com/data-recovery-software-gl.html

appear to have garbage doorways with text about random SCSI things.
Visiting those pages in Internet Explorer just redirects to your
homepage. Using doorways + sneaky redirects is a serious violation
of Google’s spam guidelines. In order to relist you (and it will take
about 7-8 weeks), we need to have clear evidence that all these pages
are gone, and that we won’t see these sort of tricks on your domain
again.

Matt

(domain name removed to protect the guilty back in 2002.)

By the way, you can see the main criteria for a successful reinclusion request to Google haven’t changed in the last four years: remove the spam and find a way to assure us it won’t happen again.

The data recovery company evidently forwarded their email to their SEO to get an explanation. I like to imagine that they said something like “Um, dude. Google removed us completely because they found a bunch of crappy doorway pages that you made. What do you have to say for yourself?”

All well and good, but what happened next is where it gets funny. The SEO replies, but he doesn’t write back to the data recovery company that he spammed out of Google’s index. No, the SEO accidentally wrote back to me instead of his client. And here is what the SEO tried to say to his client but said to me instead:

SHIT!!!!!! It’s my fault!!!! Oh my Gawd. This is the first time – why you? No doubt in my mind it was a search engine savvy competitor who turned you in, because it’s undetectable to spiders. First time the search engines have found my doorways. This is scary! Weird that this happened right now, i have been worried that this would happen someday, so i have been working all month on a new system to make the pages look undetectably “real” so that someone with javascript turned off will just see a nicely formatted page, with images & stuff. – now we will be undetectable to spiders, and humans, hence 99% bulletproof.

I know what to do. I’m going to call you…

(name trimmed so as not to reveal the identity of the SEO)

I laughed so hard, I nearly bust a gut. His old system was undetectable, but he was worried he might be caught, so he was working on a spiffy new scheme which was really *really* undetectable. But only 99% bulletproof. 🙂 As you might be able to guess, I was easily able to find all of the fellow’s “undetectable” doorway pages and all of his clients with a single Google query — I didn’t even have to use any of my internal tools. I still chuckle when I hear the word “undetectable.”

One thing I do like about working on webspam at Google is that you collect really good stories. I don’t always tell the funny ones, but I share this one to make a point. The moral of this story is that “undetectable” spam sometimes stands out a lot more than you’d think. 🙂

What did I miss last week?

Okay, I’ve caught up on all but five feeds now. 90 posts on Search Engine Land in a week? Danny and friends, you’re killing me here. 🙂 Two of my favorite posts that I’ve seen so far show that Google is listening to feedback:
– Jeremy Zawodny complained that his Gmail spam filter wasn’t working well. Somebody from Gmail’s anti-spam team touched base with Jeremy to ask for some examples, and the performance is better now. It’s very cool that a Gmail person is on the lookout for reports of problems.
– A post from the Google Reader team made my whole day. It’s by Nick Baum, someone that I’ve enjoyed working with, and it starts out

One of the most useful aspects of feed readers is how easy they make it to keep track of industry news. Which in my case means using Google Reader to read about… Google Reader. For example, I subscribe to the Google Blogsearch for “Google Reader” (which has a feed) so I know whenever someone writes about our product.

That’s something that all teams at Google should be doing for their products. I can attest that the Google Reader team watches for feedback in the blogosphere and uses that to help decide what to do next. I believe that listening to outside feedback is part of the reason why that team is rocking out so hard lately.

Here are some of the other things that caught my eye:

– Greg Linden decided to put Findory on auto-pilot and spend more time on his health and family. I understand the decision, but I’m still sorry that Greg is pulling back. I hope he continues to blog. Not only is he a healthy voice for personalization (which I consider to be one of the biggest trends in the future of search), but his blog points out cool things like the $1M Netflix challenge to improve their personalization. Ironically, the week before I went on vacation, someone was showing me a cool feature and I told them it would be really neat to contact Findory and see if Greg was interested in trying it out.
– Wikipedia is adding nofollow to its external links. Brion Vibber announced this on a mailing list, and there’s some discussion at the bottom of this this section. The nice thing is that Brion’s email mentions “Better heuristic and manual flagging tools for URLs would of course be super,” which means that Wikipedia is open to ways that allow more trustworthy links to be “follow”-able. But for the present, I think it’s the right call: the incentive to create spammy links on Wikipedia has been massively reduced. As one SEO person commented on a forum, “Yeah, that sucks. All those hours spent spamming wikipedia, gone to waste…” 🙂 Over time, I believe Wikipedia will probably find ways to remove nofollow from links that are more trusted. If you’re interested in helping with that, see Brion’s email for how to get involved. I don’t expect this change to affect Google’s rankings very much, but it’s good to see the Wikipedia folks paying close attention to link spam (and open to refining their trust for external links).
– John Battelle pointed out that Peter Horan joined IAC as CEO of Media and Advertising. Jim Lanzone, the CEO of Ask, will report to Horan.
– People noticed that Google is showing related searches more often at the bottom of some search result pages.
– A Wired article second guesses some Yahoo decisions and execution. Among other things, it asserts that Yahoo! could have bought Google for $3 billion in 2002, and critiques the development of Panama. I personally thought that the article came off as too negative. “Why didn’t Company X buy Google back when they had the chance?” is a charge that you could level at several large companies besides Yahoo. And as someone who was in Google’s ads engineering group for a year when it was all of five people, I can tell you that writing a state-of-the-art ads serving system is hard. That’s especially true when Yahoo’s page views is measured in the billions. Ah, Valleywag finds a nice juxtaposition. Also read Yahoo’s full response to the Wired story here.
– Yet another “pay-for-blogging” (PFB) business launched, this time by Text Link Brokers. It should be clear from Google’s stance on paid text links, but if you are blogging and being paid by services like Pay Per Post, ReviewMe, or SponsoredReviews, links in those paid-for posts should be made in a way that doesn’t affect search engines. The rel=”nofollow” attribute is one way, but there are numerous other ways to do paid links that won’t affect search engines, e.g. doing an internal redirect through a url that is forbidden from crawling by robots.txt.
– Hitwise offered a market share comparison between Bloglines, Google Reader, Rojo, and other feed readers that claimed Bloglines was about 10x more popular than Google Reader. My hunch is that both AJAX and frames may be muddying the water here; I’ve mentioned that AJAX can heavily skew pageview metrics before. If the Google Reader team gets a chance to add subscriber numbers to the Feedfetcher user-agent (which may not be a trivial undertaking, since they probably share code with other groups at Google that fetch using the same bot mechanism), that would allow an apples-to-apples comparison.
– Google closed a small security hole that Tony Ruscoe found. After reading Tony’s post-mortem post, it sounds like it was closer to a proof-of-concept than a serious threat and the security team responded and fixed the problem quickly, which is good.
– Someone defaced 3-4 SEO blogs using a security hole in WordPress. My blog was on the “want to crack” list, and my logs data shows four attempts to crack my site using the “POST /blog/wp-trackback.php?tb_id=1” technique of this script. Just to be clear, in the same way that trying to infect users with viruses/trojans is considered webspam, cracking sites is a violation of our webmaster quality guidelines. This incident provides a good reminder for everyone to upgrade their WordPress, especially since:
– In bigger WordPress news, version 2.1 just came out. Here’s 10 things you might want to know about the new version. The high-order bit for me is that WordPress 2.1 introduces an autosave functionality. You can read the official 2.1 release post by Matt Mullenweg, who recently turned twenty-three. 🙂

I’m sure there will be other things I missed, but those were the most interesting to me as I was catching up.

And I’m back

Ah, that was a nice vacation. For the first time I can remember, I went off-the-web-grid for a week. I didn’t check email, read feeds, or do any blog posts. In fact, I turned my computer off and didn’t do anything online for a week. Normally on a vacation, I spend an hour or two each day triaging email and handling short-fuse situations. The benefit of that is that you’re not quite as behind when you return to work, but the flip side is that work-related stress can seep into your vacation.

What did I do on vacation? I decided to stretch from my normal habits and keep busy so I wouldn’t miss being online quite as much. I did an intro session of horseback riding, a couple hours of mountain biking, and climbed a rock wall. I took a 35mm photography lesson and I jumped off a telephone pole (not to worry, I was attached to a safety harness). I tried pilates (not as bad as I thought), yoga (the mondo-beginner class that I took was fun), an Abhyanga massage (oily, but relaxing), and acupuncture, which was not really my cup of tea. At all. I tried a couple meditation classes; I enjoyed one and fell asleep in the other, which was still nice. 🙂 I went on 2-3 hikes and watched 4-5 movies with my wife. I took a drumming class (it turns out that I suck) and generally did non-computer things.

This vacation tested my personal theory that I’m primarily just a security blanket for webmasters at this point, and I’m happy that the SEO world seemed to go on just fine without me. 🙂 Of course all that off-line time meant that I’m now *way* behind on what I missed. Looks like I’ve got about 400 new email threads, which is probably 750-800 emails. Google Reader just shows “100+” for the number of items in my bucket o’ feeds, but I’m guesstimating that I came back to 500-600 unread blog posts. I’ve cleared out all the feeds except the 5-10 “high-volume” search feeds. In the next day or so, I’ll post the web tidbits that caught my eye while catching up.

Search microcosm in a forced carpool

Yesterday my car was at the dealership. The dealer has a shuttle service that drops you off at home or work, and so there were four random people in the car besides the driver. Out of that semi-random sample of Silicon Valley folks, three out of the four were working on search! The other person could have been in search too, but he was cranky and/or hard-of-hearing, so he didn’t really talk much. 🙂

Of the remaining three, there was me and a person working on Windows Live Search (just down the road from Google at Microsoft Research Silicon Valley). The third fellow was trying to get a search start-up off the ground. The start-up fellow didn’t want to mention his company’s name, but he was very proud of the fact that he was building a team of programmers in Romania. 🙂 He said a friend of his had recently interviewed with Google and hadn’t gotten an offer, despite have an 800 on the math section of the SAT. He was convinced that getting a job at Google was all about who you knew. I didn’t have the heart to tell him I’d gladly trade the 800 score (if that was all I knew about the candidate) for someone that got a 720 on the math, but worked well in a team, communicated well, took the initiative, could work independently, cared about the company’s mission, had good industry knowledge, listened, executed well, etc.

The Windows Live guy was cool, and he asked about the recent “Best Place to Work” award from Fortune. I said that they do take pretty good care of us at the ‘plex. I thought about inviting him up to lunch sometime, but wasn’t sure whether it would be good etiquette or not, so I played it safe.

Stuff like that doesn’t happen in many places other than here, and it was a fun ride in the shuttle. 🙂

css.php