Archives for August 2005

Just call me Rick

An article about the Google Dance during SES San Jose is out. Mick Jagger, Rick Moranis. Mick, Rick. It’s only different by one letter, right? 🙂

UI fun: Better queries

In the last post, we all agreed that search engines have to keep trying different ways to improve search. For example, if you sit down and look at queries that users type, it’s clear that spelling mistakes can be a problem for lots of users. That’s what led us to work on a kick-butt spelling corrector. I love that you can get “Did you mean:” suggestions so that [sturling engine] suggests [stirling engine], but [sturling silver] suggests [sterling silver].

So what else can be improved besides spelling? Well, how about queries? Check out this UI test:

To be or not to be

The user entered the query [to be or not to be] without any quotes. Google is able to suggest that adding quotes to make [“to be or not to be”] is useful. It also shows some sample results (yup, Shakespeare is a much better result). Finally, it even gives you other good searches to try, such as [hamlet] or [to be or not to be that is the question]. That’s a pretty helpful UI test in my book. I’ve also seen suggestions like [mono] -> [mononucleosis] and [24 amendment] -> [24th amendment]. Nice.

Why am I even bothering to write about this? After all, Google is always trying new things to improve search, including different UI tests. Well, an SEO firm based in New York claimed this was “breaking news” and implied that Google was interspersing ads into natural search results. Nope, not at all–this is yet another algorithmic UI test. And the SEO would have known it wasn’t breaking news if they read Gary Price on SEW regularly.

So just for completeness: people who know Google well will go “Cool” and move on. Other folks will ask things like “Are queries selected by hand–can my query get in on this? Is money involved?” And the answer is: it’s all algorithmic. The algorithms pick the queries where this could be helpful. Of course money isn’t involved at all. We’re always running experiments to improve Google–sometimes it’s noticeable, and sometimes it’s not. Don’t even get me started on all the ways we’ve tried using ellipses in our snippets to make them more useful. 🙂

UI fun: Better snippets

Alright, enough doom and gloom for a bit. Let me do a couple posts about user interfaces (UI). Everyone who thinks that search engines are completely perfect and can never be improved, please raise your hand. Anyone? Anyone? Bueller? Okay, good. We all agree that it’s important for search engines to keep trying fresh ideas. Here’s one UI test that’s pretty cool:

View links within a site

Can you see what’s going on? For a small number of sites, we’re not just showing our regular snippets: we try to expose useful links from within a site. In this Berkeley example, Google shows links for Berkeley departments, academics at Berkeley, etc. Pretty neat (and more importantly, useful) stuff.

People who know Google well will go “Cool” and move on. Other folks will ask things like “Are sites or their links selected by hand–can my site get in on this? Is money involved?” And the answer is: it’s all algorithmic. The algorithms pick the sites where this could be helpful. Of course money isn’t involved at all.

SEO Mistakes: sneaky JavaScript

In one of my earlier posts, I said

I’ll also mention some specific “high risk” techniques and give the reasons why I’d avoid them.

When everyone talks about red or blue widgets, it can be hard to get the point across clearly, so this time I’m going to give a concrete example. Today I’ll use techgroups.com. For this example, you’ll want to make sure that JavaScript is off (e.g. using prefbar, as we talked about earlier). If you do a search like [site:techgroups.com optimization] (or use words like sells, nj, or even danny sullivan), you’ll find urls such as http://www.techgroups.com/search-engine-optimization/search-ngines.htm. If you check out that page, you’ll find text like

seop. I need webseek either baiduspider intelliseek is focused on scrubtheweb etc.
spidering is focused on alphasearch cannot be northernlight.
Buy greg notess and planetsearch by serchengine and find details of webtop is required by metasearcher.
This website has information on euroseek and mirago products. teome depends on ssp.
supersearch with advantage, what are people searching for and teona.
This website has information on excite’s and meta searches. searchday features. metaspy depends on 703.
infind needs metacrawler com resources. inktomisearch meta searches, cyber411 – gigablast, searchtheweb, argus clearinghouse, espotting of danny sullivan.

Is that something a normal person would write? No, it’s pretty much complete gibberish. Plus, I see multiple typos (teome and teona for Teoma). What’s an “espotting of Danny Sullivan”? Elsewhere on the site, it says that “Our website sells danny sullivan is infomak either quigo to smartsearch.” I happen to know that Danny Sullivan is not available for purchase–see how all this industry research pays off? 🙂 Of all the industries to scrape in, SEO is a poor choice: people in that industry are much more likely to notice someone using their content.

So this text appears to be autogenerated, and autogenerated from scraped pages (not just snippets or visible text of web pages–I found text that only appears in the comments of other HTML pages). Now why did I ask you to turn off your JavaScript? The answer is at the bottom of the page:
“eval(unescape(“var1%3D299%3B%0D%0Avar2%3Dvar1%3B%0D%0Aif%28var1%3D%3Dvar2%29%20document%2Elocation%3D%22http%3A%2F%2Fwww%2Etechgroups%2Ecom%22%3B%0D%0A”));”
Hmm. JavaScript that just does a redirect to the root page. And it doesn’t just set the location, it does some obfuscation and uses eval to unescape a cryptic string. But you can see the result well enough by reading the string, or by turning JavaScript back on and reloading the page.

So let’s recap the high-risk techniques that I would recommend avoiding:

  • Don’t use programs that automatically generate doorway pages.
  • It especially looks bad if the doorway pages are gibberish.
  • It really especially looks bad if the content you use is scraped content.
  • If you’re considering scraping content, doing it in the SEO industries is one of the worst places to do it.
  • If you scrape SEO content and end up scraping a couple spam pages, you may get noticed even more because someone is investigating the other spam pages.

and then:

  • If you make lots of pages, don’t put JavaScript redirects on all of them.
  • If you’re doing JavaScript redirects, don’t obfuscate the code–it just makes people think that you’re doing things after lots of deliberate consideration.
  • If you do obfuscate code, ask yourself: can a regular person still look at this code and tell what it’s doing without even knowing JavaScript?

That’s what I can think of right now. For web design company or SEO to employ techniques like this is especially bad–SEOs should absolutely know better. Every SEO company should be well aware of our webmaster guidelines, especially guidelines such as “Don’t employ cloaking or sneaky redirects” and “Don’t load pages with irrelevant words.”

In the comments on an earlier post, someone asked “It’s fine to take out one instance of spam, but do you work on the more general case of this type of spam?” (I’m paraphrasing a bit). That’s an interesting point, and the fact is that we work hard at improving our search quality with better algorithms. From this example, you can certainly make a case for checking for sneaky JavaScript redirects, plus things like 100% frames that don’t help the user. So it’s good to take action on individual instances of spam when we find it, but of course we’re working on better algorithmic solutions as well. In fact, I’ll issue a small weather report: I would not recommend using sneaky JavaScript redirects. Your domains might get rained on in the near future.

SEO Mistakes: link exchange emails

Here’s a mistake that people still sometimes make: buying a random software package that they think will get them a gold mine of links, especially by bulk emailing reputable sites. At this point, most site owners are savvy enough to realize that emails with link exchange requests are rarely hand-crafted with love. Instead of exchanging links, lots of site owners forward the unsolicited emails to Google, so I see plenty of emails such as:

Hello,
I have found your website XXXXXXXXXX.XXX by searching Google for
“business free from health home home make money risk work”. I think
our websites has a similar theme, so I have already added your link
to my website.

You can find your link here:
http://www.suspiciousdomain.com/news/insert-random-keyword-phrase-here.html

The best links are not paid, or exchanged after out-of-the-blue emails–the best links are earned and given by choice. When I recap SES from my viewpoint, I’ll give some examples of great ways to earn links.

css.php