Cuil launched this week. For a search engineer, a new search engine is like a Christmas present: you can’t wait to play with it. Most search engineers can get a good feel for the strengths/weaknesses of a new engine within 10-15 queries. And I’d like to think that with another 5-10 queries, I can usually figure out how I’d spam a search engine. It’s my job to protect Google’s index from spam, so naturally I’m intimately familiar with different webspam techniques.
What’s also fun is to figure out the how a search engine provides various features. For example, for a Cuil search like [matt cutts] you’ll see the following categories:
Where do those categories come from? Most people didn’t drill down that far, but it’s quite doable to figure out. If you want, take a few minutes to see if you can puzzle out how the categories are generated before reading on.
Google OS figured it out, for example: “Another interesting idea is the explorative category section that shows related Wikipedia categories and topics.” With a little work, it’s easy to verify that the right-hand box comes from Wikipedia category pages. For example, the string “matt cutts” occurs on the Wikipedia page for search engine optimization, and that page also includes a link to a search engine optimization consultants page. Sure enough, one of the categories listed for [matt cutts] is “Search Engine Optimization Consultants” and the entries under that category are from Wikipedia. Likewise, I think the Wikipedia page for Traffic Power and its link to a category page for black hat SEO probably accounts for why the category “Black_hat_seo” appears for my name.
There’s nothing wrong with surfacing Wikipedia category pages, of course, but sometimes that can lead to some drift in topicality. For example, p2pnet wrote about a search for their name: “[The search query] p2pnet.net, however, gave Canadian copyright law, Project Gotham Racing Series, file sharing networks, Wired magazine people, and filesharing programs.” You can see the categories for the search [p2pnet.net] below:
And this Wikipedia page has the string “p2pnet.net” and also has a category page for “Project Gotham Racing series”. The idea of surfacing Wikipedia category pages will have advantages and disadvantages depending on the user and the query.