Competition in search

Businessweek has a good article that covers some of the competition in the search industry. It’s interesting to see the angle that each engine takes:
– Ask contends that its topic communities better than PageRank; this is the “would you want to trust a room full of people, or a room full of experts on your topic” angle. Personally, this wouldn’t be the angle that I’d press on. For one thing, it’s hard to explain. Also, this patent shows that Google has thought about different types of link analysis besides published papers on PageRank:

The new patent deals with the process for finding matching documents. Under the methodology, Google turns up an initial set of documents related to the keyword and then ranks each page with a “relevance score.” Next, it calculates a “local score value” that quantifies “an amount that the documents are referenced by other documents in the generated set of documents,” according to the filing. Finally, the local score values influence the relevance ranking of a page.

According to the patent, “a search engine modifies the relevance rankings for a set of documents based on the interconnectivity of the documents in the set. A document with a high interconnectivity with other documents in the initial set of relevant documents indicates that the document has ‘support’ in the set, and the document’s new ranking will increase. In this manner, the search engine re-ranks the initial set of ranked documents to thereby refine the initial rankings.”

Wow, Krishna filed that patent well over five years ago. 🙂
– Clusty mentions topic communities too, but the timing is good because they also get to mention their Clusty Cloud feature. Here’s a Clusty Cloud for Matt Cutts, for example:

Loading Clusty Cloud …


I’ll leave it as an exercise for the reader to think about where that text is drawn from.
– MSN’s Justin Osmer has a nice quote saying that they’re in search for the long haul and are shooting for first place.
– Yahoo! goes with the social search angle.
– Eurekster talks about social search and their Swickis, which let users create search engines focused on certain topics and urls.
– Snap gets a mention for how they show thumbnail previews of search results.
– A9 is mentioned, but in the context of their recent scale-back of features.

There’s also a quote from me that I’d like to clarify:

“We have more engineers working on core search technology than ever before,” says Cutts, adding that most employees spend about 75 percent of their time tweaking Google’s main search algorithm to make results faster, more relevant, and more comprehensive for users.

The part I’d like to clarify is “most employees spend about 75 percent of their time tweaking Google’s main search algorithm.” I was talking about the breakdown where Google wants to spend 70% of its efforts on core competencies like search and ads, and I probably said it poorly. The way I’d put it is that many engineers spend their time on Google’s core quality and those engineers work very hard to improve our search.

It’s absolutely the case that we have more engineers working on core search than ever before (and these are talented and smart folks). I think a new product or feature is much easier to point to than “Look, our language identification and segmentation accuracy is X% better, and our recall improved by Y%!”, so new products and features tend to get talked about more than core search.

27 Responses to Competition in search (Leave a comment)

  1. Hi Matt:

    Nice article. The graphic lead me to the article about KinderStart sueing Google because there rankings dropped.

    Without commenting on this case, can you tell us if there are a lot of such silly (and expensive) suits?

    Ted Z

  2. That Google patent sounds like it would encourage link farms…

  3. > Here’s a Clusty Cloud for Matt Cutts, for example: …
    > I’ll leave it as an exercise for the reader to think
    > about where that text is drawn from.

    Yahoo has a Related Queries API tho it returns different results.
    * evil matt cutts
    * matt cutts seo
    * matt cutts sites
    * matt cutts posts

    You can also feed the top 10 Google titles & snippets for this keyphrase into the Yahoo Term Extraction API (an approach I’ve been playing around with lately) and you’ll get…
    * google search engine
    * google news
    * bloglines
    * googleplex
    * wikipedia
    * free encyclopedia
    * seo
    * conference speaker biographies
    * software engineer
    * gadgets
    * blog
    * search matt
    * hip software
    * daddy long legs
    * optimization issues
    * quality group
    * matt cutts

    http://developer.yahoo.com/search/

    Why doesn’t Google have such cool APIs? 🙂

  4. Dave (Original)

    There is more than 3 SEs? 😉

  5. Hi Matt
    Sorry for the off-topic but I’ve spent days trying to find out how to suggest an idea for Google Labs.

    Is there an email address to send stuff to / form to fill out etc?

    I was going to blog the idea and post a link – would that be better?

    cheers

    webecho

  6. By the way, love the new anti-spam thingy.
    Just one thing, can you make the questions a bit easier please? 🙂

  7. That Google patent sounds like it would encourage link farms…

    That patent became known as “LocalRank”, so if you come across that expression, you’ll know what it refers to.

    Also, this patent shows that Google has thought about different types of link analysis

    Thought about, yes, but implemented? 😉

    As a Web user, I’d much rather trust an algorithm to provide impartial results than groups of people (“topic communities”). People have various biases and preferences, whether intentional or not – and sometimes they must be very intentional. PageRank was an excellent concept as a measure of ‘importance’, as long as it was never used. But the moment it was used, the ‘importance’ of pages started to be organised.

  8. Hi Matt. I have a question about internal searches. I use this function alot when I do new pages, to check related information that is already on my site and, as there is nearly always alot, include related links to old pages on the new one. It is a useful tool, because my site has over 50,000 pages and has been running for 7 years now, so it is literally impossible for me to remember everything that is in there.

    Today in the office we have observed that when we do the internal search, only 3 results appear on the search result page, at the top a message saying “results 1-3 of about 51 xxxx results on the site. And it seems impossible to access the others.

    Do you have any idea if this is a new system Google is going to use for internal searches? And if so, why?

    Thanks as always for your help and best wishes!

  9. It’s absolutely the case that we have more engineers working on core search than ever before (and these are talented and smart folks). I think a new product or feature is much easier to point to than “Look, our language identification and segmentation accuracy is X% better, and our recall improved by Y%!”, so new products and features tend to get talked about more than core search.

    Just curious…how does Google establish the latter idea (language identification/segmentation accuracy)? Algorithm? Focus group? Third party analysis (like some consulting company walking in and doing whatever it is consulting companies pretend to do these days?) Something I haven’t even thought of? Thanks.

  10. Hey Matt,

    I since you’re like a regular or at least play one at conferences 😉 is it you engineers who make those patents so incredibly incomprehensible or is it the lawyers?

  11. That’s the point I wanted to get across, Dave (Original), that there are more than 3. 🙂

    PhilC, it’s a good idea for some situations. graywolf and Ted Z, I’m going to refrain from making any comments about matters legal. 🙂 Multi-Worded Adam, it’s an interesting problem, but a lot of the traditional ways of measuring recall/precision can apply when you get a test set of data and then see how well you do.

  12. My impression is that Google has a big headstart in its core technical competency, and none of the competitors are in position to catch up anytime soon. However, over the next several years there is a very real danger of losing this big lead. The risk is that too much time and effort will be devoted to beta products and “gee whiz” stuff that captures news headlines, seems more exciting than the relatively dull work of detecting spam, improving relevancy, etc. and/or offer the tantalizing vision of huge new revenue streams.

  13. I really like the way you were able to clarify your quote. It’s amazing to think that in such a short period of time an individual has been given the tools needed to make these types of corrections. It used to be that the media had the final say. Now we can get it straight from the horse’s mouth.

  14. BTW, responding to http://blog.searchenginewatch.com/blog/061005-102033

    I think it is hard to explain topic communities. I don’t how many times I’ve explained PageRank as “not just raw numbers of links, but the quality of those links,” only to see that reduced down to “raw number of links” in an article. So I sympathize with the folks that try to explain topic communities.

  15. I think a new product or feature is much easier to point to than “Look, our language identification and segmentation accuracy is X% better, and our recall improved by Y%!”, so new products and features tend to get talked about more than core search.

    That said, would you even be *allowed* to discuss such numbers? If so, I wouldn’t mind hearing about them every so often …

  16. Hi Matt,

    Stephanie’s link to the patent isn’t good, and points to an old Google design patent. The problem is because her link uses search parameters that change when more Google Patents are filed.

    This link should work better:

    Ranking search results by reranking the results based on local inter-connectivity

    A nice (Overture) variation on it was granted on Tuesday (or I should say regranted since it’s the second version of the granted patent):

    Method for ranking hyperlinked pages using content and connectivity analysis.

    The filing date of the original version is listed as January 15, 1998.

    Krishna filed that patent well over eight years ago. 🙂

  17. I can not understand why search engines with all their glorious algorithms and patents still have trouble figuring out that a site with www and the same site without www are the same site.

    To consolidate separate listings for the same sites would improve results – why has it not it happened?

    Perhaps all the trumped up differences in methods are just marketing smoke and mirrors.

    You said “Personally, this wouldn’t be the angle that I’d press on. For one thing, it’s hard to explain.”

    Most people don’t want to understand search engines – they just want better results. Ask says they have better results – some folks will go check, I did. Some of those folks that check will stay with Ask.

    For now I will keep google/firefox as my browser’s home page.

  18. What about *Powerset*??

    It’s going to be good – probably very good – but nobody outside knows just how good. Maybe you have an idea Matt?

  19. Search is only 1% outside of YOU and 99% inside of YOU. – What I mean by that is that as long as the focus is on finding categories and not on the user there will be no significant progress in search. If the ‘search engine’ knows YOU it can give an answer/result relevant to YOU. It’s more important to EXCLUDE all the stuff that is unimportant to YOU than providing you with 16,000,000 results. It starts with simple things like the languages you know but also what else do you already know. – Anyhow, jsut my five Cents 😉

  20. Bill, thanks for mentioning the more accurate link. All hail the search patent master. 🙂

    Joseph, it’s bad luck that Powerset missed this article, but I’m sure they’ll be in a future round-up.

  21. So why is the limit by date function broken, and has been broken since I don’t know when? A search for “matt cutts” gives us 1,760,000 references. Updated in the last three months? 1,760,000. Updated in the last 6 months? 1,760,000. How is this right, or indeed helpful? It can be replicated across the board as well.

  22. Interesting. I recenetly read an article call chaos at google, claiming that google employees spend alot of time on new ideas too.
    So: it looks like you are revealing that there are in fact a nice number of Seos working for google. Got it!
    Any how thanks for the great post.

  23. Matt said “There’s also a quote from me that I’d like to clarify:”

    What he means is RE-STATE AND CORRECT, not clarify.

  24. Different searches have their own vital parameter, so the results will be various. I like Google, the SR is most correct!

  25. For a search engine to take notice of social aspects, it’s like Digg and the friends section. It’s SO trickable.

    I want to stick with the Trustrank and Pagerank. Not go social. Social is all the way, friends helping friends. Not relevancy.

  26. So why is the limit by date function broken, and has been broken since I don’t know when? A search for “matt cutts” gives us 1,760,000 references. Updated in the last three months? 1,760,000. Updated in the last 6 months? 1,760,000. How is this right, or indeed helpful? It can be replicated across the board as well.

    I’d like to know the answer to this as well. One place I’ve noticed it in particular is Google Alerts. I have the alerts come to my inbox daily, but they’re usually things that are 3-4 months old (e.g. forum discussions, blog posts.)

    It would be really nice if it results could be from the previous day.

  27. I don’t know what business school some of you people went to. Personally, if I put my own time and money into optimizing a site about blue widgets and it begins to receive traffic from (you know who) and I decide to redirect that traffic to another blue widget site that is willing to pay for it, well isn’t this still a capitalist country? And if that person decides not to pay for it anymore why should I not sell it to a competitor? This is Business 101
    Would you do the same with a popular phone number?

css.php