More info about synonyms at Google

Steve Baker, an engineer in the search quality group at Google, just did a nice post about synonyms on the Google blog. A lot of people seem to think that Google only does simple-minded matching of the users’ keywords with words that we indexed. The truth is that Google does a lot more sophisticated stuff than most people realize. I’d say that Google does more with “semantics” and both document and query understanding than almost any other search engine.

Read the blog post for more info, but I liked a couple examples that Steve mentioned. “Pictures” and “picture” often mean the same thing, but the query [arm reduction] is very different than [arms reduction]. Also, in the query [dura ace track bb axle njs] the “bb” is probably referring to a bottom bracket while in the query [software update on bb color id] the “bb” probably means blackberry.

Still not convinced? Here’s some new stats from Steve that we haven’t made public before:

However, our measurements show that synonyms affect 70 percent of user searches [note from Matt: of course, it could be a subtle change] across the more than 100 languages Google supports. We took a set of these queries and analyzed how precise the synonyms were, and were happy with the results: For every 50 queries where synonyms significantly improved the search results, we had only one truly bad synonym.

I hope Google continues to open up more about search quality and talk more about our search rankings. Steve is a smart engineer. I love that Google has a lot of smart engineers like Steve and with any luck we’ll continue to highlight the sort of work that those engineers do.

As far as concrete advice for webmasters, the same advice still holds that we’ve always said: think about the different words that searchers might use when looking for your content. Don’t just use technical terms–think about real-world terms and slang that users will type. For example, if you’re talking about a “usb drive,” some people might call it a flash drive or a thumb drive. Bear in mind the terms that people will type and think about synonyms that can fit naturally into your content. Don’t stuff an article with keywords or make it awkward, but if you can incorporate different ways of talking about a subject in a natural way, that can help users.

Added, Jan 22, 2010: Another nice post on the Google blog, this time about highlighting users’ answers directly in search result snippets.

66 Responses to More info about synonyms at Google (Leave a comment)

  1. That is really interesting Matt – but one thing I find challenging is that English and US English words are still treated differently and the same at the same time. For example [search engine optimisation] gives (as used in Ireland, Canada, South Africa, Australia, the UK, New Zealand) shows some 36 million results but with a ‘z’ it shows 1.6billion results. In Ireland, the UK and other “UK” English countries, the suggestion giving is with a ‘z’ – which is technically incorrect in those countries (think of the poor English teachers 🙂 ). Despite the difference in resulting pages, Irish SEO sites with the correct spelling still show even when searching with a ‘z’ but slightly lower than without – surely spelling should be localised (localized) by now?

  2. Hi matt.

    Nice juicy stat on the 70% of queries containing synonyms – which of course will be savoured with a pinch of salt, when considered against the language variations.

    On a similar note, we seemed to be having an interesting test over here in the UK last week; where many searches resulted in listings that had spelling variants that could be considered ‘Americanised’. I’m sure that this speaks to the same point about user demand, cultural shifts, slang etc.

    Can you see a world where Google can cope with colloquial/regional variants within a country?

  3. “Google does a lot more sophisticated stuff than most people realize”

    As usual! 😛

  4. 1 in 50 turn bad, thats a pretty good in getting accurate results.

  5. Do you think this technology (and related technology) potentially competes with microformats? If computers can organize and make sense of data without the additional markup then why add it?

    Seems microformats just turn into the keyword meta tag at some point – useless markup that doesn’t add value any more.

    Granted, stuff like RDF(a) does more than just organize, but it appears the technology mentioned in your post could be extended in a similar way.

  6. Pretty darn amazing stats. The evolution of search and search technologies never stops, but keeping your website’s visitors in mind is always most important.

  7. What about homonyms and heteronyms? 🙂

  8. Matt, It is amazing how many articles are worded for seo and have absolutely no benefit to the end user. And somehow they get awesome rankings in the serps. I have noticed lately that some fall off a lot quicker than they did before. The internet is so full of junk and you guys are the best at sorting through all of it.

  9. Good advice. Especially in regard to technical things, acronyms are used extensively and often not understand by even technical people (for instance in the case of new releases). It’s a good idea to keep your audience in mind.

  10. Thanks for the stats, I find it amazing how much Google are improving. Thanks for the stats :-).

  11. Within a post, I generally prefer to use a consistent term–flash drive, perhaps. So this always seemed like a perfect use for keyword metadata: providing alternative keywords so that people could find the post (using whatever term was natural for them) while keeping the text itself consistent.

    What I hear about search engines, though, is that they don’t really support that use of keyword metadata.

  12. @Jonathan Hochman – heheh… A fellow vocabulary aficionado perhaps?

  13. from the article: “because while we know it’s a bad synonym, we don’t typically fix bad synonyms by hand. Instead, we try to discover general improvements to our algorithms to fix the problems.” and personally appreciate the transparency!

  14. If the algo is good at matching synonyms why would you need to “incorporate different ways of talking about a subject” in your document? If the algo works as described, shouldn’t a good doc about “thumb drive” be able to rank for “flash drive” or “usb drive” without actually using these terms?

  15. Synonyms generally work pretty well, but they aren’t perfect.

    For example, there is a actors’ agent/professional deathmatch wrestler (yeah, I like wrestling, I don’t wanna hear it) named Billy Gram who often gets confused with the preacher Billy Graham. Now, Billy Gram isn’t exactly in the mainstream, but he does have a certain Cult following and queries for him shouldn’t be considered synonymous with those of Billy Graham the pastor.

    Just a minor pet peeve, though, and this was as good a time as any to give it up.

  16. Hi Matt,
    I was bit confused about synonyms at Google,
    thanks for this very informative post.

  17. Philip Brewer: I completely agree with you.

    Mats advice in this case is awful. Think about a persons writing an article – to use the same term to describe something is considered a good style. What he advertises here is the same good old keyword stuffing that is known to work for ages and this my friends, is really, really sad.
    Good articles don’t use synonyms, unless the writer need to use them to explain a point, but because of Google rankings and getting into Google we have to write article for a monkey using a lot of keywords and synonyms, which is pathetic.
    Google should put an emphasis on detecting synonyms by themselves, not suggest to write keyword stuffed articles (to a bigger or lesser degree) – shame on you for advertising artificial keyword stuffing, because your Google bot is not smart enough. We already have SOO many web articles that you can see are written for search engines, they read like a retard child had written them.

  18. I know that “Google does a lot more sophisticated stuff than most people realize” but spammers does also to trick Google. Good to hear that Matt, its nice to know that Google spent much time to develop the queries they give to searches.

  19. Does it all about of putting synonyms only? I believe the main thing is mentioning relevant terms in your content e.g. If you write anything about “Tiger Woods” you must include the term “Golf” in your content and vice versa. And “Tiger Woods” and “Golf” is not synonyms of each other in any way. Correct me, if I am wrong 🙂

  20. Great post, thanks for sharing the stats and giving us a sneak peak in the ‘google kitchen’.

  21. I think the rate of 2 percent of bad synonyms is quite good. I’ve seen that in the german serps. Its a very useful feature and makes searching easier. And more difficult for SEOs to fool Google.

  22. As you said, an usb drive could also be called thumb drive, usb key, pocket drive, usb pocket drive, etc. we used to specify these kinds of variations in the meta keywords section, but the meta keywords is completely ignored by Google! You can optimize only for a particular keyword or a set of keywords for the page content on a page without sounding spammy… This is why I take this opportunity to ask Google to start indexing meta keywords also rather than ignoring them.

  23. kinda renders the tilde useless, doesn’t it.

  24. I love seeing how Google is trying to understand normal web usage and improve the algorithm rather than try to sculpt the web to fit the algo.

  25. Hi Matt,

    thanks from germany for the information! Very interessting stuff on your blog. Semantic will be a key factor of the “web3.0” and this one is a good step in this direction.

    Thanks
    Sebastian

  26. To be honest, I might be one of the few people that hate Google synonyms. I often do searches for very specific topics – like, a command in a programming language, an error message, or a specific excerpt from a book or article.

    It happens sometimes in such cases, the synonyms are exactly what I do NOT want, because I really need a search for the EXACT thing I’ve typed in. But the results are so stuffed with synonym results that I can’t find websites that actually have what I’m looking for. It gets even worse when I have to use signs like “.” or “#” in my query. It actually happens that Google turns practically useless in some such cases.

    Would be great if there was a chance to actually deactivate synonyms in specific search queries, and get only the results that really match EXACTLY what you were looking for, letter by letter.

  27. Interesting post. So are you saying that webmasters still have to utilize the synonyms in their websites to be ranked for these terms? Or does the algorithm identify synonyms and rank them even if they don’t carry the word in the content? E.G. Can my “pictures of kittens” site rank for “photos of cats” if there are no inbound links or content that say photographs or cats?

  28. I too would like to be reassured that Google is taking into account English synonyms not just american ones. The one that always makes me laugh is cracker – in the UK it is a thin water biscuit but a biscuit in US is what we call a scone despite the word biscuit meaning “cooked twice” which is the traditional method of cooking biscuits but not scones! Does Google understand all of this? Plus of course as I do SEO, the s/z in optimisation is a concern too if there is any skewing of UK results towards US results.

    Plus how do I get my avatar on here instead of a little chunk of wallpaper?

  29. Matt,

    Is google removing synonyms that they find to be wrongfully associated with common searches?

  30. Thanks for the stats!Good advice for SEO too.

  31. Matt, thanks for the info
    very helpful

  32. Gdrive is on (import a file / Importer un document en France) on Google Docs !

    Cooooooool new feature, but slow bandwith (normal) !

  33. Which one is more powerful …. using synonyms or focusing on semantics purely? Nice info though!

  34. Thanks for sharing nice advice about SEO, it is interesting that google do lot more sophisticated stuff that more people want.

  35. very helpful~
    I like here,Matt

  36. I know that this is some time off, and getting computers to understand English is hard, but I really think that there is a time that is not too far off when there will be a more intelligent interpretation of what a webpage is about – it’s meaning. Another thought that I wonder if is in the mix. The web is about information, right? And in academia we have no problem in citing our sources. When people are creating backlinks to build their PR they want just one way links. However, wouldn’t it be better to have it so that we can freely cite our sources, without losing PR along the way. Better still, to actually benefit by this practice and have a PR boost. That way we increase the information gathering. Google can then start to form this information into clusters and index clusters of meaning such that results that are displayed start with the area of each cluster which most relates to the meaning, and then lower down the cited pages as meaning decreases. Lastly, I think we need to train people better at searching, so if they want to buy something, they need to put ‘buy’. Def is brilliant as are other little phrases, but we need to get better at it. Perhaps we need ‘meaning: XXXX’ if we were to follow my suggestion? Great reading your stuff Matt, thanks again

  37. I realised this along time ago…

    that’s why if I’m looking for something a little different in a competitive area I always go to “a less sophisticated” search engine because I find in my experience that no amount of re-phrasing will help, google just keeps on churning out the same results that I don’t want!

    I think that somehow google should work out that if a searcher has to rephrase then if the same sites and same pages are going to be returned (say on the 1st page) there should be a “eureka” moment and those results dumped (i.e. they’ve had their chance for this searcher) for an alternative set!

  38. Hi Matt, sorry I comment way off topic, but I have a question that may sound interesting for you to answer.

    I have site A, bad designed (using tables and infinite w3c errors). I modified the site structure (few content modification) and now it perform on serps way worse than before.

    1) What steps will you take into account to make this kind of change so we don’t drastically lost positions in the effort to keep updated to the latest standards.

    2) Google is encouraging users to go HTML5 as soon as possible. Is google bot able to understand those tags already? Will I have any benefits of having a tag(s) or will it be worse/same than , etc ?

    Intro:
    I was top #1 for a good quality keyword (international flower delivery) for 4 years in a row, last year I dropped to #3 position, (not bad) but because I like quality I changed my whole site to XHTML, CSS, well designed (coded like a girl) with correct tags (p, small, strong, img alt, proper 301, canonical, etc) I even switched images to content. Basically, I followed every google guideline and I think I did a great job.

    Unfortunately, 3 month later (now) it dropped from position #3 to #9 which is currently right now, which points to me that something is not right and we need to keep working on the SEO. Ok, that’s straight forward but I was just wondering…

    Greetings and thank you for your time.
    Bart.

  39. Hi Matt,

    I enjoy your blog and thanks for giving all of us a small peek inside google regarding synonyms and the 70% benchmark. However, it seems google is trying to have us write in blogs or have full sales descriptions loaded with keyword variations. Alot of times it’s neither feasible or sensible. For example if I was to sell usb drives and try to rank for it’s variations, I’d have to write something like this (below) just to work those keywords on one sales page:

    Spammy:

    “While I was on a drive, a sudden lightning flash sizzled my thumb because I was fiddling with the USB on my key chain. Luckily I could still drive but the flash really hurt my eyes, my thumb still hurts to drive and the USB drive is melted so I stuck it in my pocket… At least I could drive without my thumb and a hot USB in my pocket”

    Or somewhat sensible for an infomercial:

    “Our big data USB memory drives rock. Pow! In a flash you can slip these 100 gb beauties in, grab the data and leave with it in your pocket. So tiny (smaller than your thumb)… you won’t feel it in your pocket while you drive. Who needs memory? Never forget where your big data thumb drive is, ours fits right on a key chain for easy storage and access when you drive around town.” Order now!

    There has to be a better way than to write the drivel (above) or include a laundry list of bulleted variations to let people know that # gb drive, thumb drive, key drive, usb drive, pocket drive, data drive are interrelated and available on some website.

    Maybe I missing something?

  40. Surely 1 in 50 bad synonyms actually adds up to a hell of alot of bad results considering the volume of searches. Will effort be made to reduce the amount of bad synonyms or is 1 in 50 considered an acceptable stat in the light of improved results for the other 49?

    Joe

  41. @Jow Caws – 1/50 doesn’t mean that 1/50 times you’ll get a bad match; it means 1/50 actual synonym guesses are not good. I would guess that the ones used most frequently are more correct due to the statistical approaches taken by Google.

  42. Great information, Matt. This will be interesting to toy, fiddle, mess with as I write, compose, develop new keyword rich articles, content, pages.

  43. Thanks for shedding more light on Steve’s post. I oversee onsite search for my company’s multiple brands across several languages, and fully understand the complexity involved with managing synonyms (and hypernyms and compound words for that matter). I think it’s especially important to point out what Steve said about not fixing individual “bad synonyms” by hand. You could spend forever trying to fix individual synonyms, but the real value is in testing and modifying the rules that govern results. A lot of people get caught up on this and make their sites less scalable.

  44. Does this have anything to do with the “longtail” keyword phenomena that I’m hearing so much about lately? If a site were to rate highly for “homes for sale” would Google rate it equally highly for “houses for sale” just substituting the word “houses” for “homes”, or would they be two completely different things?

  45. Thanks for the clarification on synonyms. But i do like the way google displays synonyms. I know that many of my friends like to see those suggestions and in fact, they are helpful to the person who makes a query.

  46. Hey Matt,

    Great post and I’m glad this has come out to the public domain, Google is by far really starting to lock down search for a long time to come and synonyms is a great move along with live search, breadcrumbs trails etc etc.

    Dave

  47. 2 Heiner

    use groups with ” around like “x+y”

    “content true” “time and place”

    and I hope that there will be buttons to make them at Google (invalid) Startpage before I die

    but than I fear 2 much User than would be catch how empty it is here

    beside this endless funny useless stuff that the kids are like so much today

    and the ever present official hot air on all channels

  48. Thanks for the great post. As an SEO content writer, I have long thought that people gave too much value to keyword density and using exact keywords, and that writing in a natural way using synonyms was much more effective. While I do use keywords, I never use them in a way that sounds unnatural.

  49. Thanks Matt. Useful post.

    It makes perfect sense that all searchers will add variance to their search terms – hence the value of including synonyms in copy. I don’t buy the argument that Matt’s post will encourage keyword stuffing / spamming as there are always going to be black hat SEO’ers attempting to ignorantly muscle their way up the SERPS.

    The true art of SEO copywriting lies in understanding the diverse needs of your target market and writing copy that not only reads well and triggers the right emotional responses, but is also indexed by search engines for a diverse set of keyphrases.

    Google (and MC) have always talked about building sites for humans first; SE’s will follow thereafter. The synonym case is part and parcel of this concept.

  50. Great post, its not hard to tell which words are synonyms with eachother, this post explains it thoroughly.

    For instance search for: ~bb
    http://www.google.com/search?hl=en&safe=off&rlz=1C1GGLS_nlBE327BE327&q=~bb&aq=f&oq=&aqi=g10

    It will show a mix of bolded words like: blackberry, big brother, bed and breakfast etc. 🙂

    /cheers

  51. The lesson here is that companies need to focus on their content. They need to think about how their visitors search and not how they refer to their own products and services. One way to really get an accurate understanding of the terms your clients use is to install Google Onsite Search. You can use all the keyword suggestion tools you want, but Google Onsite Search will give you the real data.

  52. @Karl Heinz: I know that function, but there are cases where even that doesn’t help. Actually, it’s what I call the “Microsoft problem”: As soon as a machine acts as if it knows better what I want than I know it myself, things are bound to go badly wrong sometimes. And if I can’t influence that behaviour then, the machine might actually become useless to me.

  53. This is truly important data. while every SEO focuses on dry numbers and keyword density,
    synonyms and the true understanding of language enables people to stop thinking about “will google like this page ?” and start focusing on “will my readers read this stuff ?”

    because i honestly believe (long before this post was posted) that when you focus on writing true helpful and high quality material google will be able to understand the context of the pages.

    so this is another exciting post… 10x matt

  54. well good post on synonyms oriented result and google view on it, may be misspelled and Latent semantic Indexes covered under this section..Search going to be more versatile in these areas.

  55. Matt, does Adwords also use synonyms or you can’t comment because that is not your area?

  56. “It makes perfect sense that all searchers will add variance to their search terms – hence the value of including synonyms in copy. “Nice view!

  57. In Norway we have three special characters æ, ø and å.

    How should this be handled when dealing with folder names?
    Previously I’ve used ae for æ, oe for ø and aa for a.

    Øl = Beer

    Suited URL name would then be “oel”

    Example:
    http://www.thebeerpage.no/oel/

    I’ve seen some sites using %C3%B8 to replace ø in folder names.
    And in Google, this shows like a ø in the result page.

    Would an optimal URL be:
    http://www.thebeerpage.no/%C3%B8l
    ?

    Any advice?

  58. No doubt Google has done a good job in handling such queries but written English varies from region to region. I am not very clear how Google is able to analyze such queries.

  59. I am glad my language (spoken by only ~2M people) is among the 100+ supported. It was not always this way. Search used to be quite hard & many similar queries had to be entered manually since there are not only synonyms, even same words can have different suffixes and this was not supported in the past…

  60. “I’d say that Google does more with “semantics” and both document and query understanding than almost any other search engine.”

    Now I hate to be too presumptious here, but most of the articles written in my industry, pest control, are crap. Why? Because the guys that really understand professional pest control are not writting most of the content.

    For example, scorpions are totally misunderstood by most pest control techs and the public in general. I just read something online about “scorpion eggs being left in the walls.” Scorpions don’t lay eggs. Further, the content written for pest control companies by SEO firms continues to regurgitate the falisies that exsist. Better yet, the professors of local universities that only study lab rats have a gross misunderstanding of pests in the real world. A university professor quoted by the local newspaper said, “Scorpions do not infest homes.” …Fortunately the general public went to bat for us and the comments on the article vendicate the truth that “Scorpions do and will infest homes.” I know… I am only harping on scorpions here, but the same holds true for other insects, … carpenter ants included. =)

    I’m sorry but despite your best efforts to make your home unattractive to carpenter ants, your structure is still made of attractive wood and you are still a good looking guy.

    I still like the idea that google is using semantics, it just puts a little damper on my parade as Bulwark is a forward thinking and innovative company that doesn’t agree with the masses.

  61. Sorry about the typos… just got my computer back and have not re-installed my window’s browser spell check…. Does come come with a spell check included? Maybe I should switch to chrome. =)

    Revised without errors-

    For example, scorpions are totally misunderstood by most pest control techs and the public in general. I just read something online about “scorpion eggs being left in the walls.” Scorpions don’t lay eggs. Further, the content written for pest control companies by SEO firms continues to regurgitate the falsies that exist. Better yet, the professors of local universities that only study lab rats have a gross misunderstanding of pests in the real world. A university professor quoted by the local newspaper said, “Scorpions do not infest homes.” …Fortunately the general public went to bat for us and the comments on the article vindicate the truth that “Scorpions do and will infest homes.” I know… I am only harping on scorpions here, but the same holds true for other insects, … carpenter ants included. =)

    I’m sorry but despite your best efforts to make your home unattractive to carpenter ants, your structure is still made of attractive wood and you are still a good looking guy.

    I still like the idea that google is using semantics, it just puts a little damper on my parade as Bulwark is a forward thinking and innovative company that doesn’t agree with the masses.

  62. …chrome… Installing spell check now.

    Just living up to the exterminator image. =)

  63. A good read thanks Matt. More posts like this would go a long way in helping people understand the things they should or should not be doing.

  64. I am very interested in this discussion. I feel like I can relate to exactly the way Google thinks (or the way I think it thinks), and then I find so many different opinions as to how search engines really work. It shouldn’t be that hard to figure out. I would like to continue to think that synonyms work well with searches and it’s all common sense. I’ve got my fingers crossed.

  65. what about the different english versions, how are they handled? for example english uk, us, australia, south africa etc, thanks

  66. How does Google differentiate between the word “country” as in a bordered state vs. “country” as in country or western style?

css.php