Improving Arabic searches and talking more about ranking

Moustafa Hammad and Mohamed Elhawary, a couple engineers in our search quality group, just did a nice post about improving Arabic language searches:

Our algorithm employs rules of Arabic spelling and grammar along with signals from historical search data to decide when to leave out spaces between words or when to remove unnecessarily repeated letters. Now, when you type a query leaving out spaces or repeating a letter, we’ll return better results based not only on what you typed, but also on what our algorithm understands is the “correct” query.

There’s a few nice things about this post besides the direct improvement on Arabic language searches. For one, this post joins other recent posts that pull back a little bit of the curtain on the 400+ ranking changes that we make every year. I hope that we keep doing these posts.

Another nice thing is that the post talks about the impact of the improvement (10% of Arabic language queries are affected by this change). For the recent blog post about how Google uses synonyms in ranking, Steven Baker mentioned that “synonyms affect 70 percent of user searches across the more than 100 languages Google supports.” I like giving a rough idea of a change’s impact. The vast majority of Google’s 400+ annual ranking changes affect a much smaller percentage of queries, so don’t get the wrong idea that every improvement to our ranking algorithms affects a large percentage of searches. One last nice thing is that this change again shows the value of historical search data to improve search quality. I know that few users care about that, but it’s good to point out.

Anyway, I like to point out when Google blogs about these internal changes to our scoring algorithms, because people always want to know more about how Google works.

35 Responses to Improving Arabic searches and talking more about ranking (Leave a comment)

  1. Hey Matt,

    Just a question I also posted on Twitter. Why do you use utm_source=feedburner in your campaign tracking for twitter and not something like utm_source=socialmedia then utm_medium=twitter?

    Just interested in how you arrange your sources in GA.

  2. As always always good to learn how Google works. Thanks for the open info.
    Matt do you guys really have a timeout room with aquariums? Seen, pictures couldn’t believe it. I want to work for Google,lol

  3. Albert Tafoya, I think that’s in Zurich with the aquarium room. Jesper Åström, right now I just used the default Socialize feature in FeedBurner and took the default settings.

  4. Thnx. And a question related to your post, which I should probably ask the people behind these changes but:

    The Arabic language is quite interesting when you don’t use the apostrophes to indicate the vowels. The meaning of one sentence can be radically different if you just flip the position of two words. Thus the searched query cannot be interpreted by its components but by its context as well as its components sequential order.

    Even though we don’t need it for the US searches the same way, I still think there is a lot to learn from the precision needed to return relevant Arabic search results. Now to my question 🙂 Do you work equally on analyzing/understanding search queries as you work on trying to index and relate content? Especially with regards to sequential searches made within a certain time frame? (one stupid and one leading question there.. haha… hoping for a yes and an “interesting, tell me more”)

    Thnx for the update!

  5. Good to see that Google works to develop other languages seo! I am an Egyptian, Arab guy! but I don’t blog in Arabic, I went for the most connective language for blogging which is English! I was actually annoyed by how search results returns crap from Arabic discussion forums at the top of first pages!!

  6. For one, this post joins other recent posts that pull back a little bit of the curtain on the 400+ ranking changes that we make every year.Accept the Google “curtain” only covers a brick wall. “Tear down the Wall” 🙂

  7. For one, this post joins other recent posts that pull back a little bit of the curtain on the 400+ ranking changes that we make every year.

    Accept the Google “curtain” only covers a brick wall. “Tear down the Wall” 🙂

  8. “One last nice thing is that this change again shows the value of historical search data to improve search quality. I know that few users care about that, but it’s good to point out.”

    I think it’s a valuable point, looking at historical data is far more valuable than simply applying a set of rules and hoping it works correctly. To look at what you have done and keep improving upon that rather than starting fresh is a good base for anything (in my humble opinion)! 🙂

  9. Harith


    BTW. Do you want a big laugh?

    Next time you meet Moustafa Hammad and/or Mohamed Elhawary you may wish to ask them about the arabic translation of “Matt” 🙂

  10. Matt,
    Great info, I work on Multi-lingual SEO, and I feel that similar changes in the Spanish algorithm will provide better and more relevant results. Do you have any idea if there’s documentation only for Spanish? Right now Google’s Spanish blogs are just translations of the English blogs, certainly the language is very different.

    Final question, how’s your social media cleanse experiment going?

  11. Matt, you mention that Google supports more than 100 languages. There are currently about 6000 languages still in use in the world. I don’t know how many of them are written or how many have content on the Web, but has Google looked into the upper limit of the number of languages it would want to index?

    What if a language used by fewer than 100,000 people still has content on the Web? What would Google’s position on that be?

  12. Matt, how does Google attempt to avoid the inevitable “Lost in translation” problem? With some cultures within cultures, it must be impossible………..or a challenge 🙂

  13. Oops, should have said “With so many cultures within cultures”

  14. That’s great news! Thanks Matt

  15. Very interesting. Bit by bit the world gets smaller.

  16. While we would like to take on additional languages, we simply can’t take the chance that this is something else that Google will penalize us for with the potential of duplicate content. Everytime you mention ranking changes, it really gives me a heart attack. Our site has been struggling to even get in the top 200 for even generic terms like wedding, weddings, wedding planning and we have people’s personal blogs showing up before us. A site with thousands of pages of original content, a national TV show, national magazine, frequently updated content, active community, hundreds of thousands of visitors every month, and yet Google still totally ignores our site on general terms. Our site is ALL about weddings and we’re not even in the top 200? 200? Really? We are chasing synonyms, long tail and everything else. It’s just silly. We have no idea what’s wrong. And while I’m glad that Google is showing better support for other languages, there’s no way we can afford to do anything else with our site that may possibly be deemed as unacceptable by Google’s algorithm. Especially in the form of duplicate content. Thanks, but no thanks.

  17. Kevin


    What is going on with Caffeine? Is Google and/or you ever going to comment publicly?

  18. The question about “lost in translation” has me thinking. If we apply the same tack to other languages with understanding meaning through improving the understanding of the synonyms in that language, and then relating those inferences in the source language to the final translation, we may get better computer translations. However, what happens when you decline a noun? With three variations in German, and five in Russian, it seems hard to capture the full meaning in English, where we do not use genders or other factors with a noun. Also, I think that we would loose some beauty. The German “Nachtmusik” would no longer be “night music”; it would just be a “serenade”. Thank you for the post. The other posts were quite informative.

  19. Thanks for the info. Using google search in other languages always produces interesting results.

  20. I am thankful that Google took on search in languages other than language early on. I never used search in Arabic, but I use search in Hebrew every day. Using Google is just as natural in Hebrew.

  21. Actually, it was easily detectable if you are running any Arabic sites where many low value forums lost most of their spammy threads in favor of well edited human posts, articles and review sites. Well done since most languages search INTENTION have high variables of the psychometric needs of a searchers 🙂 anything to be rolled for Chinese soon when mixing English and Chinese phrases using only english or only chinese characters?

  22. And a question related to your post, is that there are some scripts to auto detect arabic language?

  23. marekk


  24. Thanks for sharing this interesting info. Arabic language sounds challenging enough – I guess a bit harder challenge would be to search for information in Indian languages.

  25. marekk

    Got Ya ! You always trash my commets and I’m not suprised. But this time use youtube to check what “child please” means ! This is my message to you and Google corporation, for all that you did to me – child please !

  26. I’ve studied the page rank algorithm at university and consider it to be one of google search’s best feature.

    It is Google’s extraordinary searching algorithms that surely makes it the best search engine. Now these fantastic algorithms playing in other languages like arabic is pushing the bar way up.

  27. Karim Mertily

    Matt, what is the nationalities of those engineers “Moustafa Hammad” and “Mohamed Elhawary” ?

  28. Google has got the best algorithms and I am amazed how could Sergey and Larry come up with this system. I Love google and I am buying more and more google stock, since I believe that google is the King of Net.

  29. Is there a link where I can have a look at the list of page ranking factors, that would help us in SEO to improve our page ranks. For sure, we knew, we do follow only a few 🙂

  30. Hi Matt! I am a Hungarian online marketing specialist. I was wondering how you can adapt changes in algorithm to a unique language such as the Hungarian. Should we really follow the English search path? Can search evaluate Hungarian grammar differences?

  31. This is an interesting post. I have been doing SEO for over 7 years, and I am still trying to figure out how to successfully take on International SEO jobs.

  32. yeah I was wondering too if there are some scripts to auto detect arabic language?

  33. Dad

    Pls check this site
    it can handle the inflection nature of Arabic

  34. @anthony flores………

    I’m currently working on project in italy, optimizing, From what I’ve seen and been told, optimizing is easier now than it was a few years ago

    Arabic seems tougher I guess because of the font symbol set, but maybe I’m wrong there, maybe its easy for google.