Moustafa Hammad and Mohamed Elhawary, a couple engineers in our search quality group, just did a nice post about improving Arabic language searches:
Our algorithm employs rules of Arabic spelling and grammar along with signals from historical search data to decide when to leave out spaces between words or when to remove unnecessarily repeated letters. Now, when you type a query leaving out spaces or repeating a letter, we’ll return better results based not only on what you typed, but also on what our algorithm understands is the “correct” query.
There’s a few nice things about this post besides the direct improvement on Arabic language searches. For one, this post joins other recent posts that pull back a little bit of the curtain on the 400+ ranking changes that we make every year. I hope that we keep doing these posts.
Another nice thing is that the post talks about the impact of the improvement (10% of Arabic language queries are affected by this change). For the recent blog post about how Google uses synonyms in ranking, Steven Baker mentioned that “synonyms affect 70 percent of user searches across the more than 100 languages Google supports.” I like giving a rough idea of a change’s impact. The vast majority of Google’s 400+ annual ranking changes affect a much smaller percentage of queries, so don’t get the wrong idea that every improvement to our ranking algorithms affects a large percentage of searches. One last nice thing is that this change again shows the value of historical search data to improve search quality. I know that few users care about that, but it’s good to point out.
Anyway, I like to point out when Google blogs about these internal changes to our scoring algorithms, because people always want to know more about how Google works.