Archive for Google/SEO

Ben Gomes on Google’s user interface

This summer several people in Google’s quality group have pulled back the curtain on how people think about search quality at Google. We’ve had Udi Manber give an overview of search quality and the groups that work on it. Then my office-mate Amit Singhal discussed some of our principles of core ranking. Amit followed that with a post about how we understand pages, queries, and users that revealed that Google does much more sophisticated semantic processing than just keyword matching.

Today, Ben Gomes steps out from behind the curtain to discuss Google’s user interface for our search results. Ben is another office-mate, he’s been at Google longer than I have, and I think he’s got quite a knack for blogging:

A common reaction from friends when I say that I now work on Google’s search user interface is “What do you do? It never changes.” Then they look at me suspiciously and tell me not to mess with a good thing.

Ben goes on to reveal a bit of the philosophy behind Google’s search interface, which might seem counter-intuitive at first glance. For example, a big goal of our search results is to get you off of them and to your destination quickly. That’s one reason why we usually put query refinements (which are a somewhat distracting feature) toward the bottom of the search page. If you get to the bottom of the search results and still haven’t found what you’re looking for, then you’re more likely to actually want those refinements to modify your search.

Every feature on Google’s search page has to defend its pixels in terms of usability, and Google tests a ton of changes that most people never notice. For example, Ben points out that we know “Arod” and “Alex Rodriguez” can be the same thing. Instead of hitting people over the head with that, we just subtly highlight the words “Alex Rodriguez” if you search for [arod]:

A-Rod

Get the skinny over on Ben’s post to read more about how Google thinks about our search interface. Ben, thanks for writing this--I’m glad that several search quality folks are working to be more open about how Google works and how Google thinks about search quality.

Comments (24)

New Toolbar PageRanks coming

Hey folks, I wanted to let you know that new toolbar PageRank values should become visible over the next few days. I’m expecting that also in the next few days that we’ll be expiring some older penalties on websites.

Comments (281)

Get your search fix with two videos

I was going to wait until part 2 was posted, but I’ll point people to part 1 now. The video from the SMX Advanced keynote is now live, so you can watch the first 25 minutes of questions and answers. Read the intro here, or just watch the video:

And Juliane Stiller from Google’s German Webmaster blog stopped by the Googleplex for a more fun interview. Read the intro in English or German or just watch the video below:

Thanks for setting this up, Juliane! Note to self: wear a different shirt for my next SEO video interview. I happened to wear the same polo shirt for both interviews. :)

Comments (28)

Generic Toolbar Indexing Debunk Post

Sometimes people think that the Google Toolbar led to Google indexing a page. Here’s a recent such story, for example, which speculates how urls with the substring “mms2legacy” got indexed. Here’s where I started to disagree:

The reason for this [supposedly unlisted urls getting crawled --Matt], explained Ken Simpson, CEO of anti-spam company MailChannels, is that one’s Google Toolbar may be configured to pass URLs that one visits to Google for indexing. “If you run Google Toolbar, it knows pages you visit,” he said.

Sorry, but if Ken Simpson is implying that the Google Toolbar led to these urls being crawled, then he’s mistaken. Let’s take the first result from the [inurl:mms2legacy] query given in the article. The first url in that result set that I saw was http://mediamessaging.o2.co.uk/mms2legacy/showMessage2.do?encMmsId=F1ABCF6D326A3F65 . Well, if you take the string F1ABCF6D326A3F65 from that url and search for that then you’ll find multiple references to that url. In the cases I looked into, we found these pages via someone publishing a link on http://my.opera.com or other places around the web. I can definitively say that all the urls I looked into were discovered via crawling regular old links.

Folks with great memories may remember that I’ve talked about this before. Back in 2006, both Philipp Lenssen and Google OS did controlled experiments by visiting unlinked deep pages with the toolbar, and both concluded that the toolbar did not lead to those urls being indexed.

It’s good to reiterate this every couple years though, especially as Google has gotten better at finding new pages as it crawls. We get questions like this often enough that we have an FAQ answer about it:

Why is Googlebot downloading information from our “secret” web server?

It’s almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your “secret” server to another web server, your “secret” URL may appear in the referrer tag and can be stored and published by the other web server in its referrer log. So, if there’s a link to your “secret” web server or page on the web anywhere, it’s likely that Googlebot and other web crawlers will find it.

Security through obscurity is not a great way to keep a url from being crawled. If you don’t want your content in Google’s web index then we provide a ton of advice on how to prevent that content from getting into Google.

Comments (36)

Generic Malware Debunking Post

Yup, I’m about to do another blog post where someone says that a website is clean but it doesn’t look like it to us. I did a very similar post in January 2007, and in that post I said

I’ve checked out a quite a few “we don’t have any malware” reports at this point, and I’ve yet to see a false positive — the sites in question have each had some malware on them.

Would you believe that a year and a half later, that’s still true for me? It may be possible that our malware flagging system has false positives, but I can’t recall a single case that I’ve seen where there wasn’t some security hole or malware that was a true issue for the website owner. If you want to know why, read Google’s white paper about how we detect such stuff -- it’s called The Ghost In The Browser Analysis of Web-based Malware and it was written by Niels Provos and several other Googlers.

In fact, just last week I handled a very similar case where Google proactively reached out to a website that had a scripting flaw security. The deja vu from my January 2007 post plus the situation last week made me want to write a generic malware debunking post. :) Are you ready? Here we go:

$ACCUSER = Brett Glass
$FORUM = Dave Farber’s Interesting People mailing list, specifically this email.
$LONG_ACCUSATION = (I’m going to quote Brett’s whole email here, just for context)

Everyone:

Google has been a strong supporter of the agenda of Free Press, an
inside-the-Beltway lobbying group which has spent hundreds of
thousands of dollars lobbying for regulation of the Internet under
regime known as “network neutrality.” While some of the tenets
included in this agenda are not reasonable, one of those that IS
reasonable is the notion that large corporations such as Comcast
should not block content with which they disagree.

However, Google -- itself a large corporation -- appears to be
blocking a site which expresses opinions with which it does not
agree on this very issue. When one does a search for the terms
“neutrality” and “site:pff.org” (the link

http://www.google.com/search?hl=en&q=neutrality+site%3Apff.org&btnG=Google+Search

will perform this search for you), many of the pages and documents
on the site -- in particular, white papers expressing views with
which Google disagrees -- are tagged with a warning that “This site
may harm your computer.” One cannot click through to the documents
and pages in Google’s search results without cutting the URL from
the page and manually pasting it into one’s browser.

The Web site, operated by a group known as the “Progress and
Freedom Foundation,” does not appear to contain any malware. When
one queries Google as to why the site was blacklisted, it claims
that “Part of this site was listed for suspicious activity 1
time(s) over the past 90 days.” Yet, we could find no malware or
other exploits in the blacklisted PDF files, some of which contain
very well presented and cogent arguments against the agenda which
Google has been actively supporting.

Could it be that Google (whose motto is, reportedly, “Don’t be
evil,”) saying, “Do as I say, not as I do?”

--Brett Glass

P.S. -- What’s especially interesting is that if one queries Google
using just the term, “site:pff.org” (you can use the link

http://www.google.com/search?hl=en&q=site%3Apff.org&btnG=Search

to do this query), one can see that the majority of the supposedly
dangerous site is not blocked. But most or all of the documents
expressing viewpoints on “network neutrality” are.

$SHORT_ACCUSATION = “Google blocked a site with opinions that it disagrees with. Worse, the query [site:pff.org] seems to show that only urls under pff.org/issues-pubs/ are labeled as potentially harmful, and that is the directory where many of the documents that disagree with Google are.”

Given what we have so far, my generic debunking would begin like “Dear $ACCUSER, I saw on $FORUM where you mentioned that Google is flagging a website as malware. You said that $SHORT_ACCUSATION. I wanted to give you a little more background and context to let you know that Google did see an actual malware attack via a real security hole. The other thing you need to know is that Google flagged the site because of the security hole, not because Google agrees or disagrees with any particular content on the site.”

Then I’d give a little background history on all the different ways that Google helps users and webmasters avoid malware. Most of the background would come from this overview post. Since that post was published in mid-2007, Google has done even more to protect users:

- Niels Provos and his colleagues published another technical report with more details about the malware detection framework and what it discovered (more info here).

- Google launched a Safe Browsing API so that third party applications can benefit from Google’s list of malware and phishing urls. If you appreciate that Firefox 3 has better security, one of the reasons is that Firefox 3 utilizes the Safe Browsing API.

- More recently, the anti-malware folks at Google launched a Safe Browsing Diagnostic page where you can enter a url and get a ton of really useful information.

The last one is especially impressive. For example, check out the Safe Browsing Diagnostic page for pff.org:

Safe browsing page for pff.org

That page gives a ton of helpful info to site owners and anyone else who is interested in why a particular site or url was flagged as potentially harmful.

All that would go quite far to reply to people that had questions about their site being flagged for malware. But this post is getting quite long, so let’s get back to this specific report in this case. The original person who reported this situation had already noticed that not all of pff.org was flagged. If you do a site: query on Google, you only see warnings for pff.org/issues-pubs/ .

If you visit pff.org/issues-pubs/, you’ll see that it’s a web form. It looks like pff.org stored their data in a SQL database but didn’t correctly sanitize/escape input from users, which led to a SQL injection attack where regular users got exposed to malicious code. As a result, normal users appear to have loaded urls like hxxp://www.ausbnr .com/ngg.js and hxxp://www.westpacsecuresite .com/b.js <--- Don’t go to urls like this unless you are 1) a security researcher or 2) want to infect your machine. Notice that even in this case, Google didn’t flag the entire pff.org site, just the one directory on the site that appeared to be dangerous for users.

I never like it when people accuse Google of flagging a site as malware just because we don’t like it for some reason. The bright side of this incident is that pff.org will find out about a security hole on their site that was hurting their users (it looks like pff.org has disabled the search on the vulnerable page in the last few hours, so it appears that they’re responding quickly to this issue). Flagging malware on the web doesn’t earn any money for Google, but it’s clearly a Good Thing for users and for the web. I’m glad we do it, even if it means that sometimes we have to write a generic malware post to debunk misconceptions.

Comments (43)

« Previous entries