An easy way to add new features to Google

Have you ever wanted to add a new feature to Google’s search results? There’s a really nice way to do it right now. If you’re not familiar with this functionality, it’s called a Subscribed Link, and it lets you “create custom search results that users can add to their Google search pages. You can display links to your services for your customers, provide news and status information updated in near-real-time, answer questions, calculate useful quantities, and more.” That page has a whole list of different ways to add new features to Google’s search results:

* Create search results specific to your product, service, or expertise.
* Design a basic version in minutes to see how it works.
* Build a dynamic version using XML, TSV, or RSS files or feeds.
* Include images in your Subscribed Links.
* Include Google Gadgets in your Subscribed Links.
* Test your Subscribed Links interactively and get debugging messages.
* Define query patterns using lists of keywords or regular expressions.
* Invoke the calculator to help construct your results.

I like that Google provides an open system to add functionality to our search results. If this sounds interesting to you, check out this blog post by Google OS (an unofficial blog), read through the subscribed links developer guide, or check out the Subscribed link FAQ.

Let’s walk through an example. I often need to know what my IP address is. Usually I go to Google, search for [ip address], and click on one of the top results. That works okay, but I discovered that there’s an even easier way. Go to this page and click on the “Subscribe” button.

Now when you go to Google and type a query like [my ip], you’ll see the answer right in your search results, like this:

Find my ip address

I painted out my actual IP address, but you get the idea. Now if only aruljohn.com would add the query [ip address] to the list of queries that triggers a subscribed link, that will let me be lazy and continue doing the query that I’m used to. :)

If you’d like to add some new functionality to Google, why not try it for yourself today? I made a simple subscribed link that looks like this:

Example subscribed link

in about a minute. It looks like you can make a subscribed link out of feeds very quickly. It looks like you can even add your own flexible gadget to Google’s search results, and it looks like this:

Example gadget in search results

By the way, I originally wrote this post a little while ago focusing on how to find out your IP address with a specific subscribed link. After Yahoo announced their “SearchMonkey” project tonight (congrats to the Yahoo folks!), I figured I’d add in some details about Google’s Subscribed Links and how to make a rich snippet result using Subscribed Links.

What should NOINDEX do?

Okay, this post will be colossally boring to some people. But I wanted to give you a peek at debates behind the curtain in Google’s search quality group. Here’s a policy discussion about NOINDEX and how Google should treat the NOINDEX meta tag. First, you’ll want to read this post about how Google handles the NOINDEX meta tag. You may also want to watch this video about how to remove your content from Google or prevent it from being indexed in the first place. Here’s the conclusion from my earlier blog post:

So based on a sample size of one page, it looks like search engines handle the “NOINDEX” meta tag:
- Google doesn’t show the page in any way
- Ask doesn’t show the page in any way
- MSN shows a url reference and Cached link, but no snippet. Clicking the cached link doesn’t return anything.
- Yahoo! shows a url reference and Cached link, but no snippet. Clicking on the cached link returns the cached page.

The question is whether Google should completely drop a NOINDEX’ed page from our search results vs. show a reference to the page, or something in between? Let me lay out the arguments for each:

Completely drop a NOINDEX’ed page

This is the behavior that we’ve done for the last several years, and webmasters are used to it. The NOINDEX meta tag gives a good way — in fact, one of the only ways — to completely remove all traces of a site from Google (another way is our url removal tool). That’s incredibly useful for webmasters. The only corner case is that if Google sees a link to a page A but doesn’t actually crawl the page, we won’t know that page A has a NOINDEX tag and we might show the page as an uncrawled url. There’s an interesting remedy for that: currently, Google allows a NOINDEX directive in robots.txt and it will completely remove all matching site urls from Google. (That behavior could change based on this policy discussion, of course, which is why we haven’t talked about it much.)

Webmasters sometimes shoot themselves in the foot by using NOINDEX, but if a site’s traffic from Google is very low, the webmaster will be motivated to diagnose the issue themselves. Plus we could add a NOINDEX check into the webmaster console to help webmasters self-diagnose if they’ve removed their own site with NOINDEX. The NOINDEX meta tag serves a useful role that’s different than robots.txt, and the tag is far enough off the beaten path that few people use the NOINDEX tag by mistake.

Show a link/reference to NOINDEX’ed pages

Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue). If a webmaster really wants to be out of Google without even a single trace, they can use Google’s url removal tool. The numbers are small, but we definitely see some sites accidentally remove themselves from Google. For example, if a webmaster adds a NOINDEX meta tag to finish a site and then forgets to remove the tag, the site will stay out of Google until the webmaster realizes what the problem is. In addition, we recently saw a spate of high-profile Korean sites not returned in Google because they all have a NOINDEX meta tag. If high-profile sites like

- http://www.police.go.kr/main/index.do (the National Police Agency of Korea)
- http://www.nmc.go.kr/ (the National Medical Center of Korea)
- http://www.yonsei.ac.kr/ (Yonsei University)

aren’t showing up in Google because of the NOINDEX meta tag, that’s bad for users (and thus for Google).

Some middle ground in between

The vast majority of webmasters who use NOINDEX do so deliberately and use the meta tag correctly (e.g. for parked domains that they don’t want to show up in Google). Users are most discouraged when they search for a well-known site and can’t find it. What if Google treated NOINDEX differently if the site was well-known? For example, if the site was in the Open Directory, then show a reference to the page even if the site used the NOINDEX meta tag. Otherwise, don’t show the site at all. The majority of webmasters could remove their site from Google, but Google would still return higher-profile sites when users searched for them.

What do you think?

That’s the internal discussion that we’ve been having about NOINDEX meta tags. Now I’m curious what you think. Here’s a poll:

{democracy:6}

I’d also be interested in (constructive) suggestions in the comments about how Google should treat the NOINDEX meta tag. Try to step into both a regular user’s shoes as well as the position of a site owner before leaving a comment.

Blogger Play

http://play.blogger.com/ is really addictive. It’s a slideshow of pictures that are currently uploading to blogger. I remember the first time I realized that the nightly TV news would never play re-runs; if you missed the show that night, you wouldn’t see it again. This new feature has the similar feel: there’s a river of pictures flowing up to Blogger, and if you aren’t watching, you might miss gems.

Googlified points out that if you use blogspot.com and don’t want to participate, it’s easy to opt out, and also that Flickr has something similar. The Flickr slideshow has a lot to recommend it, including a row of thumbnail previews at the bottom and the ability to choose tags to view. On the fastest setting of each, Play shows photos 3-4x faster than Flickr’s slideshow, but they’re both cool.

The only downside I’ve noticed is that (at least on my XP machine using Firefox) Play seems to eat up memory and never free it, so don’t be surprised if Firefox crashes (it could be one of my browser extensions, of course). Internet Explorer seems fine.

Fun stuff. When lots of people are uploading pictures or willing to label items in a photo, you can do some pretty amazing things. Check out two SIGGRAPH papers from 2007: one uses Flickr to remove or change parts of a picture (see below), while the other lets you insert new photo objects into an image.

Original picture Doctored picture

If you’re not familiar with SIGGRAPH, check out some of the video highlights from the 2007 program. I especially enjoyed the image resizing demo at around 1 minute, 43 seconds into the video. I wish YouTube let you create bookmarks at a specific point in a video like you can with Google Video, but the whole video is fun to watch.

My speaking schedule for early 2008

SES London was really fun last year, but in 2008 I’m mostly staying closer to home. If you’re attending SES London this week, please say hello to Adam Lasnik and Luisella Mazza. It also looks like Maile Ohye, Shuman Ghosemajumder, and several other Googlers will be speaking as well.

Here’s the places I plan on speaking in the first half of 2008:

February 26-28, 2008: Next week I will be doing a couple sessions for SMX West, including a Linking Q&A panel. What sounds even more fun is the trivia-quiz “Search Bowl” that I’m doing with fellow Googler Paul Haahr. It sounds like a fun, casual event to answer search trivia questions. Other search engines will field teams, plus an SEO team will play as well.

April 18-21, 2008: I’m speaking at the Domain Roundtable conference in San Francisco. I couldn’t pass up a chance to attend a conference about domain names when it’s so close to home. :)

April 22-25, 2008: I’ll speak or be on a panel in the Web 2.0 Expo, also in San Francisco. Maybe I’ll just stay in San Francisco that week and work from Google’s San Francisco’s office.

June 3-4, 2008: This is more tentative, but I’m hoping to make it to SMX Advanced in Seattle. Considering that until last year I’d never visited Seattle, I’ve enjoyed every time that I’ve gotten a chance to visit the city.

You’ll note that most of these conferences are nearby. The wonderful thing about ramping up new conference speakers in webspam is that if I want to ramp down on travel, newer speakers can more than handle the task. I’m really proud of the speakers that we send from Google.

How 404 pages work in Google Toolbar Beta 5

I thought I’d play hooky from a meeting and talk about how the newest version of the Toolbar handles 404 pages for users, because I see some people writing about it this morning.

We tried to give a heads-up in a couple places. The Toolbar beta 5 announcement on the Google blog mentioned “You’ll get suggestions instead of error pages: If you mistype a URL or a page is down, now the Toolbar will give you that familiar “Did you mean” with alternatives, like when you do a Google search.” And the John Mueller did an excellent run-down for webmasters when he talked about the Google toolbar beta on Google’s official webmaster blog. Here’s the part of John’s post that probably interests you:

404 errors with default error pages
When a visitor tries to reach your content with an invalid URL and your server returns a short, default error message (less than 512 bytes), the Toolbar will suggest an alternate URL to the visitor. If this is a general problem in your website, you will see these URLs also listed in the crawl errors section of your Webmaster Tools account.

If you choose to set up a custom error page, make sure it returns result code 404. The content of the 404 page can help your visitors to understand that they tried to reach a missing page and provides suggestions regarding how to find the content they were looking for. When a site displays a custom error page the Toolbar will no longer provide suggestions for that site. You can check the behavior of the Toolbar by visiting an invalid URL on your site with the Google Toolbar installed.

So if you’re a webmaster and want users to see your custom 404 page, just make your page be more than 512 bytes long. I do think that this feature is really handy for most users. Let me give some screenshots to demonstrate what it looks like.

I installed the Toolbar Beta 5 for Internet Explorer and surfed to a 404 page on mattcutts.com, and I see this:

My 404 page is more than 512 bytes

My 404 page, while not that useful, is more than 512 bytes long, so the toolbar doesn’t change the page.

I had to look around a little bit to find a default 404 page. My former grad school has one, so surfing to a 404 page like http://www.cs.unc.edu/~sadasdf normally looks like this (in Firefox):

A default 404 page

With the toolbar installed, I get this page:

The toolbar version of the 404 page

There’s a few things I would point out:

- The first several links all provide ways to navigate or search unc.edu. I’m offered the option to go to www.unc.edu, or www.cs.unc.edu, or to search on www.cs.unc.edu for some words.
- Note that the toolbar took my nonsense phrase “sadasdf” and segmented that phrase into a more useful phrase “sad asdf” to search for. For “mattcutts” it suggested “matt cutts” and for “mygoodpage” it suggested “my good page”. That’s really helpful for a non-savvy user because it offers a search which may uncover the information that the user is looking for.
- There is a “Why am I seeing this page?” link.

If you click on the “Why am I seeing this page?” link, you get a page with more info, including how to turn the feature off:

Instructions to disable the 404 page

I counted and it was three mouse clicks (click on a picture of a wrench, click to uncheck a box, click to save) to turn off the feature. Try to load a non-existent page, and I’m back to the standard 404 page that IE gives:

Instructions to disable the 404 page

So my short summary is:
- If you’re a user and you don’t want help with 404 pages, it’s very easy to turn off just this feature (or don’t install the Google toolbar).
- If you’re a webmaster, customized 404 pages should continue work fine. If you want to be sure that users see your 404 page, make it 512 bytes or longer.

Bonus tip: Most of the people that read my blog use Firefox instead of Internet Explorer. If you want some similar functionality on Firefox, I like to use the ErrorZilla extension. It’s a handy little plug-in that gives you error pages like this:

Example ErrorZilla page

I find the ErrorZilla plug-in really useful, even as a power user.

css.php