Hacking Google: Retro Links Revives Old Google Feature

Google sometimes turns off features. One such feature that I remember fondly is that at the bottom of Google’s search results, we offered nine other search engine suggestions. The idea was if you didn’t find what you were searching for on Google, you could click on the other links and easily run the same search somewhere else. Luckily, due to an April Fool’s joke about the Mentalplex, you can still see what these links looked like:

Other search engines in 2000

Many of these search engines consolidated or changed focus over time. Plus I’m guessing that every search engine in the world wanted to be on the list, which must have been really annoying for whichever Google person had to maintain that list of links. I think the list of other search engines dwindled down and eventually Google just turned the feature off.

Recently I was describing this feature to Tiffany Lane, another engineer at Google, and she had a great idea. Why not recreate this search feature on Google with modern search engines and websites? Because of the pain of maintaining an “official” list, we probably couldn’t turn this on for every user (plus not every user wants a lot of extra links added to their search results). But why not provide a completely unofficial option that people could install?

Thus was born Retro Links, which is a Greasemonkey script to add new search options to Google’s search results page. When Retro Links is installed, it looks like this:

Retro Links

Unlike the original feature, Retro Links lets you select which search engines to show from 42 different websites and search engines, then saves those preferences. It’s also very easy to add a new search engine in the JavaScript file.

Installation

Get Retro Links here: http://www.mattcutts.com/retrolinks/retrolinks.user.js

To install Retro Links you will need to be using Firefox and have Greasemonkey installed. Once Greasemonkey is running then you can click on the link above and you will be prompted to install the script. To see that it is working you can do a Google search – the links will be inserted near the bottom of the Google search results page.

Configuration

Configuring Retro Links is really simple. Suppose that you want to change the default Amazon link to search on Yelp. Just click on the [+] link to the right of the search engines and select Yelp from the drop-down box:

Changing search engines in Retro Links

When you’re happy with your search engines, click the “Save” button to save your preferences.

Questions

What if the site I want is not one of the 42 options?
You can add more sites to the options by making a simple code change. To edit the code go to Tools -> Greasemonkey -> Manage User Scripts, select Retro Links and click the Edit button. Simply add the name and url of the new site to the RL_LINK_OPTIONS array, following the examples that are already there.

How do I turn on/off the update notification?
The script checks to see if a newer version is available once per day. If an update is available a red box will appear in the bottom right corner of the page with a link to download the latest version. If you want to stop checking for updates go to Tools -> Greasemonkey -> User Script Commands and select Retro Links -> Never check for updates. To start checking again select Retro Links -> Check for updates daily.

Disclaimer

Finally, Tiffany wanted to make sure that I included this quick disclaimer: “Retro Links is not an official Google project. I chose which links to include based on my personal preferences and web surfing habits. These decisions do not represent the opinions of my employer.” Tiffany, thanks for writing this great script!

Wordle for Nancy Grace TV Show

I recently completed a huge “Media and Journalism” project: I transcribed over 500 hours of the Nancy Grace TV show on CNN. It took a long time, but here is an exhaustive Wordle tag cloud of all the words used in those 500+ hours of television:

It's all about Caylee

It appears that all 500+ hours consist of those six words repeated over in different combinations.

I’m not serious and didn’t really watch or transcribe 500 hours of Nancy Grace. In reality, Nancy Grace uses more and different words. It only feels like every sentence is “little Caylee” or something similar.

How to Write a Chrome Extension in Three Easy Steps

I just installed a “hello world” Chrome extension from this Chrome Extension tutorial page. When you surf to www.google.com, the Google logo is replaced with a Lolcat:

Chrome Extension

Here’s how to write your own Google Chrome extension in three steps:

1. Install the developer-channel version of Google Chrome. I don’t know if this is 100% necessary, but new support for plugins will probably show up in the developer version first. You can read instructions on how to switch to the developer version. It takes maybe 3-4 minutes — you basically run a small program to indicate your preference. In case you’re worried that the developer version will crash a lot: I’ve been running the developer version for months and haven’t seen any major issues. The developer version also gets new features (such as pressing “F11″ to get full-screen mode) way before the beta/stable releases of Chrome. I’m using version 2.0.170.0 of Chrome and the “hello world” extension worked fine for me.

2. Read the initial documentation. This is a brand-new feature, but you can already start hacking. Extensions currently have very Greasemonkey-like functionality: you identify which web pages should be modified, plus JavaScript to be added to those pages. By default, the extension’s JavaScript runs after the page loads, but you can specify that the extension’s JavaScript should run before the page loads. Right now, you can only load one JS file, but that could change in the future. You also can’t currently load Cascading Style Sheets (CSS), but that might also change.

I like several things about the extension framework:
- Your plugin has to have a unique identifier (40 digit hexadecimal number). Given an identifier such as “00123456789ABCDEF0123456789ABCDEF0123456″, an extension can include an image such as foo.gif and then easily access that image by using a full path such as “chrome-extension://00123456789ABCDEF0123456789ABCDEF0123456/foo.gif”
- The “content script” (the JavaScript of an extension) gets its own global scope separate from the web page, so you don’t need to worry about global variables conflicting. But you can still get access to the web page’s global variables using the “contentWindow” variable.
- Bundling your extension directory into a “.crx” Chrome Extension file is as simple as running a short Python script.
- Chrome also supports binary NPAPI (Netscape Plugin Application Programming Interface) plugins.

The Chrome extension manifest, which has metadata about your extension such as name, version, etc., looks much simpler to me than how Firefox wants extensions to be packaged. That’s a big plus in my book, because you spend most of your time writing code and not worrying about packaging up your plug-in. On the down side, I didn’t see any support for internationalization, which is one of the benefits of Firefox’s more comprehensive way of packaging up plugins. Another limitation of the current Chrome extension spec is that you can’t do much other than modify pages via JavaScript. And I didn’t see a way to introduce new widgets into the actual “chrome” of the Chrome browser.

3. Try it out! If you’re running the developer version of Chrome, you can install the “hello world” plugin from the extension howto page just by clicking to download the .crx file. Then type “chrome-ui://extensions/” and you’ll see something like this:

Chrome UI extensions

Once you see how it works, just start hacking around and see what happens. Remember, this howto document is only a few days old. I’m sure the Chrome team is thinking about ways to add more functionality to extensions, but the current developer version of Chrome already lets you do a lot of neat things.

One more nice thing: it looks like installing extensions doesn’t require you to restart the browser. :) And a hat-tip to Google OS for pointing out this document.

Gone to PubCon and SXSW + Lots of Videos!

Expect light blogging for a week or so because I’m traveling. I posted my 2009 travel schedule, but I’m doing a keynote at PubCon in Austin and then I’ll stick around for South by Southwest. It’s my first time at SXSW, so if you see me, say howdy!

For the PubCon keynote, we’re going to try something different. I’ll talk for 20-30 minutes, but we’ll also do a question and answer session where we take questions from the audience, from Twitter, and from this Google Moderator page.

If you can’t attend PubCon, we’ll still feed your search-info addiction with some videos. Peter Linsley just posted his recreated Google Image Search presentation from SMX West. I took questions recently and so you can watch three different videos that I did. For example, here’s a video about nofollow:

We’ll be releasing one new video each weekday for a while, so keep your eyes on the new Google webmaster videos channel on YouTube.

Clickable transcript of my Canonical Link Element talk

Recently I’ve been playing with linking to specific parts of a video and incorporating YouTube subtitles. Then I realized that you could do a neat trick. YouTube allows you to create closed captioning with a simple text file that looks like this:

00:00:07.000
Hi everybody. Welcome back to another video. We’re doing this thing where when we speak at a conference

00:00:12.180
and we talk about something substantial, not just questions and answers, we talk through our presentation later

and it only takes a little bit of Unix command-line magic to turn that into a file like this:

<a href=”http://www.youtube.com/watch?v=Cm9onOGTgeM#t=00m07s”>Hi everybody. Welcome back to another video. We’re doing this thing where when we speak at a conference</a>
<a href=”http://www.youtube.com/watch?v=Cm9onOGTgeM#t=00m12s”>and we talk about something substantial, not just questions and answers, we talk through our presentation later</a>

If you run that over your entire caption file — boom — you have a clickable transcript of your video. For the text below, click on any phrase you’re interested to and you’ll be whisked away to YouTube in approximately the right place to hear me say that phrase.

Hi everybody. Welcome back to another video. We’re doing this thing where when we speak at a conference and we talk about something substantial, not just questions and answers, we talk through our presentation later and put it up so people can follow along, watch the slides, and hopefully learn a little bit. So today I wanted to talk about the canonical link element. And that’s something that Google, Yahoo!, and Microsoft all announced that they will support in the future at SMX West. So, the date that we had this announcement was February 12, 2009, and the funny thing about it is that Charles Darwin was born exactly 200 years ago that day.

So I started out with a slide where I made a corny joke and I said, whether you think the web was intelligently designed by Tim Berners-Lee, or whether you think the web needs to evolve, either way this is an open standard which helps people improve the web. And so we sort of said, what is a big problem that faces people today, webmasters, SEOs, site owners on the web? And it’s pretty clear that duplicate content is one of the things that people care about the most. So what is duplicate content? Well, I’ve got a slide here where I show I think eight different URLs, you know every single one of these URLs could return completely different content. In practice, we as humans whenever we look at www.example.com or just regular example.com or /index or home.asp, we think of it as the same page. And in practice, it usually is the same page. So technically it doesn’t have to be, but almost always web servers will return the same content for like these eight different versions of the URL. So, that can cause a lot of problems in search engines if rather than having your backlinks all go to one page, instead it’s split between a www and a non-www version. And it’s a really big headache. How do people solve this?

How do people fix this? Well, it turns out, and I’ll dwell on this slide for just a few minutes, there are a lot of ways to fix it. So, some people have joked that this canonical link element is kind of like, you know, Spackle that fixes over the appearance of all the cracks in the wall. And the fact is there are a lot of ways that you can fix things first and foremost, from the beginning, upstream where you don’t need to fix it downstream later on. There was a really funny quote by Jill Whalen at the conference where she said, “Developers keep SEOs in business.” Right? And so whether you’re a developer or an SEO there are some best practices that can make things a little bit easier for your system so that you don’t have to worry about this issue of duplicate content at all. So, one is to try to make sure that your URLs are standardized, Microsoft sometimes calls them normalized, in essence there’s only one way to get to the content. If your content management system always generates consistent URLs, and they’re completely uniform, and you don’t have to worry about having eight different versions in the first place, that just saves you a lot of trouble. You don’t have to worry about the issue coming up at all. So one way to do that is to fix your content management system or your software so that you only generate these URLs in a very consistent way. Another thing to do is to think about your site. Suppose you have www.example.com and non-www, just plain old example.com. Well if you link to www sometimes and non-www sometimes, it’s natural that search engines might get a little bit confused. So linking consistently, saying okay, my homepage is going to be www.example.com/. Nothing else, that’s it. And then making sure that all of your internal linking is consistent, that alone can make a really big difference, so that you don’t end up with two, three, four copies of each page.

If you do have, you know, home.asp or index.html, you can rewrite such that all those other URLs are 301 redirects to a single URL. So, it’s great if you can fix it at the beginning, it’s great if you can link consistently so the issue never comes up, but if duplicate URLs do occur, then you can use a 301, a permanent redirect as we refer to it, to sort of standardize and glom together all of those URLs. And search engines will follow that 301 redirect, and typically group them all together. Google also does a couple of extra things that some search engines don’t do. So, in our Webmaster Tools, our webmaster console, which is totally free, doesn’t cost anything at all, you can specify, for example my site is mattcutts.com, you can specify if you prefer www.mattcutts.com or non-www, so just mattcutts.com. That’s a very easy setting, and that solves a lot of duplicate content issues right there. And a little-known fact, not everybody realizes this, is that whenever you submit your URLs in what we call a Sitemap, which is another standard that’s supported by many major search engines, and it’s a very simple file, it can be as simple as a list of URLs, we take that list of URLs that you submit, and we say to ourselves, oh, if we see a URL in that list, and then we see another version of it that’s not in the list, we will prefer URLs in the list that you gave us. So we sort of use it to break ties whenever you submit URLs from a Sitemap. So there’s at least a couple ways that you can give Google hints that try to help out with duplicate content.

But, that said, there will probably always be duplicate content issues that you can’t fix. So, just to run through a few example ones. Sometimes, you can’t generate a permanent or 301 redirect. For example, at my old school account, cs.unc.edu, I don’t run the web server there. So I’d have to open a ticket or drop an email to the people that administer that system and say hey, can you add a 301 redirect from this page to that page. A lot of free hosts, you might not be able to generate a 301 redirect. And you can’t help how people link to you. So for example, you know, even if you link consistently to just the www version of your website, some other people might link to the non-www version. And you can’t really control that at all. Uppercase versus lowercase paths. Microsoft IIS will support showing pages whether you link to home.asp capitalized or lowercase, and sometimes even mixed case. And so if people link to different versions that are uppercase and lowercase mixed, that can cause some issues. Session IDs are another really big factor. So I have seen, at least in some search search engines, a site with a one-page privacy policy. And that privacy policy was indexed three thousand times, each time with a different session ID, because the privacy policy was slightly different each time.

So, you know, session IDs in general if you can avoid them are great. But sometimes you as the search engine optimizer or the person who is responsible for the site can’t get rid of them entirely. Tracking codes, you know, if you’re buying ads. Analytics, you know the UTM parameter, landing pages where they have to be different landing pages for different ads, those are the sort of things that you sometimes can’t get rid of. And if you run an e-commerce site, suppose you have different products. You might have sort by descending price or sort by ascending price, and sometimes you need to have different facets, different views of your data, and conceptually it’s really the same thing, it’s just a different way to slice and dice it.

Finally, there’s breadcrumbs. So breadcrumbs are how did I get to this page? Am I coming to this red tent example via tents, or am I coming to it via colors, or did I come to it because I was interested in accessories? How did I land on this page? Even Google’s own webmaster help documentation sometimes has a CTX parameter that says here’s how we got to this page. And that day, it was kind of funny, the Queen had just launched a new website: royal.gov.uk. And so I wish the Queen the best, I want her to live long, and I wish the British monarchy the best, however, someone at the Telegraph, telegraph.co.uk, had done an SEO audit of this site, and they had found duplicate content issues. So you can see right here, just slash, royal.gov.uk/Home.aspx, and then at the very bottom I almost made a ransom note style where I mixed uppercase and lowercase. And the royal website returned the same page for all three of those URLs. So that was just a very simple example to illustrate that anybody can have these sorts of issues.

So what’s the answer? Lets, you know, I’ve buried the lead enough, how do people solve this particular problem? Well, assuming you can’t solve it any other way, and absolutely I encourage you to try to fix it upstream, to try to link consistently. This not something that you should just say, oh, now all my problems are solved, I don’t have to worry about anything else. But, if you can’t solve your problems in other ways, there’s a very simple element, link element, where you can say my canonical, and that’s a long word that means you know, my preferred, or the primary, or the clean, the pretty version of the URL that I want to use, is not this ugly URL with a tracking code or a session ID, it’s this pretty URL right over here. And all you have to do is in the head element of this document say you know what, even though this has a weird session ID, the pretty version, the canonical version of this URL, is over here. And that’s literally all it is. It’s a very simple open standard. It’s one simple element that you add to the head of your document. Some interesting little tidbits. This is the director’s cut so you get a little bit of extra info. Is this a tag?

Well, it’s kind of, the technical name I believe is “element.” But we’re all friends here, nobody’s going to abuse you or you know make fun of you if you call it a canonical link tag versus a canonical link element. People often speak about meta tags, right? And so meta tags are things that go in the head of the document as well. And so, if a meta tag has a value that is a hyperlink, I think the most correct thing is not for it to be meta, but for it to be called “link.” And so that’s why you see link rel=”canonical” href= and the value. So now you know the official name, but nobody’s going to care if you just call it the canonical link tag. One thing that’s kind of interesting about this tag, let’s just talk about a few high-order bits.

We don’t promise we’re going to abide by this 100%. Right? You know, if we see a webmaster and they’ve accidentally shot themselves in the foot, you know maybe they’ve created an infinite loop, and it’s very easy to create an infinite loop, we reserve the right to do what we think is best. At least at Google, we are going to treat this as a very strong hint. So unless we see some weird corner case or something where you’re probably hurting your own site, we probably would expect to respect this tag. So I think that in most cases, it will work quite well. But we do have to reserve the final, sort of bottom-line ability to say no, we don’t think this is what’s best for the users.

Again, if you can fix it yourself upstream, that’s much better. So look at all the other alternatives, the other choices before you use this tag. Don’t just say, oh, I can just slap everything with a canonical link tag and boom, I’m done. If you’re a regular user, just like a mom-and-pop and you use WordPress or you use some shopping cart software, it’s probably best not to just roll up your sleeves and go digging into it and trying to fix it all yourself, at least not quite yet. Wait a little while, because I think plugins will come out, people are talking about hey, is WordPress able to add this to the core software, so maybe you don’t even need a plugin? So if you’re just a regular user and you wait a few months, things should be fine. You know it’s a brand-new element, so there’s time for you to sit down and cautiously deliberate and say okay, what kinds of duplicate content do I have, how can I fix it? Take a little bit of time. Don’t just jump right in and start, oh I’m going to point everywhere, I’m going to do everything. There’s enough time where this will be supported so you can plan ahead a little bit. And as always, if we see people abusing it, we do reserve the right to change how we treat the tag, or to not respect the tag. There is a nice way that we try to prevent abuse. We allow things within the same domain, but we don’t allow things to cross domains. So with 301s, there’s always been this notion of can I hijack a site by doing weird 301s, and can I steal the reputation of some other site? And at least right now, this element is not really subject to that because you can only use it within the same domain. Now a natural question right after that, is well, what about subdomains? Can I, you know, do things across different hostnames?

And the answer is yes, you can. So, I was talking to Tony Hsieh from Zappos, and they were talking about duplicate content. And they have a server called zeta.zappos.com, which is sort of their staging software and might be the next version. And they were saying, well, can I send my canonicalness, can I splat it from zeta.zappos.com to www.zappos.com? And the answer is yes, you absolutely can. Can you use it from https and send that to http? Totally, works great for that. It’s on the same domain, so it’s no problem at all, at least within Google to use it for that purpose. And then what’s the difference between this and a 301 or a permanent redirect? There’s really not that much, other than this is restricted to one domain. So 301s can cross domains; this is all within the same domain.

In fact, whenever I think about it, the mental model that I have is that this is essentially like a little mini 301 redirect that you can generate with this link element. So, you know, if you think about how Google handles 301s, that’s probably a pretty good guess of how we’ll handle this particular element. So, a few more questions, since you’ve got the time, you’re watching the video. Do the page have to be identical? Bit for bit identical? No, they do not. Think again about this case where you have a catalog page and you can sort by increasing price or decreasing price, those are conceptually pretty close to the same page. So if you want to say map this to the same URL, and don’t worry about the sort by parameter, you’re more than welcome to do that. They should be similar. You know, if we see, this is the only thing I can think of where there could be abuse, is if you’ve got a cartoon page over here, and you’ve got something that’s completely irrelevent to cartoons over here and you try to combine them together. And you’re not really gaining any advantage because you had PageRank on this page and on that page. So it really doesn’t make sense to combine them, but we do recommend that you use them for similar pages. They don’t have to be identical, but they should be similar.

A few sort of niggly bits. How about relative URLs versus absolute URLs? The answer to that is you can use either one. We recommend absolute URLs. And there’s a very simple reason. When you have relative URLs, you can move a URL and everything stays the same relative to that URL. So essentially, you know the homepage can say /images or images. And that will move it relative to that particular page. But it’s better to have an absolute URL because this is a powerful tool, and you really want to say this URL goes to exactly this URL. So you want to specify that. Whereas if it’s relative, if you mess it up here, then you might mess it up somewhere else as well. Can you follow a chain of canonical tags, or canonical elements, just like you can follow a chain of 301 redirects? Yes, but again I don’t recommmend that, because if you have a big site and you have a big chain of 301 redirects, it’s easy for something to break. So, it’s similar, something can break and you don’t intend to have the consequences that you wanted to, so what I would recommend is absolute URLs, and going from the old URL to the new URL, one hop and that’s all you do. It’s just simpler that way, and you know you want to play it safe. You don’t want to accidentally shoot yourself in the foot. So what are some ways you can shoot yourself in the foot? Well, what if you say my canonical is over here, and that’s a 404 page? Right, the page might not exist. What if you had an infinite loop? This is canonical. No, this is canonical. And we’ve all seen those happen, you know, what is the Civil War? Look up the War Between the States. What is the War Between the States? Look up the Civil War. You know, and now you have to put the dictionary down and your head hurts. So try to avoid infinite loops.

What if I point to a URL that hasn’t been crawled? You know, we’ll try to crawl that URL, but that corner case, what if I told in the webmaster console, oh yeah, everything should be www.example.com, but then you specify your canonicals as non-www, or without the www. So you can do all these sorts of things to almost shoot yourself in the foot, and the answer is we will try to handle all of these corner cases in a reasonable way. The slide has some Ghostbusters because there’s the old saying, “Don’t cross the streams,” right? So think about this, take some time, don’t just throw canonical tags on willy-nilly on your site, you know, try to plan it out a little bit so that you don’t run into these corner cases. So we’re getting towards the end of the presentation. I just really wanted to send a shout out to Joachim, who is the Google engineer who really did all the implementation, all the heavy lifting on this. Made sure that it worked very nicely within a 301, and thought about all the corner cases. So, for example, someone said, well what if I have a canonical, and I point to myself? Does that work? Yep, that works fine. What if I have a canonical and my href is empty? Well, it turns out that parses as an error, which turns out to point to itself. So all this stuff still works because Joachim did a really good design, but again, try to make sure that it’s all absolute URLs and everything’s specified well. Also, I’d love to send a shout out to Greg Grothaus. It turns out when you dig into this, a lot of people have proposed similar ideas. I saw at least one post out on the general web after we’d started exploring this that said, hey, why don’t you do this kind of a proposal? But Greg was really one of the people who sparked the discussion at Google, who really pushed for it and had a great idea, and so I sort of think of him as at least within Google, he really got the ball rolling and really sparked the wave of work on this, so I really appreciate that. And of course all the people, you know, from Maile and Wysz and Adam and Riona who have worked on the messaging and reached out to different people. At Yahoo!, Priyank, and a ton of people at Microsoft, Nathan Buggia and a bunch of other people as well. My hope is that lots of search engines will support this. So, Yahoo! and Microsoft have announced that they will support it, let’s keep our fingers crossed for Ask, I’d love for them to join in as well. Wikia, so Artur at Wikia had emailed us and sort of asked about doing canonical tags anyway. And so it was really great that they could test it out while we were trying it out ourselves. And then a ton of webmasters who always give us this sort of feedback on what they’d like to see.

On this last slide, I just list a bunch of resources, so Google, Yahoo!, and Microsoft all did blog posts about it. There’s an official Help Center documentation page. And, what we saw was, as people would come and have duplicate content questions, Joost had come and sort of asked about an interesting corner case, we just said, hey, you know what? We’ve got this thing coming out that might help with this. And so it was a very nice way to just do a sort of very quiet beta test and see how well it worked. So, Joost happened to email just a few days before we were ready to announce support, and so we gave him a heads-up about the possibility of this, and he turned around plugins not just for WordPress, but also for Magento, which is an e-commerce shopping software, and Drupal, which is another open-source content management system, which I think the White House just rolled out using Drupal. So really appreciate the work that he’s done as well. And in general, you know, be careful, be cautious, plan out how you want to use this tag. But we don’t intend to make any money off of it, we think it’s just good for the web, I’ll lead to less duplicate content. It’s an open standard, so any search engine that crawls the web can use this information to help, you know, make the web more relevant and increase the relevancy of their search results. And now you know as much as the audience knows when they attended SMX West. Thanks very much for listening, and talk to you soon.

Pretty fun, right? With a little more effort, you might be able to get the links to update an in-page embedded video instead of using static hyperlinks. Specifically, there’s a “seekTo” function in the YouTube JavaScript Player API. But right now I’m too lazy to dig into it.

css.php