Archive for October, 2006

Stand up for your demographic!

For the last week, I’ve been filling out a radio diary for Arbitron ratings. That’s right, they gave me a cool $10-12 just to tell them what I listened to on the radio for the last week. If you’ve read some of my past posts, you know that I’m skeptical about metrics, especially ones based on sampling. So I agreed mostly so that I could whinge online that Arbitron doesn’t have a checkbox for “I listen more to web radio than regular radio lately.”

When the radio diary arrived, I found out that they did allow for you to write in internet radio! They also have a comments section at the end. After my week of listening, I wrote in the comments field:

A year ago I moved from radio to XM. Six months ago I moved from XM to podcasts. I put the MP3s in my car and listen there too. My favorite podcast is Danny Sullivan’s “The Daily SearchCast” and I listen to it about 30 minutes a day. Juice + SD card + MP3 player in car = podcasts rock!! *

With a metric computed by random sampling, the odds of hitting a power podcast listener are pretty small. So I felt like I had to stand up for the podcast-listening demographic and represent. :)

* Disclaimers:
- In my comments in the paper diary, I did not write the hyperlinks.
- My commute each way to work is 15 minutes each way. That’s 30 minutes a day, or 150 minutes a week. The Daily SearchCast is four days a week and rarely goes over 30 minutes/day, which is 120 minutes a week, so I’m running a SearchCast deficit. Danny, I need an additional 30/5 = 6 minutes/day of SearchCast! Talk longer, sigh more, or throw in extra rants. :)

Comments (12)

Smaller issues

Okay, you’ve got a fall “search weather” forecast, and it’s mostly quiet. Most of the recent infrastructure changes are out at most/all datacenters. What now?

Well, I thought it might be a good time to collect potential bug reports (not general comments, not questions, and not spam reports). If you know of a query that appears buggy, post it in a comment here. I’ll start the ball rolling. :)

- high site: results estimates. I believe that more accurate site: results estimates are live everywhere now.
- [site:mattcutts.com -inurl:blog -inurl:files] returns a result or two with “blog” in the url from Supplemental results. This is a known issue that’s pretty far off the beaten path (NOT terms in the url in Supplemental Results is pretty much the definition of off the beaten path query-wise :) ), but the supplemental folks have promised to look at this.
- I think two people noticed a difference for queries at 1-2 data centers. One person was someone that watches Oxford very closely. The other person was Danny Sullivan, so I heard about it :). I believe this is fixed/working now. The differences for the queries was because we were sending out a newer binary/executable and different data centers had different versions of the binary. Every data center has the newer binary now.
- For a brief while last week, site: only returned three results from a host. Someone mentioned it to me by email, but the first web report I saw was by DaveN on Friday (there’s your link, Dave). Fixed/working by the end that day, I think. It was related to a binary/executable that was going out, but a different binary than the one mentioned above.
- results estimates go up if you add a minus/NOT term. This isn’t a huge deal, but there’s a fix ready to go into a future binary/executable push. That particular push might be a month or two down the road.
- link: queries sometimes showed nofollow links on Google. This is a byproduct of the switchover to the newer infrastructure for PageRank in the Google Toolbar, info: queries, and link: queries. I expect in another month or two that we’ll be back to normal, i.e. nofollow links won’t be shown for link: queries.

Just to be clear, pruning will be ruthless for this post: I only want to see specific queries that seem to show bugs, and the more concisely you can explain something, the better. I’ll probably keep just the first example of what looks like a bug. I’ve got a meeting at noon tomorrow to talk about search bugs, so I’ll probably lock the comments after that.

Comments (84)

Fall weather forecast

Yahoo! gives a nice weather report to announce an index update, and it seemed like a good time to give people an update on search quality/infrastructure on Google going into the fall. The last weather forecast I did was about a month ago, and it was on video. It’s still a good video to go watch as background. Just to be crystal clear, each of the following paragraphs is talking about a different piece of infrastructure. :)

Bigdaddy was a software upgrade to how we crawl and partially how we index the web. It was deployed and done pretty early in the year. It brought smarter Googlebot crawling, including tricks like full gzip support and a crawl caching proxy that means less bandwidth usage for site owners.

We used the summer to swap in a completely new architecture for Supplemental Results. The core of that infrastructure is complete and fully deployed, but I’m sure we’ll see additional smaller changes (mostly making sure that queries off the beaten path such as site: do what people expect).

I believe site: results estimates should be more accurate at any IP address you try now. In mid-summer (while I was on vacation, in fact), people noticed that sometimes site: results estimates were too high. One change went in during mid-summer to make general results estimates more accurate, especially for shorter queries, but the change didn’t really apply to site: results estimates.

Happily, there was another piece of infrastructure going out that improved general quality and also made site: results estimates more accurate. I think I mentioned in the video that those folks were shooting to be live everywhere by end-of-summer/end-of-quarter, but it was a hope, not a promise. I believe that infrastructure was turned on at all data centers by last Friday (Oct. 6, 2006), which is pretty close. Most of the other quality improvements due to this infrastructure will be pretty subtle/stable, but it’s nice that site: results estimates are more accurate now.

Let’s see, what else? We just did a PageRank export, so I wouldn’t expect to see another export until the new year. The infrastructure that serves up PageRank in the Google Toolbar, link: data, info: queries, and “Similar results” is also new (surprise! :) ). I believe that’s the only piece of infrastructure I’ve mentioned so far that isn’t deployed at every data center, and relative to the other things I’ve mentioned, that infrastructure is smaller. The new infrastructure is live at about 2/3rds of data centers, and I’d expect it to roll out to all data centers within a month or two (again that’s a hope, not a promise). In the mean time, you may see some differences in PageRanks in the Google Toolbar depending on which data center you happen to hit.

I know that webmasters are especially sensitive to quality/webspam/ranking changes in Q4 because of the holiday season. If we’ve got something that evaluates well and that we think will improve quality, we can’t just pause for 1/4th of the year, but if anything big launches I’ll try to be available to answer questions and help get a handle on any changes. (Right now I’m not expecting radical changes in webspam ranking, but I know better than to make a promise.) Of course we’ll also be around at webmaster conferences. Several Googlers (including me) will be at PubCon in Vegas in November to talk to webmasters. Several Googlers (including Adam Lasnik and Vanessa Fox, but probably not me) will also be at SES Chicago in December to get feedback and answer questions too.

Okay, that’s everything that I can think of. :)

Comments (36)

A different view on Google Reader

Power user Gina Trapani, editor of Lifehacker, decided to switch from Bloglines to Google Reader:

Every day I trawl through almost 250 web site feeds in order to write Lifehacker, and for the past 2 years I’ve used Bloglines to do so. No other feed reader (not even the one I helped build) had all the features I needed to track what I’d read and what I hadn’t across computers and operating systems.

That is, until I gave Google Reader another whirl earlier this week. The just-rolled-out Reader upgrades turned the app into an even better product than the much older and more-established Bloglines, and so I’ve made the switch.

The most interesting thing to me is that Gina’s reasons for switching were almost entirely different from my reasons.

The comments on the Lifehacker thread (and the spillover digg thread) are good too. People mention wanting
- an API like Bloglines has. Fair point.
- Better favicon support and the ability to rearrange the order of feeds (one digg user wanted to be able to drag and drop feeds). Also fair points.
- Disposable email addresses. This one wouldn’t have occurred to me. Readers mentioned dodgeit as an alternative for disposable email addresses, and that dodgeit can turn that email into an RSS feed. For example, if someone sends an email to funkymcfunk@dodgeit.com then anyone can read that email by subscribing to the RSS feed http://dodgeit.com/run/rss?mailbox=funkymcfunk . I never thought of this, but it’s a neat concept to avoid giving out your actual email address.
- OPML import. This is supported (click Settings, then look for the Import/Export tab). One digg reader mentioned a problem with importing Netvibes OPML though.
- Show only updated feeds. This is supported (in the left pane, there’s a link that toggles between “only list updated” and “list all”).
- A mobile reader. A digg commenter mentions http://www.google.com/reader/m/ . Okay, I tried it and the mobile reader works well–it even shows tiny pictures on my phone if they’re in the post. Reader also somehow sent back the “Matt has read this post” info, so once I finish an item on my phone, it’s marked as read when I reload Reader in my desktop browser. That’s pretty cool, and now I yearn to upgrade my phone.
- Automatic refresh. I don’t think Reader does this right now, although it doesn’t bug me to hit the Reload button.
- Ability to sort items reverse-chronologically. Again, not the way that I read but it makes sense.
- Better preservation of formatting when emailing posts from Reader.

Other folks on the Lifehacker/digg threads mentioned Rojo, NewsAlloy, Netvibes, reBlog, and Mintr for their feed reading.

After taking Google Reader for a 1.5 week test drive, I’ve now switched over to Reader completely. I found that Reader let me slice through the same number of feeds in less time, and that was the clincher for me.

One hidden Reader gem I noticed today is that the search box for “Add subscription” is very smart. You can type in an exact RSS/Atom url, but you can also just type “www.lifehacker.com” and Reader will go and find the feed for you. That’s cool, but Bloglines can do that much. Today I realized that if you type a query, Reader will suggest feeds.

Let me give you an example. After I read about Beck visiting Yahoo! I kept meaning to add the “Yahoo Yodel” blog to my feeds, but it’s a mild amount of hassle to go find it and subscribe. Now with reader you can click “Add subscription” and type “yahoo yodel” in the search box and it will suggest feeds. Here’s a screenshot:

Searching for feeds with a query

Boom, right at the top of the right pane is the feed I want: Yodel Anecdotal. One click and it’s added. I love that Reader does something pretty smart with “add subscription” queries. Kinda like in Google Finance how you don’t need to memorize ticker symbols: you can just throw something at the search box and Finance will do the best it can:

Google Finance does auto-complete, too.

If you’re using Reader, try searching for a feed by name. You can also use the “Subscribe as you surf” bookmarklet (click Settings, then look for the Goodies tab), but I like adding feeds by name.

Update: The Google Reader blog mentions several nice new features. I’d recommend the post just to see their “Web 2.0 meter.” :)

Comments (44)

Wired-Tired-Expired

Wired: The first Google ad for “Matt Cutts” as a search phrase.

Yelp Engineering Position
www.yelp.com/jobs Join one of the hottest web 2.0 startups out there

Very savvy, Yelp. Notice that they even geotargeted the ad to the Bay Area to get laser-targeted leads. Somebody’s putting that new funding to good use.

Tired: The other Google ad for my name:

Matt Cutts PLR Alert
“It’s Official Scgm v3 is Smoking”
Great PLR Offer: Almost Sold Out

This ad was more a little creepy (”Grab $50,000 Worth Of Source Code And Master Rights To 10 HOT Software, eBooks & Multi-media Products For Mere Pennies On The Dollar,” says the site).

Expired: Finding a full copy of one of my blog posts in Google News. Why? Somebody just copied the post wholesale:

Newsfactor's copy of my blog post

I haven’t noticed Newsfactor doing that before; is it new?

Comments (43)

Next entries » · « Previous entries