Archives for August 2006

Scoble visiting the Plex

Robert Scoble is visiting the plex tomorrow (okay, today at this point. I lost this post and need to get the WordPress autosave functionality going). What should I ask Robert?

Aha, looks like he’s got a question for me about the Windows Live Writer blog. Without doing any actual debugging (I’m not on the VPN), I noticed:

– a query for [windowslivewriter] returns pages from the site, so we do have some pages from the site in our index.
– but the site is pretty brand-new (only one post on the blog so far, dated Aug. 11, but it didn’t get traction on Techmeme until Aug. 14th).
– the title of the blog is “Writer Zone,” which is a little generic. If you want to show up for a query like [windows live writer], having that in the page title certainly could help.
– doing the query [site:windowslivewriter.spaces.live.com] returns some urls like windowslivewriter.spaces.live.com/Blog/cns!D85741BB5E0BE8AA!174.entry . In general, urls like that sometimes look like session IDs to search engines. Most bloggy sites tend to have words from the title of a post in the url; having keywords from the post title in the url also can help search engines judge the quality of a page.

Ah, but I can hear people saying that spaces.msn.com also had urls like “ericlam.spaces.msn.com/blog/cns!5D73BE0B4076E647!330.entry” and spaces.msn.com has lots of indexed urls. That brings up the whole subject that spaces.msn.com has been migrating to spaces.live.com. I know that was a pretty big change for the Spaces folks. Anytime we know about big changes like that across the web, we want to help make sure that the changes go smoothly. I know I got over 12 emails from GregP over at Microsoft about the progress of the migration from spaces.msn.com to spaces.live.com, and I checked with the indexing folks at Google to ask about how long it might take to crawl sites over at spaces.msn.com and see that they’d moved over to spaces.live.com. So there’s a chance that it might be something related to the hostload or the amount of new pages we can crawl over at spaces.live.com. Anyway, I’ll dig into it more tomorrow.

Let me know if there’s anything that you want me to ask Scoble. 🙂

Update: It was nice talking to Robert Scoble and Scott Mace of Calendar Swamp. We talked about blogging, the differences in cultures at several companies, and what Google should be doing better on. Scoble had a couple follow-on posts here and here.

By the way, it looks like the primary issue with the Windows Live Writer blog was the large-scale migration from spaces.msn.com to spaces.live.com about a month ago. We saw so many urls suddenly showing up on spaces.live.com that it triggered a flag in our system which requires more trust in individual urls in order for them to rank (this is despite the crawl guys trying to increase our hostload thresholds and taking similar measures to make the migration go smoothly for Spaces). We cleared that flag, and things look much better now.

For a search like [windows live writer], I see the Windows Live Writer blog at number one, and the Windows Live Writer Beta product download page at number 2. Going forward, I’ll keep an eye on the spaces.msn.com to spaces.live.com migration with the crawl folks to make sure that it continues to be smooth. It also looks like Mike Torres is #1 for searches like [torres talking], so overall things look pretty good now.

Handling noindex meta tags

Okay, here’s a question. I did the search [congoo] recently and didn’t get the home page of Congoo–why not? If you view the source of http://www.congoo.com/, it turns out that they have a noindex meta tag:

<meta name="robots" content="noindex, nofollow" />

Okay, so Congoo apparently doesn’t want their root page to show up in search results pages. Fair enough. But just for fun, I did the search on Ask, Yahoo!, and MSN. Ask doesn’t show the root page from Congoo, but Yahoo! and MSN do. MSN shows just a url reference:

MSN has Congoo in the index

But if I click on the Cached link, I get the message “Could not find the requested document in the cache.” So it looks like MSN may handle noindex meta tags by showing a url reference but not any snippet.

Now let’s look at Yahoo’s result:

Yahoo has Congoo in the index

Huh. They show a 15K page with a Cached link. I clicked on that link and the cached copy had the noindex meta tag on it.

So based on a sample size of one page, it looks like search engines handle the “noindex” meta tag:
– Google doesn’t show the page in any way
– Ask doesn’t show the page in any way
– MSN shows a url reference and Cached link, but no snippet. Clicking the cached link doesn’t return anything.
– Yahoo! shows a url reference and Cached link, but no snippet. Clicking on the cached link returns the cached page.

Something to be aware of. Personally, I’d prefer it if every search engine treated the noindex meta tag by not showing a page in the search results at all.

css.php