Scoble visiting the Plex

Robert Scoble is visiting the plex tomorrow (okay, today at this point. I lost this post and need to get the WordPress autosave functionality going). What should I ask Robert?

Aha, looks like he’s got a question for me about the Windows Live Writer blog. Without doing any actual debugging (I’m not on the VPN), I noticed:

- a query for [windowslivewriter] returns pages from the site, so we do have some pages from the site in our index.
- but the site is pretty brand-new (only one post on the blog so far, dated Aug. 11, but it didn’t get traction on Techmeme until Aug. 14th).
- the title of the blog is “Writer Zone,” which is a little generic. If you want to show up for a query like [windows live writer], having that in the page title certainly could help.
- doing the query [site:windowslivewriter.spaces.live.com] returns some urls like windowslivewriter.spaces.live.com/Blog/cns!D85741BB5E0BE8AA!174.entry . In general, urls like that sometimes look like session IDs to search engines. Most bloggy sites tend to have words from the title of a post in the url; having keywords from the post title in the url also can help search engines judge the quality of a page.

Ah, but I can hear people saying that spaces.msn.com also had urls like “ericlam.spaces.msn.com/blog/cns!5D73BE0B4076E647!330.entry” and spaces.msn.com has lots of indexed urls. That brings up the whole subject that spaces.msn.com has been migrating to spaces.live.com. I know that was a pretty big change for the Spaces folks. Anytime we know about big changes like that across the web, we want to help make sure that the changes go smoothly. I know I got over 12 emails from GregP over at Microsoft about the progress of the migration from spaces.msn.com to spaces.live.com, and I checked with the indexing folks at Google to ask about how long it might take to crawl sites over at spaces.msn.com and see that they’d moved over to spaces.live.com. So there’s a chance that it might be something related to the hostload or the amount of new pages we can crawl over at spaces.live.com. Anyway, I’ll dig into it more tomorrow.

Let me know if there’s anything that you want me to ask Scoble. :)

Update: It was nice talking to Robert Scoble and Scott Mace of Calendar Swamp. We talked about blogging, the differences in cultures at several companies, and what Google should be doing better on. Scoble had a couple follow-on posts here and here.

By the way, it looks like the primary issue with the Windows Live Writer blog was the large-scale migration from spaces.msn.com to spaces.live.com about a month ago. We saw so many urls suddenly showing up on spaces.live.com that it triggered a flag in our system which requires more trust in individual urls in order for them to rank (this is despite the crawl guys trying to increase our hostload thresholds and taking similar measures to make the migration go smoothly for Spaces). We cleared that flag, and things look much better now.

For a search like [windows live writer], I see the Windows Live Writer blog at number one, and the Windows Live Writer Beta product download page at number 2. Going forward, I’ll keep an eye on the spaces.msn.com to spaces.live.com migration with the crawl folks to make sure that it continues to be smooth. It also looks like Mike Torres is #1 for searches like [torres talking], so overall things look pretty good now.

Handling noindex meta tags

Okay, here’s a question. I did the search [congoo] recently and didn’t get the home page of Congoo–why not? If you view the source of http://www.congoo.com/, it turns out that they have a noindex meta tag:

<meta name="robots" content="noindex, nofollow" />

Okay, so Congoo apparently doesn’t want their root page to show up in search results pages. Fair enough. But just for fun, I did the search on Ask, Yahoo!, and MSN. Ask doesn’t show the root page from Congoo, but Yahoo! and MSN do. MSN shows just a url reference:

MSN has Congoo in the index

But if I click on the Cached link, I get the message “Could not find the requested document in the cache.” So it looks like MSN may handle noindex meta tags by showing a url reference but not any snippet.

Now let’s look at Yahoo’s result:

Yahoo has Congoo in the index

Huh. They show a 15K page with a Cached link. I clicked on that link and the cached copy had the noindex meta tag on it.

So based on a sample size of one page, it looks like search engines handle the “noindex” meta tag:
- Google doesn’t show the page in any way
- Ask doesn’t show the page in any way
- MSN shows a url reference and Cached link, but no snippet. Clicking the cached link doesn’t return anything.
- Yahoo! shows a url reference and Cached link, but no snippet. Clicking on the cached link returns the cached page.

Something to be aware of. Personally, I’d prefer it if every search engine treated the noindex meta tag by not showing a page in the search results at all.

Five Tips for Burning Man

Okay, Burning Man is going on this week. A few years ago I went to Burning Man, and I thought I’d share some advice to keep you out of trouble and make it more fun if you decide to go.

1. Don’t take your own car. Burning Man is a temporary camp on a desert playa. That means dust. When I took my 1994 Ford Escort a few years ago, I never did get the dust completely cleaned from the car, right up until the day the car died. Save yourself the stress: rent a car or a van.

2. Don’t agree to drive someone who buys gallons of yellow body paint. I’m looking at you, N. I never did get those traces of yellow body paint completely cleaned out of my car either. This advice also applies for red body paint, blue body paint, and green body paint. Really, any type of body paint.

3. Take a digital camera and a normal camera. Taking pictures of naked people? Use the digital camera. Taking pictures of art installations or the scenery? The normal camera is fine.

4. If it’s your first time, don’t go on a Monday. I know that you want to get the “full Burning Man experience.” But the fact is that you’ll be out in the desert for days, and that can get old if you don’t know many people. Go on Wednesday or Thursday and you’ll still have plenty of time to see the art cars and watch the Man burn.

5. Keep your eyes open and try new things. For example, when I was there, I kept seeing trucks driving around every few hours, but didn’t pay much attention to them. On the last day, I realized that the trucks were spraying water as they went. Smart people keep their eyes open and spot opportunities, whether it be free water showers or a chance to talk to someone new.

Video: Datacenter comments

Okay, people always ask me about weather updates and what’s going on with data centers at any given moment. I recorded this summary last week as an overview of the various software infrastructure changes that have happened this year, and some meta-insights about data center watching.

Session 15: Data center comments

A combination of review and weather update for datacenters as of August 23rd, 2006. Some comments on:
- data refreshes on June 27th, July 27th, and August 17th 2006
- Bigdaddy
- Supplemental Results
- Infrastructure that improves quality and gives more accurate site: results estimates

Plus a short reminder that
- results estimates for site: are just estimates, with some reminders of their limitations (I mention too-high site: estimates in middle of the video, in the context of the “5 billion page” spammer whose best domain actually had <50K pages on it).
- watching individual data centers may not be the most productive use of your time. :)

As a bonus, at the beginning of the video I review some some schwag from the recent SES San Jose conference.

SEO earthquake: Danny Sullivan leaving SEW and SES

Today, Danny Sullivan announced that he’s leaving the popular searchenginewatch.com site that he built and the Search Engine Strategies series of conferences that he ramped up. For the search industry, this is a nine on the Richter scale and has the potential to shake the whole industry for a few months. Danny has been covering the search industry for over a decade now, and the brands that he built in Search Engine Watch and Search Engine Strategies are incredibly strong–although not as strong as the reputation that Danny built on a personal level. Anyone that’s had a chance to spend any time with Danny has a huge amount of respect for him. And no matter how you slice it, Danny counts as one of the “founding fathers” of the search industry.

Only time will tell how this will all shake out. My sense is that Danny will benefit from this–any company would be lucky to have Danny. I guess the one thing you can count on in the search industry is change. My spidey sense tells me that backchannels in search are probably lit up like Christmas trees talking about this. :)

Danny, I wish you the best in whatever you decide to do next.

css.php