SES NYC 2006, Day 2

Bleh. Two nights with five hours of sleep a night are catching up to me. I did a couple sessions today. I was scheduled to talk for Pundits on Search, and then I hopped in on Duplicate Content Issues to tackle Q&A and discuss a few high-order bits. I was added to that session at the last minute, so I didn’t bother to make PowerPointage; I just talked about some suggestions that I’d bear in mind. Chris Boggs did a great job of covering the duplicate content session.

For the “Pundits” panel, the questions were softer than I expected. When you’ve got three poster bloggers from GYM in one place, I expected more “Let’s try to lead the experts into a mine field and then take off the blindfold”-type questions. Barry did a solid job of covering the session, but it was a bit of a “you had to be there” session. I gently teased Scoble because some versions of IE7 didn’t have a Google search option built-in, and the instruction manual showed you how to add (AOL? Ask? Yahoo?) but not Google. On the other hand, Scoble gave props to Google on mobile phones and for several other things, so everybody got along and nobody had hurt feelings.

Other tweaks I’d add to Barry’s write-up:

– Vertical search start-ups. I think it’s great that different folks are tackling vertical search areas. Power to them for exploring these opportunities. I mentioned that it was a plus that it’s easier to start a company these days. On the minus side, existing search engines have a lot of infrastructure, so trying a vertical search experiment is a lot easier for us (we already have a large chunk of the web, indexing code, a serving infrastructure, etc.). I had a nice chat with someone from Oodle and gave my recommendations for site architecture and subdirectories vs. subdomains.

– d.e.l.i.c.i.o.u.s. – I mentioned the bad first experiences I’d had with delicious (importing bookmarks didn’t work for me) and Flickr (I hit a size quota immediately because I tried to upload full-size images), and how social search might not gain traction with regular non-techie folks for a while. The analogy I used was wireless stuff back in 2000-2001; it was clear that wireless would be important at some point in the future, but pinning one’s hopes to a WAP/WML search engine back in 2000 could be premature. In the same way that it was early for wireless back then, part of me wonders if Yahoo’s emphasis on social search is too soon. My Web 2.0 was launched around June 28th, 2005, so that’s 246 days or about 8 months (no, I’m not a savant; I searched for [date calculator] and used the first result). The My Web 2.0 page says that Y! is serving 170,160 tags, so dividing that out, is that under 700 new/distinct tags/day? It’s late, so I’ll leave the detailed discussion of tag growth for another day; such is the stuff of late-night debates. πŸ™‚ Also via TW I notice that Greg Linden of Findory fame is thinking about this too, especially the issues of spam (remember meta-tags?) and non-participation.

– What to expect on webspam in the coming months. We’re open to whatever techniques are scalable and robust. We’re working on decreasing webspam in other languages. No surprises for current blog readers.

– Video search. I used to be a skeptic, but availability of tools/cameras plus distribution like Google Video and YouTube is winning me over, especially after I lost a lazy Sunday afternoon just surfing the top videos at those two sites. The clincher for me was when I got hooked on Lost for a while. I Netflix’ed the first season, but after that I was stuck in the middle of season two without seeing the first few episodes of season two. So I bought six(?) episodes of Lost from the Apple video store, watched them, and now I’m caught up and can watch Lost with the rest of the world. If you’d asked me a year ago if I’d ever pay for TV show downloads, I would have laughed. Turns out I was wrong. Sometimes convenience is worth a buck or two.

What else? I got a couple smart suggestions for Sitemaps features, a couple bug reports, had some really interesting conversations about spam in Germany and spam with a large catalog company, collected the latest gossip on competitors, heard a couple ideas for future spam attacks against us, and got to meet some new folks at dinner. So it was a good day.

Update: David Utter summarizes the pundits panel and the duplicate content issues panel well, if you’d like to read a different report.

Update: And it looks like there’s a podcast of the pundits panel, courtesy of Webmaster Radio. The mp3 is at . So now you can hear the whole thing for yourself if you want. Danny has a post here about the pundit panel, but the mp3 that he points to is, which is Barry Diller’s keynote instead. That keynote is notable for Diller’s suggestion that should research a “Be Evil” philosophy, so you should listen to that too. πŸ™‚ (Thanks, Brian!)

26 Responses to SES NYC 2006, Day 2 (Leave a comment)

  1. GREAT post Matt, thanks for the remote view of SES NY. Next time I see you I’m asking which is the best location for SES outside of San Jose, which is an easy drive.

  2. Hey Matt,

    If you’re starting to get down with the video search, I’ve got a crazy-assed idea (probably because it’s 1:30 in the morning and I can’t sleep).

    What about “streamed searching” or “what’s on the radio now?” searching? Stations such as (Jimmy Buffett’s station, and Jimmy is GOD of all that is musical) could send XML requests to Big G that would be updated in real-time.

    Just a thought.

  3. oh, oh!!! I know!!! ho wabout Real life searching, like a flying googlebot going through peoples windows and stuff?

    if it goes through I want my check from Google!


  4. Mike (Germany)

    [quote]large catalog company[/quote]

    Hi Matt,

    What is a catalog company? Is it a shopping site or similar dmoz for you?

    by mike

  5. Right on about video search. NBC has let Apple load the pilot episode of Conviction (new law show). I would have never considered watching it or sacrificing precious DVR space. but, free download? sure! brilliant concept! Especially when you are traveling, in a foreign land and in desparate need of a little english language entertainment.

  6. I was wondering if you could detail some of your site architecture recommendations, subdirectories vs. subdomains, or point us somewhere for a good read. Great post, thanks for taking a few minutes to keep us informed.

  7. Gheesh, can’t someone get with the freakin’ program and tape and digitize these sessions for download? Not that seoroundtable and “dutter” don’t have the skills to sum up what was said, but it just doesn’t do the trick for me sorry.

    I am not getting the entire picture on duplicate content here, the message I am getting is, yeah RSS content duplicators are still having at it because of the speed of the scrape. But, did I also hear that duplicate content can help a new site as a form of weak backlink? What if your content goes down into supplimental results, does this mean that it is not longer seen as yours?

    Ever play that game while sitting around the campfire were you tell a story into someone’s ear and they whisper it to the next person? That’s right, the original story loses its focus.

    Don’t make me come out the these sessions and record everything said like some kind of crazed marketing groupie fan! Oh and by the way, wouldn’t that be the buzz? πŸ˜‰

  8. Aaron makes a good point, but I don’t think conferences are going to make these presentations available online in any form. Digital downloads would probably undermine attendance, which is the real moneymaker for a conference.

    Plus there could be copyright issues to address with all of those presenters; what if one person out of a five-person panel said “no”?

  9. Well, you can’t just roll into these sessions and record or video things. I’m actually the guy that took the notes that ‘dutter’ turned into the article. We try to get notes on the sessions out really fast and cover a lot of sessions so I generally try to take some good notes and quotes and send it home so it can be packaged nice and distributed.

    You can try to take note and quotes and obviously you want to try to convey the most important points, but you can’t just ‘bottle’ the session and present it for download. There are copyright issues etc involed with that sort of thing. That and it wouldn’t exactly encourage attendance (which is insane this time around I might add).

  10. Adam, Gary Price had a good post a while ago about searching what XM and Sirius had. It was probably on ResourceShelf; good stuff.

    Michael, the main point I made is that subdirectories are lighterweight in some sense to surfers. If you immediately start out with subdomains like metro areas or types of searches, you’re stuck with that. With subdirectories, you could have a /sanjose/ directory and a /bayarea/ directory, and somehow the conflict isn’t quite as great to visitors as and It’s all just bits of course, but when I’ve talked to surfers about pages that are similar or a subset/superset of each other, they often prefer that to happen via subdirectories instead of subdomains.

    Aaron, I added a link to the MP3 of the pundits panel. I can understand the tradeoff between the conference not wantig an MP3 (perhaps speakers would be less forthcoming, or there’s intellectual property questions about who owns the rights, or the conference doesn’t want to lose people who just stay at home and listen to the audio), but personally I’m in favor of making audio widely available. The approach of picking one session per day and making that available isn’t a bad compromise.

  11. Matt,

    Hope you are having fun at the conference.

    What is Google’s position on using articles to generate links? i.e submiting objective articles to News and PR sites?

    Should we use the “no follow” tag for these as well?


    PS Will you be at the Miami Conference?

  12. Matt,

    Thanks for your insightful and entertaining blog. I am stuggling with how to move content WITHIN a large, long established domain, which we are moving from a java server to an open source platform. There are thousands of articles now in a sub-directory. I was thinking of putting them all in a new sub-domain, e.g. on the same domain, and using 301s on each page of the the old sub-directory to redirect to new sub-domain. I would block SEs from old directory using robots.txt. Still, I am terrified to lose high rankings. Is there something else to be worried about or does this approach sound right to you? I am wondering if 301s on each page is safe that trying a server-side re-direct. Can I rename the articles or is that risky too? Thanks a lot, Omar

  13. Another vote here for MP3s of conference sessions. Free is best but charge if you want. I think this would NOT affect attendance negatively. (I’d still go to several per year).

  14. *** EDITED POST *** (since the < sign didn’t get in the first time.)

    Cool tip, Matt. I found the article in question. <– for those who may want to read it.

    But my idea was a step further. There are a lot of non-Sirius stations that have web broadcasts (e.g. The FAN 590, BBC Online) that could be searched for.

    In other words, expose all of it. All the different cultures and aspects of what’s on the radio.

  15. Wish I could have been their to see the Google/MS dynamic.

    I agree with you on regular users not getting into the group search dynamic and things like At the end of the day the majority of the content is dork centric. If you could get a large sample of business folks’ing (yes, this is a made up word), I think that the value would start to emerge to the non-tech crowd. All you have to do is look at the tag cloud to see the limited appeal,

    Get some sleep!

  16. Regarding duplicate content. Are penalties now excercised at the index level rather than the result level?

    Prior to BigDaddy, a site with ~3500 pages was fully indexed (“site:” operator actually showed 23,000) and had Google searchers regularly arriving to 2500 different pages.

    After BigDaddy, only 700 pages remain indexed. Despite both versions of the Google crawler fully crawling the site more than ever.

    What are the odds this is a technical problem on Google’s side vs. some kind of penalty? Is there a clear way to distinguish?

  17. Glad you’re having a good time Matt,

    I was really excited when I saw the link to the duplicate content session since I just asked you about it the other day.

    I have to admit that I finished the article a bit more confused than when I started though, I guess some of it just went over my head πŸ™

    Thanks for all the info though

  18. Matt,

    This page shows how to add Google to IE 7:

  19. Ben, I get along really well with Scoble. I’ve been at the table when he was having lunch at the Googleplex, and we talked until late the night before the conference.

  20. Hi Matt.

    I wanted to touch base on your Subdomains vs Subfolders comment. Now from what I understand, your preference towards subfolders was based off of site usability? By having your site set up in subfolders, will there be any effect on Google search rankings (positive or negative).

    I have completely redesigned our company website and adressed some of hte major problems we were having. We offer many services and to avoid confusing the user with overload, I have separated the services into 5 distinct sections. Each section is set to be its own subdomain and all the pages will contain the same architecture expect for the text and some images of course.

    We went with subdomains to accomplish two things. To separate and distinguish each section of our services and to be able to run campaigns which could target a desired service we wanted to increase business in.

    Do you see any problem associated with the interlinking of subdomain pages and will there be a negative effect with google on what I said above?

    Looking forward to your thoughts


  21. Matt,

    I suggested one of the sitemap features (nodes to specify paramaters that do not affect content). I have a problem trying to accurately distinguish the % of legit indexed content on our site versus dup content due to tracking parameters. The site is millions of pages and google index varies between 1/10 and 200x the reasonable number, based on virtual. I would like to send you the results for the past 4 days to get your insight into what is happening. Would you be willing to share your email address?


  22. Hi, Matt,
    I can’t find answers to these two questions about duplicate content:
    1. Should I worry that the same page can be accessed both with and without www. ?
    2. Would adding someone’s RSS feeds on my PHP page (such that its content would be search engine searchable) be duplicate content? (I noted that all RSS publishers I asked so far are actually encouraging posting their feeds).
    I would appreciate your input.

  23. Hi Matt:

    Sounds like a great conference. We have small online magazine and have been using news releases from PR Newswire as the basis for content when the release contains news which is relevant to our magazine’s overall theme.

    Some of the releases contain content which is appropriate to use verbatim, e.g. a Doctor outlines the 10 most relevant healthcare issues for one year olds, Will using this get us in trouble for duplicate content? Also, we often use the exact description provided by those who issue the release describing their organization, company….

    Thanks in advance for your input.

  24. Splogging competitors a good Google strategy?

    Regarding the link in this post to Duplicate Content Issues, and “looking at inbound and outbound links to determine if it is dupe content spam” in particular…

    I run a site critical of a multi-billion dollar corporation that is known for its corruption. Several months ago, around the time it was being publicized all over that the new Google update would put a higher penalty on mass link building, I started noticing visitors were accessing via a lot of spammy search terms. A little investigation revealed that someone was posting blog comment spam links back to my site by the hundreds. Correspondence with some of the bloggers helped me determine that the comments were coming from multiple IP addresses, so most likely from some kind of untraceable bot.

    The spam terms weren’t even in the text of the links back to my site, but in the other spam comments that had been left on the blogs for lovely things like breast enhancement and phentermine. Although I was getting extra visits from these new terms (most of which can’t be found on my site AT ALL), at first it didn’t adversely affect my listings for valid searches. That is until the latest Google update in February, when I lost half of my traffic and nearly all of my Google listings. I’m now ranking for a very limited number of legitimate search terms only, and what is missing is so obvious there absolutely must be some kind of penalty.

    Check the backlinks and all of the legit links are deeply buried in tons of splog crap, and most of the blogs have been abandoned for years, with no contact information to have the spam comments removed. Google Sitemaps lists the top words linking back to me, which are all spam, with ZERO related to my actual content. In short, this underhanded attempt by this corporation to squash a critic and place a Google penalty on my site was 100% successful and there’s not a darned thing I can do about it. Creating a new domain and starting over will mean an age-related penalty and/or a new opportunity to penalize me with splog.

    Now that it is so easy to do, I expect we’ll see a lot more of this kind of dirty trickery. I hope this issue will be addressed in a future update, because no one should be penalized for things that are out of their control, like who is linking back to them.

  25. Another vote here for MP3s of conference sessions. Free is best but charge if you want. I think this would NOT affect attendance negatively. (I’d still go to several per year).