Video: Datacenter comments

Okay, people always ask me about weather updates and what’s going on with data centers at any given moment. I recorded this summary last week as an overview of the various software infrastructure changes that have happened this year, and some meta-insights about data center watching.

Session 15: Data center comments

A combination of review and weather update for datacenters as of August 23rd, 2006. Some comments on:
– data refreshes on June 27th, July 27th, and August 17th 2006
– Bigdaddy
– Supplemental Results
– Infrastructure that improves quality and gives more accurate site: results estimates

Plus a short reminder that
– results estimates for site: are just estimates, with some reminders of their limitations (I mention too-high site: estimates in middle of the video, in the context of the “5 billion page” spammer whose best domain actually had <50K pages on it).
– watching individual data centers may not be the most productive use of your time. 🙂

As a bonus, at the beginning of the video I review some some schwag from the recent SES San Jose conference.

56 Responses to Video: Datacenter comments (Leave a comment)

  1. hi Matt,
    i need your help really, i am a confused seo, very sad for it. i cannot get the answer from others, i think you are kind, maybe would like help me. thanks in advance!
    my question is most of guys think backlinks=inbound links, i though so before, and i thought backlinks are not include internal links, but i see backlinks through google toolbar, found backlinks include internal links, i am confused, because glossory explain inbound links is “Links pointing to a website. When a user arrives at a website from another site, that link is called an Inbound Link.
    ” pls notice is “from another site” not “from another pages”, it is different, so i think inbound links arenot include internal links, inbound links not = backlinks, backlinks’s explain is “Backlinks are links pointing at a specific page. Most major search engines allow you to see who is linking to a page or site. That process is called “checking backlinks”. Some toolbars have shortcuts for this process. The syntax is often “link:http://example-url-here/”..
    “. what do you think it? i am crasy for it now, need your help really!

  2. So what science fiction series was that you got given at the conference? The video was just a bit too pixelated to read the spine on the video.

  3. Hi Matt,

    Well the weather is not pleasent for me, I have kept eagle eye on my website since last two months and found that all of my indexed pages are in supplement results. In mid July I checked mey website and found 19 supplement results, In the starting of august I found only one indexed page that was not supplement today after this post I checked and found 33 supplement result pages.

    I think know google gives lot of preference to the referential links. Links pointing to your webpage with some Keyword Oriented anchor text, this is similar to google bomb or google wash.

    I have one query here when you make a search vikas amrohi on google then you will find my site http://www.theseoguru.com comes on the top. I am not using vikas amrohi any where in the content of that page this due to link references(i think).
    My query starts from here when I make a sharp search “vikas amrohi” on google then again I found my website comes on the top.

    Why this page is coming on the top?
    As per google’s advanced search guideline this is sharp search and the search results should have both of the words.

    I am also scared all of my index pages are supplement, why these are supplement is the big issue can I relate Google wash to supplement results ??

    Help me DuDe

  4. So that’s why I see the normal number of pages for a lot of websites …

    Thanks for the update Matt 🙂

  5. Matt,

    Do you feel that the vast complaining currently going on by some webmasters regarding google serps to be justified?

    This is occuring on many webmaster forums and while my site doesnt seem to be affected it is also not advancing in the serps.

    I dont know how but my site gets the same amount of traffic from google for over 6 months – no change and I was wondering if traffic numbers are regulated to sites and if positions in the serps change according to googles alotted visitor numbers.

    OR is the official viewpoint one of Big Daddy being perfect?

  6. Great info. Thanks Matt.

  7. “…TheSEOGuru Said…”

    You sell yourself as “TheSEOGuru” and this is a puzzle? Considering there are about a whopping 680 results for your name and the fact that your name is adjacent to your url in every post/blog/forum./article site is it that difficult of a mental leap? contextual relevance… ?

  8. Hi Matt,

    I have one question. The first sentence of your blog says “People always ask me…”. What is the best way to ask you questions? Is there any email contact or just posting the question here is the best way to go?

    If the second option is fine then I would like to ask you if there are any serious concerns about indexing AJAX web pages content at Google. And if so then what is the status and if there are any recommendations (e.g.: using specific AJAX framework). Also I would appreciate if you redirect me to some discussion forum related to this topic.

    Many thanks,
    Lukas

  9. The use of the term DATA PUSH seems like a marketing buzzword –

    it would have been helpful to know from a programming/Engineering/Development perspective of WHAT a DataPush is technically.

  10. Once again, the content is the king. Thanks Matt.

  11. Seoguru, if you search for the word ‘Failure’ on google – George Bush website comes at the top – though it does not have the word failure anywhere in the website. So the only explanation may be there are some enemies of George Bush and friends of vikas amrohi and seoguru in Google team. Matt would give a better explanation. 🙂

  12. Nice video Matt, thanks.

    When you say “correct to 3 significant digits”, is that site: estimates or all estimates?

  13. These books are a must read for every SF Fan. The german translation consists of 6 books and approximately 5500 pages.

  14. Did I miss the memo somewhere, or is there a podcast feed that I can subscribe to for Matt’s show?

  15. Hi Matt,

    I’ve really enjoyed watching your videos on Google Video. My team and I find them really useful and informative.

    I have a question for you.

    To what extent does a servers location impact natural search for a site? I look after a site with presence in 9 different countries (Australia, Poland, South Africa, UK etc) and all of our servers/data centers are located in London. Would Google view our Australian domain in a differnt way (perhaps view it as more important) if our servers were actually in Australia for gumtree.com.au?

  16. SEW, a “data push” is basically what it says. the moving of either a database, or files, or whatever… from a development server to a live production server.

    It’s an update of data.

    maybe you weren’t asking about “what it means” moreso as “what does it mean in Google’s context”… I wish I could help with that part, but I can’t.

  17. Good idea to wear the dark T-shirt again, Matt. When you wear a light one, your shoulder kinda blends into the back wall (although it looks pretty cool in that Criss Angel, Mindfreak sort of way).

    By the way, is anyone else noticing that Matt’s voice is out of sync with his mouth, or is that just me? (About the 7:00 mark is where I noticed it personally, but it may be earlier.)

    Other than that, my only comment on the video is the generic “good stuff as always” and I have nothing to add. 🙂

  18. Hey Matt,

    while i searched in google.DE for “heiko”, google displays me some serps for “Julia Wilke” as can be seen here: http://www.manuelbieh.com/upload/heiko.png

    I don’t get the interrelationship of “heiko” and “julia”, maybe you can?

    best wishes from germany,

    Heiko

  19. Awesome info, as always. Thanks much, Matt!

  20. I can recomend the nights dawn trilogy – his latest ones good to i stayed up all night to finish -it.

    Wonder what faction in the ND universe the guy giving it thinks Google is 😉

    A lot of big books get split for the states though i think Robert jordans don’t

    Mary Gentles Ash for example got split.

  21. LOL

  22. Sorry, i messed up on my comment. Thanks for the latest vid. But also, for those of you who admire matts wall poster, run an image search for language families of the world and you will find it 🙂

  23. Matt, wouldn’t it be easier to just do a podcast instead of a video? It’s smaller, you can do it while your wife is home, and tons of seos will have it on their ipods (although i could put your video on my ipod.. i’m not sure i’d want to carry you around with me)

    anyway… it just seems like their isn’t much point to doing a video. sure you can get through more topic, but your lips aren’t in sync, and you don’t show any visuals or anything.

    some screencaps, or something would help. Except for the props, I’ve just minimized my browser and kept workign while listening to your videos.

    maybe i’m retarded, but it just seems like an mp3 would be more useful. you could then use phillips google mp3 player find to include them all right here on the site…

  24. Marketing 101…

    Splitting a book into 3 is like late night tv 3 easy payments…
    Or 60 easy monthly car payments…
    Or a 30 year mortgage…

    Better yet…if nobody buys the first third…no trees have been wasted on the last 2 thirds…

    Image marketing a house in 3…
    First you get the kitchen and bathroom…cause ya got to eat and well uhhh…
    Next you get the living room and bedroom…cause ya got to have a place to watch football, drink beer and sleep…
    Finally you get the garage and dog house…a place to park your car and a place to go to when the wife is sick of you using part 2…

    Do you think each part has equal value?
    And is each part worth $99,999.99 in the California market place?
    Has it been done on eBay already?

    And finally…would one part show up in Google…one part in Yahoo and another in MSN?

    Imagine how I would think if I didn’t have coffee in the morning…scarey

  25. Next you get the living room and bedroom…cause ya got to have a place to watch football, drink beer and sleep…

    So what’s the purpose of the living room?

  26. Rotating from week one to week two allows time for the wife to clean, prep and air out said area…

  27. echolu, “backlinks” refers to all links pointing to your page. A lot of people say “off-domain backlinks” to exclude self-links from your own domain.

    TheSEOGuru, having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We’ve got your home page in the main index, but if you look at your site at
    http://siteexplorer.search.yahoo.com/advsearch?p=http%3A%2F%2Fwww.theseoguru.com&bwm=i&bwmo=&bwmf=s
    you’ll see not a ton of links (I know my links are probably nofollow links, and several more of the SEO links are probably that way too). So I think your site is fine (e.g. I wouldn’t worry about spam penalties), it just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I’d expect more of your pages to be in the main web index.

    “So that’s why I see the normal number of pages for a lot of websites …” Exactly right, Cristian. 🙂

    martialarm, any update that affects a significant number of sites can’t be 100% perfect, but we do a lot of testing to try to ensure that updates improve quality and relevance from what we had before. Part of what we do online is listening to get reports of anything suboptimal so that we can try to fix it in future changes.

    S.E.W., we’ve used the term data push since I’ve been at Google. You compute some data (PageRank, backlinks, new index data, info: data, related:data, new spam data, whatever). Then you push it. Data doesn’t go from your desktop machine to thousands of production machines magically. It’s actually a hard problem (see Opsware running data centers as a full-time business, for example). When the amount of data to be moved is massive, schemes such as “compare-by-hash” can fail, for example. Compare-by-hash means that you hash some data to be moved and if the checksum agrees, then you don’t need to push that data. The problem is that if you’re moving around, oh I don’t know, a copy of the web around 😉 , you inevitably find some degenerate cases where the hashes compare but the data is different. This makes a good engineering interview question, by the way. Google has spent a lot of time to develop techniques to do data pushes well by decoupling it from computing the data in the first place. But no, “data push” is not a marketing term. It has a precise engineering meaning here at the Plex, which I hope you’ve gotten an idea of now. 🙂

    Ian, I believe that’s all results estimates. If I search for [matt cutts] I get “Results 1 – 10 of about 1,970,000 for matt cutts”. In that case, 1.97 are the three significant digits. We’ve done it that way for years and years, but fewer people notice than you’d expect.

    Christopher Penn, not right now. You could subscribe to the feed that only mentions Movies/Videos. See http://mattcutts.com/blog/ignoring-me/ for how to subscribe to specific feeds, if you don’t want to read my cat posts, for example. 🙂

    Philipp, great transcript. I added a couple comments to clarify a few words on your post.

    Ryan, it turns out that video is actually easier for me. I do a one-shot take and upload the video from a web browser. And Google Video pays the bandwidth, so that’s easy. With a podcast, I’d have to pay the bandwidth, and enough people tune in (over 100K views so far) that it would add up a little bit. Down the road I may mix things up more, but if you want to minimize the window and just listen for now, that’s great. 🙂

  28. Matt Said, “having supplemental results these days is not such a bad thing”

    That is probably the most refreshing thing I’ve read in a long while. Thank you. John.

  29. “this video is unavailable at this time, please try again later”

    any idea what’s up with that Matt? I was going to watch the video and got that error =(

  30. Hi Matt,

    Enjoying the videos, thanks!

    Rgds

    Damon

  31. Huh. I can see it just fine, Jonathan, so I’m assuming that it was just a transient thing..

  32. Hi Matt,

    could you be more pricise concerning the peoble gettin cought by the black tuesday? with your little hints you were saying: “do not always use the same keyword in title, desc, h1, h2 and incoming anchor text and don’t stuff the keywords in the s on your site because this will make it look like one of those millions of spam sites, which we only can kill by targeting all sites which fill this scheme, right? That’s what you mean with “deoptimizing” and “not listening to SEOs too much”, or did i miss your point?

    rgds, Philipp

  33. Thanks for the video Matt. The funny part is that I was watching your video at 3am (exactly when you said that!) trying to figure out why our SERP’s are buried at 900+ instead of reading some SEO forum 😉

    Our site hasn’t changed very much over the last four years so the only time I hit the SEO forums is when we get penalized by Google. When Google kicks you between the legs and is silent when you ask why, who else are you going to ask?

    Honestly, it would be much better if Google simply said (via sitemaps or something), “hey, your keyword density is a little high, tone it down a bit.” We’re not hiding text on a page, paying for links, using whacky redirects, etc. We’ve been running this site for 11 years now and have always enjoyed well indexed SERP’s. However, twice now we’ve been kicked between the legs and the last time we asked why and the Google team repeatedly told me nothing was wrong and that we weren’t penalized. This time your video is the only thing which has stated that their was indeed a penalty pushed out on August 17th.

    I’ve lowered our keyword density but frankly, I have no clue what else could be the problem so I’m sitting here bent over what’s left of my groin and scratching my head at the same time. Not a pretty picture to be sure!

    KJ

  34. Good stuff Matt, thanks for the update.

    And hey, some advice on you video…

    I had a good friend that was a professional photographer and he taught me a lot about taking pictures – so maybe I’m overly critical. But did you ever see a picture of a person with a background object sticking out of his/her head? It’s distracting (especially to me).

    You’ve got to make sure that background map (with the dark frame) is not lined up with your head (and you dark hair). Look at the videos and you’ll see what I mean.

    And let’s see some color beside earthtones! You must have something in Google red, blue or green.

  35. And why cant we edit our posts? I don’t know why but “your” always seems to come out “you.”

  36. thanks for the reply to my question Matt.

    Can you also mention what you can about my question to whether search engines monitor and so alter the rankings of a site within the serps.

    I find it amazing I get approx the same number of google visitors per day.

    So regular it is spooky…

    So I ask if google determines my site should get 500 uniques out of the daily averages searches in my field and alters serps positions to achieve near this?

    my site is http://www.martialarm.com if you want to check this out. My analytics graph is a flat line over an extended period 🙂

  37. i love Matt! thanks for your answer!

  38. About my domain with the linkfarm, that you talked about in the video, why do the datacenters, like

    http://72.14.207.104/search?q=site:vgchat.com

    show 36,600 to 51,400 results instead of just a few hundred that google.com shows? The big difference is what the Threadwatch article was about.

  39. Hi Nintendo, I answered your question on the Threadwatch thread before it got deleted. The short answer is that the lower numbers are likely to be the correct ones.

  40. Matt,

    I appreciate your comment about Supplementals and I can understand the implications if a website fell into the following category (total pages = 100, main index 50, supplemental 50).

    But I’m still seeing the following (total pages = 100, main index 100, supplemental 50) – meaning there are 50 entries out there in supplemental that in reality are duplicates of entries in the main index. These are malformed URL’s (multiple slashes, tracking code, etc). We have used redirects to point these to the correct representation.

    I’ve written to Google about this a number of times and was hoping Google would provide some means to remove these duplicate entries, especially since these entries are at least a year old. Redirects have been put in place nearly a year ago with no effect (on Google index) since the cached entries are so old.

    I believe there is a real problem with this. We recently were notified by Google that one of our sites was linking to a “bad neighborhood” site and had thus been deemed low in the Trust category. We were even provided the exact external link which was very helpful. Upon investigation however, we found that this link had been removed from our site over a year ago during a major cleanup. The only page that Google could have found this link was on one of these old Supplemental cached pages. Our page where this link had been removed is currently in the Google main index with a recent cache date.

    I understand the intent of the Supplemental index, but perhaps there are unintended consequences such as this that might have negative effects on websites.

  41. Rotating from week one to week two allows time for the wife to clean, prep and air out said area…

    So…by using two rooms…each room ends up fresh on alternating weeks.

    TxRex, your ideas are fascinating to me, and I wish to subscribe to your newsletter.

  42. Matt, before you referred to June 27, July 27, Aug 17, etc as “data refreshes” and now you are calling them “data pushes”. Are these two terms interchangeable, or are they different things?

  43. Here’s a question for you, Matt, if you are still reading new posts in this thread.

    Big Daddy introduced a system of making an overall evaluation of a site’s linkages (in and out) to determine how many of its pages to have in the regular index. In your reply to TheSEOGuru, you referred to it again when you wrote:

    … it just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I’d expect more of your pages to be in the main web index.

    I initially berated the Big Daddy changes that dropped pages from the index, but then I arrived at a theory and wrote that, if my theory is true, then I can’t find fault with the BD changes. So the question is, is the theory anywhere near the mark? Here’s the theory…

    Because the Web is growing so fast, largely because of the number of useless sites being put up for various reasons, Google has decided not to try and keep pace with it any more, by adding new capacity all the time. So you’ve made a fundamental change in your indexing, and instead of indexing as much as you can, you are now indexing as much of each site as each site merits (that linkage evaluation).

    Is there any reality in that theory, or is it way off the mark?

  44. Hey Matt,

    Totally off-topic (but Google-related) question and then a totally off-topic (but also Google-related) comment.

    Question:

    A friend of mine found this Google temp job posting and she’s trying to figure out how to go about applying for it (i.e. tailoring her cover letter, all that stuff), but it’s somewhat vague:

    http://www.google.com/support/jobs/bin/answer.py?answer=23509&query=quality%20rater&topic=&type=quality%20rater

    Is this related to the Adwords (advertisers) and the quality of the sites they are advertising or to the Adsense (publishers) and the quality of the sites they are publishing? (I suspect the former).

    Also, any other information you or any other Google-type reading this would be greatly appreciated.

    Comment:

    Google did a great job with their cameo in the movie Crank (badass over-the-top action movie where all sorts of stuff blows up and people die, if you’re into that sort of thing). I saw it last night. You guys really lost yourselves in the moment. 🙂

  45. Nice post Matt

    Thanks

  46. Hi Matt,

    I just noticed you deleted my comment…
    Was it because I was critical to the way Google managed the 27th June issue?
    Or was it because I compared Google to the IBM of the 70’s?

    Gregorio

  47. Matt,
    to estimate how many total pages indexed by Google, I simply search keyword w/ some most common keyword like letter ‘a’, assuming almost 99% of pages contains such keyword ‘a’.
    so I got about 20Billion total pages from Google, 11B from Yahoo, and 2.5B from MSN.
    Interesting enough, I tried this over last couple months, at different dates so to sample the changes of total pages got indexed. And it did show quite some difference, although no question that Google has been the No.1 on the total number of indexed pages, way ahead of the No2&3.
    However, the actual value change from 18B to 25B for Google over last couple months.
    So here are the questions:
    1) why is 25B pages the limit? due to storage or computation(crawling/indexing) time constraint? Internet’s total pages is growing rapidly, why I don’t see the similar proportional growth in the total number of indexed pages?
    2)thanks for the Cached time stamp showing when page was the crawled,
    how long does it take for the index to be updated after the completion of a crawling cycle (I presume indexing run simutaneously(or should I said right behind) crawling), how would any searchEngine keep up w/ the growing web pages ? is the indexing too slow to set the above 25B limit? or the crawler?
    3)but I do see the news or blog got indexed in minutes or within 1 day,
    why not the rest of the web pages, get incrementally crawled & indexed within 1 day or shorter? what’s the constraint preventing this?
    I cann’t wait to see Google provide search over the real-time web! How long are we gotna wait for this to happen?

    many thanks,
    Clement

  48. Hi Matt,

    I have exausted all means of help so hoping you can look into this. I manage a website that I redesigned last April. It had duplicate description tags and product descriptions which I fixed. It has a hyphen in the domain and was one of those that lost most of it’s pages in May but slowly regained them all. It was a year old then and never had ranked for it’s main keywords and still isn’t except for the words in it’s domain. This site is squeeky clean but it appears to be penalized for some reason that we can’t fathom. The host said this was a new IP address when we changed hosts in early July so we don’t believe it’s a poisoned domain. I sent in a reinclusion request 2 months ago and still no word from Google so I’m wondering if you can check on it. Not sure I should post the url so if you can email me I’ll send you the info.

    Take care
    Lori

  49. Just saw this video but I liked video on crawl dates.
    Nice use of “multi-media”. 🙂

    On a side question: Now that google has purchased YouTube are ther any plans to integrate google video and youtube?

  50. Hey matt

    I know this post is old but the video is offline – something along the lines of ‘this video is no longer available’.

    i was hoping to watch it again.

    Mick

  51. I think know google gives lot of preference to the referential links. Links pointing to your webpage with some Keyword Oriented anchor text, this is similar to google bomb or google wash.

  52. Google did a great job with their cameo in the movie Crank (badass over-the-top action movie where all sorts of stuff blows up and people die, if you’re into that sort of thing). I saw it last night. You guys really lost yourselves in the moment.

  53. Hi Matt,

    I just wanted to pass my compliments for the useful array of SEO videos you have uploaded to Google Video. The Data Center Video was well worth the watch….I totally agree the focus should be on specific and unique content. also never thought of testing sites with a text reader in terms of usability. I guess my list of priorities have been re-structured in-line with your video comments….

    Thanks again

    Shahid.SEO

  54. As do I! One of the better posts I’ve been digging up over the past days.

    I used to research linking strategies at seobook.net . Looks I’ll be hanging here and getting drunk over the Cutts site more frequently

  55. Very useful SEO stuff in just one video. Matt you have great SEO cap. I would like one too. 🙂

css.php