Indexing timeline

Heh. I wrote this hugely long post, so I pulled a Googler aside and asked “Dan, what do you think of this post?” And after a few helpful comments he said something like, “And, um, you may want to include a paragraph of understandable English at the top.” 🙂

Fair enough. Some people don’t want to read the whole mind-numbingly long post while their eyes glaze over. For those people, my short summary would be two-fold. First, I believe the crawl/index team certainly has enough machines to do its job, and we definitely aren’t dropping documents because we’re “out of space.” The second point is that we continue to listen to webmaster feedback to improve our search. We’ve addressed the issues that we’ve seen, but we continue to read through the feedback to look for other ways that we could improve.

People have been asking for more details on “pages dropping from the index” so I thought I’d write down a brain dump of everything I knew about, to have it all in one place. Bear in mind that this is my best recollection, so I’m not claiming that it’s perfect.

Bigdaddy: Done by March

– In December, the crawl/index team were ready to debut Bigdaddy, which was a software upgrade of our crawling and parts of our indexing.
– In early January, I hunkered down and wrote tutorials about url canonicalization, interpreting the inurl: operator, and 302 redirects. Then I told people about a data center where Bigdaddy was live and asked for feedback.
– February was pretty quiet as Bigdaddy rolled out to more data centers.
– In March, some people on WebmasterWorld started complaining that they saw none of their pages indexed in Bigdaddy data centers, and were more likely to see supplemental results.
– On March 13th, GoogleGuy gave a way for WMW folks to give example sites.
– After looking at the example sites, I could tell the issue in a few minutes. The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling. The Bigdaddy update is independent of our supplemental results, so when Bigdaddy didn’t select pages from a site, that would expose more supplemental results for a site.
– I worked with the crawl/index team to tune thresholds so that we would crawl more pages from those sorts of sites.
– By March 22nd, I posted an update to let people know that we were crawling more pages from those sorts of sites. Over time, we continued to boost the indexing even more for those sites.
– By March 29th, Bigdaddy was fully deployed and the old system was turned off. Bigdaddy has been powered our crawling ever since.

Considering the amount of code that changed, I consider Bigdaddy pretty successful in that I only saw two complaints. The first was one that I mentioned, where we didn’t index pages from sites with less trusted links, and we responded and started indexing more pages from those sites pretty quickly. The other complaint I heard was that pages crawled by AdSense started showing up in our web index. The fact that Bigdaddy provided a crawl caching proxy was a deliberate improvement in crawling and I was happy to describe it in PowerPoint-y detail on the blog and at WMW Boston.

Okay, that’s Bigdaddy. It’s more comprehensive, and it’s been visible since December and 100% live since March. So why the recent hubbub? Well, now that Bigdaddy is done, we’ve turned our focus to refreshing our supplemental results. I’ll give my best recollection of that timeline too. Around the same time, there was speculation that our machines are full. From my personal perspective in the quality group, we have certainly have enough machines to crawl/index/serve web results; in fact, Bigdaddy is more comprehensive than our previous system. Seems like a good time to throw in a link to my disclaimer right here to remind people that this is my personal take.

Refreshing supplemental results

Okay, moving right along. As I mentioned before, once Bigdaddy was fully deployed, we started working on refreshing our supplemental results. Here’s my timeline:
– In early April, we started showing some refreshed supplemental results to users.
– On April 13th, someone started a thread on WMW to ask about having fewer pages indexed.
– On April 24th, GoogleGuy gave a way for people to provide specifics (WebmasterWorld, like many webmaster forums, doesn’t allow people to post specific site names.)
– I looked through the feedback and didn’t see any major trends. Over the next week, I gave examples to the crawl/index team. They didn’t see any major trend either. The sitemaps team investigated until they were satisfied that it had nothing to do with sitemaps either.
– The team refreshing our supplemental results checked out feedback, and on May 5th they discovered that a “site:” query didn’t return supplemental results. I think that they had a fix out for that the same day. Later, they noticed that a difference in the parser meant that site: queries didn’t work with hyphenated domains. I believe they got a quick fix out soon afterwards, with a full fix for site: queries on hyphenated domains in supplemental results expected this week.
– GoogleGuy stopped back by WMW on May 8th to give more info about site: and get any more info that people wanted to provide.

Reading current feedback

Those are the issues that I’ve heard of with supplemental results, and those have been resolved. Now, what about folks that are still asking about fewer pages being reported from their site? As if this post isn’t long enough already, I’ll run through some of the emails and give potential reasons that I’ve seen:

– First site is a .tv about real estate in a foreign country. On May 3rd, the site owner says that they have about 20K properties listed, but says that they dropped to 300 pages. When I checked, a site: query shows 31,200 pages indexed now, and the example url they mentioned is in the index. I’m going to assume this domain is doing fine now.

– Okay, let’s check one from May 11th. The owner sent only a url, with no text or explanation at all, but’s let’s tackle it. This is also a real estate site, this time about a Eastern European country. I see 387 pages indexed currently. Aha, checking out the bottom of the page, I see this:
Poor quality links
Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled. As these indexing changes have rolled out, we’ve improving how we handle reciprocal link exchanges and link buying/selling.

– Moving right along, here’s one from May 4th. It’s another real estate site. The owner says that they used to have 10K pages indexed and now they have 80. I checked out the site. Aha:
Poor quality links
This time, I’m seeing links to mortgages sites, credit card sites, and exercise equipment. I think this is covered by the same guidance as above; if you were getting crawled more before and you’re trading a bunch of reciprocal links, don’t be surprised if the new crawler has different crawl priorities and doesn’t crawl as much.

– Some one sent in a health care directory domain. It seems like a fine site, and it’s not linking to anything junky. But it only has six links to the entire domain. With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages. Hold on, digging deeper. Aha, the owner said that they wanted to kill the www version of their pages, so they used the url removal tool on their own site. I’m seeing that you removed 16 of your most important directories from Oct. 10, 2005 to April 8, 2006. I covered this topic in January 2006:

Q: If I want to get rid of domain.com but keep www.domain.com, should I use the url removal tool to remove domain.com?
A: No, definitely don’t do this. If you remove one of the www vs. non-www hostnames, it can end up removing your whole domain for six months. Definitely don’t do this. If you did use the url removal tool to remove your entire domain when you actually only wanted to remove the www or non-www version of your domain, do a reinclusion request and mention that you removed your entire domain by accident using the url removal tool and that you’d like it reincluded.

You didn’t remove your entire domain, but you removed all the important subdirectories. That self-removal just lapsed a few weeks ago. That said, your site also has very few links pointing to you. A few more relevant links would help us know to crawl more pages from your site. Okay, let’s read another.

– Somebody wrote about a “favorites” site that sells T-shirts. The site had about 100 pages, and now Google is showing about five pages. Looking at the site, the first problem that I see is that only 1-2 domains have any links at all to you. The person said that every page has original content, but every link that I clicked was an affiliate link that went to the site that actually sold the T-shirts. And the snippet of text that I happened to grab was also taken from the site that actually sold the T-shirts. The site has a blog, which I’d normally recommend as a good way to get links, but every link on the blog is just an affiliate link. The first several posts didn’t even have any text, and when I found an entry that did, it was copied from somewhere else. So I don’t think that the drop in indexed pages for this domain necessarily points to an issue on Google’s side. The question I’d be asking is why anyone would choose your “favourites” site instead of going directly to the site that sells T-shirts?

Closing thoughts

Okay, I’ve got to wrap up (longest. post. evar). But I wanted to give people a feel for the sort of feedback that we’re getting in the last few days. In general, several domains I’ve checked have more pages reported these days (and overall, Bigdaddy is more comprehensive than our previous index). Some folks that were doing a lot of reciprocal links might see less crawling. If your site has very few links where you’d be on the fringe of the crawl, then it’s relatively normal that changes in the crawl may change how much of your site we crawl. And if you’ve got an affiliate site, it makes sense to think about the amount of value-add that your site provides; you want to provide a reason why users would prefer your site.

In March, I was able to read feedback and identify an issue to fix in 4-5 minutes. With the most recent feedback, we did find a couple ways that we could make site: more accurate, but despite having several teams (quality, crawl/index, sitemaps) read the remaining feedback, we’re seeing more a grab-bag of feedback than any burning issues. Just to be clear, I’m not saying that we won’t find other ways to improve. Adam has been reading and replying to the emails and collecting domains to dig into, for example. But I wanted to give folks an update on what we were seeing with the most recent feedback.

928 Responses to Indexing timeline (Leave a comment)

  1. Damn Ringtone People!!!

  2. Hi Matt

    Thanks for the much neede detailed update.

    I do hope that you, GG and later Adam (when he feels ready) to post more of the same and more often than you are doing now.

    IMO, its not enough of Google to tell us that they are listening. We need them to talk to us too. I.e communicate 😀

    Once again, thanks Matt. I know you must be also busy preparing for the vacation.

  3. Wow, looks like someone is going to have a short interview today 😛 Thanks for the update Matt.

  4. Yawn !!!

    After the past 12 months of Google messing about and still no better results … I’ve completely learned how to live without you.

    Best wishes, You’re gonna need it

  5. Every time someone asks a novice question in google groups while at the same time saying that google s-u-c-k-s I will refer them to this post.

    Is adam bot or human? 🙂

    Thanks Matt.

  6. Thank you Matt for the update. I really appreciate you finally using some real estate sites as examples. Since this is an indexing issue I thought I would bring it up.

    After checking the logs today I noticed this coming from Google pertaining to our site.

    http://www.google.it/search?hl=it&q=fistingglessons&btnG=Cerca+con+Google&meta=

    LOL now as you can see the #2 site is a real estate site listed for this search term.The page showing for this search is a property description page. As you can tell from the sites description it has nothing to do with this subject matter. Would you mind checking with the index team and see why maybe this would be indexed for such a phrase.

    On a side note it would be nice to see more examples of real estate sites used in the future. Thanks again for the update.

  7. Great post Matt. That really clears up a few things about how Bigdaddy works. Still seems like it is responding very slowly and I find that large companies are getting ahead of smaller sites for local terms even though they are not located in the same country. But that’s mostly because of my own business gripes 😉

    Keep up the great posting.

  8. Great post Matt, thanks for putting in the effort to explain what’s being going on.

    I have a quick question – how long is it taking these days for Google to index new pages? I added a forum to my site a couple of months ago, and while it doesn’t have many deep links from external domains, it is linked to pretty well from within my site and is in my submitted sitemap. Google seems to be crawling it quite enthusiastically. However, none of it’s showing up in the index with a site: search despite the intensive crawling and waiting about a month. Does this mean that Google doesn’t think my forum is worth indexing? 🙁

  9. Yeah, blame this disaster on webmasters, Google can’t index the web properly and it is the fault of webmasters working bad links?

    Funny that those that are running the biggest links scams on the net are ranking great Matt?

    Explain that one, will ya ???

    Where are the indexed pages Matt, do they just disappear, do you have an answer for all of us or are we all using linking scams?

  10. Thanks everybody. I’m glad that I sat down and got all this down. Yup Mike, I figured if I could get this post out before I talked to Danny, then we could just sit around and shoot the breeze. 🙂

    Danny: So, how’s life?
    Matt: Not bad. How are you doing?
    Danny: Pretty good, pretty good. 🙂 So how ’bout those Reds?
    Matt: The communists??
    Danny: No, the Cincinnati Reds!
    Matt: There’s communists in Cincinnati!?!?!

  11. Sina, it’s by design in Bigdaddy that we crawl somewhat more than we index in Bigdaddy. If you index everything that you crawl, you never know what you might be missing by crawling a little more, for example. I see at least one indexed post from your forum, so the fact that we’ve been visiting those pages is a good indicator that we’re aware of those pages, and they may be incorporated in the index in the future.

  12. Great post Matt! Good job. Nice to hear some more detailed feedback.

    Hey can you answer this for me? Finally we have been seeing some improvement to the indexing of our site. I have seen other webmasters mention the same occurance of indexing down to about level 3 pages and that is it. Althought deeper pages are being crawled (level 4+) they just don’t want to stick very long in the index. Linking a bit higher can get them to stick (turning them to level 3 and 2) but that just impossible to do with alot of content. Is this something that will correct in time? We have PLENTY of links at all levels so I don’t see this as a huge problem. Pretty much looking for reassurance to sit tight.

  13. I read two real estate sites and hoping one was mine, but neither applied to me. My real estate site only has outbound links to Home Builders, so I doubt this should quality as spam.

    It still seems to me that you are blaming this on penalties, which I’m fine with, but why would you crawl my site thoroughly on a weekly bases, then never put the results in the index? This has been happening for 2 months now.

  14. Hello Matt

    Thanks for the information.
    “Bigdaddy: Done by March” Is it really true. It means that I do not understand why there are still different search results between
    http://66.249.93.104/ and http://64.233.179.104/
    Please could you give us more details. It’s confusing.
    Where is really Bigdaddy!

    Thanks for your reply.

  15. Thanks for a very informative post. Just one quick question though, is there ever a time when link exchanges are considered legitimate? Maybe even an example of the case? It’s easy to tell the irrelevant link exchanges, but there has to be some instances that maybe a … real estate agent exchanges links w/ a … local moving company.

    Can you comment on this?

  16. HA!!!

    To celebrate this new information I deleted an old directory that was hanging off my most valued website. It made an awful shriek as I removed the database. In the coming weeks there will be a few autoemails asking “where is my link”??? and I will reply, “you will not drain my power anymore, die die!!!

    (ok enough of this Matt Cutts fellah for today, I got work to do, how about you?)

    🙂

  17. Hi Matt,

    Thanks for the post. Problem is… none of your explainations seem to fit my site. I’m trying to maintain a straight ship in a dirty segment. My links have been accumulated by form relationships with related sites (thus I’m building links a bit slower than straight link exchange would allow). My content is most certainly provided to educate the visitor. My affiliate linkage is quite low. But yet my pages seem to continue dropping and supplementals are increasing.

    Thanks for reading this,
    jim

  18. Matt thank you for the explaination about big daddy. But I have checked my websites for points you just wrote down. And I can’t find any of them for my site.

    I have pretty much backlinks. I don’t link to crappy sites and still my indexpages is like a wave.

    On monday I can have 800.000 pages indexed on tuesday 350.000, then back to 600.000 down to 400.000. The difference is way to big. And we had over a million records.

    I also requested a reinclusion request but we never heard from it or saw any changes. My domain name is techzine.nl I have www, forum, babes, msn and pricecheck.techzine.nl in use.

    We did have some problems in the past I e-mailed it a couple of times to google but never got an awnser about it.

    We changed to domain name of the website from tweakzone.nl to techzine.nl (oktober 2005). We forwarded it with 302 (stupid) I found that out later and changed it to 301 (permanent) redirect. No I am still trying to get the whole tweakzone.nl domain out of google and get techzine.nl indexed correctly. We asked many many webmasters to update their links and that worked. Our HTML code is by the book. But still we are not being indexed as we were. I’m running out of ideas and options to fix this. Can you explain to me what I am doing wrong. I have been reading SEO sites, webmasterworld.com, Google guidelines for months now and I can figure out what I’m doing wrong…..

    Kind Regards,

    Coen

  19. Strange how you ignored comments before, and now you have decided to respond.

    Unfortunately, the serps have become absolute trash, so the changes have failed, and I see more spam sites doing well than before.

  20. Thank you for the timeline.

    I find it rather frustrating to follow how your timeline basically outlines how everthing is working just as it should, and watch pages display as regular one day, supplemental the next, a week later regular and then back to supplemental. Searchable as regular listing, completely unsearchable as a supplemental.

    Good to hear you guys have plenty of machines with plenty of room. Perhaps someone should inform the CEO.

    I look forward to you finding other ways to improve.

    Dave

  21. Please, please, please delete all of the old supplemental results! I think if you took a poll, you would find very few webmasters (or end users) who actually value any of those old junk pages (many of which do not even exist anymore).

    I have even used the URL removal tool in the past – but those old pages just keep coming back!

  22. I don’t think what Mr. Cutts meant the mortgages sites, credit card sites, and exercise equipment sites were junk, most likely that they were unrelated.

    Now, I don’t think it’s fair to penalize a site for linking to an “unrelated” site, since many webmasters link to their other websites etc. Links being devalued because their coming from an unrelated page would be more fair.

    And what’s the deal with reciprocals? Although I rarely do them (time related), I don’t think it’s unfair. A vote is a vote right? Even if two people vote for each other. As long as it’s not automotive I don’t see why it would be a problem…

    What about the impact of getting a bunch of unrelated inbound links to your site? Image if someone used a linking scheme to point hundreds, or thousands, of links at your domain? All those links from “unrelated” or “junk” sites would surely put a hurting on you. Not fair.

  23. I agree that reciprocal link directories should be removed as they are link farms, so Google is doing the right thing there!

    Some reciprocal linking is natural though and sites should only have their sites removed if they have a high percentage of reciprocals in their totals.

  24. [quote]Google should NEVER * NEVER * even entertain the idea of deciding what Products or Services are “JUNK”

    This is a recipe for disaster, and extremely arrogant.

    What gives any search engines the right to decide that someone’s business category is “JUNK”. This would be analogous to Yahoo Directory or DMOZ devaluing certain TYPES of products or services.
    [/quote]

    It aint that often you’ll see me stick up for Google but MR SEW you are VERY wrong.

    Google can do what the hell they like with their search engine, cos it is THEIRS.
    If they want to devalue links in their algorithm, that’s their perogative, cos the algo is THIERS
    If they want to say certain business models are junk in their search engine then that is their right, cos the search engine is THIERS

    You have exactly the same right. On YOUR web properties you can say and do what you want. If you want to link out via affiliate URLs you can as the web site is YOURS.
    If you want to buy or sell links, you can as the web site is YOURS.

    When all is said and done when you own something it is up to you what you do with it. Google is no different with whatever it decides to stick anywhere on its domains than you or I am with mine.

    Personally I think Google makes lots of mistakes. I also believe so do many webmasters, myself included but they are our mistakes to make the way we see fit at the time.

    I’m happy with what I do and I am sure Google are happy with what they do. Personally I am going to carry on trying to beat Matt and his team at Google and I am pretty sure he and his team will carry on trying to beat me.

    He wins some, I win some but therein lies the nature of the web. On his site he can do what he wants. On my site I can do what I want. I suggest you, Mr SEW do the same 🙂

  25. Damn … great summary Matt … the “other Matt” must be saying “gulp” to try to follow that act while you are gone. And yea, what are you going to talk about in a couple of hours on the radio show?

    BTW, here’s an oddball corner case that I would classify as a bug – one of your favorite subjects – redirects! 😉

    So URL1 ranked well for keyphrase1. The SERP’s show a title, some text, and a URL. A (legit) 302 (temporary) redirect was setup to URL2. After a few days, the SERP’s for keyphrase1 show URL2, but was still using the title tag for URL1. The “other text” is pulled from URL2. Looking at the cache, it is all URL2. This persisted for several days – looked pretty darn funny actually in the SERP’s, since the URL2 title tag had nothing to do with keyphrase1.

    I think (?) correct behavior would be that if you are going to show a URL in the SERP’s, you should show title/text associated with that page … but in this case, some part of the indexing machine got confused by the redirects and the title1 piece got left in even though URL2 was displayed.

    Email me if you want more info, but you should easily be able to setup a test case based on that description. BTW, Yahoo has a similar bug in the SERP’s (I forgot how MSN handled it), so it’s not just the big “G” struggling with redirects.

  26. I had some clerical errors in my post above (automotive should be automated :), wish I could edit it… sorry.

  27. Hi Matt, great information as always. I have a question about this:

    were getting crawled more before and you’re trading a bunch of reciprocal links, don’t be surprised if the new crawler has different crawl priorities and doesn’t crawl as much.

    How might this impact the typical blog with a lengthy blogroll? Many people have blogs with lengthy blogrolls… and many of those sites in my blogroll end up linking back without it really being arranged as a reciprocal exchanged.

    From what you are saying I get the idea that having a blogroll/recommended reading list doesn’t sound like a good idea.

  28. Doesn’t matter…..they don’t care about results. Bad results means more money for Adwords:)

    Microsoft will squash Google like it did Netscape. When Vista comes out….Google will fall.

  29. Matt. For me, that was the best post that you’ve ever posted here – by a very long way.

    I’m one of the people who has sites that are suffering right now. One of them is the site that we spoke about last year. It had a clean bill of health from you, and nothing has changed since then, and yet it’s pages are being dropped daily. Right now it’s down from a realistic 18k-20k pages to 9,350, but only around 500 of them are fully indexed – the rest are URL-only partials. Yesterday it had 11,700 but only ~600 of them were actually listed, and some of those were partials.

    From your post, I would say that the site fits the description of not having many trusted IBLs. Would that be correct? Reminder – http://www.holidays.org.uk

    To be honest, if it is correct, then I dislike it a lot. It would mean that it isn’t sufficient to have a decent and useful site any more to be fully indexed by Google, if the site has quite a lot pages. It would mean that we have to run around getting unnatural IBLs just to be fully represented in the index, and unnatural IBLs are one thing that Google doesn’t want.

  30. Chris, I talked about this a couple comments above:
    http://www.mattcutts.com/blog/indexing-timeline/#comment-27002
    With Bigdaddy, it’s expected behavior that we’ll crawl some more pages than we index. That’s done so that we can improve our crawling and indexing over time, and it doesn’t mean that we don’t like your site.

    arubicus, typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.

    Ronald R, I’ve got a finite amount of time. 🙂 I spent a large chunk of Saturday writing this up, but I don’t have time to respond to every comment. I wish I did. But improving quality is an ongoing process; if you see spam, I’d encourage you to do a spam report so we can check it out.

    CrankyDave, the supplemental results are typically refreshed less often than the main results. If your page is showing up as supplemental one day and then as a regular result the next, the most likely explanation is that your page is near the crawl fringe. When it’s in the main results, we’ll show that url. If we didn’t crawl the url to show in the main results, then you’ll often see an earlier version that we crawled in the supplemental results. Hope that helps explain things. BTW, CrankyDave, your site seems like an example of one of those sites that might have been crawled more before because of link exchanges. I picked five at random and they were all just traded links. Google is less likely to give those links as much weight now. That’s the simple explanation for why we don’t crawl you as deeply, in my opinion.

    Brian M, I’ve passed that sentiment on. I believe that folks here intend to refresh all of the supplemental results over the summer months, although I’m not 100% sure.

  31. How about a tool so that we know who we should be linking to or not?

    I see spammers in the google index. Maybe they should get penalized down to a PR of 3 for linking to a bad neighborhood! LOL. Just kidding.

    I guess you just may as well nofollow every external link just in case.

  32. Yes a good example of this is our link backs here, I linked to this blog entry from my forums and my link here goes back to the forum!

    Is this what Google is going to take out or are you looking for a high concentration of reciprocal links Matt?

  33. Problem with this post is that most of us would have identified the spam examples that you listed and yet most of us still don’t understand what has been happening to our sites, in our case going from 20000 pages indexed to less than 100 instead.

    You had indicated that there were only a “dougle-digit number” of emails sent to the bostonpub address and that someone was going through them over a week ago already. Today, you also stated that someone was still going throught them. We did send an email and we still have not received a reply. Based on the most recent thread on wmw, it looks like we are not the only ones.

    Real answers would help.

    Many small businesseses are suffering from these massive de-listings. It is not a light subject for us. From our point of view, bigdaddy has not been “pretty successful” and general replies are now a bit short on comfort at this point.

  34. Nice post Matt. Very informative and not at all too long.

    Shoemoney – Was that one of your ringtones sites?

  35. “arubicus, typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.”

    Thanks MATT!

    I think it is a PR the factor but nothing is trickling down from the home page – (Backlinks for the homepage reported from google are completely ?????)

    We keep the most logical structure you could possible have. A pyramid strucure drilling down to the articles. Articles linking to related articles. Googlebot crawls just does not like level 4 +. If pr is a factor (I thought it now updates continuous) I am not sure why it does not filter down (besides I have no clue if it actually does since what is shown on toolbar may not be accurate).

  36. Jason Duke, I did another pass to mark all SEW links as spam. Gotta muck around and delete SEW from my user database. 🙂

    Anthony Cea, I gave a quick example above. Someone was complaining about their pages being supplemental, but that’s the effect, not the cause. The right question is “Why aren’t as many of my pages showing in Google’s main results?” I picked five links to the domain at random and they were all reciprocal links. My guess is that’s the cause. I mentioned that example because CrankyDave still has an open road ahead of him; he just needs to concentrate more on quality links instead of things like reciprocal links if he wants to get more pages indexed. (Again, in my opinion. I was just doing a quick/dirty check.)

    Valentine, I made the links I showed an image so no one would feel the need to go digging into actual sites. 🙂

  37. What if our problem isn’t crawling so much as seeing those pages indexed at all. I have checked the supp index and haven’t seen them there either but I have seen the Googlebot crawling the pages.

    P.S. Is there an email I should send to asking about this and if so where?

  38. OK Matt, so what you are saying is that we should produce great content and hope we get linked to because of the value of the page!

    But when is Google going to get real about schemes to game the engine so that natural links that are earned are rewarded?

  39. Matt… I have previously reported spam, and not in my sector. But nothing happens, so in the end I just gave up.

    I’m wondering how you gain relevant links, in some sectors, without reciprocating, or paying? Do you believe that rivals would give you a free one way link, lol?

  40. @Matt:

    Some days I really wonder why you even post to your blog at all lol It seems that for every 1 legitimate query there are 10 others holding you personally accountable/responsible for their serp/penalty/crappy result.

    I mean really… if the amount of Q&A here was T&A a team of plastic surgeons couldnt wipe the grin of your face 🙂

    anyway …. “my site is getting crappy results and no traffic …” its your fault and Google sucks… LOL not really… but I want to get in on the fun too !

  41. Dear Matt, thank you for explaining us google’s view of link exchanges.

    We have dropped low-quality link exchanges months ago, now going on only with high quality links, added tons of new and unique stuff to our site, but the crawler does not crawl much, and the site is low rated. One year ago it was on top of many competitive searches.

    Is it possible to overcome this bad backlink reputation? It’s almost impossible to get rid of low-quality links once they are there. Do you have an advise for sites like ours?

  42. I have a snall site that offers a free downloadable tool. So I registered a sitemap and waited.. some months. Still not indexed. Every day the bot visits, picks up the site map then the index page then the download exe (which is about 3.5M) Any idea why the bot should try to spider exe files?

    I needed a slightly different version of the tool for a specific audience. so I registered a new domain, copied the site with minor changes. Did not register a sitemap because I wasn’t particularly bothered if it was index or not. The new site was indexed in a week or so, and now has a PR of 4. The original, near identical site, still not indexed.

    The original site has been in Yahoo and MSN for months….

  43. I don’t blame Google for dumping on webmasters that try to game the engine with manufactured links, purchased links, traded links, links from reciprocal link farm directories and so on, this is good long term if they can index the web properly taking these things into consideration!

  44. Better late than never 🙂 Thanks Matt, you put my mind to rest on a lot of issues

  45. I cannot wait to forward this to my mortgage lender, who just asked me just the other day,
    “You work in SEO any idea why I’ve lost so many of my pages in Google?”
    Your explanation sounds so much nicer and more official than… “It could be because your website has a bunch of crap in it, and on it, and connected to it”

    BTW- “It could be because your website has a bunch of crap in it, and on it, and connected to it” is an accurate analysis for many of the mortgage and realtor sites who do not rank well on Google right now.

  46. Personally I don’t care about where my site ranks. I believe that would happen ranks would happen naturally if you serve your visitors well.

    What many of us DO care about is having equal treatment as any other website owner large and small as well as equal opportunity. Spammers should not be there when legit sites should be there but are not being indexed for some reason. I believe that it is healthy for to get a bit of feedback and give feedback to google so that such equal opportunities can exist.

  47. Matt, thanks for the information…but it doesn’t help me at the moment! My most important pages just aren’t getting indexed but are getting crawled. We have a really useful website with thousands of members but it seems that only Google thinks its not good enough! Any advice would be greatly appreciated.

  48. Anthony Cea, you’ve got some people who were relying on reciprocal linking or link buying complaining specifically that they’re not crawled as much. So as far as “when is Google going to get real about schemes to game the engine so that natural links that are earned are rewarded,” I think that we’re continually making progress on judging which links are higher-quality.

    Ronald R, we’ve been checking spam reports more closely lately. You ask “I’m wondering how you gain relevant links, in some sectors, without reciprocating, or paying? Do you believe that rivals would give you a free one way link, lol?” My answer is that trying to force your way up to the top of search engines is in many ways not working in the most efficient way. To the degree that search engines reflect reputation on the web, the best way to gather links is to offer services or information that attract visitors and links on your own. Things like blogs are a great way to attract links because you’re offering a look behind the curtain of whatever your subject is, for example.

    Mike B, I’ve talked to the sitemaps folks a lot. Having a sitemap for your site should *never* hurt your domain. On the other hand, don’t expect that just listing a sitemap is enough to get a domain crawled. If no one ever links to your site, that makes Googlebot less likely to crawl your pages.

    That’s a very concise way to say it, Bob Rains, although a lot of variation that I see is also if someone’s domain is hardly linked at all. At the fringe of the crawl is where you’re likely to see the most variation, while a site like cnn.com with tons of links/PageRank is going to be less likely to not be crawled.

    It’s funny, because most people understand that on a SERP there are 10 results, and if one webmaster is unhappy because they dropped out of the top 10, then some other webmaster is happy that they have joined the top 10. In the same way, we have a finite amount of crawling that we can do as well. Bigdaddy is more deep, but we still have to make choices about whether to crawl more from site A or site B.

    Well said, arubicus. Adam recently sent me 5-6 sites that he thinks we could do a better job of crawling, for example. So I wanted to give people an update of how things looked right now, but we’ll keep looking for ways to improve crawling and indexing and ranking.

  49. Hi All

    Anybody wish to say hello to our new friend Adam_Lasnik of Google Search Quality team 😀

  50. >Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled.

    So is the conclusion that sites that are deemed “low quality” will also have “light crawling” correct?

  51. Thanks for the feedback Matt, I really appreciate it. Made me feel thoroughly warm and fuzzy inside :). Seriously, it’s really great to have people at Google who directly talk to webmasters and demistify things that can seem a unusual to outsiders. Keep up the great work!

  52. graywolf, it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic. Hope that makes sense. Light crawling can also mean “we just didn’t see many links to your domain” as well though.

    Glad I could answer questions, Sina. It’s nice that I didn’t have any meetings this afternoon, so I could just hang and answer questions. Then I’ve got Danny in a half-hour or so. But that’s okay too. Maybe for some of the questions, I can just be like “Ah yes, Sina and I talked about this in paragraph 542. It helps us to crawl some more pages than we index so that we can see which pages might help us improve our crawl coverage in the future.” 🙂

  53. Boy matt you have to have a vacation after all of these posts you are doing.

    “improve crawling and indexing and ranking.”

    I personally expect things to move more from an SEO standpoint to more of a QUALITY standpoint in that businesses and sites to compete more on the QUALITY level rather on the SEO level. I believe now (after what you mentioned) this is where you want us webmasters to compete (probably always have). This push for quality will make this a WIN WIN WIN game for all of us.

  54. Yup, exactly, arubicus. There’s SEO and there’s QUALITY and there’s also finding the hook or angle that captivates a visitor and gets word-of-mouth or return visits. First I’d work on QUALITY. Then there’s factual SEO. Things like: are all of my pages reachable with a text browser from a root page without going through exotic stuff. Or having a site map on your site. After you’re site is crawlable, then I’d work on the HOOK that makes your site interesting/useful.

  55. Yep, clear enough, and what I suspected, thanks.

  56. Matt, I have to agree with Joe and Anthony in that spanking webmasters for reciprocal links is often unfair. And I don’t have an intelligent suggestion on how to spot reciprocal link breeding facilities vs. honest, natural reciprocal links…at least not anything that can’t be instantly and easily “gamed”.

    My industry might be a good example to use to look at reciprocal linking, actually (it’s weddings & honeymoons). In this market, there are certainly a large number of blind link-exchangers out there, adding no value to the end user with their hydroponically engineered reciprocal link spaghetti. But on the other hand, a site like mine (honeymoon travel) might have pages that list a small number of recommended related businesses (e.g. half a dozen wedding coordinator companies in Hawaii…an online jeweler for rings…an association of wedding officiants…etc.). We list other wedding-related companies on our site with whom we’ve done business (and been happy with)…and naturally, many of them also recommend us on their sites. We each are happy to recommend other companies in our general industry whom we believe do a great job for our customers and yet don’t compete with us.

    Now, without thinking algorithms, should this kind of link be very important in determining good sites to return to users?

    And what should one think about two companies where one thinks the other is great and links to them….but the feeling ISN’T mutual?

    So there’s my argument for SEs being VERY careful when it comes to designing algorithms to discredit or punish for reciprocal links. Yes, I realize that massive reciprocal linking campaigns are evil and manipulative, but there may be some baby parts being thrown out with this bathwater.

  57. Matt:

    I haven’t experience the pages dropping problem webmasters are attributing to big daddy, but I have seen some behavior I would like to understand.

    Through the middle of April, our SERPs showed with our homepage and then the product page indented on the next item. It looked really great. Over the last month, the deep linked pages no longer show up for some high volume keywords, only the homepage.

    I won’t list the keywords in a blog, but if you want to look into it, I would be glad to provide a list.. Alternatively, look at our sitemap page and you can see it for the 3rd, 4th, 6th and 7th term listed (terms 1 and 2 are our brand name)

    Am I alone in seeing this or does it represent a trend?

    Thanks

  58. Matt,

    Somethings been eating at me…

    If link exchanges are frowned upon and buying links is a no no. How is a new site supposed to ever be able to succesfully enter a competitive space? It seems the only people who would be able to compete are very old sites (not neccesarily the best) and people who maintain a zillion domains for interlinking purposes. Google seems to be placing an unfair barrier to entry UNLESS spammy tactics are employed.

    -jim

    -Jim

  59. Circling back to folks who just had comments approved. Joe Hayes, it’s not that reciprocal links are automatically bad. It’s more that many reciprocal links exist for the wrong reasons. Here’s an email that I just got:

    Dear Site Owner,
    I am looking for quality link exchange partners for several of my
    sites. I have browsed http://www.mattcutts.com and it seems like a link exchange
    between our sites will benefit us both.
    If you are interested in doing a link exchange between our sites, I
    would be glad to hear any offer you might have.

    In general I will give back a link from a page with the same PR rating
    of the page I will be given.

    If you own any other sites for which you are willing to trade links,
    please let me know.

    I’ll be glad to hear anything you have to offer.

    Kind Regards
    Loki

    I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.

  60. Matt, everyone knows that Google has a Supplemental Index, but no one outside of Google knows exactly what it is and what its purpose is.

    Even if you cannot give us the details, will you please share a working definition that SEOs can point to as the most reliable description?

  61. Okay, I gotta go do a pass at email before meeting up with Danny. Talk to everyone later.. 🙂

  62. Michael Martinez, personally I’d think of it as a fallback way that we can return results for specific queries where we might not have as many results in the main index. Okay, now I really am going to go. 🙂

  63. >>>>The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the ***inlinks*** or the outlinks of that site.

    Nice. We can destroy our competition by making spammy sites and then linking to the competition!!! SWEET!!!!

    Maybe Google should update ‘There’s almost nothing a competitor can do to harm your ranking or have your site removed from our index.’

    at

    http://www.google.com/support/webmasters/bin/answer.py?answer=34449&topic=8524

    Now it’s easy to harm the competitions ranking!!!!

  64. Thanks again for the feedback!

  65. Great Update Matt!!!

    It looks like I had put it together pretty well in my explanation of why people were disappearing from Google that can be found at http://www.ahfx.net/weblog/80 . I just needed to build on the devaluation of reciprocal links.

    The only remaining question is whether it is the reciprocal link that is bad (we had already discussed that reciprocal links were losing value back in November.), or that the “unrelated” outgoing/incoming link that is bad. My bet is on the lack of quality of the inbound/outbound links. It seems the “tighter” the content, links, and tags are, the better the page does. Although, I agree also that reciprocal links should be devalued.

  66. Matt,

    I’ve seen mentioned that duplicate content can potentially hurt a site. On one of my sites I’ve had people write FAQs, etc, and am now wondering how much of what was written might not be original content. Can you, or anyone else, point me in a direction of being able to check for duplicate content, other than just pluggin sentances into Google. How divergent does content need to be to be considered original?

  67. I’m sitting here watching Danny. 🙂

  68. Matt,

    Example, a website contains a “link exchange” button within their navigation. When you look closer, the websites forming the link exchange are real companies but the majority of links are unrelated, e.g. car-hire, wood art gifts, labels. Would I be correct in assuming that the non-related links carry no weight and that the domain is scoring only from the related “link exchanges”. Note: I say link exchanges and cringe as I’ve usually been against this however, having just read your latest note I feel encouraged to build a link exchange page and provide reciproical links to associated quality websites. Have I got the wrong end of the stick here? Thanks in advance for your time.

  69. Is adam bot or human?

    Clearly, I’m a bot.

    Aaron Pratt, what is your a/s/l?

    Matt Cutts, c/t/c?

    I am a magic 8-ball. Type !future to read your future.

    Okay, goofy stuff aside, this sort of a statement was long overdue. I can’t speak for anyone else, but I was ripping my hair out for the longest time watching people bitch, moan, and complain because their spamtastic sites weren’t getting indexed or that they were dropping. Tough **** for those people. Let ’em build something worth visiting.

    The only problem is that now the idiots will come up with some random and illogical explanation that “linking to other websites and forming alliances isn’t a bad thing, and Matt should be listening to me because I’ve created some 3-page keyword stuffed piece of crap and think I’m an expert.”

    Anyone else wanna bet that SEW says something stupid in response? 🙂

    I just have one very stupid question:

    Things like blogs are a great way to attract links because you’re offering a look behind the curtain of whatever your subject is, for example.

    Doesn’t this also lead to the possibility of increased blogspam as far as people reading this comment going and creating BSLogs (TM) full of meaningless drivel about something loosely related to the topic at hand and/or cross-posting to other blogs related to topics (moreso the former concern)?

    Personally, I’d rather not see blogs like yours and Aaron Pratt’s and Jaan Kanellis’ blog get dragged down into the mud because a few dumbasses ruin the concept.

  70. Matt: ‘I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.’

    The thing is, just writing great content isn’t enough. I’m not saying my content is the greatest ever in the whole world, but its pretty good. If people can’t find your site, along with all its great content, they will never link to it. I don’t know what the answer is, I can see how some reciprocol links are bad, and how buying links is a problem for SE, etc. But it is extremely difficult to get links to a site with just good content. Unless maybe you know lots of people who can give you links, etc. For shy people like myself its tough, I just don’t know enough people and because of the shyness I haven’t participated in any online communities like I should have – I’m working on that though. It seems that getting traffic from SE is kind of like a popularity contest – its like highschool all over again – I could be real nice and real smart, but too shy to be popular so my site is just ignored by SE.

    Oh well, sorry to whine. I’m trying to write high quality blogs to attract links. (Doesn’t seem to be working too well yet though. )

  71. Nice. We can destroy our competition by making spammy sites and then linking to the competition!!! SWEET!!!!

    That’s not what he said. He said the spammy IBLs would not help. He didn’t say they’d hurt. They basically have no effect at all.

    The worst thing you’ll do is give that person no increase in traffic. The best thing you’ll do is give them a bunch of direct traffic from your spamlinks.

  72. “They basically have no effect at all.”

    The only thing I see happening is when your site used to rely on the effects of such links in the SERPS and now since the effects are gone you may see decreased rankings and spiderings (even fewer indexed pages) and lower PR.

  73. Matt,

    That was your best post so far on this site!

    The reason I liked it so much was that you gave many examples.

    Please keep the examples coming. That’s where we learn the most!

    Dave

  74. Matt. What you’ve described really sucks, and not only from a webmaster’s point of view, but also from a Google user’s point of view. I know that you are the spam man, so it’s not your fault, but the whole thing is just plain crazy.

    What you described means that a website with quite a lot of good, useful pages, won’t be fully indexed unless the site has enough IBLs, and not just any IBLs – certain types mustn’t dominate. What kind of search engine is that? FWIW, I don’t mind the death of reciprocals (I’ve never got invloved in it anyway), but it’s crazy for a search engine to require a certain number of IBLs for a site with a lot of pages to be fully indexed.

    For one thing, as a user I want a search engine to show me all the relevant pages that it knows about, and I don’t want good pages left out just because the sites they belong to didn’t have enough IBLs. I want good service from a search engine, and depriving me of good relevant pages is a very bad service.

    For another thing, as a webmaster, if my pages are good, index them, dammit. What on earth do IBLs have to do with it? Doesn’t Google want to show good pages to its users? If you don’t want to rank them very highly, don’t rank them very highly, but there is no reason in the world to leave them out of the index, and deprive Google’s users of the possibility of seeing them. It’s just crazy, and makes no sense at all.

    No, I’m not talking about the site I mentioned earlier in the thread. Forget that site – there’s nothing wrong with it, but let it go out of the index. I’m talking about Google users who are being *intentionally* deprived by Google, and the owners of perfectly good websites who are being shafted because their sites just don’t happen to have enough IBLs to satisfy Google.

    The other nonsense is the outbound links that you mentioned. What the hell has it got to do with a search engine what links a website owner puts on his/her pages? If people want to put affiliate links on a page it’s entirely their own business. And if they want to link to off-topic sites it’s entirely their own business. And if they want to sell real estate on their sites, it’s entirely their own business. It has nothing whatsoever to do with search engines, so why are they penalised by not indexing all of their pages? Why are Google’s users *intentionally* deprived of good and useful information, just because a site’s pages contain things that are nothing to do with search engine’s?

    From what you described in your post, Google has consigned many perfectly good sites to the scrap heap, just because they didn’t have enough IBLs, or because the sites had some perfectly valid links in them. And they’ve intentionally deprived their users of a lot of perfectly good results for the same stupid reasons.

    I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.

    Yeah right. Just what Google has always said – concentrate on making a great for visitors. And if the site doesn’t have enough IBLs to satisfy Google??? What a load of ….

    Frankly, the whole thing stinks, and it stinks big time! I’m just not going to run around getting unnatural links to satisfy a bloody search engine, as you suggested to a couple of your examples. Why should anyone need to do that? My attitude to it is “stuff it”, and stuff Google!

  75. Great post from PhilC, I agree with his statement that IBL should not determine if a sites pages are indexed, Google should not be guilty of selective indexing of the web as Microsoft calls it.

    To be a world class search engine you have to index pages to serve relevant results, Microsoft is indexing pages on the web much better than Google and so is Yahoo at this point in time, thus their results are much better and more relevant than Google SERPs.

  76. PhilC said it perfectly.

    And what really sucks is this is KILLING small businesses that just want clients to be able to find information on them. What do they know about inbound linking or reciprocal linking? They just want to be found for [product anytown, usa]

    I have a one off italian pizza place that just wants people searching for catering to be able to possibly find him the area. He’s in Google Local, but some people don’t even look at that, or depending on the query it doesn’t come up. He links with all his other local buddies: a clown, a hotel for catering, an iron worker they did his little cafe fence. Now this seems to be discouraged. They just want to share business, not join this big link scheme.

    If i type in my small town name on Google now, the top 20 hits are all gigantic spam sites, that contain the equivalent of a Wikipedia article.

  77. and what is wrong with affiliate links? how else do some sites make money?

  78. Thank you for addressing my concerns directly Matt. I do appreciate it.

    I must say that I’m really disappointed.

    I’m really disappointed that related sites with good and logical reasons to exchange can no longer exchange links without harming themselves.

    I’m really disappointed that if an authority site links to me, I cannot link back to the authoritative information they provide without damaging the crawling of my site and theirs.

    This is not a matter of “not counting” something. This is a matter of blindly punishing sites, and most importantly, searchers.

    No, Google has not not moved forward. They’ve taken several steps back.

    Dave

  79. So, how does this relate to the inented index page event that people have been seeing. It’s not hosting crowding

    Example: Search for “MY company Name” would normally brining up the listing index page from Google directory. Now, it brings up another page from the site with index page indented under it.

    Penalty, fluke, ??

  80. “They just want to share business, not join this big link scheme.”

    The way I see it is that there is NOTHING wrong with trading links. Just don’t expect higher rankings and faster indexing because of them. If you rely on recip. links and junk scraper/directory links and have not much for any other quality links you may see some adverse effects because those links are not counting for much anymore. Go out and promote sure but be smart on who who cross promote with just do expect your ranking to go up because of it.

  81. EDIT: Go out and promote sure but be smart on who who cross promote with just do expect your ranking to go up because of it.

    should read

    Go out and promote sure but be smart on who who cross promote with just don’t expect your ranking to go up because of it.

  82. thanks for clearing everything up matt.
    enjoy your new man-boobs on your plastic surgery vacation.

    love,
    tmoney 😉

  83. Matt,

    First, I appreciate you maintaining this blog and responding to some of the comments.

    I realize you can’t analyze every site, but from what I’ve seen at Webmaster World, the sites you have picked are not very representative of the sites which are having problems with the supplemental index and not being crawled. The sites you have picked are obvious offenders, but sites such as my own and many others have none of these issues. To us, it seems that building a site to the best of one’s ability isn’t good enough; unless you can play the Google game, you’re out of luck. For instance, the inbound link issue. There are only a couple active fansites related to mine (most are no longer updated, and my site is only a few months old). Therefore, I am stuck with a couple inbound links unless I try to contrive inbound links, which I have no desire to do. Of course, the related sites also naturally link back to me – I’m related to them too, after all! Now that’s bad? It’s quite a Catch 22.

    I think one should hesitate to imply that all the websites with supplemental problems “deserve it” because they’re all doing something so terribly wrong that they no longer are recognized by the index. There are many sites which do not fit into this penalty schema that have lost pages – too many to blow off as abberations in an otherwise successful change.

    I care because my site, the last time I checked, had seven pages out of over 600 that are non-supplemental, and it is jumping wildly in the Google rankings daily for main keywords, varying from 35-75 any given day. Meanwhile, it varies between #6 and #8 on other search engines.

    But frankly I am more concerned with the fact that so many pages with good content are being ignored. If I were #105 for my keywords but could look at site:[my site] and see that my pages are indexed, I would be OK with that. At least they’re there, and people who are looking for content unique to my site can find it. However, now, according to Google, only 7 pages on my site are searchable for the average Google user – only seven pages of my site exist in Googleland. I can put exact phrases from supplementally indexed pages in the search engine and get no results returned. With almost nothing indexed, I feel like all my honest efforts are worthless to Google for some mysterious reason.

    Yes, it’s your search engine and you may do what you like. However, I’m sure you understand that a search engine that throws out good content is not doing its job. Hopefully, you will not shrug off the numerous legitmate concerns because you were able to find in the vast array of e-mails you received some egregious offenders.

  84. Matt,

    Thanks for confirming my theory. I – and a few others – have been saying all along that the Dropped Pages bug is being caused by a faulty or out-of-date backlink index.

    You just confirmed it. Do you honestly think that all of the people making a noise at the moment are naughty people with some irrlelevent outbound links, or “not enough inbound links”? Isn’t it far more likely that Google just arent’t finding or indexing the backlinks properly since Big Daddy?

    Are you looking on Yahoo or MSN for backlinks before you go generalising about sites not having enough? Because that’s Big Daddy’s problem: many, many, high quality backlinks are just not registering as backlinks anymore. It’s a bug. You must have a very low opinion of an awful lot of people to just dismiss us all as whining idiots who didn’t know you need a few backlinks. Take a look at Yahoo’s backlinks for the effected sites before you condemn them all to the garbage.

    How long is it going to take you guys to notice your backlink bug? It probably doesn’t help that you keep deleting any comments that mention it.

  85. Matt: ‘I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.’

    I would recommend as one other poster that if Google wants to get a handle on reciprocal link farms to look at real estate sites. I have pointed out before and I have been guilty of this myself but there are huge link farms operating with high Google rankings that are nothing but link farms.

    Multiple site creations on the same subject, directory creations, scrapper sites all that are created to increase the manipulation of Google and to benefit the present link farm group even further in Google.

    A Good example of this was some research that I performed last week on our # 1 competitor in Google. Out of 1000 links, this site had 40% of them coming from 5 IPs. Yet Google has rewarded this type of linking scheme with top rankings.

    Based on my own personal experience Google has rewarded reciprocal link farms and continues to do so. Based on these subject sites if a link farm is created and is themed, Bigdaddy is rewarding these unnatural link schemes.

    You have groups and some Seo companies that are able to point 1000s of links at their clients sites or create a closed off network of themed reciprocal link exchanges that are not natural according to Googles definition. Myself and others as I am sure you understand Matt that these systems are only meant to manipulate Googles serps.

    On the flip side of this coin is the fact that new sites who are trying to compete with these sites must follow the example set by Googles reward of high rankings of these practices. As long as Google rewards even a few sites with these type practices new sites that may offer more to the online user will forever face an uphill battle for business in Google.

  86. so, no affiliate links? or how many is ok? cause you know, why not just kill the affiliate business model all together.

    let’s have a look at some examples: amazon.com – currently nothing but a site promoting other site’s merchandise but have own transaction processing capability and sell some books whathaveyou on the side (177 million pages indexed by google). any site providing syndicated news? nothing but a “duplicate content” aggregator. every coupon site on the web (type in “coupons” in google, all those sites are there) is nothing original but a bunch of affiliate links (mostly cloaked). are you gonna not index any of those? i say let the users decide which ones they like most. bookmarking rate maybe? i don’t know. things like that. backlinks? well if you delisted all the sites that originally linked to some site, there will be no backlinks left i guess. you know all the small sites that decided to give each other a boost.

  87. Great post Phil C. It’s nice to see somebody who is pro business. Google wants to corner the market on search but has stifled small business’s ability to make money. BD seems to favor only their “fat cat friends”.

    Google: Our goal is to index the entire world’s information but
    alas we’ve found it more lucrative to censore.

  88. I have a question about sites missing from the index, and I wasn’t sure where else to get a reply, so I hope you don’t mind me asking here.

    Last fall I had five sites completely banned from Google for having “outgoing links to pharmacy sites”. I removed all outgoing links from all the sites, and filed reinclusion requests. One site, a PR 7, was immediately back in the index and continues to show up on page one of the search results. The other four sites have never reappeared at all, despite the fact I made the same modifications to them.

    The Google reinclusion people wrote to me in March about my missing four websites, saying, “Please be assured that your site is not currently
    banned or penalized by Google.” When I wrote back and asked why my sites were missing completely (grey bar, and the domain not in the index at all), I was told the matter would be investigated by the engineers. That was three months ago, and my sites are still invisible. They’ve been gone from Google for 8+ months now, after being in the index previously for over two years.

    Have my sites been “sandboxed” or something, prior to reinclusion? They were only a PR 5 or 6, so did the PR 7 site get some sort of priority? I really would like my sites back in your index, and I’m at a loss as to how to achieve that when your own engineering team claims my sites aren’t banned at all.

  89. Matt, it seems that google picking on reciprocal links just makes it more attractive to buy expired domains.
    you always avoid talking about this type of webspam, yet its doing more to upset the balance of good serps tahn any other type of spam.
    You also mention that blogs are a great way to develop one way links.
    That also plays into the spammers hands.Expired blogs still work a treat and that profile I gave you many weeks ago is still live and active. http://www.blogger.com/profile/17839170
    So much for your inside man at blogger taking care of it.

  90. Matt, thank you for the update While I appreciate the information it does little to change my philosophy that it is almost impossible for small site (25 – 100 pages) playing by the rules in a competitive market to rank in Google.

    It is sad to come to the realization that the only sites that Google feels provide any value to the web are the large multi-nationals or sites with 10k+ pages and thousands of incoming links. How relevant will Googles results be if webmasters abandon efforts to rank in your index and focus their efforts on the other engines?

  91. So Matt,

    Are you partly responsible for this debbacle then? Even if you didn’t have a backlink bug (which clearly you do), your logic is fatally flawed. The innevitable end result of requiring more and more inbound links before you will even dane to index a site is Spam. Spammers do this stuff full-time. They spend no time on content, and no time on value-added functionality.

    The more ludcirous hoops you make sites jump through to qualify for the index, the more you pave the way for Huge Companies or Spammers. The in-betweens get sidelined.

    Incidentaly, why does a site need a gazillian artifically bartered inbound links before it is worthy? No one at Google seriously believes that inbound links are still a measure of relevance do they? Have you read your own posts? They talk none stop about how to go about aquiring the right kind of links.

    You’ve all lost the plot. You’ll delete this message without even bothering to pause and consider whether or not I’m right.

  92. Ah, finally. Maybe now we can finally kill off the link exchange program cottage industry. A few particular countries are not going to be happy about this! 😉

    Hey Matt, when is Google going to implement the long awaited SERPs Randomizer? I mean, we’ve talked about it in the past and it would be great to see those first 30 SERPs rotating randomly. Do that and watch the life expectancy of a search engine marketer drop by a few years. 😉

  93. Matt,

    I know google is not giving us webmasters a full picture with the link command. I did the link command on yahoo and msn and I noticed some scraper sites copied my content and added some links to a few of my websites. I have a feeling google is looking at these links as questionable. I am in the process of emailing these scraper sites webmasters and getting the links removed because I did not request to put them there and they violated copywrite by taking our content.

    Since google crawls better than msn and yahoo, will there be a way in the future for us webmasters to see these links? Honestly right now if a competitor wants to silently tank a websites rankings in google all they need to do is drop a bunch of bad links. Without google giving us webmasters the ability to see the links we may never even know this could happen.

  94. Hi Matt. I appreciate what you have explained here. I suffered through supplemental pages earlier than many others, and at this time I am happy to report that nearly all of my pages have returned when doing a “site:” type search.

    Unfortunately my Google traffic has not recovered yet. At one point it dropped down to about 2% and has recently risen to around 5%. This is not good as it used to run closer to 75-80%. Have surfers changed search engines? I don’t think so as the total numbers from other engines hasn’t varied a whole lot.

    Earlier I did a search for a page on my site and it was found on the 4th page. That’s fine for that page, but the sites that came up ahead of it were not even related to the subject and only mentioned in passing the words that I had searched for. I expected to see well known sites in the very same niche appear in that search, however none did. It looked like crap was floating to the surface instead. It looked like relatively had disappeared out the window and that cannot be good for Google’s business.

  95. Great post Matt, thanks for sharing all the insight. Congrats on getting more help recently, I hope that this frees you up to make more posts like this.

  96. Hello, Im really new to dealing with google and I really appreciate finding some feedback from you guys, great!

    I have new site that has about 3750 pages. The total indexed pages are constantly hopping from 30 to 340. It would be great if I could get them all indexed. lol

    But I’m completely lost as to what I am supposed to do to get all my pages indexed? I really dont want to be going around the net trying to get links to my site and we are being told its better we create good content instead. But hang on how will my great content get indexed if I have no links? As your also saying we need links to get indexed, but not any links they must be “good” links. Im lost again! lol What I mean is that for someone with little experience reading that they need links its really hard to judge what are good links and be able to find places to get good links. This again seams to mean that established sites with big SEO budgets are always going to be ahead regardless of there content.

    I think PhilC made a really good point above too. I have some unusual specialist information on my site that isn’t indexed. There are currently no results for related search terms for this information. Now where is the benefit for people that these pages arent indexed as there is not enough links pointing to them?

    What if you have one large site with 60k inbound links that has a page of information about a subject and it’s the only page returned for a search term. Then you have a small site with no links that hasn’t been indexed but has a similar page that’s a 100 times better content wise. Why not index it and show it second in the results? Surely that’s better for everyone?

    Lastly, if the dropping of people index is because of site trust issues then why is my own new sites index going up and down like a yo-yo? Newly indexed pages then hardly any pages and then newly indexed again. Is it having trouble making up its mind if my site is trusted or not?

  97. Hi, Matt!
    I was wondering if you guys changed something to the algo in the last days…
    A few hours back, my site dropped from 3 pos to nothing, although it’s a good site. The sitemap acct doesn’t show any spam warning, but google started to delist the pages…
    Can you have a look? I’m a total mess now…

    Thank you,
    Chris

  98. Are you kidding Chris?

    Did you read Matt’s post? Your site is a piece of junk not worthy of Google’s index. It’s true. Matt has personally checked. And every site that has been de-indexed that he has looked at has not had enough inbound links or else has had outbound links that are just completely off the wall. Imagine a real estate site having the gaul to link to some other kind of site. What a joke. You’d better get busy and go after links. It’s links links links from now on. It’s official Matt says so. You are junk if you don’t have links. Google love blogs you know. You shouldn’t really be allowed to have a website nowadays unless you are willing to link yourself silly on your own blog. It’s the future you know. And it’s great. Matt says so.

  99. Yeah, for a porn site that is some great spam work indeed man!

    Google is taking porn sites out of the index if you have been reading the news there are lawsuits flying around about them being in the index!

  100. Oh, my, goodness! It just so happens that at about the same time my remaining indexed pages disappeared I had just added a reciprocal link to my site!!! Ugg!

    Soo… now that I’ve removed all links from my minute template based website and added a no follow command to the three remaining links, should I expect to see a change in indexed pages on the next crawl? Or am I banned for a year or something?

    By the way thanks for the update I’ve been stalking your blog for over a month waiting for something like this post.

    Heh… and I’ve only ever had two internet customers… (but they were recently which is why I was inspired to get my site indexed 😉 )

  101. Anthony Cea Said,
    May 16, 2006 @ 7:24 pm

    Yeah, for a porn site that is some great spam work indeed man!

    Google is taking porn sites out of the index if you have been reading the news there are lawsuits flying around about them being in the index!

    Yeah the funny thing about that ranking is that my site is real estate, not porn. It only shows a flaw in Googles algo and ranking system. I kind of liked Midwestnets comment on DP

    Fisting lessons with your new house, anyone? 🙂 The page ranked for that term is a property detail page of a listing in Las Vegas. First I thought ok maybe this page was hijacked but it hasnt been, then I thought ok did someone get access to the site to change title tags and meta descriptions, wasnt that.

    Checking that page I found no backlinks to it with that anchor text, so this only leads me to believe that somehow someone at Google turn over a cup of coffee on their computer 😉 and caused all this mess..LOL

  102. Zoe C,

    Shame on you. You added a reciprocal link! Why? It’s a simple fact that natural links just materialise out of thin air if you are any good. How? Because people find you, think you’re great and link to you. How do they find you? Why, on a search engine of course…ummm…wait a minute…Oh my god. The system is flawed! Heh Google. You’re a bunch of idiots.

    I guarantee, history will not look kindly on this particular period in Google’s history.

  103. Hi Matt,

    I wanted to ask a couple of questions. In the next days I am going to launch a new site that will be offering a certain service to bloggers and webmasters. Basically it will offer a script for free. I am going to ask the people using it at their blogs and websites to link back to my site, that can attract all kinds of backlinks because the script can be used at any kind of site. If some sites from bad neighborhoods according to google use this script and link back to me will this penalize my site?

    The other thing that I would like to ask is: on my blog I have a niche affiliate store related to my blog’s theme as way of monetizing it. Will this lower the overall trustrank of my domain? for example can this cause a decrease of the rate my blog is being crawled or cause my site to loose it’s current rankings for certain keywords?
    If that’s the case I think it would be very unfair, it would be like msn penalising sites that have adsense code on them.

    Thank you,
    Dimitris

  104. Matt, thanks for your great post.

    One question relating to sites that send traffic in exchange for linkbacks. Say 20,000 sites link to a page, and in tern that page sends traffic to each of those sites. Here’s the twist: that page rotates links in and out periodically, so that on any given day, it only displays 200 links. I consider the 20,000 incoming links as manufactured links, but technically, 16,000 of those links are not reciprocal. Will Google be dealing with this type of linking scheme anytime in the future?

    “What do you think of that? Hmm? I said ‘What do you think of that?’ Don’t answer. You don’t have to answer everything.” 🙂

  105. Hi, Matt !

    This is a very valuable post indeed ! It has given good insight over the quality parameters which Google considers when indexing the pages.

    A better web can be made by openly sharing the problem & comments.I feel that there is need for something/ some forum where volunteers /enthusiastic can contribute to share their real time expereince about black hat seo /non ethical SEO practices followed by many sites in an annoymous way.This will help to improve the Google filters continuously and a better web can be made.

    Thanks & Regards,
    Ajay

  106. Hi Matt,
    Thanks for the post! I pretty much expeted everything you have said. After all Google is going to keep trying to improve itself so after all in the long run only the quality sites are going to last. Any thing that tryies to game the SE with backlinks or whatever will eventually get kicked out!

    Anyways,on my site I have a link to my “web stat counter” at the bottom. Will that be concidered as a bad link at the bottom to have?

    I have other bad links too…but i want to know specifically about the web stat counter link? Is it a bad link to have?

    Thanks

  107. I have a small, noncommercial, ad-free site (with good-quality content). You could say I’m not so much a webmaster as just some guy with a website. There are a lot of people like me, who seem to be being left behind by the new Google with its infatuation with giant business enterprise.

    From my perspective, both yahoo and msn do a far better job than G at returning results from my site when they are pertinent to specific search queries. At some point — early February as I recall — I noticed that traffic to my website had virtually stopped. I then found I had dropped out of the Google index. After a little research I decided I was being penalized for duplicate content (which probably occurred when I moved the site to a new domain). I filed a reinclusion request and at least got my site indexed, although at its previous host — defunct for almost a year — it was still showing better results than the same site its current location last time I checked.

    Right now I feel I’m doing about all I can, which is to improve and expand my content and hope someone notices. Maybe Google will some day start to return better results from my site so that traffic will pick up again, but it’s kind of out of my hands.

    All of which is a long preamble to a comment about how organizations fail. I’ve looked at this issue a little, and typically there is some fatal flaw that seems insignificant at first but gradually become magnified and turns out to be their undoing. (I suppose that insight was the genius of Greek tragedy.) Anyway, it’s looking to me like Google’s fatal flaw is paranoia. By obsessing about people scamming its SERPs, it has started dropping valuable content. It expends too much of its energy in a kind of perpetual chess game with black hats, who are simply playing whatever system G devises, and so it has turned into a mirror of their tactics. The esclating back and forth is like an endless succession of reflections in funhouse mirrors. Meanwhile, its competitors, perhaps just by doing nothing, are now returning more useful results.

    Or maybe I’m wrong, and the ship will correct its course. I hope so.

  108. I am with PhilC. This whole thing is ridiculous now.

  109. After 4 years online, my large content based adult website dropped like a rock today in Google. I lost maybe 80% of traffic in a single day, and after talking with my competition it seems they’re all doing fine. Not sure what to make of this so far, we haven’t made any major changes lately, or have any duplicate content.

    We have setup XML link trading in the past few months to help our customers find similar articles about the sites we list, adding many top quality IBL’s. Hundreds of exactly-relevant links from a site ranked 575 in Alexa for example.

    We have never used any spammy techniques (to our knowledge!) or anything black hat. I’d say we’ve followed all rules to a T since 2002. We have never wanted to risk our good relationship with Google. What can I do? It hurts to see cloaked sites and sites with no content out-ranking our high PR, old and established pages, with relevant, useful content.

    It seems to me that google is having big problems with .biz, .info and .us sites lately, too. My 2 cents.

  110. * clap clap clap clap clap *

    That was brilliant, Phil. You’ve managed to come up with the most emotionally compelling arguments on this blog any of us will ever see. Like many of the things you have written in the past, it is truly a work of art. It’s passionate, it’s inspired, it’s emotionally charged, I laughed, I cried, I felt stirrings from the very cockles of my heart…no wait, that was a gas bubble. Sorry about that. My bad. Really.

    If they were even remotely sensible, then they would have been great arguments. The problem is that you’re making the same fundamental mistake that most others make when they try to convince others (especially guys who have stroke, such as Matt): they don’t argue from any point of view other than their own. We’re all guilty of that, though. You do it, I do it, Aaron Pratt does it, Wayne does it, we all do. We can’t help that. It’s human nature.

    (Side note: for those I mentioned here, it wasn’t an attempt to single anyone out. I was merely mentioning names as examples. So please don’t take it personally; I’m not trying to attack or insult anyone).

    But the whole point of what Matt was trying to say here is something I think most dedicated SEO-types tend to miss, and that’s “worry about the site first as far as a resource for people goes, and THEN start SEO after.”

    When webmasters start linking to ringtones or MP3s or Viagra or pet urine control from unrelated sites, that doesn’t do a thing to help the end user. It either sends the user on a wild goose chase or turns the user off.

    When webmasters start receiving those links, they’re getting trash traffic at best. I’d rather have 10 visitors from a relevant search query than 10,000 from some trash-traffic link farm scheme (assuming it was even that good).

    For another thing, as a webmaster, if my pages are good, index them, dammit. What on earth do IBLs have to do with it? Doesn’t Google want to show good pages to its users? If you don’t want to rank them very highly, don’t rank them very highly, but there is no reason in the world to leave them out of the index, and deprive Google’s users of the possibility of seeing them. It’s just crazy, and makes no sense at all.

    First off, who are you, I, or any of the rest of us to judge whether our own sites are good enough to be listed and indexed? All Google is doing by using quality IBLs as a sign of quality is extending the concept of human referral and word-of-mouth. If it’s good enough for a human to link to it organically, it’s good enough for Google to list it. How else are they supposed to figure out what to rank and what not to rank? People would complain if the Toolbar were used; on-the-page content can be manipulated very easily; and any other form of monitoring would be met with some heavy-duty scrutiny at best.

    Where are these pages that are so perfect that Google is doing a disservice to the web and that don’t have hyperlinks to them from any other web destinations, anyway?

    Second, if Google is going to list pages so that users can find them, they’re going to need to list pages in such a way as to provide users with easy access to them. In 99.999% of cases, the SEO-types call this “ranking highly in search engines”. So you want to be listed somewhere in Google SERPs for your content so that users can easily find it, yet you’re okay with it not ranking highly. Does anyone else see where a guy like Matt might have a bit of a problem with that?

    As far as OBLs go, this is an area where webmasters should take some responsibility and show some moral judgement (and, to be fair, most webmasters are pretty good that way.) We have a certain moral obligation to those who may visit our sites to guide them via the hyperlink structure in a manner that will give our users the best possible experience. How does irrelevant OBL linking do that? How does providing a link to otherwise useless content help the user?

    For those of you still not convinced that building a good website, putting up content and drawing visitors the natural way works, there is at least one website that has done a terrific job of doing exactly that.

    The owner doesn’t obsess constantly about where his site’s positioned in any engine.

    The owner has never linked to a spammy site without using the nofollow attribute, and has ensured that the spammy link was relevant to the site’s theme on the rare occasions that he has done so.

    The owner has never bothered to participate in link schemes, exchange reciprocal links, or do any of that stuff.

    The owner has quietly built up his content, and in the process has attracted a large, loyal and active userbase, which if I’m not mistaken is what we’re all supposed to be doing when we build websites.

    It’s not a perfect site…none are (including my own). It could be improved, and I’m sure the owner would say the same thing. But at least the owner is focusing his/her efforts on his/her site.

    And each and every one of you reading this has visited the owner’s website. In fact, you’re on it…right now.

    Just something to think about the next time someone offers you a wonderful reciprocal Viagra link, or maybe buying some text links from a broker.

  111. Matt et al,

    Thank you for this insight. You’ve put the pieces of the puzzle together, and I appreciate it.

    What I am getting from this is that links to a site that were previously considered a positive vote are no longer considered that, so some of your pages that were in the index because of that vote may now dissappear. Now of course if those pages had links on them, the sites that received those links may now dissappear. Thus the gradual deindexing of sites.

    This effort has been put in place so that un-natural linking schemes such as link farms, directories, and paid listings.

    Now since people like me cannot afford a superbowl ad to get the name out, and I’m not a seasoned SEO with 2000 sites under my control to “naturally” gain links my pages will go unknown to googles users. Unless of course they get fed up with the same old sites at the top of the SERPS and go to the other search engines that cache fresh sites.

    All of this effort appears to made to discourage un-natural links, however I believe it will only increase them. Why you ask? Because I know my site was killed due to the new filters, perhaps I didn’t have enough “quality links”. However if I search for some very specific terms, 3 of the top 10 results are simply made for adsense mini-directory sites that have 10 links on them, some scraped content, etc. If I check their links it is trully a bunch of junk. So my only conclusion I can make from this is that the junk links still work, it just take a whole lot more than before. Until the day when all of those sites are gone that can be the only conclusion.

    Now to address the paradox. To get indexed you need natural links, to have natural links you need webmasters to view your site, to get them to see your site you need to be in the index…yada yada. BUT the webmasters have just been told not to link to lower ranking sites and if you do use the NO FOLLOW tag. Why not simply show these low linked pages on page 800 of the SERPS and track if they are found. In my line of work (engineering) I frequently search very deep into the serps to get to sites written by real people in the field and not the corporate presence that rule the industry and the first 100 pages. In other words as a page is found included it, the more action it receive the more it moves up. This of course could be tracked with activities such as watching the back button (a vote for not finding what you wanted) etc.

    Just my 2 cents. And I’m off to spend the night find a few thousand sites that want to link to me to get my pages listed again.

    ~John

    PS If you don’t delete this I added my URI this time as yahoo doesn’t seem to care about the NO FOLLOW thingy.

  112. I realise that “adult” results arent exactly your “forte” but how can you explain google’s seemingly deliberate action of making adult search terms give irrelevant results..

    This practise has a string of problems with it..

    Heres the main problem i see. do a search for an adult term like “porn clips”

    the top 10 results are somehwat relevant, but the other 90 are filled with domains like this

    http://www.thechurchillscholarships.com/analporn.htm
    http://www.lewisandclarkeducationcenter.com/farmsex.htm
    http://www.nyotter.org/porn.html
    http://www.argenbiosoft.com/amateurporn.htm
    http://www.universityplazahotel.com/porn.html
    http://www.plannedparenthoodcouncil.org/amature.htm

    notice a trend ? all are recently expired , non-adult listings.. the main pages are usually direct copies of the previous site pulled from the web archive , then each domain is filled with easily identifiable doorway.cloaked pages.

    Not only does this give a bad impression of the adult industry in general , but the other problem is most of these sites contain trojans / virii /childporn/beastiality that then alter surfers browsers.

    The only reason i could see google allowing this practise is they make more from google adwords this way..and they realise adult webmasters dont have a voice as loud as mainstream ( even though it is a huge p[art of google’s revenue )

    I also notice a trend of google adsense sites jumping up in popularity when the sites dont even have relevent content, just ads for google adwords/adsense

    Now i notice my “adult” website that is very relevant and is several years old and established with several hundred relevant backlinks is close to #300 position , while the vast majority of sites above me are either frshly created/expired domains with no content or guestbook/forum spam on mainstream sites that the owners cant fix without losing their entire website.

    Why the foolishness ?

    p.s. its really irritating when you write out a big long post and the “security code is invalid ” so it makes you hit back button and all your post is erased .. grr

  113. Matt,
    Your article reminded me of my startup days.
    Having written a very deatils business plan (about a 100 pages long), I was told that although it was very heavy to hold, what most venture capital analysts I met with would read is the executive summary on the first page and that I should focus on it. Funny how an indexing article can get mo to remember those days 🙂

  114. Matt I think you guys still have a lot of work to do. I know one of the real estate sites sites you mentioned in your main post and it’s got all it’s pages back but they bought ALL their links, and their content is dire! Then a site like mine, mainly natural links with good relevant content gets stuffed. Seems there’s still a long way to go…

  115. Hi Matt

    Thanx for the post. In the last 48 hours I’ve seen allot of change in the SERPS. The question still stands though, how the hell do one promote a new site if we’re not allowed to trade links with similar sites? Ok, so it’s not ‘not allowed’ but won’t help ranking. I assume it will however help with indexing, so all in all not a bad thing?

    What I don’t understand is being penalised for linking to unrelated sites. For instance I’m really proud of the city I live in, so I run a blog about it. I also link to many city related sites, but they are all in various different niches, yet still in the city, so it’s kinda tourism related. Is that a bad thing? After all I am giving the user useful info about where to find what in the city.

    Actually it doesn’t really matter where that site ranks, allthough I’m trying to get a better understanding of how things work…

  116. I agree with Justin, google has to re-think its strategy about link-valuation.

  117. Dave (Original)

    Matt, my site is the best out there on my chosen topic. Despite this, there are many sites above mine in the SERPs for my chosen targeted phrase. Please fix this so the whole World can see my site at #1 when searching. Until you do, the Google SERPs are crap!

    Oh, can I also have some more PageRank. My paid links just dont seem to work like they use to.

  118. I really think a lot of you need to understand that the days of gaming the SE’s with links is coming to an end, links have nothing to do with Google’s problems with indexing the web, they could index pages if they had the storage space and dump the pages with bad links to the bottom of the SERP’s, the problem is the lack of indexed pages at the moment!

    http://blogs.zdnet.com/web2explorer/?p=173

    The above link was left on one of our forums and is common knowledge!

    “““““““““““““““““““““““““`

  119. Hi Matt!

    Great post, and great answers.

    My question is if BigDaddy, and the effects thereby are equally significant in alla languages?

    I am seeing alot of link exchange, linkspam and other desceptive teqniques earning top positions in the index for certain non-english languages.

  120. PhilC, we try very hard to find ways to rank mom/pop sites well. As I mentioned Bigdaddy is more comprehensive (by far, in my opinion) than the previous crawl/index. A site that is crawled less because their reciprocal links are counted for less is a different type of situation than many mom/pop sites, for example.

    Halfdeck, I’m happy if it helped clear things up.

    John, that’s your choice if you decide to chase thousands of links in one night. I just don’t think that’s the best way. BTW, just because Yahoo reports nofollow links in the Site Explorer, I wouldn’t assume that those links are counting for Yahoo!Rank (or whatever you want to call it 🙂 ).

    Justin, of the three real estate sites that I mentioned, two are unhappy because they’re not crawled as much as before.

    Dave, nice one. I made it several sentences in before I got the (dry) humor. 🙂

  121. And I gotta get some sleep now..

  122. Hi Matt,

    I am the owner of the health care directory domain you used as an example above. Thanks for having a look at the site, your comments are helpful and much appreciated.

    I would like to clarify something on how and why I used the removal tool as I don’t think that was described properly.

    I had pages that were indexed under both www and non-www in a directory such as:

    http://www.domain.com/directory/

    Those pages were mired in the supplemental index and indexed under both www and non-www. (At first, my server did not have a redirect from non-www to www but I have since put one in place. That is likely why they were indexed under both www and non-www.)

    I removed those pages( /directory/ ) from my server and used the removal tool to let Google know they were gone. I re-built those pages at:

    http://www.domain.com/new-directory/

    I used the removal tool because I wanted to start fresh and didn’t want to get penalized for having the same pages under two directories. I did not use the removal tool hoping that just the non-www version pages were removed from the site. I used the removal tool to let Google know those pages were gone forever (six months in Google’s eyes).

    Since the above maybe a little confusing, I am going to summarize one more time for clarity. I removed pages from my server (that were indexed under both www and non-www) and then used the removal tool to let Google know they were gone. I rebuilt those pages under a new directory to startover and hopefully get those pages indexed correctly.

    I very much agree that the site could use some links. Thanks for your time and help.

  123. NIce Day Matt,

    hope you got some sleep.

    Didi you recognise that thousands of small businesses are out of business now. Google was a search engine where small business could compete against big business. That days are over cause now the balance has changed up to the big business. That is a pity.

    In a comment you said you/ the team are going to observe the spam reports more closely?! IMO spam reports don´t work. I find well ranking sites with more than 12,000 pages with javascript redirects!, wll ranking Dupliacte Content with 3 or more domains. Nothing happend to them. When does your fight against that begin ?

    greets, Martin

  124. Google has lost its edge: or more accurately, the crawl fringe. And as a result, it is officially broken in my view.

    I have been using Google for about six years now, and Altavista before that. The key advantage Google had over Altavista in the early days was that its ordering was improved. In the very early days, Altavista had a lot more results than Google, but that changed fairly quickly. At any rate, Altavista always had the results, but you had to dig deep. In Google, the results were just ordered “right.” Indeed, if your search phrase was particularly detailed or otherwise unique, you could often click on “I’m feeling lucky,” a button unique to Google. And jump straight to the page you needed.

    I have been following a change in Google’s results for the last four or five months which has gotten steadily worse: that is that Google is returning results which are not “dead on” anymore. That is, it seems to be using PageRank not as just an ordering tool, but a pruning tool as well. Now, it is true that PageRank has always been used in this way, but the aggression of this is now too extreme for my purposes.

    Matt, you have effectively said in your entry and in the comments that PageRank is now being used to eliminate pages from the search results completely. I think this has what has broken Google, because PageRank is the foundation of the algorithm that used to make Google work.

    A page which does not appear in the index has an effective PageRank of 0. Any pages linked only from this page have PageRank 0 also. In this way we find that this feeds a recursive loop- as a page disappears, it takes pages with it, these pages take pages, and so on. Yet these pages have keywords on them- they often have unique variations of them. Google used to be able to find these, even to “whack” them. Now it simply cannot. It has lost its power.

    Now this wouldn’t be so bad- what point is a page without incoming links after all- except that this isn’t the only change Google has made. Google now has a manual switch which zeroes PageRank of sites it deems to be “unfairly gaming the system.” It also has a scheme which lowers the PageRank of pages in “bad neighbourhoods” or using known “black hat” SEO techniques- this is often dubbed TrustRank, but we have no indication from Google that it is separated from PageRank in the Google architecture. Additionally, it now appears that Google can detect duplication in results, which also seems to feed into PageRank in an unspecified way.

    Matt, you have said before in an entry on canonicalization that everyone should 301 from site.domain to http://www.site.domain (one or the other,) but there are likely to be millions of websites which cannot or just won’t out of ignorance or laziness. Are these pages actually worth less than the others? Do they deserve to fall into PageRank 0 hell?

    Surely you can see that what was already a nasty problem now has the potential to snowball. And this is what appears to be happening. The low ranking pages of the web, made by small people who don’t go out and get lots of links, have been caught in the SEO/Google crossfire. These small people had relevant pages for detailed search queries, not the so-called “competitive phrases” Google staff actively monitor. Now these phrases generate generic “authority” crud, really nasty black hat spam. or worse. The Googlewhack has become a “no results” and the “I’m feeling lucky” has been set to an instant trip to the Wikipedia world. Google is horribly broken as a result.

    I fear, Matt, that if what you say is true, all my fellow techies can forget typing some bizarre error text into Google and hitting a three year old web discussion on some portal where someone else had the same problem. You’re just gonna hit the boring table of manufacturer error codes… or maybe nothing at all. It’ll be back to Altavista for me, I expect.

  125. You didn’t sense my tone of sarcasm in my voice when I typed that!?! I didn’t spend the night chasing links, actually just wrote some articles on a subject I know something about, this interwebby stuff is to volitile for me right now. Someone will find it interesting, and natural links have to come..well…naturally. I think generating natural web traffic is like pushing a boulder over a mountain, it takes a long time on the way up, but on the way down you can’t keep up.

  126. Matt let me see if I can summerize this correctly:

    Big Daddy attacked crap backlinks and therefore if you have less backlinks you dont get deep indexed till your site earns it with quality links or site age.

    Everyone who has either bought some links or traded for some links or sold unrelated links on their site will suffer. If not then the quality sites that do link to you lost some of their PR power becasue they lost backlinks and therefore you lost reputation points from them. It is hitting so hard now because of the chain reaction of the death of crap backlinks either effecting you or a site(s) somewhat connected to you.

    My view on the affliate links is that if your site has nothing more then affiliate external links and product dup content then you are no more valuable then any other site with the same and therefore it goes back to backlinks and indexing. The only way for a site like that to rank is to out PR the other crap and then you are still in an up hill battle since your site does not provide anything more.

    Simply put G knows what portion of your site is affliate crap and what is quality original content. IE quality .vs crap links/dup content ratios.

    How am I doing?

  127. Great Post Matt – you really do deserve your holiday now –
    but

    can we just clear up the reciprocal link question

    is it OK to have RELEVANT reciprocal links – and could it even be beneficial.

    My directory type site has many outgoing links to relevant sites and articles for which I’ve never requested reciprocal linking – but I was just about to run a campaign asking most of them to link back to my RELEVANT pages – would this be OK and not harm my position or ranking.

    cheers

  128. YES! That’s my exact question, summed up Weary.

    I’m fairly sure that my massive drop yesterday was due to the inclusion of XML link trading with my competitors. My review on SiteXYZ links to their review on XYZ. I thought this was valuable, relevant information to my users, and valuable IBL’s for my site.

    They don’t look so hot now! Oh gosh.

    Anyways thanks Weary, looong day. I really hope this gets fixed, and I’d hold off on the relevant reciprocal linking!

  129. The web will be transformed to take the shape of our current world.
    Those who sell something has to be the Walmarts and the Amazons.
    Of course it’s their merit that they are so big.
    Just that the net was something that was equal for everybody and it’s now transforming so that the little ones don’t have a chance.
    As a little one, I don’t have a chance not even with cpc now. I have to do tricks like Shoe to get something.
    I think its all over folks.

  130. Matt,

    Thanks for you great post. However, something really concerns me – in the above example of outbound linking you state that the “Real Estate Site” has dubious linking, by linking to a “mortgage site” ??

    Are you serious about this ? Or is it a mistake ? As this has very serious imlications.

    Surely if I am looking to buy a house, then I am also extreamly likely to be looking for a mortgage, and that link is actually very relevant to the browser – I am actually hard pushed to think of anything that could be more relevant.

    Could you please extend on this as if this is not a mistake, then along the following lines I would expect:

    – Holiday sites will get penilised for linking to car hire sites
    – Wedding sites will get penilised for linking to honeymoon sites
    – Finance sites will get penilised for linking to credit card sites

    In essence if the above holds true, a site will get penilised for linking to anything that is not exactly the same theme as the site it links from.

    If you looked at numerous property sites I would guess you will find hundreds of adverts that have been paid for and hand picked by mortgage companies as they know that they are very likely to get the perfect customer from that site. It would seem that google is therefore going against the best human knowledge.

    All I can see is, that if the above is not a mistake, then it is asking for the destruction of the web as everyone is so paranoid that they may be linking to a site that is not exactly the same as their own, that they pull all their links.

  131. Hey Matt. Since you ignore my mails and poosts, I thought you might like a visit outside the blog before you head for the hills for your hols.

    http://gooogle-search.blogspot.com/

  132. How could it be possible that some important sites can put some dirty links on their footer, and not being dropped from the index ? eg : http://www.pixmania.com/fr/fr/home.html

    And now : how could a directory be indexed, because it give some links to many differents sites ?

  133. These kinds of posts are really good. It gives the lots of webmaster a feeling that google isn’t evil at all, just that they need to pollish their website.
    Keep up the good work …

  134. I hope in the future, and my opinion is, that bigdaddy is a step in the right direction. The future target is very easy for everyone to understand, give good sites the first places in the serps. There is only one thing that can make this happen, and all the bigdaddy stuff against link exchange … is the beginning. Sites that have good content becomes fee links, thats all. But its an enormous projekt to build a searchengine that wil give you really good serps. And all thes problems that google fighting against are self-inflicted by google. And of course of this i can understand people who are angry, causw they spend much time too be high ranked and now googles algo is changing.
    I build my site with real good content and i hope the future will be good.
    So kepp on try’n matt (and give nnew sites with good content a chance, and not so many filters 😉 )

  135. Hello Matt!

    I read the whole post of you and I have very strange feelings about Big Daddy and your reciprocal links penalization.

    Why? It’s simply. Let me build a dog breeder site. I want to get some surfers, so I’ll ask my friend to add link to my site on his one. He will also ask to add link on my site, so he will probably get some fresh surfers from me.

    Then I’ll want to have my site on some dog directory, so I will ask them to put link to my site and they will ask for link to their site for sure.

    But Big Daddy is telling me that if I want my site to be indexed and have good results, I have to not add links to other sites, but pleased for links to my one…

    This is really stupid algorithm and a kid would create a better one.

  136. This emphasis on IBLs is nuts.

    Just for a favour (no money changes hands) I run a Chinese takeway site for a friend. He makes a good product and serves a quite specific geographical area.

    Why in hell’s name should I have to run around getting “high quality” links to his site when that isn’t the way anyone would seek to access it?

    And what is a “high quality” link to a Chinese restaurant? Local CHamber of Trade – as it happens, it has all the appearance of a spam site, with hundreds of links to unrelated businesses that happen to be members. Is a link from such a site “untrusted”?

  137. Hi Matt

    Dammm – I am late to this post – I hope you revisit it.

    Matt, you say that crawl depth etc is largely based on PR.

    PR at the moment though has been acting very strange – some sites that lost PR regained this in the last PR update – however, depth of crawl still looks like it maybe based on perhaps an older level. (EG PR5 site not getting crawled – was prev. PR0 – Due to ban, canonical, error – I dont know) – but it is still getting crawled like a PR0 – eg hardly at all. 🙁

    Now – as you know some pages/sites didn’t have PR updated at the last change (about 4-5 weeks ago ?)

    Soooooo – whats the score with PR at the moment – I would assume that an update will be coming soon that updates the PR of the sites which did not change at the last change over ?

    These sites which regained PR after a long absence but no ranking changes – does this point to perhaps increased crawling in the future when PR is updated accross all sites/pages ?

    PS. I did not get a reply from Boston email address thang – sniff 🙁

  138. PhilC, we try very hard to find ways to rank mom/pop sites well.

    Maybe you do try very hard to do that, Matt, but it’s just not working. The new criteria for crawling and indexing that you explained in this thread is so bad that’s it’s hard to actually believe. To base whether or not a perfectly good site gets all of its pages in the index on how many links it has pointing to it (and the type of links), and what types of link it has on its pages is sheer lunacy. I asked before – doesn’t Google want to index good pages any more? Doesn’t Google want to give full choices to its users any more? Or is Google happy in the knowledge that there are so many pages in the index that there will always be some relevant pages for the user’s results, even if they deprive them of plenty of good ones?

    Most people wouldn’t mind at all if Google identifies and drops certain types of links (reciprocals, etc.) that they don’t want to count for anything. If you don’t like certain links, cut off their juice – treat them as nofollow – drop the links from the index – but there is no sense or justification whatsoever in dropping a decent site’s pages from the index, and virtually killing it off because of them. It’s clear that Google can now programmatically recognise some of the links it doesn’t like, because you say that’s why some sites are being treated badly, so drop the links – remove the links from index – but don’t refuse to index the site’s pages because of them. It’s a sh..ty thing to do to sites, and it’s a sh..ty thing to do to your users – the very users that Google claims to think so highly of, but are now being short-changed.

    Most people would support getting rid of spam links, but to treat sites that just don’t happen to have attracted enough natural links to them as second class and on the fringes, it plain stupid. Nobody would support that – especially Google’s users if they knew.

    Google now wants us to go out an acquire unnatural links for our sites if we want them to be treated fairly. Whatever happened to, don’t do anything just because search engines exist? What an embarrassing about face! As I said in the previous post, I am not going to run around getting unnatural links just for Google. I’ve never gone in for it before, apart from submitting to a very few directories, and I’m not going to start now. You can stuff that stupid idea!

    The site I mentioned earlier had a clean bill of health from you personally, and nothing has changed since then. 4 days ago it had 17,200 pages in the index, and on subsequent days it had 14,200, 11,700 and 9,350 yesterday. It started at an unrealistic ~60,000. I’m past caring about the site now. It’s a decent and useful resource, but who cares if your valued users ever see it or not? Google knows best about what their users want to see, so they are stopping showing them most of that site’s pages – right? They’ll love you for it! The site has only one reciprocal that is down in a very specific and relevant page in the site – and it’s staying there. The site has never had any link building done on it, and because of that, Google is dumping it and depriving their users of a useful resource. Nice one Google! If only your users knew how well you look after them.

    That’s just an example of what a great many sites are *unfairly* suffering because of the sheer stupidity of the new crawling and indexing regime. Nobody gains by it – including Google’s users, who are being intentionally short changed. Actually, that not true. Those who gain are those who link-build. The filthy linking rich get richer, and ordinary sites are consigned to poverty. Is that what Google wants? You want the poor to turn to crime? That’s what you will drive them to. The whole bloody thing stinks!

    Matt. My posts are not aimed at you – they are aimed at Google. I’m sorry if you take any of it personally – it’s not intended.

  139. One last point…

    All that this will achieve is that the link-poor will start unnatural link-building, and in ways that will deceive the current programming. Google will have caused it – not the site owners. This sledgehammer treatment of innocent sites, just because they haven’t naturally attracted enough IBLs for you, is madness.

  140. At least now we know that the indexed pages filter is based on external linkage.

    Thanks.

  141. I agree it’s important to filter out low value sites (although it’s debatable what low-value means). Unfortunately, the same techniques used to promote such sites are the same as legitmate sites.

    As a webmaster with a lmited budget trying to get a new site going, or dirve more traffic to an existing site, what are the options? No reciprocal, can’t buy links, can’t sell links, can’t compete with the big boys (Walmart, Target, Amazon, Overstock, Ebay…) in PPC …. what’s a webmaster to do?

    Soon top Google results will be primarily big companies with big name recognition. Of course such sites gets thousands of back links. How could it not? But what about poor JoesSunglassStand? Sorry Joe, McDonalds is hiring. Or there’s PPC if you have the cash to go against the afore mentioned companies (not likely).

    I don’t think you can determine a web site’s subjective value with an objective algorithm. And now the small webmaster’s site doesn’t even show in the results because he doesn’t have a few hundred natural backlinks, or he sold a link for $10 to a Ringtone or Credit Card site.

    Despite all Google’s efforts, I can still easily find sites using black hat techniques (such as cloaking) that appear high in Google results. Here’s one I’ve reports a half dozen times:

    term: comforter sets
    linensource.com – Offers Down Comforter Setslinensource.com – Your one-stop source for all your down comforter set needs. The Linen Source offers a wide variety of down comforter sets.
    http://www.linensource.com/down_comforter_set.asp

    The asp page is a cloaked page that redirects the user to the main site.

    I applaude Google’s efforts to bring order to chaos, but I can’t help but think that they are doing in a manner that is more and more exclusionary to the small website owner.

    It seems to be a fact of societal evolution that democracies eventually ‘evolve’ into republics, where the power and wealth ultimately end up residing in the hands of an elite few, rather than being equitably spread through the population.

    The irrefutable guiding principle of our undeniable monetized society is inescapable. When it comes to search engine position of ecommerce sites, it’s not about who is most deserving, it’s about who has the most money. Mega sites with name recognition and multi million dollar traditional media marketing budgets are taking over the serps, and it’s only going to get worse.

    If you want to make a site about the mating habits of New England barn owls, or any other esoteric research topic, you can do great in Google. But if you want to run an online business that relies on the sales of products or services, you’re in for a tough time.

  142. I don’t doubt you are trying hard, I like PhilC, believe you’ve simply got it wrong. Very badly wrong.

    Deindexing part of a site or refusing to index deeper parts of it for any reason defies logic. You either index it or you don’t. How you rank it among the other pages is another matter.

    Big Daddy may be far more comprehensive, but the results are not if you choose to deindex pages, or refuse to index them based on the types of links and not the links themselves.

    Dave

  143. Matt, are you sure BD is over, and does Yahoo link to bad neighborhood ? 😉

    site:www.yahoo.com – those were supposed to be 400k .

  144. Thanks for the post Matt !
    I translate some of the most significants extracts in French : -http://www.malaiac.net/seoenologie/91-bigdaddy-liens-sortants.html (hope the – is enough to not make a link)

  145. Once again PhilC has put my concerns in a more coherent way than I could. As I stated above I’m new to this and struggling to work out what I’m supposed to do.

    This is an example as I see it of not being indexed in action: Try the following term in the UK (google.co.uk and select UK)

    Beta Tools 920A 1/2 Socket Set (a completely possible search)

    As you can see the search returns 2 things firstly my XML sitemap which is pretty useless to anyone searching for the above item. The second is something completely irrelevant.

    Now wouldn’t it be better if this page was indexed and then returned?
    http://www.shacktools.com/beta-tools-920a-22-piece-12-socket-set-p-5415.html

    This is just one example, and probably not the best, of how this is affecting my site. I have 1000’s more like the above. As you can probably guessed my XML sitemap page is very busy but people cant find what there looking for from that and then exit the site.

    The thing that worries me is that this page has no competition so it doesn’t matter where it ranks just so long as its indexed. So to get this page indexed I need to go around adding links to other sites? This seams such a completely unnatural thing to do.

  146. PhilC has a very good point.

    I know of a site that can’t rank above page 3 for anything. I naturally thought it had some kind of penalty, as prior to its plummet it did fine on all manner of queries, then kaboom, overnight, I find myself in search engine purgatory.

    To cut a long story short- I sent a letter to the good people at google asking if it had a penalty, only to be told that I should look at getting a few more quality links.

    In general, webmasters can improve the rank of their sites by increasing
    the number of high-quality sites that link to their pages. You can learn
    more about how Google ranks pages at…

    So yes effectively I have a penalty. I like to call it the lack-of-IBL-I-didn’t- go-out-and-aggressively-pursue-lots-of-links-penalty-cos-I-always-thought-it-would-bite-me-on-the-ass-oh-but-how-wrong-I -was-I-wish-I-had-penalty! 🙂

  147. I would just like to pick up on the affiliate issue – what gives Google the right to determine that affilaite sites are bad? The internet is about choice and these affiliate schemes work, giving people a living!

    Does Google not feel any responsibility for the thousands of people who will loose their income?

    With so much unemployment these days, the internet and affiliate schemes offers – or did offer – a way of people earning money, setting up businesses and providing surfers a choice, even if it does mean they end up buying from the same place in the end.

  148. I completely agree with John. Google is about to destroy the original linking spirit of the WWW. Matt reflects the whole paradoxon in his posting:

    a) Since Bigdaddy, a high quality site xy.com is considered less relevant due to the fact, that inbound links might have been paid for.

    b) On the other hand: Google’s webmaster guidelines and also Matt himself keep on recommending webmasters to get “quality relevant inbound links” for their sites to gain more relevance for Google.

    c) Since Bigdaddy, another site ab.com is also considered less relevant, because it has got outbound links to sites that might cover different topics than the site itself (see the real estate example in Matts posting). But why does this happen? Because Google and Matt recommended other webmasters to get quality links.

    Matt: Do you really think, that quality sites will ever link to other quality sites for free again, as it used to be in the old days of the WWW? As it used to be one of the major ideas of the WWW? If you link to another website, you’ll have to be afraid to get punished for this action. So, why link to other sites but for money? And on the other hand: How will you ever get “natural” links to your site again?

  149. [Quote from Matt] it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic.[/Quote]
    I have seen sites linked to by scraper sites whose only content is Adsense ads and scraped Google search results. Does that mean that my site would be penalised by the actions of a third party spam site, over which I have no control?

  150. Matt:

    Would you please clarify your comments regarding reciprocal linking and
    discuss RELEVANCY and the RATE at which a site obtains reciprocal links?

    The 2003 Google patent says ‘obtain links with editorial discretion’..
    reciprocal linking is tough to avoid when site A won’t link to site B unless
    site B links back to site A. And as you have noted, paid links are not
    always the best course of action so where is the line drawn? Free
    advertising is not very prevalent in this world. Paid or bartered
    (reciprocal) are the current options.

    Most sites (especially hobby, niche, small business) will not provide a link without a link back. That’s the nature of the web, you scratch my back, I’ll scratch yours. If sites didn’t link to each other, the web wouldn’t be a web.

    Responsible reciprocal linking should be done for the end user and to
    generate qualified traffic from like minded sites. When done correctly,
    relevant and useful links offer content to the end user and provide
    additional resources and “trains of thought” to continue the learning
    process on a subject.

    Relevant exits links add value to a site again through providing the user
    with another “knowledge gateway” to pass through leading to more information or related information on a subject. This is the essence of the web. And many site operators won’t provide a link unless they can get one back.

    If Google is giving less value to sites that engage in HIGH VOLUME
    IRRELEVANT linking, I applaud this move as it sends the right message to website operators to keep linking relevant and for the end user.

    If Google is penalizing sites for engaging in ANY TYPE of reciprocal
    linking, that smacks all of the small businesses who have engaged in this
    practice correctly, and ethically since the beginning of the Internet,
    pre-Google.

    Can we get some clarification on reciprocal linking please?

  151. My goodness. It appears “many” in this thread either did not read Matt’s first post in it’s entirety, or are only reading the parts they want to read.

    First; In NO way is Google only looking at “inbound” links. In no way is Google only looking at reciprocal links. In no way is Google only looking at links in general in regards to how often a crawl takes place, or how many pages of a site are indexed, or not.

    Many of you are not thinking about the much bigger picture. I also know that “some” firms out there, including firms who do seo/design for clients, have not experienced ‘any’ of the problems found in this thread. Matter of fact; all positions have actually gone up.

    One thing in Matt’s post went something like this….. for every one page that dropped out of first page serps, another page took it’s place with a happy camper. He also stated this, and why I say many of you did not read in it’s entirety.

    [quote]After looking at the example sites, I could tell the issue in a few minutes. The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling.[/quote]

    Sorry for the ‘speech’, but it’s sometimes tough reading stuff and not responding. 🙂

  152. Excellent post, Matt.

    As an observer it is fascinating to see how Google seems to be trying to balance or reconcile what appears to be a long-standing corporate culture of secrecy with the increasing need to share more knowledge, information and insight with the world at large. I think that your role in that process is not to be underestimated.

    My technical prowess in this field is virtually nill and I am thankful that my site has not suffered a loss of pages in the index and is still maintaining a good ranking on my keywords. I am, however, very puzzled by the major differences in inbound links as listed by Google vs Yahoo and MSN. What especially struck me today when I checked was that, as well as listing about 6 times as many, Yahoo seemed to be giving greater prominence to what I would view to be better quality links. In particular the links from my CafePress store are prominent in Google whereas links from relatively obscure (but probably much more credible) sources get more prominence in Yahoo results.

    The other observation I would make is that I have always thought that Google was not as adept at searching images as at searching whole pages. For whatever reason, although able to consistently maintain a top 3 ranking for keywords ‘stained glass’ I have totally failed to get images even into the top 50 for the same keywords. I’m now reading (today) that perhaps a dash instead of a space might help but I do also believe that the image search mechanism is something of an achilles heel for Google. Just MO.

    Keep up the excellent work.

  153. Thanks for the great post Matt.

    Matt: “graywolf, it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic.”

    One of my sites is a mom n pop jewelry store which has some really unique content. The link building process is going slowly because there’s only so much I can do, however, I find that a lot of backlinks are from scraper/junk sites which are totally beyond my control. Does that mean that my site will get crawled less because of these junk sites linking to me?

    Site sitemaps is making the webmaster be proactive in helping G with crawling, so why not do this with backlinks too. It would be really cool if G also provided a link removal tool, where you could specify domain pattern matches to discount certain links from being counted. I’m sure you’d also love to see the aggregate info. You could also tie it in to the spam report function… ok enough rambling.. need coffee…

  154. Doug. You are wrong about people not having read Matt’s whole post. It’s true that smaller parts are being focussed on, but that doesn’t mean that we haven’t read it all. A smaller part:-

    Some one sent in a health care directory domain. It seems like a fine site, and it’s not linking to anything junky. But it only has six links to the entire domain. With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages

    and more about the same site…

    A few more relevant links would help us know to crawl more pages from your site.

    Google knows that the site exists, and they know that there are more “fine” pages that they haven’t indexed. They don’t need to be told to index more so that the engine is more comprehensive. They should try to make the index as comprehensive as possible.

    Matt’s best guess is that it is a low priority crawl/index site, and that they are intentionally leaving some of the site’s pages out of the index, just because it hasn’t attracted enough natural IBLs. That’s no way to run a decent search engine. It is grossly unfair to link-poor sites, and it short-changes its users.

    Now if you can think of a good reason why some of that site’s pages should be left of the index, just because it has only attracted 6 natural IBLs, then tell us. We’re all ears.

    You are whitehat incarnate. Do you think that webmasters should have to do unnatural link-building, just so that a search engine will treat it the same as other sites? Do you think it’s a good idea for Google to tell webmasters that their sites can’t be fully indexed unless they make the effort to do things that they (and you) have always talked against – doing things solely because search engines exist?

    A general purpose search engine should try to index all of the Web’s decent content as far as they are able. It should never come down to leaving stuff out just because it hasn’t had enough votes. If it’s there, and if it’s useful, index the bloody stuff.

  155. I guess webmasters still don’t understand what a link farm is because they keep asking if reciprocal linking is great!

    Some websites run reciprocal linking pages, you have seen them, “cut and paste this code into your pages” and we give you a listing and many directories do the same thing for you to gain a listing in their link farm database!

    These networks are simple for the SE’s to bust, webmasters must figure out that all this exchanging of links and getting a million links from anywhere and rank high stuff is old and worn out!

    I can see this is hard to accept because many have conducted “SEO” this way for years and don’t know any other way!

  156. My first post was long enough, so I didn’t adress it, but will now since Doug brought it up:

    “The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site”

    Inlinks?

    So now my position in organic results, or the number of pages Google chooses to index from my site, can be affected by the sites that link too me?

    I hope I’m interpreting that wrong….

  157. Hi Phil, For the record, my post was not aimed at you. I actually did not fully read your last post until now. I’ve just seen many in here that really don’t get the overall picture about what Google is trying to say.

    First off; The overall structure/architecture of a site has lots to do with ‘crawling’ in general,,.. and btw; has lots to do with how Google views your site as a whole… quality wise. Quality for pagerank… “INTERNAL” Google PR, and quality of other sites in your network of links in and out. AND: quality of the programming involved with the site. It’s the overall picture. Robots are not getting dumber,.. they are getting smarter.

    I think many simply believe that someone can go into an existing site and change some code here and there… and presto, the site is doing good. I also think many still think that this stuff is mainly about links coming in and going out.

    None of that can be further from the actual real world.

    [quote]You are whitehat incarnate. Do you think that webmasters should have to do unnatural link-building, just so that a search engine will treat it the same as other sites? Do you think it’s a good idea for Google to tell webmasters that their sites can’t be fully indexed unless they make the effort to do things that they (and you) have always talked against – doing things solely because search engines exist?[/quote]
    I can’t speak for anyone else, but my firm “stopped” looking for reciprocal links about 2 1/2 years ago. Matter of fact; we deleted all ‘link pages’ only that clients had. We don’t ever plan on “pursuing’ links in any way, shape, or form. It seems to work just fine. And no; some are in competitive markets as well.

    IMO; The best built websites that will do well into the future are those sites built “strictly” for their visitors. Period. That’s the philosophy we have had for along time now. If built that way, the robots will like the sites as well. At least they have up to this point in time.

  158. Matt, it’s an old topic but I have a question related to it.

    Is it possible say that if I put adsense on say, page1.html and it links to page2.html .. can that cause it to fetch and cache page2.html? Even though there are no links coming to page1.html or page2.html from anywhere else on the web, and it hasn’t been submitted to google?

    If so, this is a potential problem.

    Example: I put up a domain a while ago that just had 4 words on it… while I developed the site under a subdirectory.

    Anyway, one of these subdirectory pages had adsense on it while under development (testing placement etc..) and happened to have it’s links pointing to the main URL.

    What I noticed shortly after was that the main URL was now cached and in google’s index (while the page with the adsense wasn’t).

    I’m trying to think of another reason, but the main URL had no in-links at the time from anywhere (at least none showing in msn, google or yahoo, and no visitors from anywhere)

    Now the site is live, and I can’t get googlebot to revisit it. It’s been a month since it’s first uninvited visit, and since it saw just a “coming soon” type message, I can’t blame it for not coming back.

    Could this be an unintended consequence of the adsense caching thing?

  159. From google sitemaps

    “Some of your pages are partially indexed”

    Explanation from google sitemaps

    “We are always working to increase the number of fully indexed pages in our index. While we cannot guarantee that pages in our search results will always be fully indexed, crawler-friendly pages have a greater chance of being fully indexed. Our crawlers are best able to find a site when many other high-quality sites link to it.”

    So what I need to do its create links or I wont be indexed and create my pages for crawlers……..Zzzz

  160. HI Matt,

    I read your comments in a different way to most of the negative posting.

    It seems to me that Google have realised that they cannot index every page on the web every day and simply have to prioritize.

    Therefore sites with lots of good themed inbound and outbound links are “prioritized” in the crawl cycle in a similar way as they are “prioritized” in the SERPs.

    Sites with a high percentage on non-related reciprocals or spammy links are not given the priority treatment and therefore get crawled “Less Often” (rather than not at all).

    The result being that internal pages of these sites, especially the deeper pages of very large sites of this type, may drop out of the index from time to time resulting in poor SERPs for some pages of these sites.

    May seem unfair to some but if you think about it a themed natural link (or reciprocal in some themed cases) can be taken as a vote for the site by the WWW and as such makes it worth crawling by Search Engines more than a site without many or any votes.

    Question is should Google concentate on delivering high quality “popular” sites in the top ten, where most of the public click, or concentrate on indexing every page on every site even if the page is unlikely to be returned in the top 50 results.

    My take is the first option with the top ten results including the top ten most popular sites calculated by:

    Natural inbound and outbound themed Links (votes)
    Click Through Rate & time spent at site (popularity)
    Clear Relevent Title & Meta Tags (keyword friendly)
    Original useful updated content (quality)
    Some themed Reciprocal Links (community)

    If your site passes the above tests you really should be OK.

    Good luck all.

  161. Matt,

    Snow Crash by Neal Stephenson is one of my favourite cyber-punk books, you could also try Mr Nice by Howard Marks.

    Enjoy your vacation!

  162. >>Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled.

  163. Don’t know what happened to my message – here it is again.

    >>Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled.

  164. In regards to this whole revelation about crawling priorities and all that jazz, I have one question that should clear up alot of things for all of us webmasters and SEO folks.

    Lets say we have said site that is under a year old. It has well written and informative original content, clean coding, no shady stuff going on whatsoever. To promote a healthy link exchange, webmaster/SEO installs a link exchange directory which is accessible from all pages of the site. Now is having a link exchange directory that contains many different business categories (no porn, pills, warez, casino and hopefully no “made for adsense” sites) a bad thing?

    This is important becauase many MANY sites have this kind of a setup. And with the amount of free dynamic scripts out there that enable and automate the link exchange process, are they now considered to be tools of damnation in Googles eyes? Please give a good example of what is good in this scenario and what is bad about this scenario.

    I think alot of us are at the breaking point with Google, and that can only spell trouble in the long run for everybody.

  165. Hi Matt, I have a decent sized forum on my site with about 221,000 posts in 8,000 threads.

    I recently moved my forum from domain.com/forum to forums.domain.com, and I now have only 6 pages listed in google. I am guessing this will change, but I have lost some of my domain.com listings that were unrelated to the /forum directory.

    My concern is, I want to use some kind of redirect to send visitors to the appropriate link – what should I use? At the moment I am using a php redirect in /forum/index.php and /forum/showthread.php to redirect to the ppropriate link.

    And also, could this move and redirect be affecting my other results on the top level pages?

    Thanks for your help!

  166. um, so that is a great post… but like many others I am seeing strange things happening with one of my site where a 301 seems to be producing all sorts of strange results. Our site travelscotland.co.uk now seems to be registering on Google as http://www.scotland.net and strange variants of that domain such as http://www.facts.scotland.net even tho these have all been correctly 301ed to travelscotland as they are supposed to. I thought Big Daddy has cured this. I now notice that this problem seems to have resulted in the site not being indexed much anymore – caches are all for april where we used to be always up to date in the old days. Is this another artifact of the Big Daddy changes?

  167. Hello Matt, this morning I send a mail to bostonpubcon2006 at gmail, may be resolving this issue helps cut down the noise in your comments.

  168. Hi Matt, Hi JohnScott. Hi everybody.
    I’m the webmaster, the person who did this.. http://www.mattcutts.com/images/poor-quality-links.png
    Matt, first, thanks for not mentioning the url, my name or email – its really, really stupid step i took to bring back my clients site indexed in google again.

    In short:
    the website is about 6 months old, with around 30 uniques a day. I was building “natural” links for 2-3 months, few pr5,6 sites, but mostly 1-2-3.

    Right now it has 520 pages indexed, Matt, you can check them;)
    And again 30 uniques a day 😉

    2 moths back – the website had (almost)each page indexed – around 1000.
    There were NO LINKS IN THE BOTTOM. Not a single link ! Only internal links.

    And the pages went down to one. Three in the best days. For a month.
    My clients were not satisfied – they asked me to fix this. So what I did ?
    Pyramid Linkings – 3 urls in the index. 5 DP co-ops. Reciprocials from directories – the clients were not going to pay me for this, so i just spend exactly one hour to set pyramidlinkings, coops, stupid directory listings.

    JohnScott, I AM REALLY REALLY SORRY for this link (v7 contest)! It IS RECIPROCIAL LINK FROM A DIRECTORY! I absolutely did not looked at the anchor text. SORRY AGAIN 🙁

    When I bring back the “new website”, with this links in the bottom, I got nothing. Google started bringing back my webpages.. with around 10 to 30 per day. One day I saw they were back to 100 or something, then I sent the mail to Matt Cutts. The other day the index was good – 400 or something, from this day, Matt pointed the date, I was getting +10 pages a day. With this links from the image. An hour back I removed every single outgoing link, I left only the internals.

    My hands are shacking.
    I wont be able to sleep…

    Matt, I did not wrote everything above in the right direction – I dont think myself this is the reason for NOT being indexed in google.
    The situation is not the way you describet. The reason is something else, i know my website, i know what i did with this. I can proove that this links are not affecting my dropping and upping pages…
    I will not post anything else here if Matt dont want to.

    JohnScott, you are great person, sorry again for mentioning your contest with this bad topic 🙁

    /sorry for my broken english – Im from Eastern European Contry 😉 /

  169. Matt,
    Topic Specific links
    I’m a believer in topic specific links. Is this post saying that off-topic links – whether inbound or outbound – will incur a penalty?

  170. Isn’t relevance more important then if there is a wrong link on your site?

    Sorry, this is bullshit. I only want to find what i’m looking for. And if there’s a wrong link on the page(which could be relevant to the human reader) doesn’t intrest me.

  171. IMO; The best built websites that will do well into the future are those sites built “strictly” for their visitors. Period. That’s the philosophy we have had for along time now. If built that way, the robots will like the sites as well. At least they have up to this point in time.

    Can we get an Amen and a Hallelujah for this?

    TES-ti-fyyyyyyyyyyy, mah brutha!

    Come on, everyone, throw your hands up for the GOOD Word.

  172. If I have to look at one more Amazon listing at #1 and #2 after being bumped to #3, I’m going to run screaming into the night.
    Just because it’s dAmnazon doesn’t mean EVERY page of their site is more relevant than everything else on the planet…

  173. Hi Doug. I didn’t take your post as being aimed at me personally 🙂

    Unlike you, I’ve never done reciprocals, so none of my sites, or any site that I’ve ever worked on, will ever suffer because of that method. I haven’t even been talking about any sites that I’ve had anything to with, although I used one as an example, because it specifically has a clean bill of health, and it’s frustrating watching it die – presumably because I didn’t do any link-building, so it doesn’t have enough IBLs.

    This isn’t about a site “doing good”, Doug, or about rankings (you mentioned that in your previous post). This is about a site being treated fairly – just because it’s there. If a site contains useful pages and resources, then it should be fully indexed, regardless of how many votes it’s managed to get. That’s what search engines are supposed to do. That’s what a search engine’s users expect it to do for them – they (we) expect a good search engine to give us the opportunity to find as many useful pages and resources as it can. They (we) do not expect the engine to intentionally leave out useful pages and resources.

    Doug. Can you come up with a good reason why that example site (the one I mentioned in my previous post) should not have all of its pages indexed?

  174. Thank you for the post; it was the most information I’ve seen on the situation thus far.

    Unfortunately, it was also rather disheartening as regards inbound links. Some sites just do not naturally generate all that many valid inbound links – and I’m not talking about small “mom & pop” sites, either. My two largest sites are B2B catalog shopping sites, not really small as each has in excess of 2000 products online. Both have been in operation for years, and both were hit very hard in the past few months by pages dropping out of the index. I would stake my year’s salary that there aren’t any spam issues with either site (and both of which did QUITE well up until BD started rolling out) – but (at least according to Google’s light) there just aren’t many links out there to either site – the only places that WOULD link them would be spammy b2b directory sites, for the most part. They are undoubtedly in a lot of people Favorites folders, because they have very high rates of return customers, but they will never ever collect up any quantity of relevant natural links. I mean, think about it. Would walmart.com need a bunch of backlinks in order to rank highly, and why would anyone put a link to walmart.com on their site in the first place?

    So now I have to explain all this to my clients, who aren’t going to understand anything except that they’re pretty much out of Google for the foreseeable future, and there’s nothing legitimate we can do to get back in.

    Like I said, disheartening.

  175. Matt,

    Sounds like Google is now actually penalizing for poor quality inbound links. Does that mean that a malicious competitor could link our site to FFAs, link farms and other bad neighborhoods and actually hurt our rankings or get our crawl cycles reduced?

    Also sounds like Google doesn’t like affiliate links. But if AdSense is okay then why are affiliate links bad? Seems that coupon sites, shopping comparison sites and reviews are in jeopardy of being dinged now.

    I have a PR7 homepage (www.shopping-bargains.com) but no rankings and my homepage isn’t even in the index (I just checked and we have only 10 pages there — was 8 yesterday). We used to have thousands of pages indexed. Something seems odd — we have no reciprocal links, don’t sell links, don’t buy links in mass (occasionally buy a banner or link for marketing reasons in newsletter or blog, etc.). We have original content and have been online for 7 years. The index is fluctuating wildly though for us.

  176. I too agree with PhilC!

    And, all I know is that as one who has tried to actually use Google to buy some rare plants, the results are way bad!

    Trust me that when you do a search for an item to buy and all you can find are the paid for listings and stupid content sites, the value sure ain’t what you were looking for!

    I have even put “buy” and “for sale” in the searches and all I get are stupid content sites. Some great value that is! I want more choices than the stupid paid listings. Thank you just the same!

    All I have found are stupid sites describing and telling the history of the plants. What I wanted was the small plant nursery in New York that sells the plant. I was ready to buy! That is why I typed “buy” in the queries. Too bad and so sad that they didn’t attract enough natural links! No results for me!

    Google results suck for shoppers of rare plants, at least!

  177. Matt,

    About indexing. Why is it that the site: command for some sites still showing more pages than the site actually has? I know of sites that have maybe 12.000 pages and Google is happily claiming they have 35.000 pages indexed.

  178. [quote]Sounds like Google is now actually penalizing for poor quality inbound links. Does that mean that a malicious competitor could link our site to FFAs, link farms and other bad neighborhoods and actually hurt our rankings or get our crawl cycles reduced?[/quote]

    Mike, IMHO, Google is not actualy penalizing a site for poor quality IBL. What they do is they just discount those links in a much more drastic way than before. What is more, they now regard reciprocal links as very poor quality links.

    As you have less IBL, your PR drops and your site is less crawled / indexed than before.

  179. I think you guys might be missing one thing here. I dont think what happened is permenant thing. Your pages have dropped out because of crap links not counting or your quality links own crap links not counting. It is all about reputation. If your pages are gone now it is because you lost your reputation. This doesnt mean your pages will not get indexed. This just means your pages are pushed back to a waiting almost sandbox like state where it will take time for them to index again. Quality natural links just help like they allways did. Now it is just harder to fake.

    Maybe Matt can tell us if that is true… Will the pages eventually get indexed even if they dont have the reputation from other quality sites? Is it a matter of time and age if there are no links?

  180. Adam Senour… what do you think about webmasters who submit their sites to directories?

    These types of links arent’t exactly natural, and often aren’t relevant. Are these people trying to manipulate the search engines?

  181. Alex Duffield

    OK, Matt, I just don’t get it, how on earth does a site do this sort of linking

    http://hyak.com/links/links_computers_internet.htm

    And not get penalized.

    They have thousands of links like this, totaly irelevant to the content of the site.

  182. It is not a penalty.. it is a matter of what counts and what doesn’t PERIOD.

    What feels like a penalty is just the fact that you lost reputation due to your links or your links own links are no longer valid. Your competition can not hurt you by doing bad link in your site’s name. That might actually help with whatever small amount of traffic you get from that. Other then that it will not hurt it will just not help.

  183. Matt,

    Thank you for all the information that you furnished to us and the examples that you showed. It was a very welcomed feedback.

    The information that you furnished surely helped everyone understand what is happening and why it is happening.

    Personally I hope that Google continues to work on presenting quality sites that assist a visitor with helpful information in the high SERP’s. People have to remember that Google sets guidelines and those don’t follow the guidelines will suffer. We have to follow the “rules” in order to win the high SERP’s.

    Again thanks for your post.

    Now enjoy your vacation and the time with your family.

  184. People have to remember that Google sets guidelines and those don’t follow the guidelines will suffer. We have to follow the “rules” in order to win the high SERP’s.

    Exactly what “rules” did the Health Care directory site break? (that’s one of the examples that Matt gave)

  185. Hi Matt, thanks for the post, I agree it was good to hear some real estate examples used.

    I have a question regarding overuse of reciprocol links possibly causing lack of crawling? I run a network of real estate sites, one site for each different country/region we offer (total 9). Each site links to each of the other sites for obvious reasons.

    My question is would this interlinking of 9 different sites to one and other from every page on each site be regarded as spammy for the google bots. Would this have a negative effect on my sites?

    If anyone else can offer any advice I’d be very grateful. Thanks.

  186. PhilC, again do you feel that the pages even in time will not get indexed without quality links? If a site has something indexed, don’t you think in time the rest will get indexed? I think so, but how long is the real question.

  187. Matt, could link selling affect a site’s ranking in the SERPs, directly or indirectly, despite the site’s not “losing” any pages in Google’s index?
    I’m seeing a client that sells some (on topic) text links on some of his pages suffer in the SERPs since a couple of days ago, yet that client hasn’t lost any pages whatsoever in the index – the only thing I’m seeing is that all pages that previously ranked well are now not ranking well (yet still can be found)… Could there nevertheless be any conection?

  188. Mike, it might have something to do with the many many links you have from spam pages like this one:http://www.creativehomemaking.com/articles/112603g.shtml

    and the fact that all of these spam pages participate in the same “web rings” with the links on the bottom.

  189. Hi,
    Matt,
    If the sites you stated above as having poor links, if the outwards links had a rel=nofollow would it improve the number of pages indexed?

    thanks

  190. Phil my site is a mom and pop site, I don’t have a lot of quality incoming links (have a ton of links from scrapper sites), and all my pages are indexed. Like you I have never done a link exchange and don’t plan to. So the idea that you have to out and build links in an unnatural way is not the case for every site.

    Most of my pages can be reached by 3 clicks, a few 4 clicks from the home page, and or the common navigation that is on every page. I don’t use Googles site map. I do have a pretty good html lite map, so I’m wondering if navigation may be part of the problem for some sites loosing pages or not getting them included in the index.

    I see a big difference in the number of pages when checking with the API and actually checking at Google.

    I do have a problem that has surfaced in the last couple of weeks. For me Google is having problems with 301 redirects again. At least for me.

    site:domain.com 500 plus pages
    site:www.domain.com plus pages
    site:www.domain.com -www.domain.com 300 plus pages which are supplemental so maybe they are on their way out again.

    Up until a few weeks ago site:www.domain.com -www.domain.com showed 0 pages. According to the API domain.com is showing PR again for domain.com.

  191. hmm, so things are becoming more and more difficult each day and i am now of the view that people will need to understand the real importance of Good Content updated frequently and having good links only.
    No short cuts anymore 😐

  192. Connie, is your site ranking for smaller terms only? I bet your niche is not a very competitive niche and you are ranking for obscure terms for then large general terms.

    I too am associated with a niche site not doing any link building that has not been affected by Big Daddy, but this site is ranking for small stuff.

  193. [quote]Can you come up with a good reason why that example site (the one I mentioned in my previous post) should not have all of its pages indexed?[/quote]

    The one in frames? The one using javascript links and no hrefs? The one that looks like an MFA site? Is that the one?

  194. It’s worth remembering that not every site can ever appear at the top – seems obvious, but Google has to place resuts in some kind of order, and I have no quarrel with newer sites not being featured as fully as mature sites; if they develop, their turn in the sun will come.

    If the result is a cleaner, more spam-free search result, then I doubt many users will be complaining. And I suspect Google has not forgotten the needs and preferences of searchers, as we consider our sites, our client sites and the spam sites that get in the way.

    This has been a very useful thread – but has it really contained any surprises? I don’t think so – Google has long warned of the bad practices mentioned above; just some people just never believed they could put their money where their mouth is; kudos to Matt and the teams for significant progress on reciprocal and paid-for links.

  195. I’ve spent alot of time today simply looking through the serps and reading posts here there and everywhere.

    Yesterday we saw a massive change in the serps and now my target market results are full of nothing but spam, cloaked pages and general rubbish.

    Now, I run a number of affiliate based sites, but i build my own pages with my own content and as of yesterday, i basically don’t get found, has Google scanned my site, realised that my visitors are joining an affiliate program and therefore penalised me due to that?

    I can appreciate penalising sites that buy a domain name and then basically copy their affiliate programs text etc. and thow the site up for themselves with nothing originall to offer at all, but just becasue I promote affiliate dating does that mean that the 2 years of work that has gone into the site is simply ignored?

    Matt, I’d be intrested in any feedback and the url is available if you get the time to respond.

  196. Connie. It may be that a very large number of sites haven’t suffered – yet. But I can’t see that what’s happening is to do with the navigation. For one thing, Matt used examples where he said that, if they get more IBLs, the new system will know to index more pages (as if they don’t already know). For another, the many sites are having their pages dropped, had their pages indexed – so why drop them if they are already in the index?

    We have always known that PR affect the frequency and depth of crawling, so crawling was never equal amongst sites. But now they have added links to the criteria, and if a site doesn’t do well on links score (e.g. not enough IBLs that can be trusted), it just doesn’t get crawled as it deserves, and its owner is froced to either accept it, or to get spammy and do some link building.

    I was tempted to suggest that it still may be down to just PR, because IBLs bestow PR, but my site that I’ve used as an example currently shows PR5 on the homepage, which has always been enough for decent crawling. Even so, the toolbar PR is always out of date, or they may have simply moved the scale up a bit. But Matt said that the new crawling/indexing system is new, so I’m sure that it isn’t still just PR, and that IBLs, and maybe OBLs are significant factors.

    The example site that Matt gave – the one that I used in a post – is a directory, and, as a directory, it probably needs to be drilled down – good pages that are plenty of steps away from the homepage. If the number of steps could have been the cause, then I’m sure that Matt owuld have said, instead of simply saying get more IBLs and we’ll crawl more of the site’s pages.

  197. Hi Matt,

    some URLs lose pages in the index but this site is still growing :
    site:69.41.173.145 –> Do you think your duplicate content algo is ok ?

    Greetings from germany,
    Tobias

  198. Hi Phil, don’t know. If you post your site in question, maybe “ihelpyou” with it. 😀

    You know there is no way a general answer to an unseen website with problems is a good thing. I’ll put it this way; I really doubt your problem with your site has anything to do with “links” in or out. The entire backend code and html code output might need to be redone.

  199. Adam Senour… what do you think about webmasters who submit their sites to directories?

    These types of links arent’t exactly natural, and often aren’t relevant. Are these people trying to manipulate the search engines?

    In and of itself, I don’t have a problem with the concept. The problem lies in the quality of the directory, whether it offers free one-way inclusion to sites that deserve it, and how many of said sites they submit.

    Submitting to a directory, particularly to one with a captcha tool or similar device, is not automatic, nor is the approval (depending on the quality of the directory again). I don’t really see it as “unnatural” either, since the basic premise of these sites is to act as informational portals. Submitting to a directory provides them with the content that they need to build their own site, and gives the webmaster a link for traffic generation purposes (notice how I didn’t use the phrase SEO purposes).

    A good example would be Human Edited Directory (yeah, I’m a mod there, so I’m slightly biased…although I’d say the same thing if I wasn’t). You don’t get onto that directory unless you damn well earned it, although you can submit for free.

    What’s “unnatural” to some about submitting to directories is that you have to go to some stranger’s site, find a relevant category, and ask for a link in that category.

    So no, I don’t have a problem with it, as long as the directory has some quality standards in place and the link provided is a relevant category backlink.

    To borrow from something Phil has stated in the past, it’s quite often not the concept, but how people choose to abuse it.

  200. Hi Matt,

    I have a quick question for you regarding the supplemental index. When I search for a manufacturer part QUA41965 in Google it starts returning supplemental results on the first page (4 out of ten). Each additional page is primarily in the supplemental index. When you drill down to page four you can still find results that are not in the supplemental index though. Should not pages not in the supplemental index be returned before those that are? It seems to be against what is considered supplimental IMO.

    Thanks.

  201. No more free traffic

    hehehe

    Buy some Adwords folks. Free traffic is now only free for the multinationals, spammers and those with special deals with Google.

    All the others buy some AdWords please…contribute to the great cause…

  202. Is SPAM any attempt to deceive the SEs to artificially increase rankings?
    What if I have a nice W3C validated site with some 16.000 clothing products from various vendors, nicely categorized, with updated datafeed, some coupons, with no spaming techniques, no hidden text, no cloaking, nada. zip, zero.
    That, in the definition of SPAM… is not SPAM.
    So why am I reduced to one indexed homepage in Google?
    Isn’t the ability to search for the same class of products from various vendors at the time, and compare prices, service enough for the mighty Google?

  203. Hi Matt,

    Great post, and thanks for following up the comments, makes for a happy community 🙂

    Like most people here, I have a number of otherwise good sites with 5 or so rubbish footer links on each page. I was never happy about putting them there, but only did it because it does work, and there is no point writing original content if nobody will read it.

    Are you saying that these links are now completely worthless? I’m all for “best practice” and following the guidelines, but I’m reluctant to stop this kind of linking if it still works for other people.

    Don’t get me wrong, I’m very keen to see the death of unrelated footer links, “resource” directories and begging emails – but if Google still rewards these practices with good rankings then people will continue to use them.

    Thanks for the responses so far.

    Harvey.

  204. Eternal Optimist

    In reference to supplementals, I am sorry but the issue has not been dealt with here, as doing a site: check on a number of both small and large sites, it is difficult to find a site that does not have any supplemental issue.

    This must mean that Google has an inbalance in the settings of the algos, OR that they deem virtually no sites are worthy of the merit of having a clean bill of health.

    I can check the same sites, which aren’t even mine, on a daily basis, and they delve deeper and deeper into supplementary hell.

    Why would webmasters have coined the phrase ‘supplementary hell’ if there was no issue?

    Thanks for trying to appease webmasters Matt, and we don’t blame you. It’s just that we feel it’s about time things improved.

  205. One thing I want to be clear about is that Bigdaddy isn’t especially intended to do differently on spam; it’s just an infrastructure upgrade to our crawling, and we get better at judging link quality, our crawl changes as a natural consequence of that.

    The other thing is that I certainly don’t want to imply that everyone who is still seeing less pages crawled was somehow getting spam or lower-quality links. I just wrote up the five cases that I analyzed in more depth. As a large change in our crawling infrastructure, it is to be expected that some sites will see more or less crawling.

    In fact, I just got out of an hour-long joint meeting with crawl/index. Jim, we talked about your site, the one where you said “I’m trying to maintain a straight ship in a dirty segment.” There’s absolutely no penalties at all on your site; it’s jut a matter of PageRank in your case. You’ve got good links right now, and several hundred pages crawled, but a few more good links like you’ve got now would help some more.

  206. what does this command do? site:www.domain.com -www.domain.com

  207. I have to admit, from reading the post and Clikz column, that what is happening in practice is that sites with less money and marketing spin behind them are regarded as less important, and are therefore to be pretty much to be ignored. It’s no longer about document relevancy, as much as site popularity.

    Perhaps one day Google will go a step further, and simply take the top 1000 sites according to Alexa, and return only results from them? 😉

  208. Let me also describe a little bit of the interaction between the main results and the supplemental results. Think of the supplemental results as a large set of results that are there in case we don’t find enough main results. That means that if you get fewer documents crawled/indexed in our main results, you’ll often see more supplemental results.

    So I wouldn’t think of “having supplemental pages” as a cause of anything. It’s much more of an effect. For example, if you don’t have as much PageRank relative to other sites, you may see fewer pages crawled/indexed in our main results; that would often be visible by having more supplemental results listed.

  209. That’s a post Cutts! hehe. As you’ve stated that poor choices in outbound links can cause crawls/indexing to be negatively effected, I’m wondering if the opposite can be said of linking to high quality (trusted) relevant links? What say you Inigo?

    BTW, great show yesterday. Hope you can do similar more often!

  210. shorty, much appreciated. I wanted to get the timeline out of my brain and talk about what I was seeing before I headed out for some time off.

    Alex Duffield, in my experience those links aren’t making much/any difference with Google.

    Peter, without knowing the site I couldn’t be sure. It’s possible that we’ve indexed the site with www and without www, or there might be some session IDs or other parameters that are redundant.

    “Sounds like Google is now actually penalizing for poor quality inbound links.” Mike, that isn’t what’s happening in the examples that I mentioned. It’s just that those links aren’t helping the site.

    David Burdon, no, off-topic links wouldn’t cause a penalty by themselves. Now if the off-topic links are spammy, that could cause a problem. But if a hardware company links to a software package, that’s often a good link even though some people might think of the link as off-topic.

    nsusa, WordPress seems to have problems with the greater-than sign.

    Peter Harrison, thanks for the book recommendation! I love early Neal Stephenson (less so his historical fiction).

  211. “The other thing is that I certainly don’t want to imply that everyone who is still seeing less pages crawled was somehow getting spam or lower-quality links. I just wrote up the five cases that I analyzed in more depth. As a large change in our crawling infrastructure, it is to be expected that some sites will see more or less crawling.”

    Kind of enlightening that this MAY not be the case for us and others. Still hurts not having the whole site (well the better part of our sites) being “avoided” in the index and not know why this is happening after 5 years of business.

    The sad thing is that the only thing we really have to go on is sharing experiences and this isn’t getting many of us very far just that there is some sort of problem and we can’t find a correction.

  212. Matt, what about sites that have some pages indexed.. with no links to the site or very few, will the enitre site ever get indexed? Is it a matter of time or do you have to get more links to get pages indexed deeper?

  213. nuevojefe, thanks! It felt pretty business-like and on-topic. After the mike turns off, then I took Danny up to an office and we just chatted for a couple more hours. It’s amazing to me just how much fun some of the top people in search are. 🙂

    To go to your other question. I wouldn’t be thinking in terms of “if I like to Yahoo/Google/ODP/whatever, I’ll get some cred because those sites are good.” If it’s natural and good to link to a particular site because it would help your users, I’d do it. But I wouldn’t expect to get a lot of benefit from linking to a bunch of high-PageRank sites.

    Peter Harrison, I’m going to go buy some books right now; you’ve inspired me. 🙂

  214. But I still don’t see an explanation for pages not showing up on the regular or supplemental index that have been craweled and that are over a month old.

  215. What a Maroon

    Matt, why is it bad for a real estate site to link to a mortgage site? They seem to go hand in hand. Obviously I couldn’t follow the link to check the site to see if it was just a scum sucking scaper site, but your statement seemed to overgeneralize. If your bot does the same, then Google has a problem.

    I am also perplexed by the reciprocal linking issue. Is it now always a bad thing? Is the relevancy of the topic a compensating factor? While it may be gamed, it has also become a powerful networking tool for many. In my personal services sector where referred business is an integral part of the business model, I have received referred business from those i meant via reciprocal links that amounts to close to $50k in income in the last 10 days alone. How is this bad?

    I am also confused by the apparent contradiction regarding links, PR, crawling and indexing. Sounds like a chicken and the egg scenario. Its implied not to buy links or reciprocate, but if that advice is followed, then Google wont crawl it or index it, so how is anyone to find it to be so overwhelmed as to be compelled to graciously link to it?

    OMG, did I just agree with Jill and PhilC on the same issue in the same sentence?

  216. Let me also describe a little bit of the interaction between the main results and the supplemental results. Think of the supplemental results as a large set of results that are there in case we don’t find enough main results.

    This has also been part of the problem Matt. The supplemental results have been unsearchable. They have not been being returned when you don’t find enough main results.

    Dave

  217. What’s Googles definition of ‘Find web pages from the site domain.com’? If you click those links, you some times only get the index page. Even supplemental results should show up when you make that search.

  218. Wow Matt

    You got some cahones and came out and said the CEO was wrong about the machines being full??

    Everything else is old hat SEO that amounts to “Webmaster Quality Guidelines” being followed.

    Obviously you could have saved some carpel tunnel just telling people what I and others have been saying ,.reciprocal links have zero value other than to hurt you. Follow Googles webmaster guidelines and you’ll be fine.

    Clint

  219. Wayne Said,
    May 16, 2006 @ 1:07 pm

    Thank you Matt for the update. I really appreciate you finally using some real estate sites as examples. Since this is an indexing issue I thought I would bring it up.

    After checking the logs today I noticed this coming from Google pertaining to our site.

    http://www.google.it/search?hl=it&q=fistingglessons&btnG=Cerca+con+Google&meta=

    LOL now as you can see the #2 site is a real estate site listed for this search term.The page showing for this search is a property description page. As you can tell from the sites description it has nothing to do with this subject matter. Would you mind checking with the index team and see why maybe this would be indexed for such a phrase.

    Matt Could you please check on this for me with the index team. I am sorry but I am getting a lot of traffic from this according to my logs which we shouldnt be ranking for fistinglessons for a home listing details page. The house listing has been removed from that but its the kind of traffic I do not wish to have.

    If you decide to dig around in our site. Your thoughts on whether we are abiding by what Google likes to see would be nice.( I know I ask for it so what I get I wont hold against you 🙂 We want to stay 100% Google compliant but as I have said before we are small fish in a big pond so we make mistakes like everyone else.

  220. Saying you can’t do reciprocal linking is just sheer idiocy. How does Google expect you to get back links?

  221. Guys can you please stop asking silly questions…

    The message is crystal clear…use AdWords…

    😉

  222. Hi Phil, don’t know. If you post your site in question, maybe “ihelpyou” with it.

    You know there is no way a general answer to an unseen website with problems is a good thing. I’ll put it this way; I really doubt your problem with your site has anything to do with “links” in or out. The entire backend code and html code output might need to be redone.

    I didn’t ask about my site, Doug. I asked you if you could come up with a good reason why the health care directory site that Matt used as an example shouldn’t have all of it’s pages indexed. Perhaps you should have *all* of Matt’s post 😉 There isn’t a good reason. Matt’s best judgement is that it’s a shortage of IBLs. Simple as that. The site had had it’s pages indexed, but with the new BD crawling/indexing it’s pages have been dropped simply because it doesn’t have enough IBLs. It makes sort of sense at all.

  223. That last sentence should have read…

    It makes NO sort of sense at all.

  224. Matt: is there any way to tell Google “index this page, but serve the permalink in the SERP?” This is a problem for my blog … when entries are being served off the main page (http://dossy.org/), searches for keywords in those entries return the main page URL in the Google SERP. However, it seems the index is updated less frequently than entries dropping off my main page, so while Google’s SERP brief text shows the relevant content — causing a user to click through on the result — the page they end up on no longer has the content. Eventually, it seems Google’s crawler figures things out and the SERP eventually links to the permalink for the entry … but I’m sure this behavior is frustrating some users.

    I tried adding the meta header “noindex” to my main page to prevent it from showing up in SERPs, but then for searches for “dossy” where the main page SHOULD be #1, no longer has the #1 spot — very annoying. So, I’ve removed the meta “noindex” from the page and am waiting for Google to crawl my blog again.

    Any advice? Thanks!

  225. Matt,

    You have no idea what your two sentence comment has done to lift the spirits of 2 down and out guys in boston… thanks!

    jim

  226. It appears that everyone is getting on board the link train as the
    problem.

    So I did some checks on one of my sites, When I do a link:www.mydomaininquestion.com on Google , I get Results 1 – 1 of about 45 linking to http://www.mydomaininquestion.com. (1.25 seconds) in the bar, with only a page from my actual site shown.

    However if I do the same thing on yahoo i get Results 1 – 10 of about
    671 for link:http://www.mydomaininquestion.com.

    So for some reason 44 of my links that google knows I have are hidden
    from view but show up in the count, and perhaps hidden from the
    indexing algorithm, this seems like something very specific that could
    be checked out on your end.

    Thanks.

  227. Is your site losing pages from the index, John? I just did link: check on my site (the one I mentioned earlier), and it’s the same. It will only list 16 of about 629 links. For my site, I’d put it down the constant changes in the index as pages are being dropped wholesale on a daily basis. I’m thinking that the index might be a bight confused concerning my site right now.

  228. PhilC,

    I know Matt doesn’t want this to turn into a discussion board, so feel free to delete this post, but to answer your question yes, we were at a high of 17,000 pages in march, two weeks ago 500, saturday 140, today 39. I’m not going to check anymore after today, because I know whats next: NO INFORMATION FOR THAT SITE

  229. Some thoughts:

    1) The explanation that Google isn’t fully indexing sites based on the lack of quality/quantity of incoming links and lack of quality of outgoing links sounds like a policy change. Has Google always lacked a commitment to building the most comprehensive index it can and this is just the first time those sentiments have been voiced, or is this something new, perhaps in response to a storage crisis and Google’s inability to keep up with the growth of the web?

    2) How finely tuned is this improved link quality filtering? Does it simply look at the percentage of IBL’s that are reciprical, and apply a filter after a certain threshold, or does it attempt to determine the relevancy of those recipricals before placing a value on them? When evaluating the relevancy of both inbound and outbound links is this just a quick semantic analysis that would miss the fact that the ironworkers endorsement of a pizza joint is certainly a good link? How much good content is Google willing to hurt while trying to prevent having it’s results manipulated?

    3) What I see here is good times ahead for link-building SEOs. Panicky phone call from owner of a marketing site seeing it’s thousands of pages dropping from the index… Calm explanation of Google’s new approach – show them your blog here Matt… Tell client what will be involved in building a link network on multiple domains across Class C’s with relevant content that Google will perceive as quality IBLs… You could always just spend that money on Adwords and trust that Google won’t bill you for click fraud – show them lawsuit pages…

    I hope Google has something better than this coming along quickly, because the future doesn’t look pretty.

  230. Hi Matt C – reworded, jeesh

    Many have an odd issue, what is the Big Daddy cause here?

    Sites have done well for a very long time, sites have some good natural incoming links from major sources in science magazines and even links from large established online portals.

    Since April 26 or so almost all our pages went from page 1 to page 4 across the board. Is this a penalty, why such a drop so fast on all positions on all terms? This is happening to many sites.

    Not sure what to make of this, we did submit a re-inclusion request but I have seen no change, why such a huge dump on a site?

    Thank you Matt

  231. Matt, I got the feeling, that you are very harsh with the new filters… I just dropped from 3M results to below 300k with my site. That does not seem right, since I am one of the few places, where you can download MP3 files, which you can buy on CD elsewhere. I think G found the duplicate description and tries to filter now. That is not good. Mine are downloads, having the same description of the artists like the tangible goods… And I am gaining incoming links like a maniac, by signing up over 10 digital merchants a day… I get incomings from every place of the net, but I do hope that incomings can not hurt one? Maybe the over 500 scrapers, who live from my RSS feeds are causing that?

  232. Dave (Original)

    PhilC and others

    Have you ever considered that with soooooo many web pages out there Google (at this time at least) HAS to limit its crawling and indexing.

    Lets face it, they are (and have been for years) doing a better job that the other BIG 2.

    When I search the SERPs to BUY, I often see mom & pop pages ABOVE those of the BIG merchants.

  233. Well it just went critical for me after 10 years on the Internet.
    Google has eliminated so much of our site, rankings, and traffic we likely won’t survive. What’s going on is just way too harsh. We play as much by the rules as we can. We only have about 75 links but apparently Google is annihilating our small site. I wish I could take a vacation but I’ll have to worry about putting food on the table.

  234. Phil, I was responding to you thinking you meant the site you were watching.

    Okay, …. health care directory?

    Without looking; I’d say it’s more about the sites that directory is listing than anything else. Does it require a link back? Is it all paid? Does it exist strictly for adsense? Does it have a real purpose for users on the internet? Being in the market it is in, I’d have to have those questions answered and see the site. I’ll bet big bucks it’s because the quality of sites listed isn’t good. A major search engine has to start drawing the line somewhere. No one could continue to simply index page after page of low quality websites, especially directories.

  235. You say that like it’s a bad thing:)

  236. Dave (Original)

    How many people ACTUALLY search for a directory anyway? Answer, not many. They are so low in demand that Google shifted its DMOZ clone off their main page years ago. Check your log stats, even DMOZ sends next-to-nothing.

    Besides, why on earth would/should a SE list a directory page?? It would be a link to more links! There is no longer any need for this 2-step approach as SE’s are so much more advanced than when directories WERE popular.

  237. I didn’t ask about my site, Doug. I asked you if you could come up with a good reason why the health care directory site that Matt used as an example shouldn’t have all of it’s pages indexed. Perhaps you should have *all* of Matt’s post There isn’t a good reason. Matt’s best judgement is that it’s a shortage of IBLs. Simple as that. The site had had it’s pages indexed, but with the new BD crawling/indexing it’s pages have been dropped simply because it doesn’t have enough IBLs. It makes sort of sense at all.

    Actually…that’s not what was said at all. That’s what you chose to read. The key sentence is actually here.

    Hold on, digging deeper. Aha, the owner said that they wanted to kill the www version of their pages, so they used the url removal tool on their own site.

    It was dropped because the site owner made a mistake. Not a spammy mistake, and certainly an honest one, but still a mistake.

    That’s not BigDaddy.
    That’s not Google crawling/not crawling/indexing/not indexing.
    That’s not Matt pretending to be God and striking down upon some site that apparently doesn’t deserve it.

    That’s a webmaster relaying a message, unintentional as it was, to Google asking for a removal.

    So there’s a perfectly good reason for Google to remove it…they were asked to do so.

  238. Matt, thanks for this update, I have to say, this confirms what I’ve been increasingly suspecting about a vast majority of those webmasterworld posters who have been complaining about these specific issues, and it fits exactly with what I saw over a year on another search forum I did for a while.

    Especially amusing was the guy who had 10k indexed then dropped to 80, I’ve read him, he comes off as if he’s lilly white, and there’s that typical spam garbage.

    This isn’t your problem, but I think wmw’s policy of not allowing any reference to the site in question is starting to seriously damage the viability of their search forums, especially their google forums. As you found, and as I’ve suspected, quick checks showed the weaknesses easily. That’s exactly what I found over a year of doing site checks too, that’s why I stopped, it got boring and predictable.

    But still very glad to read the updates on this, I’ve been following that supplemental nonsense for a while, I pretty much ignored the big daddy indexing stuff because it was pretty clear what was happening even without being able to look at the sites in question.

    I don’t envy you your job at all, having to dig through this stuff all the time.

    Too many comments, didn’t read them, no need, your post was pretty concise.

  239. Here is a question about quality “earned” links and recip or poor quality links. I am a web developer, and I create amazing websites that are linked to from all across the web because people are talking about the design, or functionality or other legitimate means.

    Now I have a credit for work link as an image (My companies designed by logo) on each site. Thus in essence each of these sites is a backlink for me. By itself is this a good link?

    Now here is the other thing – I also like to place my designs in my online portfolio for prospects to view (Im showing off my work) – and this usually includes a “visit this website” type link so they can see what the site looks like in real time and what the client is doing with. Have I not in essence created a reciprocal relationship here? Will these links be discounted in some way, were they poor quality links to begin with?

    This relationship seems very natural whether link popularity or pagerank existed or not – companies would put their credit for work logo on a site, in hopes that others who appreciate the design would see the designers insignia and hopefully hire that company. And of course we artists always want to show off our work.

    So what’s the deal? I know all recips arent bad – but where is the line?

  240. Addition to above, and I failed to mention it – I am not complaining.

    I rank #4 out of 153,000,000 in Google for my most important targeted term which is quite competitive, like I said, I’m not complaining. However after reading this, it almost made me want to remove the links in my portfolio to my designs, or put a no follow on or something. And what about my discussion forum? I run a vBulletin forum on my site that has thousands of members, all of which are also web designers or clients. They are there to post and chat and learn about web design, show off their latest projects etc. Now they have signatures so people can view their work, and they are html links. Thousands of them also link back to my site with varied anchor text, usually along the lines of “proud member of” that sort of thing. How does this sort of thing play out with reciprocal links in this situation?

    I am not going change my portfolio of course, because they way I have my portfolio set up makes sense to me and it is not about link pop for my clients its about showing off my work. But my forum is another matter – we try to keep it as clean as possible and have great mods that kill spam right away, so I am still confident in the quality of my members signatures, but should I let posts like this scare me – is there a benefit to my users (From a link perpective to having the links in thier sig, is it a detriment to my site?)

    I mean no matter what the members would still post if I took sigs away, they are their for the education and a sense of community, but what bugs me is that I feel like I have to do something special in fear of loosing my google rankings, that “If search engines didnt exist” I wouldnt do. Signatures come stock in vBulletin and members like to get creative with them and use them, they have fun with it. Do I need to alter this natural element to appeas the great G gods?

  241. Before you try to dig through all of the comments, I’ve just written an executive Summary of Matt’s comments on reciprocal links at http://www.ahfx.net/weblog/83

  242. Doug Heil >>> No one could continue to simply index page after page of low quality websites, especially directories.

    I don’t think thats a good idea – shouldn’t it be more like, index but don’t rank if something is low quality? After all, a SE can go wrong in what it thinks is low quality – but me as a user would prefer to occasionally go through the first 100 pages in search of that one thing I am looking for. I would prefer if its there somewhere, and someone else doesnt ignore it altogether. Even the directories – why punish when you can’t be 100 % sure?

    Search engines were meant intiially to index everything they could, but rank as they judge. Unless of course you have other issues like overload and unmanageable data…

  243. h2, I completely understand the policy on WMW. You can’t go into specifics without it quickly unmanageable. The other thing is that those were the five cases that I dug into. But Adam found several domains that we’re digging into more, for example. I asked someone to dig into your domain for example, John. But John, bear in mind that we only show a subsample of links that we know about.

    Dossy Shiobara, I see that on my blog sometimes too. It’s natural, because if we see the same article/text in two places, we pick the more reputable page (which is the root page of your blog, in this case). I wouldn’t use a noindex tag; you might consider putting fewer articles on your root page though. That would more quickly put the right text onto the individual pages.

    Jack Mitchell, you said “Saying you can’t do reciprocal linking is just sheer idiocy. How does Google expect you to get back links?” I’m not saying not to do reciprocal links. I only said that in the cases that I checked out, some of the sites were probably being crawled less because those reciprocal links weren’t counting as much. As far as how to get back links, things like offering tools (robots.txt checkers), information (newsletters, blogs), services, or interesting hooks (e.g. seobuzzbox doing interviews) can really jumpstart links. Building up a reputation with a community helps (doing forums on your own site or participating in other forums can help). As far as hooks, I’d study things like digg, slashdot, reddit, techmeme, tailrank to get an idea of what captures people’s attention. For example, contests and controversy attract links, but can be overused. That would be my quick take.

    And now my cat insists that I spend some quality time with her before going to bed.

  244. Dave (Original)

    Matt, you have the patience of a Saint.

    It matter not what you write, many here will only put on their selective reading glasses anyway.

    The funny thing is, most of what you write is just plain old common sense.

  245. Matt, It sure looks like you’ve got your hands full here with all these posts. I tried to read all of them, but they were just too many. It’s funny how people (or should I call them, concerned searchers) view Google’s efforts in providing quality results in the SERP’s.

    Even though I have my own battles as an SEO and SEM marketer, I have to abide by the rules and make sure my client sites are ready, not only for Google, but other SE spiders as well.

    In my mind Google is doing their utmost to keep on providing accurate results based on the search terms. Why would you destroy the very “kingdom” you yourselves have built. Surely you’ll want to maintain your position as the #1 Search Engine worldwide ??

    Anyway, great article Matt. It does indeed explain a lot. Thanks.

  246. Matt,

    Say that I did recip links in the past and now I decide to remove all the links. How long does it take for Google to know the change and adjust my ranking (crawling priority) accordingly?

    Steve

  247. Hi,

    a question regarding cache. Most of caches pages from my site are dating from last february. Since I have change URLs. Olds one’s are redirected to the news one, but as google bots don’t like my site anymore I have loose 6000 indexing pages and no new pages are indexing since bigdaddy update. So I have 2 questions :
    – why my caches pages are so olds (most of them were uptodate the day before bigdaddy update)
    – how make google love my site again ? 😉

  248. Phew, glad you cleared that up about reciprocals otherwise I’d be deader in the water.

    I deleted an old http site map and added my www. sitemap as nothing was gettin indexed. Still nothing getting indexed. Did I drown myself by deleting the old map?
    Do I need to do a resubmittal form?

    My site only has two links in google and theyre both from the same place! I know I need more, but is it the links or the sitemap keeping it from getting indexed?

  249. Dave (Original). Coincidentally, I posted this in my forum just a few minutes before I read your question.

    The more I think about how the dumping of pages doesn’t make any sense, I’m wondering if they really are short of space in spite of what Matt said. He said that they have enough machines to do it all, but to do what – run a pruned index or expand the index?

    I can imagine a meeting where they discussed whether or not to keep on adding machines and capacity, or to be more selective about what they have in the index.

    Doug Heil. The point is that Matt’s assessment of the health care directory site (and he examined it) is that it needs some more IBLs for Google to crawl and index more of its pages. It isn’t just any directory site that we can generalise and guess about – it’s one that Matt examined, and that was his assessment.

    You said that, “No one could continue to simply index page after page of low quality websites, especially directories.” I don’t disagree with that, but it would depend on the definition of “low quality”. Matt said that the health care directory looks like “a fine site”. It doesn’t sound low quality to me.

    Dave (Original). I’m sure you are mistaken about the usefulness of directories. Niche directories can be very useful, and some people really do use directories, so for some people, they are useful. Either way, they are not low quality sites by definition.

  250. I dont know whats going on now but if I do site:www_mydomain_com

    It says results 1-2 of 68, was about 320. Even though it says 68 its only showning 2 pages, homepage index and the an attached forum index.

    Now that isnt ussual is it?

  251. One of those is Supplemental too

  252. Caios

    This:
    link:www.shacktools.com 🙁

    Maybe because of this:

    link:www.shacktools.com 🙁

  253. Is Matt Cuttsa Matt Cutts – or someone pretending to be Matt ?

  254. Thats true but why index over 300 pages and then remove them? market links link checker shows about 300 links through MSN. Also site: says 68 pages but only shows 2?

    I know I need to build links to get my site indexed but I personally would rather not spend all my time doing that. With there being less sites out there indexed then that means that even if I did have more links out there then they are surley less likely to be seen by google.

  255. Spam Reporter

    Matt:

    Over the past few months, I’ve submitted many Spam Reports (basically whenever you ask for them here) on a competitor’s site that is using Hidden Text, yet that site is still in the index. This site has been doing this for at least the past three years (that’s how long I’ve been in competitiion).

    When Google finds SPAM on an internal page, is just that page removed from the index or is the entire site penalized?

    It’s just frustrating. I keep submitting the reports, and yet the site still remains. Feel free to e-mail me at the address provided and I will supply more info if you’re interested in details.

  256. When I go to a library to do research, I do NOT care how many people have read or checked out the book that I am looking for. I only care that the book is relavent to my research.

    When I am doing research on the Internet, I do NOT care how many people have linked to that site. I only care that the site contains the material that I am researching.

    TOO MUCH emphasis has been placed on links coming to a site. TOO MANY sites with excellent and exclusive content are being left out in the cold because they have no incoming links.

    The Internet is about INFORMATION. By putting all their trust in the incoming links, Google has made the Internet all about POPULARITY.

    That is NOT what a search engine should be concerned about.

  257. This is my first post in your blog Matt and thanks for giving us an opportunity to spell out our views.
    I’m talking about affiliate sites.
    Individuals, who do not have the capacity to go on for something big, has no other option but to continue affiliate marketing for simple reason of earning few bulks.
    Mostly optimised for less common keywords, these sites would provide products/services for a small group of people.
    Google policy suggests W/Ms to think “whether you would do that if there were no search engine”. Needless to say, it makes no sesnse for a T-Shirt affiliate site to give history, origin and such unnecessary things about T-Shirts just to satisfy Search Engine Bots.
    It is really difficult to change the description of the products like T-Shirt and such sundry other items, though that can be done using scripts to change words like “this is” to “we have” and to try and befool the crawlers.
    Probably, it is possible to add valuable content for sites that offer a domain name registration/ web-hosting service through affiliate links.
    And who wants to drive hard earned traffic to a different site and with the fear that the traffic may never come back and that too being well aware of the fact that had it been a direct sell, the publisher could have earned much more! It is nothing but compulsion.
    So far as the value that such sites add to the internet – an analogy may explain it. Why do we visit a street-side small shop, when everything is available in a big shopping-mart, undoubtedly providing the best comfort.This is the very basic human nature and really difficult to explain.
    And finally, it is rather easy to befool a crawler(!) but not a bonafied customer. Just because there is some links on a website, a customer is never going to buy anything from that site unless and until he gets something meaningful.
    So this is my small request to let the visitors decide what they want to do, whether to buy it from the Principal site or go via an Affiliate site.
    Regarding this issue, your earlier stand seems to have more sense where affiliate sites would have come in SERPs for rather uncommon keyphrases and Pricipal sites would enjoy the traffic for more commonly used keyphrases.
    Thanks again.

  258. Is Matt Cuttsa Matt Cutts – or someone pretending to be Matt ?

    I saw that too. I think he went Italian.

    Heyyyyyyyyyyyyy CUTTSAMATTAYEW, huh?

  259. Hi Matt,

    Great post and great information!

    Ok some say there is no such thing as “sandbox” but there is a holding cell. I have been working on a site for nearly a year and a half now and I still can’t get the site to rank- even for the most unused/stupid keyword. I do see tons of supplemental pages in the results and your explanations seems to fit the bill. But still don’t understand why it won’t come out of the holding cell. Was there something new in the update for new domains that keeps’em in the cell longer other then getting quality links and everything else I know?

    You can email me for more details if you like.

    Thanks,

    Beth

  260. Hi Stephen

    “Is Matt Cuttsa Matt Cutts – or someone pretending to be Matt ?”

    I’m sure it was Matt. Its his style 100% .

  261. Yes, looks like his style – but also not his style in a way.

    EG. It looks like it is has been made to fit his style but some things dont add up ( He has two cats for a start 😉 )

  262. Caios

    Let me put it like this:

    Either you play the “Backlinks Game”..or you don’t play at all 😀

  263. Stephen

    “EG. It looks like it is has been made to fit his style but some things dont add up ( He has two cats for a start 😉 ) ”

    I know. But it seems that it was Emmy that insisted that Matt spend some quality time with her.

    While J.D guy might have had more important things to do than to waste his time on Matt 😀

    You know women always need more attention than men 😉

  264. Harith

    You might be right JD is probably still just running around the house chasing laser pointers, its tail etc.

    Matt

    Any chance of an update on the PR situation ?

    As has been noticed at WMW and other places. The last PR (Early Aprl) update only seemed to effect some sites and no ranking changes were noticed as a result of this update. (OK this might be hard/impossible to notice anyway – but from the outside it just looked like the last PR update was purely cosmetic)

    Other pages kept the old PR which was probably updated around February time…..

  265. Unfortunately, it still sounds like sites that wouldn’t naturally collect large amounts of links aren’t ever going to be spidered/indexed completely.

    I don’t care about ranking at this point; if I can get the pages into the index in the first place, I can *make* them rank. I just can’t get them back in.

  266. A software site has a “these people use The Widget” page linking to their customers’ sites.

    We’re thinking of doing that, and probably will. Our site will probably drop in PR, maybe even disappear from the Google listing altogether, because most of those links will be to sites not relevant to our site because those who are interested in our product would not also naturally/automatically be interested in the products of the sites we link to. We get sales by other advertising and word of mouth, so SERPs really don’t matter as much to us as they might for others. Our concern is for our customers.

    Question: If we are penalized, will that also penalize our customers?

  267. Alex Duffield

    [quote]
    Matt Cutts Said,
    May 17, 2006 @ 1:49 pm

    Alex Duffield, in my experience those links aren’t making much/any difference with Google….
    [/quote]

    Matt, I am sure you know better than me, but the fact remains that the site I pointed out comes up number 1 for Many searches (rafting BC) and in the top 5 for just (river rafting).

    I manage the site for one of there competitors, and have kept an eye on these guys for many years. Befor they started participating in this sort of link scheme, they did not recieve this sort of ranking.

    There site does not include nearly as good user valuable content as any of the others in the top 5.

    My main concern here is that my clients think they should (need to) also participate in this sort of linking scheme in order to compete. I insist that good content, a well designed site combined with regular updates and good (honest) linking is the better approach. I have pointed out that Google guidelines clearly stat that “Linking schemes designed to improve PR” are against the rules and tell them that in the long run they will get burned, but I fear I am slowly loosing the battle against the fact that it does work.

    All I am looking for is some ammunition to convince my clients against this coarse of action.

  268. Netmeg, as a major change in crawling/indexing, I expected to see some people say “I’m not crawled as much.” Somehow the people that are crawled more never write in to mention it. 😉 But we take the feedback and read through it. I’ve been talking to someone in crawl/index about the future of the crawl, for example. We keep looking for ways to make the crawl better.

    Stephen, I haven’t asked around about PR lately. Yes, the one cat is much younger. He can keep himself busy with of string for an hour. It’s the other cat that often demands attention. 🙂

    Spam Reporter, a lot of the time we’ll give a relatively short penalty (e.g. 30 days) for the first instance of hidden text. You might submit again because sometimes we’ll decide to take stronger action.

  269. Hi Matt

    Thanks – sometimes I wish I was a cat – less stress.

    I have sent another email to the Boston address and a follow up as it looks like someone had a look last night but no reply – OK – perhaps I should be more patient.

    I dont know if you are looking into the site deeper, just ignoring my site, or what ? It seems to have regained PR at the last change – but still suffering from a penalty – I have given more details of perhaps why in the email.

    Cheers

    Stephen

  270. Hi Matt,

    A gold mind of stuff, great.

    However one question. I understand the principle of relevant OBLs and improving the visitors experience, but here is a quote

    “Moving right along, here’s one from May 4th. It’s another real estate site. The owner says that they used to have 10K pages indexed and now they have 80. I checked out the site. Aha:

    This time, I’m seeing links to mortgages sites,”

    I can’t see how linking to a mortgage site from a property site would not be deemed as a relevant link and not improve the visitors experience.
    I would not be able to buy a property without a mortgage and my guess is, that this would aply to most people.

    Is there no slack for cross subject linking?

    I have a “Breakdown Recovery Site” I have a lot of information, regarding cars and motoring and driving holidays. It is not directly related to your car breaking down, but it is related?

    Thanks

    Mark

  271. Being someone who consults to many companies there is a need from Google to avoid everyone from spinning wheels and wasting time, I speak for ALL website owners & even folks at Google dealing with all the questions.

    I had to deny a Google Paper Publications advertisement as I was not sure doing more advertising was good or bad, I feel like anything I do could cause a penalty. So in the end I turned off my Google Adwords account 100% and will not use the Google publications anymore for advertising. All going to the other large players. (Why would anyone at Google want this). I just can not figure things out with Google. (Investors must love this part) Thus Google does not make an extra few K now at least. My other clients, all off! Now we are at about -15k a month for Google. (And this was my professional answer, do not take chances)

    Please (if possible) find a way to let folks know that there is a penalty, makes such sense, everyone would save time, all would win.

    The way it stands it appears search engines do not want to tread in these waters, but be professional and let people know, get a great lawyer, write a disclaimer, save everyone hours of time.

    Is it a dream?

  272. i built that t-shirt site matt said wasn’t interesting to my visitors. well, my bookmarking rate is 15-20% monthly. so, the users! find it interesting. i just put together stuff i liked, and users didn’t have to go around looking for this stuff for days. just a fashion magazine. was hoping that google would be in the business of “indexing” not editorialising… the affiliate links are nobody’s business but mine, it’s legal. some of the content has been provided by business partners and syndicated on the site – this is also legal.

    matt, i sent the specifics of my website traffic to to the original email address if you need the proof of what IS and ISN’t interesting for people that search for this info.

  273. Matt,

    There appears to be two datacenters that show a completely set of results from the others. I believe these datacenters are the original BD Dc’s, not sure. Would you expect these to spread? My real question is at what point in the timeline would you expect some stability and consistency across all datacenters.

    Thanks Matt.

    Chris

  274. MATT, Can a site get fully indexed in time without having to get links? I know you can get indexed faster with them, but I want to know if a site never gets links (they do have a page indexed though) will they ever get indexed or will they never see there pages(entire site) in Google until they do get links.

  275. hey matt i didnt hear back from you regarding the adult listings as mentioned above..

    I think this topic needs some investigation. I see a continuing trend of freshly expired NON-adult domains getting insane rank for adult serps while older established adult sites are pushed further and further down the list.

    The talk amongst the “adult” seo community is the only way to get good google rank for adult these days is by buying or getting your links on NON-adult mainstream pages..

    This practise makes everyone looks bad and in continuing the way google is operating it is really harming the non-adult community..

    The top 100 listing for most adult terms are filled with SCHOOLS and EDUCATIONAL domains that recently expired. Banking on the fact many other schools will still have links up to the expired domain..

    So now all we have done is made the serp’s irrelevant , and shown alot of porn to kids and unsuspecting people , all for some google rank..

    If that wasnt bad enough the rest of the results are guestbook spam of adult links on mainstream results. If google didnt reward these spammers , they wouldnt attack innocent sites with automated software just to add their adult links..

    So by continuing to allow these sort of methods google is creating a problem where one didnt exist..

  276. Netmeg, as a major change in crawling/indexing, I expected to see some people say “I’m not crawled as much.” Somehow the people that are crawled more never write in to mention it.

    Ironically enough, two or three years ago we had to contact Google to throttle back the crawling on one of the sites that so concerns me, because it was being hit way too hard at the time. Oh, for days gone by…

  277. Matt,

    You seem to have referenced the fact that sites might be penalized without being banned, and I know the question has come up a couple of times, but I’ve never seen a clear cut answer on this. Is there such a thing as “penalizing” (drop in position but still listed), and if so, is doing some of the stuff you discuss here, such as recipricol linking, a possible cause? I’m not talking about linking to spammy sites, as you have been clear on that, but what abnout recips in and of themselves?

    For instance, I was told recently that I should submit one of my sites for a particular award. I’m pretty sure that receiving that award means being listed on the list of award winners with a link to my site. Are you saying that if they link to me, great, but if the award logo hotlinks back to them (and thus becomes recipricol), not only would the link from them then become worthless (well, aside from the ego boost I know I’m going to get if I win. 🙂 ), but that it might actually hurt me?

    I somehow doubt that’s what you’re saying, but it certainly isn’t a clear cut issue.

    Thanks.

    -Michael

  278. Hey Matt, how about mentioning something to the sitemap folks about adding a feature to get rid of dead/404 urls from a site. Google seems to take forever to rid itself of 404 pages, could be a great asset to Google and webmasters if there was a functional way to remove urls via the sitemap system? Kind of a dumptheseurls.xml anti-sitemap deal.

    Cheers,

    John

  279. it seems google has forgot a basic fact, webmasters are the net, and google is a bridge between users and webmasters, users opted for google because it gave them the most in-depth and the most choice when it came to searching a certain term, after finding a site through google, users decided later on which site to use or bookmark, now it seems google is trying to choose for them what they should see and what they shouldn’t see, as somebody mentioned this is editing content and not indexing content.

    I have a question, lets assume someone had a website with a lot of original and unique information, yet at the same time they were involved in heavy and excessive link exchange to generate traffic from other sites (like the net old days, exchange links for traffic), will you curtail that site valuable content from millions of users because a dumb crawler saw lots of links?

    The sad fact is thousands of webmasters have lost thousands of pages, and millions of people have lost tons of information, because a bunch of spammers have decided to manipulate google, while manipulating search results could be and is a serious problem for google, the way big daddy have been designed to solve it is not proper.

    Google mission was supposed to be organize the world information,
    however I believe the mission has evolved into “editing the world information according to a blind algorithm, because we got blinded by spam!”.

  280. Hi Matt,

    With over 200 comments it is time consuming to read through each one so I apologize in advance if this question has been asked.

    With regards to Web design companies, it is standard practice to insert “Designed by Company Name” etc along the footer of our clients pages. No suprise there.

    Now, usually these links appear site wide. What impact do you find these will now have with the recent updates? Is there a better process that Google perfers to have our clients credit our work?

  281. Michael VanDeMar, yes, a site can be penalized without being outright banned. Typically the reason for that would be algorithmic. I wouldn’t worry about being listed on a page of sites winning prizes though, unless it’s the Golden Viagra Mesothelioma Web Awards. 🙂

    Netmeg, I’d like to see us provide some ways to throttle crawling up or down, or at least give preference hints.

    Adultwatcher, don’t take a non-reply as not reading it. I did pass all of those on to ask how some new things do on stuff like that. There’s a part of our pipeline that I’d like to shorten, for example.

    Relevancy, I wouldn’t count on getting a large site fully indexed without any links at all. We do look at things like our /addurl.html form and use that, so it’s possible that a smaller site could do it without links.

    dude, I didn’t mean to cast stones at that site. Someone who gets to the site can certainly buy a T-shirt from different brands. But at least some of your links are from stuff like an “RSS Link Exchange” and those links just aren’t helping you as much.

    Bruce, that’s your call of course. Advertising with Google wouldn’t affect your either way though (help or hurt). I think we’ve talked about your site; 1-2 of the pages on your site, plus the “sponsored by” link plus the “Search engine optimization” message on that page would be where I’d start.

    Stephen, Adam is going through all the emails. He’s writing back to the ones that he can, but he can’t write back to every single one; I need him to do other stuff too (e.g. keep an eye out for other feedback across the web, learning more spam detective skills, etc.).

  282. I dont think I have ever seen so many comments on one Cutts blog.. this will make comment #263 (sorry Matt if this comment violated your Guidelines on Comments) but just wondering, did this thread break a record? Whats the highest # of comments a Cutts blog has seen?

  283. Matt can you explain why sections of our website are now showing supplemental results for almost every single one of our home listing detail pages. Each of these listings are unique and required to be on the site if we want to make our visitors happy.

    The system houses about 25k listings all with unique information. I guess I dont understand why these pages should be placed in to the supplemental results.Site: (mygorealty.net)

    If there is a problem on our side we want to correct it. If it is a problem with Google it might be nice to know what that problem is so your crawl/index team can correct it.

  284. Matt:

    I’ve read your post with interest.

    About four or five days ago, I noticed that google had dropped all but four of my pages on my new site. Now, four or five days later, it has dropped them all, besides the index page.

    Shocking and unexpected and, for me, unexplicable.

    My site is almost 100 percent original content and even though I do feature an occasional affiliate link, it is certainly more content-oriented than affiliate oriented.

    so….after reading your comment policy, not sure how to phrase my question so that it can be general interest but here goes and i hope it flies……

    if a site has almost 100 percent original content, will a few affiliate links cause google to stop indexing it?

    Thanks, neva

  285. matt, link exchanges with relevant sites for small guys is the only way to get traffic. please discount the exchanges, just not factor them into your results. if a site can’t get to the top of the results, exchanging links is one of the few means to get some visitors. automating the procedure kinda makes sense. cheers.

    ps: i have approved every link on the exchange with the goal of not getting random visitor, but a targeted visitor. granted, i didn’t get too many as the result from that one, so i am not doing it anymore.

  286. Eternal Optimist

    Matt, firstly thanks for your further comments on supplementals 🙂

    There has been a considerable amount of ‘statement of fact’ in forums, although it is probably rather more speculative, but does Google take into account things such as age of site, time spent on pages by visitors, percentage added to favourites, number of years a domain is registered with the domain provider, etc, or are these all academic to ranking and indexing?

    By the way you mentioned that supps. may be caused/affected by, higher ranking pages being indexed by priority,but I notice that there are some very high ranking websites with many supplementals 🙂

    Thanks 🙂

  287. Matt,

    Thanks, no, it’s not a Grande Cialis Hair Loss Award, but it isn’t strictly related either. Loosely they tie in, but it might take a second to see the relationship.

    As for the penalties that you mentioned… would an email from Google indicating that you were not penalized or banned cover those? Or might you be anyways? And would that class of penalty be something that might get mentioned in the Sitemaps Penalty Awareness program…?

    Also, I’d like to know the answer to Joel’s question too… is this a record comment count for the blog?

    Thanks. 🙂

    -Michael

  288. Matt,

    Sorry for double posting, forgot this one. This is a long post to read with the comments, and I think that if you start from the top without reloading, and then go to comment, the security code might time out. Didn’t happen this time, but it has in the past. You should really make it a forward-only process on missed codes, retaining what has been typed to the next page, to keep people from having to retype comments. 🙂

    -Michael

  289. So is there a way to know if we’re linking to what Google “thinks of as a bad neighborhood”? Also I am interested in what someone said about sites such as coupon sites. Obviously these sites don’t have original content. Will linking to other coupon sites still help you? Also, on a mall site how can any link not be related?

  290. Matt,
    I quote:
    Yup, exactly, arubicus. There’s SEO and there’s QUALITY and there’s also finding the hook or angle that captivates a visitor and gets word-of-mouth or return visits. First I’d work on QUALITY. Then there’s factual SEO. Things like: are all of my pages reachable with a text browser from a root page without going through exotic stuff. Or having a site map on your site. After you’re site is crawlable, then I’d work on the HOOK that makes your site interesting/useful.

    I find some real advice: Word-of-mouth = Be popular and get links from related sites. Factual SEO = get the tech right (and clean). Work on the HOOK = Try to be interesting in your own way. (We don’t care about this)

    One thing bothers me between the lines: “Return visits”. Please tell me that you are not tracking and using return visits as part of your algorithms.

    /chris

  291. Now, usually these links appear site wide. What impact do you find these will now have with the recent updates? Is there a better process that Google perfers to have our clients credit our work?

    I may be about the only person in the world who feels the way I feel about this issue (and those who know me have heard me say this before), but I’m still gonna say it.

    Unless the work was non-commissioned (which is highly unlikely), putting a hyperlink on a client’s website is tacky and unprofessional, and deserves no real credit. It’s like watching a Ford commercial and seeing the logo from the ad agency who designed in the lower right-hand corner.

    Personally, I’d like to see no credit whatsoever given to these links. It does no benefit to the customer and goes against the whole organic link concept. If there were ever an “unnatural link”, that would be it.

  292. Google has been my preferred search engine for many years, however in the last year or so the results seem to be getting worse and more irrelevant, with other big search engines results are improving. I fear that because Google has become the number one search engine it has made itself number one target for financial gain for web authors through ppc, very much like Microsoft Windows became the number one target for hackers.

    Its quite frustrating that unique specific content isn’t enough to get ranked on Google, it seems that you have to get links regardless of their quality or relevance.

    Personally I don’t do reciprocal links, if a site wants to link to me then great, if I think a site will be of interest to my visitors then I will provide a (nofollow) link.

    I hope that Google will sort this Internet search mess out, the sooner the better, my suggestion is to penalize; directories, duplicated content, automated sites that have thousands of pages etc etc

  293. Very nice to see you take the time out to communicate stuff like this Matt —– definetly worth the time to stop by and read….. Thanks!

  294. Matt, Isnt the point of your addurl and sitemaps program supposed to help get sites indexed? If that is the case what is the point of them if it takes links to get indexed?

  295. — You will see the site but not index them with addurl and sitemaps? I know sitemaps does more, but it’s original point was to help pages get indexed. Now it doesnt help that.

  296. Google sitemaps is a great idea and the perfect mechanism for Google to communicate to webmasters.

    It should have no effect on rankings, but merely act as a mechanism to inform Google of the the structure of your website and any new pages that may need crawling.

  297. I understand if there are 2 pages that talk about something and one page has tons of links and the other doesn’t.. that site should rank higher, but if there was a site that had tons of dedicated pages about that term with no links(maybe its new or a mom and pop) it should at least be indexed and judged on its relevance and merit.

  298. Supplemental Challenged

    Matt, you would do yourself and everyone else a good service by not allowing a lot of the above confusion about “reciprocal links” to go unanswered. Just say “there is nothing wrong with Wikipedia linking to Dmoz and Dmoz linking to Wikipedia.”

    You can end the FUD once and for all, and put a lot of link brokers and “three way” spammers out of business just by saying its not reciprocation that is the problem, but spam and deception and trying to pretend a site is more important than it is.

  299. Matt,
    I have another question for you. I will repost my first one here along with my second question, so they are consolidated.

    #1 There appears to be two datacenters that show a completely set of results from the others. I believe these datacenters are the original BD Dc’s, not sure. Would you expect these to spread? My real question is at what point in the timeline would you expect some stability and consistency across all datacenters?

    #2 I am really frustrated with the quality of serps in a few instance, where I am trying to do research. Today, I was doing a little medical research and was trying to find information about a schedule 2 narcotic. Specifically, I was searching – difference between oxycodone and hydrocodone – (without quotes). I got page after page after page of junk scraper/directory style sites with links to other sites like this one: getcreatis.com/oxycodone.html

    Many of these urls in the Google index immediately redirect to the affiliate page. Total junk. I am not trying to sound overly critical, but I wanted to point this out to you. It is very difficult to conduct any type of scientific research, especially medical research, when these spammy, worthless affiliate sites with page after page of just spammy links or adsense are ranking so well. Many of the pages have zero PageRank, so I find it amazing they are ranking so well. Actually, I see a lot of pages with PR0 ranking well these days.

    Thanks again for your help.
    Chris

    Thanks Matt.

    Chris

  300. Google Pagerank on average seems to be a good indicator of quality pages, but seems to have little relevance to ranking on Google at present, but I can understand why Google is holding back.

    I’d suggest it being included as a filter when searching Google or maybe include it as a filter on Google Toolbar search at least.

    My only criticism of Google Pagerank is that its very slow to update new pages. (Why dont they link it to Google Sitemaps?)

  301. Hello Matt,

    Seems like the recurring theme is that recip links are now bad. This is hard to fathom. Isn’t this the nature of the web?
    Especially in my sector which is fishing charters, guides, trips, etc where you have so many many less than professional websites that will never rank that high.
    Now when they come to me asking for a link trade I have to deny them for fear of suppressing my ranks?
    These link trades for these usually poor charter Captains that barely eeke out a living are their life blood of the Internet and now I am going to tell them “Sorry Charlie” no links because Google doesn’t like it.
    OK I have good links and don’t actually need a link back so since you are freely spouting out great info and insight can you take it a step further and let us know the heads up on linking to sites like I mention, one way.
    I want to continue being “friendly” to the charter Captains and guides that are struggling to survive, so will sites that only link out to relevant resources have a “drain” or negative affect on their respective websites?
    I certainly understand that a fishing site linking to a credit card site is bad and that would be an obvious sign of link laziness or just someone trying to manipulate the system but a fishing information site trading a link with a fishing charter site should be considered what makes the web go round, no matter how many times this is done.
    Anyway have a nice evening and I wish I could have caught this post when you first wrote it. Thank You – Joe

  302. Hi,
    Matt,
    If the sites you stated above as having poor links, if the outwards links had a rel=nofollow would it improve the number of pages indexed?

    1)Also I had a directory, with 15000 pages listed, I have been hit hard and now only have 600 pages listed, (do you not like directories). (the site contains very few outbound links).

    2)Also directories would principly have links coming in from all cateogries they list, so which category could be taken as relevent for a directory.
    thanks

  303. Dave (Original)

    PhilC, I was actually thinking more along the lines of not enough hrs in a day/week/month/year to index ALL pages out there. It would appear that Google NEEDS make a choice in many case and what Matt describes would fit.

    I’m not trying to say ALL directories are of no use, just that the number of people (in the scheme of things) that they are useful to are low.

  304. Hi Matt,
    Here is an idea for Google to sole its reciprocal linking problem. Why not only give value to a certain number of defined reciprocal links. If Google promised us that only 100 reciprocal links from our site would count then it would seem to solve a large part of the problem. To make things simple these could all be put on one page- with a specified name. Of course a site could have as many outgoing one way links as it wanted and as many reciprocal, but only 100 would count in terms of the search engine. This would only apply to reciprocal links- all other links would be as they are now. A number of different similar schemes could be thought of- what about 25 per a year. We would be a lot more careful about reciprocal linking if we new we only had a certain number and that those links in a sense defined our site – as well as the one way links the site was able to attract.

  305. YES!!
    [quote]I may be about the only person in the world who feels the way I feel about this issue (and those who know me have heard me say this before), but I’m still gonna say it.

    Unless the work was non-commissioned (which is highly unlikely), putting a hyperlink on a client’s website is tacky and unprofessional, and deserves no real credit. It’s like watching a Ford commercial and seeing the logo from the ad agency who designed in the lower right-hand corner.

    Personally, I’d like to see no credit whatsoever given to these links. It does no benefit to the customer and goes against the whole organic link concept. If there were ever an “unnatural link”, that would be it. [/quote]
    No Adam; you are “not” the only one who feels “exactly” like that. I find it “extremely” Unprofessional for a design firm OR SEO firm OR both who stick their links in the footer of client websites. It’s so bad. It’s very amateurish and not only looks bad for that site the links are on, but looks bad for that designer/SEO as well.

    Not only all the above, but that particular client is “unknowingly” linking to a SEO or designer without the full and clear knowledge of what linking can actually mean in the long run. That firm they link to could get caught for spamming, or be deemed a ‘bad neighborhood’ firm, which would indirectly affect the poor client who is linking.

    We all hear all the time in this industry about SEO firms/designers practicing “full disclosure” to clients. What does that mean exactly? Does it mean that as long as the SEO asks the client if they can stick a link in the footer, then it’s perfectly fine? This goes for any technique the SEO claims they do for clients and then trying to explain it in “full disclosure”.

    What this industry does not get is the fact that NO way does the average joe client understand all the ramifications involved with “anything” their site is doing, whether done by the SEO or done by the client. It should be up to OUR industry to educate that client and then blame ourselves for the bad SEO’s/designers in this industry. We shouldn’t be giving free passes out to firms who show Unprofessional-ism day in and day out. But you know what?… we sure do hand those free passes out very freely.

    Getting back to the links in footers….. can you imagine seeing a link on Google.com in the footer that says:

    “Designed by Church of Heil” LOL

    or

    A link on Sony that says:

    “Consulting by Doug”

    with a link to Doug’s website?

    Why do firms feel the need to jeopardize client websites in this way, and feel the need to advertise in such a cheeky and unprofessional way? I’ll never know.
    (Editing disabled while spellchecking)
    Stop spell checking

  306. Spam Reporter

    Matt:

    Just submitted another SPAM report with your name and my name (above) in the message box. Please have a looksee and take action!

  307. No Adam; you are “not” the only one who feels “exactly” like that. I find it “extremely” Unprofessional for a design firm OR SEO firm OR both who stick their links in the footer of client websites. It’s so bad. It’s very amateurish and not only looks bad for that site the links are on, but looks bad for that designer/SEO as well.

    Not only all the above, but that particular client is “unknowingly” linking to a SEO or designer without the full and clear knowledge of what linking can actually mean in the long run. That firm they link to could get caught for spamming, or be deemed a ‘bad neighborhood’ firm, which would indirectly affect the poor client who is linking.

    We all hear all the time in this industry about SEO firms/designers practicing “full disclosure” to clients. What does that mean exactly? Does it mean that as long as the SEO asks the client if they can stick a link in the footer, then it’s perfectly fine? This goes for any technique the SEO claims they do for clients and then trying to explain it in “full disclosure”.

    What this industry does not get is the fact that NO way does the average joe client understand all the ramifications involved with “anything” their site is doing, whether done by the SEO or done by the client. It should be up to OUR industry to educate that client and then blame ourselves for the bad SEO’s/designers in this industry. We shouldn’t be giving free passes out to firms who show Unprofessional-ism day in and day out. But you know what?… we sure do hand those free passes out very freely.

    Getting back to the links in footers….. can you imagine seeing a link on Google.com in the footer that says:

    “Designed by Church of Heil” LOL

    or

    A link on Sony that says:

    “Consulting by Doug”

    with a link to Doug’s website?

    Why do firms feel the need to jeopardize client websites in this way, and feel the need to advertise in such a cheeky and unprofessional way? I’ll never know.
    (Editing disabled while spellchecking)
    Stop spell checking

    DUUUUUUUUUDE!

    Do you have any idea how long I have been waiting to see someone actually get this? Just to find one person who truly understands the ramifications of these links and the potential negative ramifications of such?

    This post is truly a thing of beauty. Other designers/developers/SEOs, read and take heed. SEO reasons aside, all the marketing stuff Doug mentioned here is reason enough not to do this.

    Hey Matt, would there be any possibility of a future blog post or at least a comment on this, since it’s one where a large percentage of your readers would be interested in it (including the three who posted about such)? I wouldn’t go postal on you or claim you’re an asshole or anything like that if you didn’t, but we (and I say we because there are at least two of us who asked) would love to hear your take. Thanks in advance…and if not, thanks for posting stuff like this that lead to the tangental thoughts that others have.

  308. It seems that quite a few people who are complaining about having lost indexed pages are stating it is due to reciprocal links. Well, I can say that after some research we finally found out why our site was not Googlebot friendly. We fixed what we thought was our issue and went from 3500 indexed pages to 80,000 indexed pages in short order. Well, now I am down to about 600 and seeing this go down the last few days. I was enjoying the traffic while it lasted. The kicker is that we do not have any reciprocal links at all. I have some one way inbound links I have been working on obtaining but no reciprocal links. So I wonder if you do not have enough inbound links if that hurts as well and will cause you to lose indexed pages?

  309. In my eyes Goolge IS BROKEN it’s been ruined. Forget about SEO

    It’s unreliable, the results are from the easy to sort through and find literally any page indexed google that existed 2 years ago.

    I no longer can find anything that I seek in google. If I do it’s 20 pages deep. Your algo and focus on combating spammers has maken quality results and the basis of what google started as tyake a back seat.

    PLEASE for the sake of a decent search engine enough is enough with exlcuidnf results, and deciphering whos’ back links are valid or not.. There are many quality bl’s in my eyes that don’t even get counted by you guys or get very little credit given.. like a user reccomending a solid site they found useful in a forum by providing a link to it, that to me is one of the best most valuable ways to determine if a site is worthy..

    Regardless I could go on for an hour..

    Google is broken. I can not find anything i search for, and any of the hard work put into a few quality sites that have been around for years is being negated and de-indexed page by page day by day

  310. Matt–

    As much as I often disagree with his analysis, Phil C. is essentially correct.

    Google is dividing the web into “haves” and “have nots.”

    It is no longer enough to build a decent, spam-free, original content site. Now you need to attract links from major players or your content is not worthy of the index.

    Shame.

    It seems to me that you guys are trying your best to stunt or at least ignore the natural growth of the web via your new selective indexing policy.

    Might it have something to do with a capacity problem? Let’s ask the boss: “We have a huge machine crisis – those machines are full”.

    Matt we all know how hard you work, but you’re beginning to sound a little bit like “Baghdad Bob.”

    Be well.

  311. I find it “extremely” Unprofessional for a design firm OR SEO firm OR both who stick their links in the footer of client websites. It’s so bad. It’s very amateurish and not only looks bad for that site the links are on, but looks bad for that designer/SEO as well.

    Why stop at web design/SEO? Let’s remove logos from all products – from cars, tins, clothing, computers, etc. After all, logos are unnecessary – why the need to advertise the company who made the product (just like web designers advertise they made the page the person is reading)? What’s the difference between developing a car, and developing a website, in that respect? Why is it OK (in your mind) to have your logo on a car you created, but not a link on a website you created?

    I would respect your opinion if you actually stated WHY it is unprofessional. I think a discrete link at the bottom of a page is fine – it’s actually doing a service to the reader – they may LIKE the way the website is laid out/designed and want to know who made it. Sure, you can just put the raw text of the web design company’s website address at the bottom of the page, but it’s hardly friendly to force users to copy and paste links into their browser rather than simply click on it.

  312. Dave (Original)

    Why do soooo many base the success/failure of Google on their site(s) position in the SERPs? (that’s rhetorical guys)

    I wouldn’t mind betting that Google has indexed more pages than ever since Big Daddy.

    “Google is broken”, “the SERPS are crap” yada yada yada all boils down to “My site isn’t ranking like I want”.

  313. Matt Cutts: re wmw and looking at sites: imagine talking about paintings without being able to look at them, by policy. Then you’ll get art critics getting into big arguments about some piece of garbage, without even realizing that the painting is garbage. I think once you take an absolute position like this, year in and year out, it begins to erode the overall quality. At least that’s what I’m seeing.

    The one positive of that other search forum I did was that I finally got to see the garbage sites that people had been complaining about not ranking. At least 95, probably 99, out of 100 were total junk, spam, tricks, keyword stuffing, link spamming.

    It’s a question of creativity, thinking outside the box. Lots of ways to do it. Brett has often said that he wants his stuff to be reference quality, thus no specific examples that will be different and changed in the future. But that pretends that anyone is ever going to go in and read google threads from a year, two years ago, which, let’s get real, is ridiculous, only the most hardcore of seos are going to sit reading old search threads. Only a tiny fragment of the world’s population would ever think of doing that, and an even smaller fragment would actually do it.

    Anyway, doesn’t matter, the quality drop is what’s readily apparent, things move on, blogs are getting more interesting than webmaster forums, at least blogs like this one. Brett’s decision to not allow blog linking [with the exception of yours I guess] is going to continue the quality drop, since more and more authoritative sources are writing in blogs. More and more I’m getting my primary information from developer type blogs.

    Doesn’t matter in the larger picture, but this particular thread/posting was really revealing to me in terms of how low the quality on wmw google forums is getting. Much of what you said was fairly obvious to anyone who’d followed jagger update, no real surprises, except to the spammers who continue to complain about getting caught by the algo.

    Re the footer links, I’ve been guilty of that, more out of ignorance and laziness than anything else, I started pulling them all off sites I’ve done a year or two ago, and I’m happy I’ve done that, I agree with the poster who said how amateur that is, it is. And it’s a cheap trick.

    Personally, I’m tired of cheap tricks, I’m happy to just let the cards fall where they will, if search engines like my sites, fine, if they don’t fine, if people like them, fine, if they don’t, that’s fine too. Life is too short to worry about how stuff ranks every week or month.

  314. Ranking doesn’t bother me so much. It’s the fact that pages aren’t getting indexed that bothers me.

  315. Matt, nice to hear you guys got to cut loose a bit afterwards.

    As far as the linking goes, yea that’s understandable. I guess what I meant was that if the crawl depth is being reduced based on low quality inbound links and spammy/off-topic outbounds, someone less informed could infer that they could just not have outbound links altogether in order to avoid some of the reduction. That obviously wouldn’t be a good thing for users so I guess i was just prying to see if in relation to crawl depth, G might now also be taking into consideration on-topic analysis and quality analysis not just for the purpose reducing it.

  316. There are sites out there with millions of automatically generated pages designed to manipulate Googles index.

    There is a well known site which is showing 162 million pages, sites like these should have a penalty applied to all but its root level pages.

    Its no wonder the results are in a mess.

  317. There are sites out there with millions of automatically generated pages, designed to dominate the search results for practically every subject matter you can think off.

    One well known site has 160 million pages, its no wonder search results are getting worse.

    Sites like these should have a penalty applied to all but its root level pages.

  318. Replying to Joes comments,

    Reciprocal links are bad because they are open to so much abuse.

    Im in the same sector as you, charter boat fishing.

    The point is, there is no harm in your fishing info site having a reciprocal link with the fishing charter boat sites, just include nofollow in the link.

    The link is there for your visitors to follow, not to get either site a higher ranking.

  319. Dave (Original)

    IMO, most directories are totally useless, and are there for other purposes than providing a useful resource for people. I have an very negative attitude about them because of what they are. They are are there because search engines exist. It’s just that the directory that Matt used as an example isn’t like most directories – not according to Matt, anyway. It sounds like a useful resource that is being unfairly treated by Google, AND Google is intentionally depriving their users of much of that resource. I see no sense in it at all.

    Doug (Heil)

    Use the HTML blockquote tag to quote. Forum-type codes don’t work in this blog 🙂

    Robert G. Medford

    As much as I often disagree with his analysis, Phil C. …

    Perhaps you have read my other analyses closely enough, Robert 😉

    Jack Mitchel said:

    Ranking doesn’t bother me so much. It’s the fact that pages aren’t getting indexed that bothers me.

    For me, that’s the crux of this. I said it earlier, and I’ll say it again – let the rankings fall where they may, but index decent pages – just because they are there! That’s what a search engine is supposed to do. That’s what its users expect it to do. Allow your users to opportunity to find decent pages – just because they are there.

    I’m seriously wondering if Google really is short of space, as Eric Schmidt (the CEO) said. Matt said that they have enough machines to run it all, including the index, but to run what exactly? A pruned index? I can imagine a decision being made as to whether or not they keep on adding new machines and new capacity, or start being a bit selective about what they index. Perhaps Google really is short of indexing space after all.

    Whatever the reason for the new crawl/index function, it is grossly unfair to websites, and it intentionally deprives Google’s users of the opportunity to find decent pages and resources. It’s not what people expect from a good search engine. By all means dump the spam, but don’t do it at such a cost to your users and to good websites.

  320. That should have read…

    Perhaps you haven’t read my other analyses closely enough, Robert.

  321. Hi Matt thanks for update but it does raise a number of concerns. As mentioned by PhilC and Robert Medford above – I do wonder if you are not in danger of dividing the web into the ‘have’ and ‘have nots’ regarding links.

    The web is a very big place and there are very many, very diverse users and publishers – some are very well skilled in the web and code etc. and many others are not (my self included). This is what makes the web the interesting place that it is – you can find real gems of information – that you really rate but which may be of little interest to the majority of surfers. There is a site out there somewhere about growing pineapples and other exotic fruit in your living room – great just a few pages with a real rough and ready look to it – but who is going to link to a site like that. With your new BD policies sites like that will disappear and we’ll be left with thousands of bland, corportate clone sites that are SEO’ed to the hilt and are as dull as ditchwater!

    Try looking at some asian sites, especilly japanese to see rampant creatvity – little robots and clowns and racing cars etc and not an SEO to be seen anywhere!

    Back to the main point, which is that within the web community there are those who are well connected and savvy about links etc. and there are very, many more who are not. So some publishers start off with a huge advantage regarding linking strategies and others are always at a disadvatage. If the rate of indexing is to be determined by the number of quality IBL’s they will always be at an advantage. The unconncted will suffer a double disadvatge, they won’t have the benefits of extra traffic that links provide and also they won’t get indexed – therfore they will just fade away.

    The sort of quality links G is looking for are presumable; .gov and .edu links, large corporates sites, all of these give a natual advantage to a website if you are well connected and can get a link. Likewise folks in the SEO and SEM community – know their way around and can easily get links. But what about the small, enthusiast webmaster, the small business or hotel and small community sites. How are they going to get quality links to their sites. They have to rely solely on reciprocal links with simlar sites. From what I can gather from this blog these changes will wipe out all of these sites. But why – they are the life blood of the web – they are what keeps it going. G will stife the webs diversity if you are not careful.

    We’ll end up with a web of full optimisted, cloned, corprate brochure sites and thousands of blogs talking abour the web in the good old days!

    Anyway thats enouth of that. Have a great break – we expect to see some nice pics when you get back. Oh and don’t take any electronic devices with you – camera excepted!

    regards dave

  322. I think this is why I like MSN. They seem to rank their pages based on what the page is about rather than spammy linking techniques that seem to work in some other engines.

    My site (yes talking about my site) for example – there are only 2 websites on the subject in the whole of the WWW, MSN recognise that my site is relevent to the topic, wheras Google doesn’t see it as relevant at all, infact, Google decided to drop the pages it once indexed – now I read that this could possibly be because Google doesn’t think I have enough high profile IBLs?

    I’m not trying to knock Google, because I like Google in general, but I know when people search for things relating to my site, and knowing they would be glad to find it, they wont, because Google dropped the page and don’t rank it.

    Just another thought to throw in – how can people naturally link to a site they can’t find?

  323. All right Matt this a preventive intervention! I’m not asking to go into Google-purgatory, just having some fun, because sometimes you have to laugh to keep from crying.

    I was reading a SEO Forum a discussion came up regarding link-bait and you, well my gears started turning (the engineer in me), and I threw up a quick blog post with my attempt at graphic arts.

    I won’t spam your Blog with the link, but if you are interested, its one click away from my URL.

    John

  324. I totally agree with DavidW and the rest of the folks who wrote about dividing the net into have’s and havenot’s. At the moment it all boils down whether you play the backlink game or not. But even if you want to play that game – for some non-commercial sites with good content that’s just not feasible. If you’re in a niche like us with an enthusiasts audi s and rs models website it’s quite hard to get decent links and it get’s even harder if you’re in a niche and your website language is not English. Where should we get that many high PR links from to get a deep googlebot crawl into our discussion board topics? English websites usually don’t link to us or or blog about us. Of course we’re using sitemaps but that obviously doesn’t help as long good IBLs are missing. It’s a lot easier if you play the backlink game in the English market because it’s so huge.

    Cheers,
    Jan

  325. I totally agree with Nicky.

    I’ve been checking whois data on websites that have lost there indexes and those that are mostly intact or at least shown a load of old pages.

    So far anything less then 6 months old has been dropped and anything older is still there or has a load of old supplementals showing.

    What does this mean? You don’t get indexed until 6 months from registering your domain?

  326. Matt, just a comment for your consideration…

    If as you say there’s no server crisis or problem storing data at Google, then how do you actually see the new crawling method as benefiting users in terms of relevancy?

    Certainly there’s a lot of “spam” content that wants to be indexed, but there’s also lots of new “good” content that wants to be indexed, too.

    It used to be th case that you could get listed at directories, get link exchanges, or buy a couple of ads to help the spiders know you were there.

    Seems to be the case that Google is intent on killing these methods.

    In which case, how on earth is a new and useful site suppoed to get useful links?

    The suggestion seems to be that a site must be exemplary to get the gets and indexing, but surely you are aware how difficult it is for newer sites with good content to be exemplary?

    Not ranking sites for some types of links was one issue – it’s understandable – but not even indexing sites to any degree on those grounds isn’t going to be helpful for anyone.

    It used to be the case that Google wanted to index the entire web – access the cotent that was normally difficult for search engines to find – and crowed about the huge size of its index.

    But now that index is backfilled with supplmentary junk that very commonly comprises of nothing more than long-dead URLs and 404’s. And his type of content is preferable to new content?

    I have to say, the situation does sound more like a server problem and the indexing issue is simply Google’s immediate response to addressing the problem. In which case, I can only hope this is true, and that normality will return, because otherwise you will simply continue to provide less and less relevancy in your results.

    2c.

  327. The core of my disappointment is that the Internet is no longer a level playing field. ‘Back in the day’, the Internet provided an unprecedented business opportunity for anyone with a little gumption and willing to put in the time and effort. By way of perserverance and elbow grease, and minimal capital (depending on how much I did myself), I could build a site that could compete with the ‘big boys’.

    That’s no longer the case. Developing an online business now is like trying to open a hardware store next door to Home Depot. Site age, backlinks, link age, link churn, degrading purchased and reciprocal links and other filtering factors have more and more of an influence on position, while actual content seems to matter less.

    Google isn’t Walmart. Google is the Department of transportation, and all the roads it’s building lead more and more to Walmart and less and less to Mom and Pop’s Tool Emporium.

    The Internet started as a democracy, with everyone equal. In almost any eCommerce or Service segement however, it’s evolved into a monarchy, with stores like Amazon, eBay, Walmart, Target etc. ruling as kings while the serfs fight over the scraps and try to eek out a living.

  328. Hi Matt,
    I read with interest the bit about URL’s with hyphons and the issue there had been.

    I checked my sites and sure enough, those with hyphons seemed to be hit hard with pages removed, one site to one page only.

    You suggested there was a quick fix, but as up to now there has been no difference in my sites.
    Will there be a difference do you think to the main fix?
    Or are you suggesting where my sites are now are my normal stats and this issue has now been completed.

    Also is there a time differnce to the UK as to the USA?

  329. lol, did some one say “Mom and Pop’s Tool Emporium”

  330. Some thoughts and comments in random order.

    1. It seems most of the spam sites we talk about contain adsense adverts, if Google manually approved urls first, then I’m sure that this would help cut down on spam.

    2. People have mentioned 3 ways links what are worse are 2.5 way links, which a lot small site owners get tricked into.

    Site A links to Site B
    Site C links back to A
    and site C is just some spam directory or dmoz clone, that is owned or found by Site B

    3. But I wondered if all of this an 80/20 or 95/30 chop.
    At a guess 80% of people look at 20% of the sites,
    or 95% of people look at 30% of sites.

    4. I thought expired domains were automatically dropped from the index and not reindexed anyway ?

    5. Blogs seem to be the new forums, years ago everyone had forums but dropped them because if no one ever visits the site then no one ever posts. How many forums have you seen with 2 posts and 5 registered members ? Now blogs are similar all these sites now have blogs with just 3 or entries which aren’t of any real interest
    eg This is my new exciting blog
    or Updated the widgets page yesterday
    or This site had 25 hits yesterday
    So what ! how is that valuable content for visitors?

    6. I also agree with ‘Not pwned Adam’ about footer links

    7. Extra content, every widget site now has extra pages like
    The history of widgets
    Taking care of your widgets
    Widget traditions
    News articles about widgets

    Not because they provide a valuable service to the visitor but just because they want to rank better

    8. A few people have mentioned about ecommerce and using adwords to provide traffic, as people have already mentioned, isn’t adwords just paid for links?
    Also adwords isn’t cheap enough if you are selling low ticket items.

  331. This thread is unique in two ways:

    (1) I believe that it’s the biggest thread ever in this blog.

    (2) I don’t think that anybody has agreed with Google about this issue. Probably most of this blog’s regular contributors back Google to the hilt, but I don’t recall anyone doing it about this issue. Even Doug Heil backed off and didn’t come up with a good reason why the health care directory shouldn’t have all of its pages indexed, and he’s Mr. Whitehat.

    Doesn’t this tell Google something, Matt? The overhwelming opinion from all sides is that Google is doing it very very wrongly. Nobody is talking about rankings, and nobody is talking about spam. Everyone, including the hard-line Googlites and spam haters, is talking about Google being very unfair to ordinary websites, and to Google’s own users.

    There’s a *very* big message here for Google.

    If Jill and I are in agreement (somebody mentioned that – I don’t know personally), then Google really should take notice 😉

    Nicky’s site is one of only two in the world. It’s an information site – but it’s a gonner.

    I repeat – there is a *very* big message here for Google.

  332. It’s kind of like playing darts after having been spun around a few times while blindfolded. I might hit the board at some point, but I’ll have no idea how or why I did it, and the odds are agin it. And I’ll never be able to replicate it.

  333. What webdango said.

    The problem nowadays is webmasters have become OBSESSED with SEO. And why? Because of the way the major search engines work. They’re obsessed with rankings. They will forfeit good content and come up with gibberish that just happens to have the right keyword density for certain keywords. I am one of those unfortunate souls plying a trade in e-commerce. The top SERPs in my sector are also an embarrassment to my sector – extremely low on content, but tweaked to the max in terms of SEO. I did a search for “web design UK” on a particular search engine and the top result was a web design outfit that hadn’t updated their site for 2 years. Even more, they claimed to have won an award – when I clicked on the ‘award link’, it was a scam site giving out ‘awards’ to anyone who joined their affiliates program. But their site was keyword-stuffed to the brim, so they get to be on top of the SERPs. What a total joke. Where’s the quality?

    We need to forget about the major search engines. Seriously. I’m concentrating all my efforts on local trade – that means getting out in my car, and meeting people. I’m placing adverts in certain magazines. I’m doing some telesales. I tell you what – it’s working. It’s slow and hard work, but it’s the way to get things done in 2006 if you haven’t got a keyword-stuffed, 500-backlinks-to-high-PR-sites-that-have-been-bought website, it’s effective.

    What the internet needs is Google to get smaller, and for many other search engines to enjoy their share of search traffic. Web standards should replace SEO. SEO isn’t adding any value to the web – it’s making people write text for robots, not humans. Reward clean HTML (robots can do). Reward rich content (employ a human). Ban spam quickly.

    p.s. I changed my default home page from Google to MSN.

  334. (and btw, to whomever it was up there who mentioned about putting a “designed by” credit on a client site – I actually agree with you, and stopped doing this years ago, even though I’d gotten at least two projects directly from such a link. Somehow it just doesn’t look right, or professional anymore. So there’s three of us.)

  335. [blockquote]Google is broken. I can not find anything i search for, and any of the hard work put into a few quality sites that have been around for years is being negated and de-indexed page by page day by day.[/blockquote]
    That is your opinion based on the opinion of a website owner/webmaster. The thing is, your opinion should not be the major issue of a major search engine. The “users” of that engine who actually do real-world searches looking for products or services is who the major players are that engines actually put their priorities on. If those “real” people doing the searching actually find they don’t like Google anymore, they seek out a search engine that serves their needs.

    This is all basic stuff… survival of the fittest. If people seek out other engines for their research on products and services, then that other engine will rise to the top, right? So far to date, I’m not seeing that at all. The greater majority of searches are done at Google. Website owners can claim “bad Google” all they wish, but it’s the real searchers who have the most control. … common sense.

    This thread is still pretty much all about the “linking” thing in many minds. Even after Matt stated in another post that this is “not” all about linking, members still insist on thinking that it’s all about linking.

    Remember that “each” website is different than the other website. Each site has problems that the owner really does not understand nor know about. It’s very true that links could be a problem for “this” site, but it’s also true that something other than links could be the culprit. Unless or until you have someone from Google specifically “look” at your individual site, or someone more knowledgeable taking a look, there is absolutely no way you can read anything into your individual site’s problems by using a very blanket and general statement about “links”.

    Further; it’s certainly not in the very best interest of an engine like Google to specifically state in a blog about how a website should build their site or how they should avoid penalties, etc. That’s like giving our “secrets” to the people in the world who want to kill us. It simply makes zero sense for an engine to state exactly how their algo works in a given point in time.

    Google is doing a great job of reaching out to you all. It should really be appreciated the vast amount of info they are giving these days. But don’t think for one second that a general type statement ‘must’ pertain to your individual site as well, as that couldn’t be further from the truth.

  336. Hi guys,

    I’m a bit worried about the simplicistic concept of “relevant” or “related” content used by MC when he talks about linking and reciprocal linking.

    I’ll explain what I mean with an example: we are a hotel reservation website and we deal with hotels in various destinations of the world.

    Our “related resources” are the ones that would be _USEFUL_ for a traveller.

    As the traveller will book the hotel with us, the rest of the resources are “complementary” resources and not competitive resources.

    Example of what we link and what our travellers want us to link (as these are useful things to know if you have already booked or about to book an hotel):

    – Car rentals
    – Airport transfer services
    – Bicycle Rentals
    – Art Galleries
    – Cinemas
    – Museums
    – Theaters
    – Bars
    – Food Festivals
    – Restaurants
    – Casinos (Yes, if you book an hotel in Las Vegas, you want to know the best casinos if you don’t have one inside your hotel)
    – Clubs and Discos
    – Festivals & events
    – Nightclubs

    I also have another 195 categories of resources that we regularly link in order to build a good service for our hotel-bookers.

    As you see, these are all hotel and travel related resources, that makes our websites very visited and one-way-linked just because these are useful info for a traveller than wants to book an hotel and know more about the area.

    NOW: I’m worried about what MC says in his blog and about the use and definition that all the SEO world has done about “relevant/related” content.

    It should be natural that a website will link COMPLEMENTARY resources, not COMPETITORS. Therefore, the keywords to be inspected on our outgoing links are 100% different from what we sell.

    Therefore, I’m deeply worried about the concept of “related” that Google will or is applying in evaluating what type of links you have on your pages.

    MC says:

    “another real estate site……I checked out the site. Aha, Poor quality links…mortgages sites….”

    Now: is MC aware that mortgages sites are natural and relevant and pertinent to be linked if you are a real estate agent, as you might want to give related services to your visitors telling them how to find the money to buy his services?

    Or does MC search for the related content in terms of a simplicistic “real estate words are good, anythign else is bad”? I mean: is Google even thinking about the fact that a real estate site cannot link a competitor but will be more likely to link complementary services?

    In short: does Google and MC want us (a hotel reservation service) link Hotels.com as it will be relevant (and a complete nonsense as they are our competitors) or is googe “mapping” the related (complementary) services for every industry?

    I doubt that Google will have a map of every complementary service for any given industry: therefore, I’m afraid that “related” for MC means “same topic, same industry… competitors, essentially”.

    Will MC want Expedia to link Orbits, in order to evaluate Expedia’s lik as relevant?

    Or will MC and Google better evaluating (or not “worse evaluating” at least) Hotels.com linking Avis or Budget?

    Thanks

  337. I don’t think that anybody has agreed with Google about this issue. Probably most of this blog’s regular contributors back Google to the hilt, but I don’t recall anyone doing it about this issue. Even Doug Heil backed off and didn’t come up with a good reason why the health care directory shouldn’t have all of its pages indexed, and he’s Mr. Whitehat.

    Just because an opinion wasn’t openly voiced doesn’t mean that people don’t agree, Phil. However, what tends to happen is that the voices of discontent drown out the silent majority.

    Since apparently you seem to think no one “agrees with the issue”, while I don’t see one I can see exactly what Matt is saying. And it’s quite simple:

    Look inside before you look outside.

    No site is perfect. That includes your site, that includes my site, that includes Matt’s site, that even includes Google. No matter what, there is always room for improvement.

    In the cases of everything Matt has discussed here, the webmaster has made a mistake each time. It isn’t always an on-the-page factor, and in the case of the health care site it’s a factor that only Google would be privy to knowledge of, but there has been a factor each time.

    BigDaddy has served to expose a larger number of errors than it has in the past, things that were previously tolerated or “forgiven” but aren’t any longer. It’s not perfect. It’s got a ways to go (particularly with scraper sites, but that’s a big mother of an issue). But it’s certainly on the right track.

    Doug Heil gets this, h2 definitely gets this, Dave (Original) gets this, I get this, and I’m sure there are others but there are just too damn many comments at this point. If I ignored yours and you do get it, it’s an unintentional and honest mistake and I do apologize in advance.

    As far as the health care site goes, I’m not sure if you read the comment I made on it, or for that matter the answer Matt gave because it was fairly buried. So I guess benefit of the doubt applies here.

    And now, the answer Matt gave (side note/suggestion for Matt…the next time you want to show why something’s wrong, either bold it or put it in its own paragraph. That way, it can’t get missed.)

    Aha, the owner said that they wanted to kill the www version of their pages, so they used the url removal tool on their own site.

    The health care site used the URL removal tool on their own site, in effect asking Google to delist it.

    What else is Google supposed to do at that point? “Oh, they didn’t put in both the www and non-www versions so they probably only want the non-www version. They don’t have a 301 redirect to accomplish this, but let’s just go ahead and guess at what they mean anyway. Then we’ll have a nice fat canonical issue to deal with from webmasters who wonder why their non-www is listed and not the www version.”

    I don’t think there’s an issue at all, other than what people need to do to fix their own stuff.

  338. there are still many sites with hidden text coming up high in the rankings of Google. also, there are still a TON of dead links out there. as always thanks for the update!

  339. So is it ok to use reciprocal links if they go to on topic real non spam sites like lawyers linking to lawyers and dentists linking to dentists. I know of a Dentist site that all of his links come from paid links in directories or he has a reciprocal link program. His site has a directory that he uses to exchange links from other dentists. And this dentist ranks for the terms he wants to.

  340. Doug Heil – “That is your opinion based on the opinion of a website owner/webmaster. The thing is, your opinion should not be the major issue of a major search engine. The “users” of that engine who actually do real-world searches looking for products or services is who the major players are that engines actually put their priorities on. If those “real” people doing the searching actually find they don’t like Google anymore, they seek out a search engine that serves their needs.”

    Ummm many of us ARE regular searchers also looking for services and products. Many are experts in our industry in that we know which sites are complete junk and not and what sites that “should” be listed for the average JOE. Also if every site one day were to exclude Google from indexing, you would see just how important us webmasters are. Even more important than the average JOE surfer is actually. If this happens Google would have no product. No matter how much they tweak the algo for average surfer it would do no good because the base product (us) no longer exist. In business the PRODUCT always comes first with “user” in mind. Service of the product comes second.

    Remember that Google was built by the PR of us webmasters. We help start it. Even before media PR took off. Only after did the average joe surfer follow along. The same CAN and WILL happen in reverse. I can walk up to 20 people who know I do business online as say “Google is not longer the thing – MSN is now the popular engine” people would actually listen. “He does business online and knows what he is talking about”

  341. Thanks for the update.

    Referring to the comments about irrelevant links and link networks, it appears G is trying more to give priority to relevance. I always did see G as the SE with most weight on relevance. Naive maybe, but I’m convinced relevance will win the search market.

  342. Like someone above said, I don’t care as much about where I rank at this point than just indexing. In looking at what google has on my site… if I do a site: search I find about 100 pages that are in the “main” index (that’s a bit under 10% of the site itself I guess) and the rest are supplemental, yet, when I try to do a text quoting a sentence from a supplemental page – no results found. I thought when the “main” index was exhausted that results should be pulled from supplemental? Have I missed that somewhere along the way? The supplementals don’t seem to be used in general searches at all from what I’ve tested. EVEN when the main results come up empty. It looks like most of the site is indexed, but supplemental now is just not being offered up.

    Sure, I probably don’t have tons of quality links coming in. I can do a link: search and only see a couple (one from another of my sites.) but if I search for “mydomain.com” then I see tons (yes most of what I’ve looked at there are really links there. instead of just the text.) Of course, they may not be “worthy enough” links to pull my site out of the supplemental banishment, but amazingly some of those sites seem a LOT easier to find in the results which makes me think they have higher pagerank.

    My biggest concern is the uselessness of the “site search” from google. Since text from the supplementals are not searched I’ve had to start putting in another site search so that things can actually be found.

    As for the lower crawl priority for sites that don’t have quality incoming links – I’m getting crawled every day. Just out of curiousity I checked a couple pages against the site: search and indeed many of the supplemental pages are getting crawled. It just doesn’t quite seem to add up. From what you’re saying, here’s what I would expect… lower quality inbound links – more likely that pages get supplementaled and don’t get crawled every day, but once every blue moon. Pages that are “higher quality” due to their backlinks get checked more often… then from the search end – search term pulls up results in main – supplementals are offered as an additional resource, search term pulls up NO results in main, so supplemental is used to give an answer. This doesn’t seem to be what’s happening – I’m getting almost continuous crawling – even of supplementals – they’re not updated in the index (old caches in site: search for supplementals) and they aren’t showing up as even being there when a quoted text search is done.

    I’ve already documented some of my other frustrations with the placement in results in other venues – how my MAIN computer service site always seems to be the dead last result even with the quoted text of the title of the page (which is quite long and includes my name.) It’s almost as though I’m being penalized for something else although I don’t know what it could be other than the lack of “quality inbound links” although what’s ironic – most of the sites that come in above me in that search are actually linking to my page… go figure…

    But to bring it full circle – what is most concerning is not “where I rank”, but the broken-ness of “site search” and the fact that now google is useless to me for finding things that I’ve written on my own site. What’s truly ironic is there is VERY current and full coverage in the blogsearch.google.com area, but I don’t think there’s a “site search” box that I could put on my site for the blogsearch.

    Are supplemental results supposed to be offered up if no results are found in the main index?
    Is there a penalty against my main site?

    Thanks for any feedback.

  343. Google is just one website. The internet consists of billions of pages. Google has reached its zenith (surely?) in market share. Businesses that are turned off by the way Google works will find alternative methods. This is the way business works. Google can continue to ignore the complaints all it likes. It will see its search traffic go down, and advertisers use other PPC networks. You can be the main player and choose not to listen to your customers. Eventually you lose business. Eventually competitors find new strength from your complacency.

  344. Adam.

    I don’t recall anyone agreeing with Google on this issue – that’s what I said. If people chose to be silent, I can’t help that, but it’s rare for such silence in this blog.

    I asked Doug to come up with a good reason why the health care site shouldn’t have all of its pages indexed, and he didn’t come up with anything. I’ll ask you to do the same – please come up with any good reason why the health care site shouldn’t have all of its pages indexed. We’re all ears.

    I didn’t forget that the owner had made a mistake with the delisting thing, but that wasn’t Matt’s answer to the problem (the delisting had lapsed some weeks ago). Matt’s answer was:

    That said, your site also has very few links pointing to you. A few more relevant links would help us know to crawl more pages from your site.

    Now why would Google need help to know to crawl more of the site’s pages? Google already knows that the pages are there – they have URLs. And why would more IBLs help them to know?

    What difference does it make if a site has only one IBL or a thousand IBLs? Does having only one IBL make it a bad site that people would rather not see? If it does, why have ANY of it’s pages in the index?

    These aren’t rhetorical questions, Adam. I’d like answers to them please.

  345. Remember that Google was built by the PR of us webmasters. We help start it. Even before media PR took off. Only after did the average joe surfer follow along. The same CAN and WILL happen in reverse. I can walk up to 20 people who know I do business online as say “Google is not longer the thing – MSN is now the popular engine” people would actually listen. “He does business online and knows what he is talking about”

    While I do believe in the awesome power of GWOM (Geek Word of Mouth), it was far from the only factor.

    What about ISPs that use Google for its default search?
    What about the Netscape tie that existed for years?
    What about AOL (I don’t count AOL as an ISP because anything that destroys a TCP/IP stack is not a real ISP)?

    And once something like that is entrenched into users’ behaviour, it’s very difficult to remove it.

    I could probably do the exact same thing you just said…tell 20 people to use MSN and get them to do it. Hell, I could walk into one office alone and do that and have it done in about 30 seconds. But it wouldn’t accomplish a damn thing, because those 20 people would tell no one else. So I told 20 people to switch out of the billion that presently use Google. If all the webmasters that hated Google right now did the same thing, you might get a million people to switch. A drop in the bucket.

    “Us webmasters” don’t all share the same point of view anyway. I don’t have a problem with Google SERPs for the most part and finding what I want, assuming I’m doing a completely objective search (about the only thing it seems to give me problems with are used car parts). Doug obviously doesn’t. And even if we did, we’re a small portion of the community. You’re a drop in the bucket, I’m a drop in the bucket, Doug’s a drop in the bucket. As he quite rightly pointed out, it’s the overall user perception that matters, not what a few egocentric people who didn’t get their way and want to bitch in Matt’s blog think.

  346. Google faces an insurmountable problem. Making EVERYONE happy.

    What I’m starting to realize is that we aren’t going to rank #1 for every single keyword we desire. It simply isn’t fair. Because there are sites that are more relevant for certain words than ours’ might be. It’s frustrating yes, but it’s life. I’m actually starting to respect sites that are better, or more relevant than mine that rank higher for certain keywords. I say okay, this site is solid and it deserves to be here. I still want to beat it though, so I may optimize and enrich my site to compete. Complaining won’t solve anything…

    However, the annoying aspect to all this is the garbage sites interspersed among the results. That is what pisses everyone off, including myself. You know these sites don’t belong above yours, or among the results period! And you feel that it is so obvious, yet no one seems to be doing anything about it. Problem is there are billions of sites and keywords, so what seems like a simple fix is multiplied by a billion or so making it a severe task.

    So in closing instead of worrying about the garbage sites, worry about the sites that are better than yours, and that rank better for your desired keywords. That should be the focus. The junk will always be there. The web is simply a microcosm of life, and life contains junk.

    Word.

  347. The heart of the issue:

    An ‘objective’ algorithm making a ‘subjective’ decision. The only person who can make a true assessment as to the ‘value’ of my or any other site is the user. Only the user will never get to my site because a computer formula has decided it isn’t ‘worthy’.

    Doug said “This is all basic stuff… survival of the fittest.”

    No, it’s not. Like most things in our society, it has devolved to what it always devolves to: he who has the money makes the rules, although in this case it’s he who has enough money to brand through traditional media gets the ranking.

  348. Phil…normally I would tell someone who asked me a question in the manner that you did to stick it straight up his ass for being so damned arrogant. (Seriously, dude, you need to let up on that. You come across as being very elitist sometimes.)

    But I’m not going to do that in this case.

    Here’s how the scenario, as I see it, would have played out:

    1) Health care site gets listed in Google, has a series of 6 IBLs.
    2) Site inadvertently asks to get delisted. (Webmaster error.)
    3) Time lapses on the delisting request and again, webmaster is unaware.
    4) Webmaster presumably doesn’t file a reinclusion or resubmission request (I don’t know for sure…Matt would have to fill in this blank)

    At this point, the site for all practical intents and purposes is a “new site” again. How does Google know whether or not it’s worthy? There would have been no fresh IBLs that would suggest that other webmasters find the content to be worthy enough of a backlink…the webmaster would have not indicated to Google that he/she wants to be back in other than submitting what apparently was a manual request via email.

    Matt may have offered an opinion, but it’s just that…an opinion. It’s subjective. So if Matt likes the site, it should be in there? What happens if he hates one? Can you imagine the ramifications and bitching that would go on if Google started filtering results on personal whim?

    Now…at this point, two things are known:

    1) There are very few IBLs, and presumably none since the reinclusion request was made.
    2) The webmaster had submitted a delisting request and had done nothing to indicate that he/she was still interested in being part of the index.

    How would anyone or anything evaluating the scenario objectively be expected to know whether or not a site should be included in the index given those parameters? It was a site with very little external credibility and even less recent credibility that had already said it wanted out (accidental or otherwise).

    Google doesn’t have anything else to evaluate on, other than the inbound links in this case.

    It can’t go based on the spidering of the page alone, because that may not reveal all of the information related to that site.

    It can’t go by Matt’s personal opinion of the site, because it’s one person’s opinion, very subjective, and prone to human error.

    So the only thing it can go by is a series of links provided by other webmasters. In this particular case, the recent backlinks would be more beneficial since it would establish that the health care site is being viewed in a positive light by those who link to it.

  349. “What about AOL (I don’t count AOL as an ISP because anything that destroys a TCP/IP stack is not a real ISP)?”

    AOL came around AFTER they became popular. Even at that most people (even when google was on yahoo) thought that AOL search is just that and Yahoo search was Yahoo’s. How many “average surfers” knew that looksmart was feeding MSN for years before MSN developed their own?

    “because those 20 people would tell no one else.”

    How do you know for sure? You state it as fact but is just an opinion.

    “‘Us webmasters’ don’t all share the same point of view anyway”

    Nobody shares the same point of view due to unique life experience.

    “I don’t have a problem with Google SERPs for the most part and finding what I want”

    The key phrase ‘for the most part’. Problem is if sites in the industry are not fully indexed what information, points of view, products are you missing that COULD be there? That is why we are complaining and NOT because a site dropped from #4 to #20 or whatever in that there is fair and equal opportunity.

    “As he quite rightly pointed out, it’s the overall user perception that matters, not what a few egocentric people who didn’t get their way and want to bitch in Matt’s blog think.”

    And where does that overall perception come from? Sometimes a few egocentric people have changed the world, insighted revolutions, etc.

    “And once something like that is entrenched into users’ behaviour, it’s very difficult to remove it.”

    The thing is…it really is not all that difficult to change it. A behaviour can be changed in an instant. A moment of decision. All there needs to be is something to interrupt the pattern and a viable alternative with that alternative needing to be reinforced. That is all! The difficult part is just being able to do it.

  350. I am new to SEO, and trying to learn,
    and as I read in the post, a lot of times, there are “fake software made” websites, only for adsense with bad content.

    Anyway, I have a question, what about new websites, are them indexed as they don’t have PR, and no links?

    So people will not find my 3 months old site when they tape in google.

    And what about the Future of article submissions?

  351. Adam.

    I’m sorry if the way I write isn’t very good, but I’m just trying to write logically and put points across as clearly as I can. I don’t intend to talk down to anyone, but asking questions in such a way that answers are almost a requirement is intentional – to make points 😉

    The things that we know about the health care site are:

    (1) Some of it was delisted by request, but not all of it, so it doesn’t have to start from scratch.

    (2) It has six IBLs that it’s reasonable to assume are not new. That’s reasonable because it was fully indexed before the delisting mistake (at least one IBL was necessary), and since then some of its pages have stayed in the index.

    (3) Matt, who is the expert at finding wrong things with sites, didn’t see anything wrong with the site – and he looked – so the problem isn’t internal.

    (4) Matt said clearly that, if the site got some more IBLs, then it would get more pages indexed (my paraphrase).

    I haven’t suggested that all of the sites pages should be in the index at this moment in time – we know it takes time for that to happen. What I’ve asked for is any good reason why all of that site’s pages should not be indexed.

    The reason I’ve asked is because Matt’s comment about the site made it clear that, with more IBLs, more of its pages would be indexed, and by inference, that if the site doesn’t get more IBLs, then it’s not likely to have all of its pages indexed. Specifically, he said “With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages.” I want to know why.

    My big objection is that IBLs have nothing whatsoever to do with the quality of a site, and should have nothing to do with how many of a site’s pages are indexed. That’s the reason I’ve been using that particular site as an example. If you can come up with a reason why all of its pages shouldn’t be indexed, then please tell me. Or if you can come up with a good reason why the number of IBLs *should* be used to determine how many of a site’s pages should be indexed, and how many to leave out, please tell me, because I can see no sense in it.

  352. I totally agree with DavidW and the rest of the folks who wrote about dividing the net into have’s and havenot’s. At the moment it all boils down whether you play the backlink game or not. But even if you want to play that game – for some non-commercial sites with good content that’s just not feasible. If you’re in a niche like our enthusiasts audi s and rs models website it’s quite hard to get decent links and it gets even harder if you’re in a niche and your website language is NOT English. Where should we get that many high PR links from to get a deep googlebot crawl into our discussion board topics? English websites usually don’t link to us or or blog about us. Of course we’re using sitemaps but that obviously doesn’t help as long good IBLs are missing. It’s a lot easier if you play the backlink game in the English market because it’s so huge.

    Looks like (non-english), niche websites with good content are going to be the losers?

    Cheers,
    Jan

  353. So isn’t there a tremendous amount of ambiguity in who exactly determines the “quality of links” you speak about? Will the site that I am describing be penalized somehow (less pages crawled because of the digital camera links and Incorporation links)?

    Also – the site is big and ranks in the #1-5 spots for high traffic keywords in the webmaster space in Google. How is this consistent with your post?

    Thank you for giving us an outlet to read about Google and respond!

    I

  354. Incidentally, Adam, Google did become popular through the buzz that web-type people caused. The tie-ups that you mentioned came afterwards.

  355. “I asked Doug to come up with a good reason why the health care site shouldn’t have all of its pages indexed, and he didn’t come up with anything. I’ll ask you to do the same – please come up with any good reason why the health care site shouldn’t have all of its pages indexed. We’re all ears.”

    Show me the exact site in question, and I might be able to give an “exact” answer of some kind. Phil; you know better than most that it’s impossible to give a general answer to questions that will fit into ‘most’ other sites as well. I thought I gave you some possibilities with a prior post? How many more answers do you want me to give? I know it’s not the answer you were looking for…. like “bad Google”, but it’s the only answers I can give without actually viewing the site, right? Please read my prior post where I gave some answers on that health care directory.

    The Adam that doesn’t belong to Matt; … who the heck are you? It seems you read my/our stuff as you are “spot on” with my way of thinking. LOL Great post!

    arubicus;… read “the other Matt’s” post again and again. “We” are a very small minority compared to the internet users as a whole. I sometimes think that this “little” community of webmasters/owners/SEO’s actually think we are some huge majority of the internet, and that Google and other major engines should bow down to us because of it. Believe me; we are very small.

    I agree with you that “we” indeed are regular users/searchers as well, but we are small. You can’t only look at certain groups who may be unhappy with google serps right now, but you have to look at the big picture.

    It IS only survival of the fittest. The major search engine with the most users who do searches daily, is the engine who gets the majority of searches. It ‘is’ common sense stuff, right? Just because a few owners are unhappy does not mean the “majority” are unhappy as well. IF AND WHEN the majority are unhappy with Google, then that majority will move to find another search engine, right?

    That’s called…. survival of the fittest, and has zero to do with how much money anyone has. It does have to do with common sense.

    Keep this in mind as well; do you all realize “how tough” it would be to give an answer to a question if you were an employee of a large company, and that answer had to appease a whole bunch more, other than the site the answer was given to? My goodness; what a tough job it would be. What’s the result of Matt giving answers to questions in here?

    The result is many more questions pop up because of that particular answer he gave. Why is this? Because “each” site has it own little set of many hidden problems that are impossible to know about unless the site is manually reviewed and diagnosed. Speculation about problems is just what it is… speculation.

  356. “Incidentally, Adam, Google did become popular through the buzz that web-type people caused. The tie-ups that you mentioned came afterwards.”

    Yep. Us webmasters help get them started. Usually this is the way it works on the internet. The WOM buzz is much stronger in the online world. Look at MSN and ASK running commercials for their searches. Not much impact was made. Now if MSN, Yahoo, or ASK creates a better search result then the Buzz will be on them starting through the webmaster community. This buzz sets a foundation of confirmation that indeed they do have better search results which in turns grounds a basis for perception change for ‘average Joe” (Interrupts patterns, challenges current habits and associations, sets basis for decision/change). Since many webmasters and online business owners know their industries our word tends to hold more weight and resinate longer.

    Major search players AOL and the like may switch to whomever provides their user with the best experience. This would be a major pattern interrupt for the “ignorant average Joe” and sets up patterns for change for them. Of course the more they use the new search then new habits forn and the other brand gets associated instead.

    This is the SAME route that Google took!

  357. Matt,

    Kudos to the team for their job on Big Daddy.

    Dropped a set of my sites from over 100,000 site: listed pages to around 50. Average PR = 6. But you still manage to suck up about 400K hits & 1Gig of bandwidth each month.

    WTG. Use my resources and my money to collect and analyze my sites without even allowing an accurate site index.
    NP though, as the sites are very popular and successful with the other SEs.

    I truly hope G attains remarkable growth. Just as the Anti-Rockefeller’s did some 100 years ago.

  358. (1) Some of it was delisted by request, but not all of it, so it doesn’t have to start from scratch.

    It likely isn’t. It just isn’t going to have other stuff indexed.

    (2) It has six IBLs that it’s reasonable to assume are not new. That’s reasonable because it was fully indexed before the delisting mistake (at least one IBL was necessary), and since then some of its pages have stayed in the index.

    Agreed.

    (3) Matt, who is the expert at finding wrong things with sites, didn’t see anything wrong with the site – and he looked – so the problem isn’t internal.

    No offense to Matt, but again, he’s human…he can overlook something. The initial comment appeared to be a surface-glance kind of thing. It’s probably accurate…but possibly inaccurate.

    (4) Matt said clearly that, if the site got some more IBLs, then it would get more pages indexed (my paraphrase).

    That would make sense.

    My big objection is that IBLs have nothing whatsoever to do with the quality of a site, and should have nothing to do with how many of a site’s pages are indexed.

    If you look at IBLs individually, that would be an accurate statement. 1 or 2 backlinks, for the most part, shouldn’t make a difference either way.

    But would it not be reasonable to assume that a site with 10,000 backlinks, from various sources and varying degrees of credibility, is more useful to the population as a whole than a site with 1? I think it would. I might not like the site with the 10,000 backlinks, but the majority of other people would. And the variety of sources indicates a prevailing opinion as opposed to that of just one person.

    If the site continues to gain backlinks and credibility, then why shouldn’t it be indexed more often and more deeply? It shows at least one of two things:

    1) The webmaster is sufficiently proud of his/her work to be able to promote it.
    2) Other sites find it to be a valuable resource.

    Now, I don’t know how many backlinks a site should get in order to be indexed fully. That’s an arbitrary number that could be debated until we all turn blue in the face. But a webmaster that’s so concerned about getting indexed, never mind ranked, should have a lot more than 6 links to his/her domain. Links bring in traffic, directly or otherwise…why wouldn’t any webmaster try to get as many as possible?

    Here’s a scenario involving a small number of IBLs:

    http://www.google.com/search?q=%22216.89.218.233%22&hl=en&lr=&rls=GGLG,GGLG:2006-19,GGLG:en&pwst=1&filter=0

    For those who don’t know what the IP is, that’s my server’s testing IP address (or redirector depending on what I want to do with it). There are backlinks there…a small number.

    But is any of that content worthy? By my own admission, no. It’s all testing stuff.

    How would a search engine be able to determine something like that? It’s got a link. Google knows it’s there. There’s nothing blocking robots from indexing it.

    Think of the crap and the potential for manipulation my scenario allows for webmasters who decide they want to be snaky.

    Let’s take the scenario above and put a slight twist on it (just for hypothetical sake).

    Domain A is bought and used by Company A.
    Company A puts up a site, gets 6 IBLs.
    Company A submits a delisting request.
    Company A lets Domain A expire.
    Company B snatches Domain A for purposes of building a new site.

    Now…would Company B be entitled to the credit for the backlinks that Company A went out and got beforehand? No. They didn’t do any of the work…all they did was snatched a domain name. It’s still a new site and hasn’t established anything since the delisting request.

    How many times does that scenario play out? Quite a bit. And that’s not all that different than what we’re seeing here.

    And finally, the biggest problem that not one person who has whined about the IBL issue has yet answered:

    What is a more effective measure to determine whether or not a site provides a valuable resource to users and deserves to be indexed fully, without potentially diluting the existing results?

  359. “What is a more effective measure to determine whether or not a site provides a valuable resource to users and deserves to be indexed fully, without potentially diluting the existing results?”

    This is the question of all questions! Even Google hasn’t figured out the answer. Nice something to ponder about really. It seems to me it is becoming time to stop playing the manipulation game, the lazy whinning I created it and I think it is good so it should rank game, and start playing the quality site/business that has a unique whatever to gain interest in the internet community game

  360. The comments on the wasted efforts and strategies are indeed an example of the strength of the human spirit. Like wanting to commit suicide, but just can’t get it done. The truth is that the art of Google is a myth. They do any thing they want because they have the ball and bat in this game. Webmasters are playing T ball thinking they are playing for the Yankees. Am I the only one that has noticed that they are a monopoly? Tat is suppose to be illegal in the US. No wonder they are going to China; that form of government is more along the lines of what Google would want.

  361. Matt,

    I understand you believe no one out there is experiencing any more problems related to Big Daddy, however, is anyone at google looking at the results being returned now? It seems like google is partying like it’s 2001. I could find not much of a sign that any modern websites are being returned. Is the new ranking priority that a site must be old and not redesigned in 5 years? Is Def Lepard at the top of the charts again? Is Google Search going to be renamed Google Retro? In case you haven’t noticed, things are getting ugly.

    -Jim

  362. Hi Jim, I honestly don’t see what you are seeing as far as serps go.

    http://www.google.com/search?sourceid=navclient-ff&ie=UTF-8&rls=GGIC,GGIC:2005-09,GGIC:en&q=children's+gift+baskets

    has not changed at all…

    http://www.google.com/search?hl=en&lr=&rls=GGLC%2CGGLC%3A1969-53%2CGGLC%3Aen&q=ethical+seo&btnG=Search

    Has not changed at all.

    I could do searches on many, many phrases, and don’t see any serp changes that look bad. For everyone who watches over or owns a site that dropped out of the results, I’m sure there are more than one other who has stayed the same or has gotten better.

    This thread is a tiny minority of people.

    And just because a website is not being shown in it’s entirety doing a site: search on the domain, does not mean a whole bunch. As anyone thought about the idea that Google is doing lots of changes right now, and really doesn’t want to “spill the beans” until things are all done?

    My forums are not showing but a couple hundred pages now, but that hasn’t stopped all the referrals we get from Google daily.

    I simply refuse to get all up in arms about something unless I know things are set the way they will be for awhile. All of this whinning, etc does no good. To make a drastic change “right now” does no good either. Trying to decipher whatever Matt says about something, and then “making” that answer pertain exactly to your individual site’s situation, is certainly doing no one any good either.

  363. It seems to me it is becoming time to stop playing the manipulation game, the lazy whinning I created it and I think it is good so it should rank game, and start playing the quality site/business that has a unique whatever to gain interest in the internet community game

    Now THAT I’ll drink to. 🙂

  364. Is Def Lepard at the top of the charts again?

    Some of us LIKE Def Leppard. Damn new music sucks ass now. 80s glam hair rock and acid-washed jeans forever!

    And a little Mecca Lecca Hi Mecca Hiney Ho (for those truly on that higher plane of consciousness and understand that very obscure 80s reference.)

  365. Good post, but there are still too many problems with Google. In the Las Vegas travel market, the top 10 results have been the same for over 2 years for the keyword “Grand Canyon Tours” , many other search terms, the results ahve been stagnant.

    We are led to believe these odler sites have been Grand-Fathered in.

  366. Why are people still talking about the influence links should have on getting a site indexed? This has nothing, or should have nothing to do with getting a site indexed. This should affect rankings, not whether and how much of a site gets indexed. As far as links showing a popularity of a site, we all know this is not even remotely true. It is very easy for a site to get tons of links and this should show Google that it is a horrible thing to measure rankins and/or indexing on.

  367. Matt, you menation a problem with hyphenated domains that you say you think its solved.

    Could this be effecting pages with URLs like this?

    http://www.domain.com/nice-web-site-in-my-head.html

    If it could be that could explain quite a lot of the pages Ive lost from the index.

  368. Dave (Original)

    RE: “Now why would Google need help to know to crawl more of the site’s pages?”

    Perhaps they want quality over never-ending quantity. I others words, if they DID index nad list all those ‘other’ pages they would never rank anyway.

    PhilC, I think are you HUGELY mistaken by assumming silence means agreement/disagreement. The VAST majority of the people who come here ARE bias and generally only look at an EXTREMEMLY minute part of the whole picture. Of these, only those with PROBLEMS generally post.

  369. Caios,

    Venessa Fox just addressed this very issue on their blog.

    http://sitemaps.blogspot.com/2006/05/issues-with-site-operator-query.html

    And further information can be found in the newsgroup in the response from Google Employee here: http://groups.google.com/group/google-sitemaps/browse_thread/thread/0fc2ae32ef28da7e/961cf2c0421208fc#961cf2c0421208fc

    Where they say, “One issue is with sites with punctuation, which definitely affects you. We’ll keep you posted as we get this resolved.”

  370. Doug. I already explained which site I was asking about – more than once. It’s the health care site example that Matt gave. There is enough there to answer the question based on Matt’s information alone. But I don’t think you want to answer, do you, so forget it.

    Adam. You are talking from a web-type person’s point of view, but I am talking in general terms. The vast majority of people who have sites on the Web wouldn’t have a clue about getting some “buzz” going to gain links, as Matt suggested – and not only in this thread. They just want to put their sites online so that people can find it. It’s not a search engine’s job to decide whether or not it has value for anyone, or for how many people it might have value. It’s their job to index what they can, and dump spam as and when they find it in their index.

    It’s not a search engine’s job to determine which sites provide the most valuable resource. And even if it were, there is absolutley nothing to suggest that a site with 10,000 IBLs is any more valuable than a site that has 0 (zero) IBLs. For example, which site is more valuable to me right now? The one where I can order a pizza or Google? I’m starving and I really want a pizza, so the local pizza site is far more valuable than Google is right now.

    You see, a site can have great value to one person, and a site can have great value to a very large number of people. They are both equally valuable because the value of the pizza site to me is as great as the value of the other site is to you, and to him, and to her, etc. The degree of value is equal.

    Can you say that the pizza site should not be in a search engine’s index, just because there aren’t many of us who are likely to look for it? Of course not. Can you say that all of it’s pages shouldn’t be indexed because there aren’t many of who value it? No.

    You haven’t noticed the answers that you say nobody has answered yet? I’ll try to make it clearer…

    If Google is short of space, and they need to limit the number of pages in the index, then fair enough – let the most popular sites have the bigger shares, because they are wanted by more people, and IBLs could give some indication of that. But if there is no shortage of space, then limiting the number of pages that a site can have in the index is simply wrong. It’s editorial, and it’s not what Google’s users want or expect from them. It’s not a search engine’s job to be editorial.

    It would be acceptable to take IBLs into account if there was a reason to do it, such as a shortage of space. But there isn’t a reason to do it that we know of, so IBLs aren’t needed as an effective measure to determine whether or not a site provides a valuable resource.

    You can argue as much as you like, but you still can’t come up with a valid reason why any decent, perfectly clean, website should not be fully indexed, given that there is plenty of space in the index.

  371. Dave. I never referred to people who were silent, and I didn’t assume anything about them. I referred only to people who posted in this thread.

    Perhaps they want quality over never-ending quantity. I others words, if they DID index nad list all those ‘other’ pages they would never rank anyway.

    Perhaps they do want quality over quantity, but if they do:

    (a) IBLs are entirely the wrong metric to use for measuring quality.

    (b) If pages are allowed in the index based on IBLs (and PageRank), and some sites only get some of their pages in, then NONE of the site’s pages should be in, because the low IBLs and PageRank score mean low quality. Of course, it doesn’t, so quality isn’t the issue here.

    (c) It’s not a search engine’s job to decide what their users want to see and what they don’t want to see.

    I don’t think for a moment that Google is doing this in an attempt to index quality. There aren’t any programmes yet that are remotely capable of doing that. I’ve no doubt that it’s to do with spam.

    As for your last point, there is an enormously long searchterms tail, and pages will rank highly.

  372. That should have read…

    I’ve no doubt that it’s to do with spam, or they really are short of space.

  373. “It’s not a search engine’s job to determine which sites provide the most valuable resource.”

    So it’s the actual website’s job to tell Google that it is the most relevant resource, right?

    I see.

  374. Dave (Original)

    RE: “You can argue as much as you like, but you still can’t come up with a valid reason why any decent, perfectly clean, website should not be fully indexed, given that there is plenty of space in the index.”

    Oh there is always a reason, we just don’t know for a *fact* what it is. However, Google do know for a fact why. In fact, Matt has stated part of the likely reason (see his disclaimer) why. You just don’t agree, that’s all. However, I feel VERY safe in saying that Google are in a better position than yourself in determining what and how much they index.

    RE: “I referred only to people who posted in this thread”

    Exactly! These people all have their own barrows to push and are extremely bias. Most post with PROBLEMS not PRAISE. However, Google (I’m sure) is more wiley than simply giving the squeakiest wheel the most oil.

    RE: ““It’s not a search engine’s job to determine which sites provide the most valuable resource.”

    But the SE (in Google’s case) probably isn’t as a whole. Other sites are and Google searchers are by the search terms they use and the sites they visit.

  375. I would like Matt to look at rent.com since he review several real estate sites, why not include a big one owned by EBay?

    Look at the bottom of the page and look to where it says “Other eBay companies: eBay | Kijiji | Shopping.com | Epinions”

    How are some of these related to real estate? These are just as bad a ring tones.

  376. The Adam that doesn’t belong to Matt; … who the heck are you? It seems you read my/our stuff as you are “spot on” with my way of thinking. LOL Great post!

    You know that guy who can generally fit into a conversation with most, if not all groups of people, comes in like a bat out of hell, raises some issues, makes people think, and then disappears as if he never existed in the first place?

    That’s me. 🙂

    Okay, seriously…I’m a webmaster of about 7 years (holy Christ, that’s a long time), and to be totally honest there was a time when I would have said a lot of the things that others were saying…why isn’t my site ranking, what’s wrong with Google, etc.

    I then came to the realization that no matter what I think about my own work, it’s going to be biased. So it’s not up to me to decide…it’s up to others to decide. And if they decide against what I’m doing, then I should listen and try to improve where and when I can rather than sit there and throw a hissy fit about it.

    Unfortunately, as in many ideas and concepts that I have, it’s probably a few years too early. 🙂

    I’m what you’d also call a gun for hire…I work for a select few clients now (used to have a lot more, but it didn’t work for me as a business model), and come and go pretty much as I please.

    There are people here (Aaron Pratt for one) who might be able to tell you a bit more about me, since I’m not all that comfortable talking about myself.

    Anyway, that’s my story. What’s yours?

    Phil: I’ll get to your unique brand of inciteful ranting in the morning. I actually did type out a post in response, but it took me too long and the captcha tool kicked in and it got erased so to hell with it, I’m not doing it again until then.

    I do, however, have one question for you to chew on in the meantime. You say that IBLs are the wrong “metric” (side note: does anyone else hate this word and find it to be a corporate buzzword, or is that just me? Just wondering.) What would be a better way to do it, and why?

  377. Matt,

    How hard of a mod would it be to have the number of unique commenters along with the number of comments? I think this entry may indeed beat both. 🙂

    Also, you should maybe add a compulsory spell checking step to people posting. Just a suggestion. Able to post anyways, but at least force them to view the mistakes first…

    -Michael

    PS Got the invalid security code thing again, not everyone is going to think to select all and copy just before posting. 🙂

  378. Dave (Original),

    I’m guessing that your background is non-technical? It’s just that you seem to have enormous and unfounded faith in Google’s algorithms. Anyone who knows the first thing about the limitations of any algorithmic approach just wouldn’t say things like “Google know better than you do!”. You know you are comparing an algorithm’s fraction of a second, rule-based perception to that of a person who has spent years developing their site, right? You know that these algorithms don’t actually “understand” anything, right? The truth is, you are just assuming that all of those complaining are lying. A judgement you are clearly not in a position to make.

  379. What is a more effective measure to determine whether or not a site provides a valuable resource to users and deserves to be indexed fully, without potentially diluting the existing results?

    How can a page that ranks +1000 in the SERP’s possibly dilute anything?

    Noone can link to a page that they cannot find. They are certainly not going to find a page if Google refuses to index it. A site should never need more than a single link just to have it’s pages indexed. Actually, no links should be neccessary to have an entire site indexed. Telling Google directly with a site map submission should be enough.

    This is not about SPAM nor is it about rankings. It’s about being deemed worthy enough to have your pages indexed based upon nothing more than links.

    This leaves it up to the webmaster or siteowner, to manufacture enough links deemed worthy enough simply to get their content indexed. How is this a good thing?

    In the end, it will be the searcher who decides. A searcher looking to buy tomatoe plants in New Jersey or looking to find a Carpet Cleaner in Wyoming is likely to be just as diasappointed in Google as the merchants whose pages they refuse to index.

    Dave

  380. I wrote:

    It’s not a search engine’s job to determine which sites provide the most valuable resource.

    Doug Heil replied:

    So it’s the actual website’s job to tell Google that it is the most relevant resource, right?

    I see.

    You forgot your glasses, Doug. Value and relevancy or not the same things – not by any stretch of the imagination. But since you asked, it’s a search engine’s job to determine relevancy to a search query – it isn’t a search engine’s job to determine the value of a website.

    Dave (Original).

    You are correct that Google is in a much better position than me to determine what and how much they index. They are the *only* people who can make that determination. BUT, they are not in a better position that anyone else to decide what should and should not be indexed. Everyone can have opinions about that. The only difference is that Google are able to go with their opinions, but it doesn’t necessarily mean that their decisions are the best ones, or even the right ones.

    Look. I made an accurate assessment about the posts in this thread. It made no attempt to include opinion that hadn’t been expressed here. You know as well as I do that there a great many hard-line Google supporters who post in this blog, and they had not expressed support for Google about the new crawl/index criteria, up to the point when I made that assessment. Alright? Please stick to the topic, and forget that sideline. The assessment was correct at the time of writing it, which was way down the thread. And even now, there are only a couple of people who appear to be supporting Google’s new crawl/index criterai, and I don’t think I’ve seen any outright statements of support from them – yet.

    Phil: I’ll get to your unique brand of inciteful ranting in the morning. I actually did type out a post in response, but it took me too long and the captcha tool kicked in and it got erased so to hell with it, I’m not doing it again until then.

    (I always type my posts in a text editor, so I never have a problem with the captcha.)

    Inciteful ranting? Inciteful debating, perhaps, but I left the ranting near the top of the thread 😉

    The only reason that we’re going on and on is because you haven’t yet given me a valid reason why a perfectly good, clean, website should not have all of it’s pages indexed, regardless of how many good, clean, on-topic, IBLs it has pointing to it. I say there is no valid reason, you disgaree with me, but you haven’t stated a valid reason. Actually, I’m the only one who offered a reason – shortage of space, but Matt said they are ok on space.

    I’d like anybody to give me a reason, not just you, but you are the one who continues to debate with me.

    I do, however, have one question for you to chew on in the meantime. You say that IBLs are the wrong “metric” (side note: does anyone else hate this word and find it to be a corporate buzzword, or is that just me? Just wondering.) What would be a better way to do it, and why?

    I’m not overkeen on the word “metric” myself, but it’s what people use these days.

    To answer your question: I said that IBLs are the wrong things to consider for determining a site’s value. My answer is what I said before – it isn’t a search engine’s job to determine the value of a site. It’s an engine’s job to index sites (except spam stuff) and determine relevancy to a search query. It is users who determine value for themselves. So no way of measuring value is needed.

  381. Damn! I wish there was a way to edit these posts, or even even to preview them.

    Everything in the last post, after the first 2 paragraphs in response to Dave, is a response to The Adam That Doesn’t Belong To Matt.

  382. I want to try and clarify the value of a site and its pages, to help avoid us going off on the wrong thing.

    It’s easy to think that a site that get thousands of visitors a day is a much more valuable resource than a site that gets 4 or 5 visitors week. And in one sense it is – the popular site is more valuable to the world than the less popular one.

    But search engines don’t deal with the world – they deal with individuals – single people sitting in front of their computers. They present results to individuals, and not to the masses. For an individual, a site that gets few visitors is just as valuable as a site that get millions of visitors. As an individual, the pizza site that I mentioned is just as valuable as Amazon, for instance. In fact the pizza site is a much more valuable resource than Amazon, because I never use Amazon.

    The value of a site and its pages is down to each individual user, and search engines cannot measure that. So can we get away from the idea of a site’s value, because it’s fair to say that all sites have value to someone. Also, Google haven’t said that they attempt to determine a site’s value, and it’s just red herring in this thread.

  383. “This is not about SPAM nor is it about rankings. It’s about being deemed worthy enough to have your pages indexed based upon nothing more than links.

    This leaves it up to the webmaster or siteowner, to manufacture enough links deemed worthy enough simply to get their content indexed. How is this a good thing?”

    Based on nothing more than links? Wow; I am very surprised by many comments in this thread. This is not ‘based’ on anything at all but …. common sense.

    I will tell you this; … if you “manufacture” incoming links, you do run a big risk, as it should be.

    PhilC wrote:
    “But since you asked, it’s a search engine’s job to determine relevancy to a search query – it isn’t a search engine’s job to determine the value of a website.”

    Agreed. The search engine searchers determine “value” by their individual preference of leaving that site, or staying on it and maybe even buying something. You are stating the obvious.

    PhilC wrote:
    “BUT, they are not in a better position that anyone else to decide what should and should not be indexed. Everyone can have opinions about that. The only difference is that Google are able to go with their opinions, but it doesn’t necessarily mean that their decisions are the best ones, or even the right ones.”
    Who would be in a better position to determine what sites/pages should be on “your” website Phil?

    Well sure, it certainly is Google’s opinion about which pages or sites show up in SERPS or even in their index. Afterall; it is “their” index and they can do whatever they wish with “their” index. This is not a right or wrong thing at all. It’s simply stating the obvious. I choose to ban, edit, delete members in my own forums. So do you. Would you rather have an outside party determine who or what or when or how your own website should be run? I don’t think so.

    The Google users…. real people who are trying to find info or buy a product or service are the people who determine which search engine is most popular. If and when Google loses market share on “search” is only because all the people of the internet found a better place to search.

    None of this has anything to do with all our individual sites or our client’s websites. We all want to get indexed by Google with good positions on our phrases we are targeting. But guess what? We want this to happen in a “free” environment.

    This thread is nothing but webmasters/owners/seo’s, etc who think that their sites deserve to be listed and ranked. That’s human nature to think and act that way, but is it “real life”? And should it be the highest of priorities that Google takes the health care directory and ranks it according to how “you” want it to?

    People in here are focusing on a few comments Matt made about a few individual websites. Do you really believe that Matt would do the required research “in detail” to determine “exactly” why that health care site is not doing well? That would be suicide for Google to do that.

    If you ran a real large search engine that handed out free referrals to others, would you want to do things manually or automatically? If auto, why would you give every search engine spammer on the planet access to “your” exact algos and reasons for doing something at any given time?

    I keep going back to this:…. Common Sense Stuff.

    I know darn well if my firm builds a website for it’s visitors, it automatically does well for all the se’s. That’s saying your people actually know how to build websites in a good way…… that good way just happens to be the good way of se’s as well. The key is in knowing what is that “good” way.

    Again; … common sense stuff.

  384. Damn! I wish there was a way to edit these posts, or even even to preview them.

    Agreed with the preview, with a recap on my earlier forced spell checking on that preview, and thrown in that the security code should only be on the initial data entry if you do add that. 🙂

    -Michael

  385. Doug Heil.

    I will tell you this; … if you “manufacture” incoming links, you do run a big risk, as it should be.

    Doug, I think you should read Matt’s original post again. This discussion isn’t about things like that. You keep trying to take it off on general stuff, but we’re specifically discussing the new BD crawl/index criteria.

    Agreed. The search engine searchers determine “value” by their individual preference of leaving that site, or staying on it and maybe even buying something. You are stating the obvious.

    Yes I know. It was a reponse to your error.

    “BUT, they are not in a better position that anyone else to decide what should and should not be indexed. Everyone can have opinions about that. The only difference is that Google are able to go with their opinions, but it doesn’t necessarily mean that their decisions are the best ones, or even the right ones.”

    Who would be in a better position to determine what sites/pages should be on “your” website Phil?

    I would, Doug, but my sites are not search engines. The function of a general purpose search engine is to show users all the resources that it can for a given query. If a search engine intentionally doesn’t show useful resources, then it is being editorial, and is not a proper search engine. Google’s users don’t expect Google to be editorial, except when it comes to spam.

    Afterall; it is their index and they can do whatever they wish with their index

    Yes they can, but not if they want to continue as a top class general purpose search engine. Their users don’t expect to be intentionally deprived of some resources, just because Google feels like it. They expect Google to do the best they for them, and being editorial is not what Google’s users expect.

    None of this has anything to do with all our individual sites or our client’s websites. We all want to get indexed by Google with good positions on our phrases we are targeting.

    This thread is nothing but webmasters/owners/seo’s, etc who think that their sites deserve to be listed and ranked

    Doug, leave rankings out of it. You’re the only one who keeps bringing them in, but this discussion has nothing to do with rankings, and it’s best not to get sidetracked.

    Yes, Matt used only a few examples, but what he said about those examples is extremely significant – he didn’t need to repeat it for 20 or 30 examples – just a few were sufficient. What he said is that, with the new BD crawl/index function, a perfectly good site cannot have all of its pages indexed until it has enough decent IBLs. He said other things as well, but that’s the one that caused the most outrage.

    You use the phrase “common sense” a lot in your posts, Doug, but you don’t even try to discuss the issue. You talk only in general terms, which isn’t helpful at all.

    Now if you, or anyone else, can come up with a valid reason (other than a shortage of space) why a perfectly good, clean website, should *not* have all of its pages indexed, just because it’s there, then please do. Please try to address that question. I’ve asked it several times in this thread, and nobody has yet given a valid reason, other than things like, Google can do what they want to do, which is no answer at all.

    As long as that situation exists, Google is being grossly unfair to a great many good clean sites, and also to their own users, who don’t expect them to intentionally deprive them of good clean resources in the results.

  386. PhilC wrote:
    “You use the phrase “common sense” a lot in your posts, Doug, but you don’t even try to discuss the issue. You talk only in general terms, which isn’t helpful at all.”

    If you honestly think that Google also doesn’t talk in “general” terms, then I can’t help you. Do you really believe Google is going to research every problem each webmaster has with their website? And do you really believe it would be in the best interest of “any” major search engine to talk and discuss things “other than” in a general way?

    Why would Google tell you or I exactly how things work?

    Come on Phil, you are a smart man.

    I talk in general terms with most everything. Unless someone has hired me to do a specific thing with their specific website, how can one talk in any other terms when we all know darn well that “each” website has it’s very own special problems that cannot be solved by general answers?

    I will guarantee you that Google’s crawl patterns, etc has “much” more to do with many other things than simply the number of quality IBL’s the site has.

    And yet again;… Common sense is what I used to make that statement.

    It’s you all that want to focus on IBL’s and whatnot, not me. 🙂

  387. If you honestly think that Google also doesn’t talk in general terms, then I can’t help you

    I know you can’t help, Doug, but that’s another matter.

    Doug. Nobody asked you for help, and I’m not aware of anybody wanting your help. You were, however, asked to answer a specific question several times, and each time you declined to answer it – because you are unable to answer it honestly, without appearing to disagree with Google, which is something that you can’t bring yourself to do.

    On the other hand, I’ve tried to help you because you obviously needed some help. I told you how to quote here, but you couldn’t figure it out. I’ll tell you again – use the HTML blockquote tag. It’s very easy.

    It’s you all that want to focus on IBL’s and whatnot, not me.

    We know that. The rest of us are discussing what Matt said about their new BD crawl/index – y’know – the topic of this thread, but you don’t enter into the discussion. You keep trying to take it off into generalisations, but good discussions should remain focussed. It’s common sense.

    The discussion in this thread is about something very specific. If you don’t want to discuss it, why bother posting at all?

    I will guarantee you that Google’s crawl patterns, etc has much more to do with many other things than simply the number of quality IBL’s the site has.

    Nobody has suggested anything different. For instance, PageRank has always determined the frequency of crawl, and still plays a big part. The TYPES of IBLs and OBLs also play a part. You see, you don’t have to guarantee anything, Doug. It’s all right here in this thread – in Matt’s original post, and in some of his later posts – if you care to read it. Your guarantees aren’t needed.

    The Adam That Doesn’t Belong To Matt
    If you think that I’m talking down to Doug, it’s because I am. We have a small history, and this is nothing compared to what’s gone on before 😉

  388. Hi Matt,
    Was wondering if it would be possible to have a section for SEO’s on google to dig into our sites according to the google index? I already use sitemaps BTW.

    Just would like some additional tools.

    Can we have a tool to find out where our sites lay in the index for a given keyword/keyphrase? And perhaps a way to plot how we are doing overtime? IE my site ranks # 110 for the term “blue widget underwear” So I add a link or two from a respectable relevant link partner or write a few articles for reprint and have those links indexed, wait a while for the reindexing to occur and see if that improves things…. OR rewrite/add/shuffle around content to my home page to see if that makes a difference….. wash rinse repeat.

  389. Why are you making this personal Phil?

    I’ve been “extremely” nice to you.

    Again; you are focusing on IBLS like they are the main thing that leads to more crawling. You could not be further from the truth. I know exactly what Matt said and didn’t say. It’s what he did not say that I’m trying to tell you that you better look into.

    I’ve not gotten off the topic of this thread at all. You have though by sticking in snide comments that have no place. You don’t want to get into a personal debate with me, believe me. Let’s keep this NON-personal please. I’ve even agreed with you against that JW character. Don’t bite the people who actually stick up for you from time to time.

  390. In Matts original post all I see mentioned is IBLs and affiliate links. So how can you say the issue isn’t about IBLS??!!

    And don’t get me started about affiliate links. A link is a link. It shouldn’t matter whether it’s an affiliate link or not. All coupon sites are nothing but affiliate links. Google would be suicidal to try and eliminate these kinds of sites.

  391. PhilC Said,
    I want to try and clarify the value of a site and its pages, to help avoid us going off on the wrong thing.

    It’s easy to think that a site that get thousands of visitors a day is a much more valuable resource than a site that gets 4 or 5 visitors week. And in one sense it is – the popular site is more valuable to the world than the less popular one.

    But search engines don’t deal with the world – they deal with individuals – single people sitting in front of their computers. They present results to individuals, and not to the masses. For an individual, a site that gets few visitors is just as valuable as a site that get millions of visitors. As an individual, the pizza site that I mentioned is just as valuable as Amazon, for instance. In fact the pizza site is a much more valuable resource than Amazon, because I never use Amazon.

    The value of a site and its pages is down to each individual user, and search engines cannot measure that. So can we get away from the idea of a site’s value, because it’s fair to say that all sites have value to someone. Also, Google haven’t said that they attempt to determine a site’s value, and it’s just red herring in this thread.

    Completley agree with Philc.

    Relevant and Quality information is most often only found on the small sites, that concentrate on writing good content as a resource for all to read, rather than simply an attempt tp generate income from someone else.

    Its a real pity that Amazon and simlar sites simply frustrate and waste the time of the average person who is searching for something and anything.

  392. The problem for a small specific topic site is that its near impossible to get relevent inbound links. The only option is to submit to directories, but we all hate them as they will inevitably rank higher for the link to the small specific topic site.

  393. Doug.

    Sorry, but every time you avoid answering that very simple question, and every time you overuse the phrase “common sense”, and when you say things like “I can’t help you”, as though anyone asked you to help them, I assume that you are just trying to interfere rather than trying to debate sensibly. If I’m wrong, I apologise.

    Again; you are focusing on IBLS like they are the main thing that leads to more crawling.

    I am focussing on IBLs because they are now evaluated as part of the crawl/index function, and I’m focussing on the total unfairness of that, because (a) they can give no indication as to whether or not a site should be fully indexed, and (b) pages are being dropped from the index wholesale – partly because of them. It seems very reasonable to focus on that part of the new crawl/index function.

    When Matt says that a perfectly good site needs more IBLs so that Google will index more of its pages, then I consider the new evaluation of IBLs to be well worth focussing on, because I consider it to be very wrong in two important ways.

    I am not trying to discuss any wider than that. I haven’t even started on the “types” of OBLs, like Jack Mitchel just mentioned. The way that they are evaluated for the crawl/index function is also very bad – they have nothing to do with whether or not a site should be fully indexed (spam excepted all round).

    You sided with me against Jill??? In what way? 🙂

  394. I missed this bit:

    You don’t want to get into a personal debate with me, believe me

    Really? The only debating that I’ve ever seen you do was in your forum, where quickly resorting to flames (your side) was the order of the day, as expected. I can’t debate against flames, because it’s just stupid, so you would win every time. But we don’t do that in my forum, so if you’d like to debate there, you are more than welcome to come along and voice your opinions on any topic 🙂

  395. Jack Mitchell Said,

    And don’t get me started about affiliate links. A link is a link. It shouldn’t matter whether it’s an affiliate link or not. All coupon sites are nothing but affiliate links. Google would be suicidal to try and eliminate these kinds of sites.

    Google, please please please wipe out the affilliate links, they are just parasitic sites that simply waste my time !

    Links should be categorized, those that are relevant to the topic subject should have priority whilst those who are just trying to freeload should be penalized.

  396. Philc and Doug, you both obviously know what your talking about, but please stop throwing stones at each other and discuss the topic in question specifically IBLs.

  397. Another Question, what is the penalty in terms of time for a new site?

    PS I actually agree there should be a penalty but think it should be proportional to the number of pages on the site.

  398. No thanks.

    Phil Wrote:
    “When Matt says that a perfectly good site needs more IBLs so that Google will index more of its pages, then I consider the new evaluation of IBLs to be well worth focussing on, because I consider it to be very wrong in two important ways.”

    You are assuming Matt did the long and hard process of reviewing and researching that site with a fine tooth comb, right? You really can’t assume that, and you cannot assume Google is going to specifically tell you all about a website. To me, that doesn’t make any sense.

    You say it’s a perfectly “good site”. That’s great and I hope it is, but that doesn’t mean that the “only” thing keeping the site down is a lack of quality incomings.

    Phil wrote:
    “I am focussing on IBLs because they are now evaluated as part of the crawl/index function, and I’m focussing on the total unfairness of that, because (a) they can give no indication as to whether or not a site should be fully indexed, and (b) pages are being dropped from the index wholesale – partly because of them. It seems very reasonable to focus on that part of the new crawl/index function.”

    I’ve thought for along time that quality incoming links that were “natural” in nature were evaluated as “one” part of the index/crawl function. This is no surprise to me that Google has stated it as such. It’s just like the many, many other things/parts of the index/crawl function. There are boatloads of parts for this in the algo. The weights given to each part changes all the time. I’m trying to get you to look at the bigger picture of things and not strictly focus on a statement Matt made. He makes lots of them all the time, but he certainly cannot make statements of fact in regards to Google, Inc. He makes “general” type statements so as to help the most sites. But those “most” sites should not believe that “one size fits all” in regards to their individual problems, and that includes crawling patterns.

    That’s all I am saying.

    Phil; I dislike those who portray being whitehat, but are “not” whitehat, more than I dislike those who are blackhat, but state as such. When I say “dislike”, I mean that “business” wise. I actually personally like many blackhats, just dislike their business ways tremendously. I have much more respect for a blackhat who knows it and states it, than for a whitehat who really is NO whitehat at all. In other words, fence-sitters get my goat in a big way. You have more respect from me if you firmly take a stand on things.

  399. Hey Matt,

    Is this an endorsement of directories? “Submit your site to relevant directories such as the Open Directory Project and Yahoo!, as well as to other industry-specific expert sites. ” as seen on http://www.google.com/support/webmasters/bin/answer.py?answer=35769

    I would imagine ‘relevant’ and ‘industry-specific’ are the key words here. Seems like I have heard a lot of discussion on getting links without paying or trading for them, so this appears to be an official stance on what types of links may help a new site get crawled/indexed.

    Thanks,

    John

  400. Since we’re not being personal, at your request ….. oh! I see that we are still being personal:

    You have more respect from me if you firmly take a stand on things.

    Doug, I’ve seen you in action more than once, and believe me, I have no desire for your respect. I don’t respect your views, and I don’t respect your actions, and I don’t respect you, so why would I want any of your respect? Don’t flatter yourself, Doug.

    I don’t know what brought that on, but I wear my views on my sleeve and in my site for all to see. I have certain views about spam. I’ve never changed them and, when it’s useful to a topic, I state them. Perhaps you think that I’m pretending to hold pure whitehat views because I say something along the lines of, it’s a search engine’s job to rid itself of spam, but I’m not. It really *is* a search engine’s job to do that.

    If you’re confused, I’ll state it clearly. Use whitehat seo as much as you can, and only turn to blackhat if and when whitehat won’t work, but if you use blackhat, you must accept the risks that are associated with it. Never ever use blackhat on behalf of a client without the client’s full knowledge of the risks involved, and his/her agreement to take those risks. Happy now?

    It’s good not being personal, init? 🙂 Now where were we?

    You are assuming Matt did the long and hard process of reviewing and researching that site with a fine tooth comb, right?

    With the tools that Matt has, it’s not a long and hard process – he does it live at conferences. I’m assuming that he took a pretty good look at the sites, and I’m assuming that he knows what he’s talking about. When he said that the health care site needs more IBLs for Google to know to index more of it’s pages, and when he said that he is not surprised that a site with only six IBLs wouldn’t get all of its pages indexed, I assume that he knows what he is talking about. And, if it’s all the same to you, Doug, I’d rather assume that Matt knows what he’s talking about much more than you do, and I’d rather take notice of him, and not you.

  401. Okay, now that I have a few minutes before I have to put steaks on a grill (mmmmm…steaks 😛 ~~~~~~~~), I’ve decided to try and encapsulate my thoughts.

    Phil, I gave you a perfectly valid reason, and a series of possible scenarios that outline whether or not a site should be included solely on the basis of its existence.

    There are many sites out there that do not want to be indexed for a variety of reasons:

    1) Intranets.
    2) Confidential information.
    3) Integrity of information.
    4) Under construction/incomplete/testing areas.

    It would take very little effort for a competitor to submit a site that falls under any of the four categories. Under your scenario, these sites could still be indexed with few or no IBLs, and that could prove to be very damaging to Google, to the end user who may stumble upon these sites, and especially to the site owners themselves.

    If a site is indexed, the possibility exists that it could rank for something. Even if it’s an 8-9 word phrase, it could rank for something.

    Yes, the owners of these sites could put in a robots.txt file (or use the meta tag for that matter), but let’s face it, most webmasters do not necessarily possess the knowledge of the robots protocol. A good many do, but most don’t. Even so, a lot of those that do don’t care.

    Before anyone goes all moral and claim it’s a webmaster’s job to care, you’re probably right. But that doesn’t help Google in this case to determine whether a site wants to be there or not.

    What would happen if all of those sites and pages in the 4 examples were indexed? Is the content “of value”? No, because the average end user is not going to gain any benefit from visiting those sites in the present incarnation.

    A greater number of IBLs and a greater collective quality measure of those IBLs indicates that a website is “live” and ready to be seen by its targeted end users.

    The problem with the health care site is that there were a series of IBLs generated (a small one at that), a partial delisting, and then nothing after that. How would Google know whether or not this site were “live” again? It could have had those parts taken out, and then added back in with new features…the parts could have been removed for spamming reasons…there are other possibilities.

    And how does Google even know that the reinclusion request even came from people involved with that site? That’s a big assumption in and of itself. There would be nothing stopping a competitor from submitting that site, getting it indexed, having it found under some obscure search term, and thus pissing off an end user. Matt doesn’t even know that…that request could have come from a competitor just to research a potential threat to his/her business (much stupider stuff has happened than this?)

    The point is that IBLs provide, at the present time, the best measure of how interested a webmaster is in promoting his/her site and the benefit to the end user. The greater a number and greater a collective IBL quality, the more likely it is that it is a site that provides some user benefit and that has an ownership that is even interesed in having the site in there in the first place as opposed to the other possible scenarios which I outlined above.

  402. If you think that I’m talking down to Doug, it’s because I am. We have a small history, and this is nothing compared to what’s gone on before

    I’ll tell you what…don’t tell me about it, and I’ll never ask because I really don’t want to know.

    Deal? Deal.

  403. The Adam That Doesn’t Belong To Matt

    I wasn’t going to explain about Doug and me to you. I just didn’t want you to think that it’s normal for me to talk down to people.

    The example possibilities of sites that don’t want to be indexed are fine, Adam, but oddities like that are not what we are talking about.

    About the health care site…

    And how does Google even know that the reinclusion request even came from people involved with that site? That’s a big assumption in and of itself.

    There wasn’t a reinclusion request was there? What Matt said is that the exclusion period had expired some weeks earlier. It’s automatic. When a page is requested to be taken out, it is taken out for 6 months, and then it comes back again. It doesn’t need a reinclusion request. I don’t know if it’s put back in immediately, or if the URL is placed in the list to crawl in its turn, but it comes back automatically. Matt didn’t suggest that the pages would actually come back with the new crawl/index function. He suggested the opposite.

    The problem with the health care site is that there were a series of IBLs generated (a small one at that), a partial delisting, and then nothing after that. How would Google know whether or not this site were “live”again? It could have had those parts taken out, and then added back in with new features…the parts could have been removed for spamming reasons…there are other possibilities.

    Yes, of course, but that doesn’t make any difference. About the site, Matt said, “With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages.” and “your site also has very few links pointing to you. A few more relevant links would help us know to crawl more pages from your site” It’s clear. In Matts view, six IBLs is not likely to be enough for all of the site’s pages to be indexed. Those other possibilites don’t come into it.

    The point is that IBLs provide, at the present time, the best measure of how interested a webmaster is in promoting his/her site and the benefit to the end user.

    Yes it could (not does) provide some sort of indication as to how much a webmaster is interested in promoting the site, but that should never even be a consideration. Surely you are not suggesting that sites that are promoted should be treated better than those that are not? I don’t believe you mean that. If you do, we might as well stop right now.

    Assuming that you don’t, I want to ask the same question again, but with modifications…

    Assuming that the site owner wants the site to be fully indexed, and assuming that nothing odd has happened, and assuming that the site is perfectly clean all round, and offers some value to some people, but has only one IBL, do you think that there is any valid reason why the site should not be fully indexed (other than Google being short of space)? If you think there is valid reason, what is it?

    The site I am trying to describe is just a normal average site that offers something to some people, and has had no odd stuff happen to it, and hasn’t engaged in any promotion, so there’s no spam around it – just yer average site.

    The answer I am looking for, Adam, is “No, I can’t think of a valid reason for such a site not to be fully indexed” or “Yes, I can think of a reason, and here it is….”.

    Reasons such as, maybe the site doesn’t want to indexed, and maybe some spam has gone on in the past, etc. are avoiding the question. I’m sure it’s obvious that I’m only asking about yer normal, regular, unspoiled, unspammed, etc. website – just yer average website.

  404. An average website is like leprechauns, fairies, and France. It doesn’t exist.

    And I’m not avoiding the question at all. I’m pointing out that possibilities exist whereby a site isn’t meant to be indexed fully (or at all). Those possibilities, as ridiculous as some of them may be, do exist and have to be considered.

    It’s similar in a sense to the warning on a curling iron: “do not insert into an orifice.” The policy behind that warning isn’t in place for the vast majority of people, who would be smart enough never to attempt such a stupid thing…it’s in place for the few stupid people who are.

    Assuming that the site owner wants the site to be fully indexed, and assuming that nothing odd has happened, and assuming that the site is perfectly clean all round, and offers some value to some people, but has only one IBL, do you think that there is any valid reason why the site should not be fully indexed (other than Google being short of space)? If you think there is valid reason, what is it?

    Assuming all of that is true, nothing. But how does one prove the validity of the assumption? Accepting assumption is very dangerous at the best of times.

    The reason I brought up other possibilities isn’t to avoid the question…it’s to point out that the scenario you describe isn’t as black and white as you are making it out to be. There are too many other possibilities that exist, and far too much room for blackhat manipulation and/or other error, for your scenario to play itself out, and one cannot make the assumption that you’re making based on that.

    Those same examples and oddities affect the health care site as well.

    Who knows who asked Matt to review that site? Do you know it was the owner of the site? Do I? Does Matt?
    Does the site in question have those delisted areas in place any more?
    Is the site in question actively promoting itself in non-SE ways?

    And perhaps the biggest question (one that one of us, including me, should have asked a long time ago):

    If the site had made a delisting request that had lapsed, and the time period is six months, why did it take so long for this to even become an issue for the site in the first place? Seems to me that a webmaster who would have taken a real active interest in his/her site would be aware of that situation and have reacted one hell of a lot more quickly than he/she apparently did (and even here, I’m assuming it’s the webmaster that pointed it out.)

    In other words, my questions, and the possibilities that I raise from them, comprise the reason why the site shouldn’t be indexed fully…yet. There are too many outside scenarios and too many variables that need to be dealt with first.

  405. Assuming all of that is true, nothing.

    Thank you. I finally got an answer, and one with which I agree. Although there may be oddities with some sites, generally speaking, there is no valid reason, other than a shortage of space, to not index the full content of a perfectly ordinary website. Imo, most sites fall in the category of ‘perfectly ordinary’, but I accept that you don’t think so, Adam.

    If the Google programming determines that the site has spam links about it, or spam content in it, then I’m fine with the site not being indexed. What I am not fine with is the quantity and/or quality of IBLs playing any part in whether or not a site is fully indexed. And I am not fine with the types of OBLs playing any part in it. It is not a search engine’s business whether or not a site contains affiliate OBLs, paid OBLs (advertisements), link exchanges, off-topic OBLs, or anything like that, with the exception of when it is blatantly obvious that they are specifically there for ranking purposes, such as in a known linking scheme. Affiliate OBLs are never for rankings, so why Matt would criticise a link to a mortgage site is beyond my comprehension – especially since it is in an real estate site! It just isn’t a search engine’s business.

    I don’t mind search engines devaluing the links that they don’t want to count, including the mortgage one on that site, but I have very strong objections to penalising sites because of them, and limiting the number of pages a site can have in the index is nothing other than a penalty for the site, and it short-changes the engine’s users.

    About the health care site:
    There may be other things involved with why it isn’t fully indexed yet. We don’t know. I only used it because, according to Matt, it seems like “a fine site”, which to me means that it’s a perfectly ordinary site without anything negative about it, and yet, in Matts judgement, it is unlikely to be fully indexed until it acquires some more IBLs. That’s what I find wrong.

  406. Just wanted to say thanks for a good post. I should read your blog more often.

    No need to approve this – ’twas just a quick note, no meat innit 🙂

  407. sheesh Phil, You didn’t figure out I was responding because you asked me what I meant by the JW comment? My goodness; now I’m not so sure about you as you thought my comment was for you? LOL

    phil wrote:
    “I don’t know what brought that on, but I wear my views on my sleeve and in my site for all to see. I have certain views about spam. I’ve never changed them and, when it’s useful to a topic, I state them. Perhaps you think that I’m pretending to hold pure whitehat views because I say something along the lines of, it’s a search engine’s job to rid itself of spam, but I’m not. It really *is* a search engine’s job to do that.”

    I know YOU WEAR THEM on your sleeve, …. my gawd, take a chill pill Phil. I was talking about HER… sheesh.

    The rest of your comments are just plain silly stuff. I really don’t give a hoot what you think of me Phil. You have proved your worth in this thread and by what you write.

  408. Dave (Original)

    RE: “I’m guessing that your background is non-technical?”

    Wrong guess.

    RE: “It’s just that you seem to have enormous and unfounded faith in Google’s algorithms.”

    Of course I do and so do most of the other people on the Planet. This is why G has been MILES ahead of the rest for many years.

    RE: “Anyone who knows the first thing about the limitations of any algorithmic approach just wouldn’t say things like “Google know better than you do!”.”

    Of course Google no better than “you” or me. You must living in a fools paradise to say otherwise. Do you have a non-algo approach?

    RE: “You know you are comparing an algorithm’s fraction of a second, rule-based perception to that of a person who has spent years developing their site, right? You know that these algorithms don’t actually “understand” anything, right?”

    LOL! You think Google should perhaps employ the whole of India to do manual checks! I would think most (I did think it was all until now) here know full well it must be an algo and not manual human intervention. You do know that algos ONLY do what humans tell them, don’t you?

    RE: “The truth is, you are just assuming that all of those complaining are lying. A judgement you are clearly not in a position to make.”

    You guess a lot don’t you. I have never said anyone was lying. However, I will say you are lying by saying “The truth is…”

    Now, you typed a lot but I cannot see a point in any of what you have written. Why not back-up PhilC and others and stop focusing on person(s) disagreeing?

  409. philc, I’m going to cast my vote: if I never read another word you write it will be a good day, I don’t want to see your stupid comments and bickering, grow up please and stop wasting matt’s blog space on your pointless bickering and childish whining.

    To be clear, I have no idea of who you are, and I don’t care, your words are all I have to see, and judge you buy, and if I were you, I’d give some serious thought to giving some thought to what you type before hitting the submit button, it’s not interesting, and it’s not worth reading, and it’s not worth the electrical energy it requires to transmit those bytes.

    Again, I have no idea of who you are, and I don’t care.

  410. CrankyDave – I agree 🙂

    If pages aren’t indexed, how can pages link, and how can the pages that should have IBL’s find themselves in the results if they rely on these pages?

    Tomato growers in Wisconsin will have to rely on Adwords.
    Users will also go elsewhere

    Matt – Are you saying theat indexing is working well – guaranteed – no disclaimer links – no effect on the IBL’s ? It doesn’t like this to a lot of us out here.

  411. Hello Matt,

    You mentioned that a fix had been identified for Google for hyphenated domains. Can you confirm if you have taken into consideration more than one hyphen being used. A legimate example of this might be shoe-store.com as opposed to shoe–store.com, or even shoes—store.com? I believe that this is a valid and important question because in the past it appears that Google has recognised shoe-store.com in the listings but has ignored shoe–store.com. In view of the fact that there are so many retailers of shoes online, it is quite reasonable for someone to want to use double or triple hyphens simply because that text does ideally suit their business?

    Has this been taken into consideration in the fix you described above?

    Thanks,

    Nick

  412. Matt, I thought that I would add this comment to the question I asked about in my last posting just a few minutes ago.

    In reviewing my posting after it appeared on your site, interestingly enough I see that all the hypnens I used in between the words “shoe” and “store” have all been reduced to a single hyphens. This is very relavant to my questions because will then Google and GoogleBot make exactly the same mistake? Will they see shoe(1 hyphen)store.com exactly the same as shoe(2 hyphens)store.com or even shoe(3 hyphens)store.com as exactlt the same site? Will they take into consideration that they are 3 distinct sites and index and speider them as 3 distinct sites – assuming of course that each of the 3 sites has its own unique content?

    After seeing what happend on my last post I am now even more keen to see your response! Thanks!

  413. Dave (Original)

    RE: “BUT, they are not in a better position that anyone else to decide what should and should not be indexed”

    I think they most definitely are.

    RE: “but it doesn’t necessarily mean that their decisions are the best ones, or even the right ones.

    No of course not. However, the chances of “them” being correct over a bunch of Webmasters/SEO are so much more.

    RE: “You know as well as I do that there a great many hard-line Google supporters who post in this blog, and they had not expressed support for Google about the new crawl/index criteria, up to the point when I made that assessment. Alright?”

    So silence makes you right in your mind? You denied that earlier. I also stated earlier that Matts Blog (and most SEO forums) are ONLY full of problems, rarely praise. Why should this one be any different?

    RE: “Please stick to the topic, and forget that sideline. The assessment was correct at the time of writing it, which was way down the thread. And even now, there are only a couple of people who appear to be supporting Google’s new crawl/index criterai, and I don’t think I’ve seen any outright statements of support from them – yet”

    I thought I was, at worse I was responding to your off topis writings. Phil, you SURELY must understand that these forums, blogs and whatever are extremely bias and negative on top of all else. I fully support a LOT of things I never comment on. Have you ever heard the term “silent majority” or “vocal minority”??

    RE: “I say there is no valid reason, you disgaree with me, but you haven’t stated a valid reason”

    The “reason” (or at least likely reason) has been posted by Matt. Why ignore it? More links he said and if it were MY site it would be more links I would get. That would be a better use of my time than complaining about what I cannot change. I’ll run my Website and let Google run their SE.

    RE: “They present results to individuals, and not to the masses”

    I disagree there. They present the same results to masses who search via the same term. Personalized search is different, but that’s not the issue, is it?

    Phil. Do you think BD was used to index more or less pages? I say more and that is a good thing IMO. If it’s less, then there are “reasons” as there are for everything. We just don’t know what they are. But Google do!

    RE: ” I’ll state it clearly. Use whitehat seo as much as you can, and only turn to blackhat if and when whitehat won’t work”

    Then you ARE a blackhat.

    RE: “when he said that he is not surprised that a site with only six IBLs wouldn’t get all of its pages indexed, I assume that he knows what he is talking about”

    Then why not also assume Google know better than you on what, how, why, when etc they index? Or is Matt the only one at Google that “knows what he is talking about”?

    Phil, I might be wrong here, or confusing you with another.., but haven’t you argued in the past that Google SHOULD NOT have carte blanche to index all pages out there and make money from them?

    RE: “do you think that there is any valid reason why the site should not be fully indexed (other than Google being short of space)?

    I don’t recall Matt stating the health care directory site would NOT be fully indexed. He said it would “help”. Keep in mind that Matt also said “self-removal just lapsed a few weeks ago”. Perhaps he means it would happen sooner with more links?

    BTW. Do a quick count on the number of times you have used the word “assume”. You know what they say by “ass-u-me” don’t you 🙂

  414. BTW. Do a quick count on the number of times you have used the word “assume”. You know what they say by “ass-u-me” don’t you

    That really depends, Dave…when I say that to my girlfriend, it takes on different meaning. 😉

    HOP HOP SMACKY SMACKY HOP HOP SMACKY SMACKY! 🙂

    (Someone’s gotta put some comic relief into this before everyone wants to hang themselves.)

  415. I don’t normally pick apart one specific section of a post, since it generally removes that section from the greater context of the post, but I think this particular section has a context in and of itself…so here goes:

    And I am not fine with the types of OBLs playing any part in it. It is not a search engine’s business whether or not a site contains affiliate OBLs, paid OBLs (advertisements), link exchanges, off-topic OBLs, or anything like that, with the exception of when it is blatantly obvious that they are specifically there for ranking purposes, such as in a known linking scheme. Affiliate OBLs are never for rankings, so why Matt would criticise a link to a mortgage site is beyond my comprehension – especially since it is in an real estate site! It just isn’t a search engine’s business.

    With the possible exception of off-topic OBLs, the answer is pretty obvious and simple. I suspect you know what it is, so I’m not actually directing my answer to you as such…I’m directing it to anyone who might not have considered the other side of this.

    (For those wondering why I’m debating things like this with Phil, that’s your reason…it’s not really for Phil, who I know is a smart guy that way. But it’s for the other people who may follow a message without considering some of the other angles behind it.)

    The problem with the links mentioned above, with the possible exception of off-topic OBLs (depending on circumstance), is that the links aren’t there for purely organic reasons. Whether the interests are fiduciary (affiliate links/ads), SERP/traffic increase (link exchange) or whatever the reason is, these links are biased links and aren’t purely organic. It’s not really a “vote for” a site in its purest form.

    So it is a search engine’s business, since this does have impact.

    I’d come up with more, but it’s 3:30 in the morning and sooner or later I should go to bed.

  416. Adam said:

    The problem with the links mentioned above, with the possible exception of off-topic OBLs (depending on circumstance), is that the links aren’t there for purely organic reasons. Whether the interests are fiduciary (affiliate links/ads), SERP/traffic increase (link exchange) or whatever the reason is, these links are biased links and aren’t purely organic. It’s not really a “vote for” a site in its purest form.

    I wouldn’t agree with that. If I am showing affiliate links then I am endorsing that link. I won’t put up a link for say a merchant that runs a shady business. When I do link exchanges I I arrange them in related categories. Since my main site is a mall site, everything is going to be useful to somebody, depending on the category they are interested in. That is why I set my links up by category instead of just one generic link page like many sites seem to have.

  417. firstly, thanks for being there.

    secondly:

    i have submitted above url to sitemaps on google.

    it finds two pages as inappropriate named – lloydsbank……very long name…htm
    and weight,loss.htm a typing error originally as i am an intermittant moron – increasing with age.

    how do i change these to meaningful names without people getting 404s off the old ones if i leave them extant, and presumably if i do they will still mess up google/google sitemaps?

    ive not phrased this very well but im sure you understand.as in i have two that are wrong…i can create two that are right but what about original two – if i remove them using removal tool people will lose their weight loss link – somewhat important in this day and age. i should stress the whole site is frree and lost a lot of impetus due to russian porn guys putting links in my guestbook as was until i spotted it = 15 months in the google doldrums so far from first pages originally.
    so there are a lot of self help pages as i was an alcoholic/chain smoker/painkiller addict now all fortunately history for me.
    i need to get the site back in favour to help people, and i think these two may be one of the last stumbling blocks – so i would value your help.

    malc pugh – rubery – england
    http://www.stiffsteiffs.pwp.blueyonder.co.uk

  418. Doug

    No I didn’t figure that out at all. I’ve just re-read what you wrote, and it does read as though it referred to me – there were no clues that I could see. You say it was about JW, and that amazes the hell out of me, as you can imagine. Oh well. I’m curious, but I won’t ask – it’s not my business – and you wouldn’t tell me, anyway 😉

    h2

    Then I suggest that you skip my posts. It’s easy enough to do. Just look at the name and, if it’s mine, skip to the next post. Then I won’t waste any of your time. Easy huh 😉

    Dave (original)

    Please get off the silence thing, and read what I wrote. I only referred to the opinions that were written in this thread at the time, and the assessment was correct. It’s all up there in glorious black and white, plus a smattering of colour. Keeping on about it is so unnecessary, and so inaccurate. Here’s a reminder of what I actually wrote about this thread at that time:

    I don’t think that anybody has agreed with Google about this issue. Probably most of this blog’s regular contributors back Google to the hilt, but I don’t recall anyone doing it about this issue.

    You see? It was all about what had been written in the thread at that time. Up to that point in the thread, the following comments had been made by different people:-

    Great post from PhilC
    PhilC said it perfectly.
    I think PhilC made a really good point above too
    I am with PhilC
    Like PhilC, I believe you’ve simply got it wrong
    Once again PhilC has put my concerns in a more coherent way than I could.
    PhilC has a very good point.
    I too agree with PhilC!
    OMG, did I just agree with Jill and PhilC on the same issue in the same sentence?

    There was a lot of agreement, and many more people has expressed dissatisfaction with Google’s new crawl/index function, whilst hardly a word had been written in agreement with it. It’s true that 3 people (you, Doug and Adam) have debated in favour of it since then, but that doesn’t make the assessment at that time wrong. Alright?

    As for the rest of your post, I’ll agree to disagree. I don’t have the inclination to go through it sentence by sentence, as you have done. I’ll just reply to one of your questions, though…

    Then you ARE a blackhat.

    My views and practises are what I described. I do whitehat as far as it is possible, which is almost all of the time, but if it can’t work, then I am happy to do blackhat. I never do blackhat for a client without the client’s full knowledge, understanding and agreement – never. You should read more – then you wouldn’t need to ask.

  419. Dave (Original)

    No offense, but you need to work on your basic comprehension skills. It’s just not possible to debate anything with you because you don’t seem capable of understanding the basics of what is being argued about. For that, you need to be able to read someone’s comment, comprehend what it is saying, and respond accordingly.

    The tangents you go off on defy any logic. Lord knows what you’ll make of this one…

  420. Adam

    Your points about OBLs are valid, but it’s not black and white.

    Links on the Web worked fine before Google came along. Websites linked to other websites because it was good for their visitors. People bought ad space (text and banners) on websites for the traffic, etc. etc. Links are what the Web is about – links *are* the Web.

    Then Google came along and largely based their rankings on link text (alt text for images), and as Google became more popular, people started to manipulate the links for ranking purposes. It couldn’t be any other way. The effect was that Google largely destroyed the natural linking of the Web. Because of Google, people are now wary of linking to other sites in case they are bad neighborhoods, or they may become a bad neighborhoods in the future. People exchange links for rankings, and not as much for their users or traffic. People don’t want to link to a site unless the site links back, AND from a page of equal value (PageRank). The natural linking of the has largely been destroyed by Google and the other engines that copied Google’s links-based rankings. In that respect, Google has been very bad for the Web.

    It’s true that many many links are there just for ranking purposes, and it is a links-based search engine’s task to identify and nullify them within its system. I have no obections to that, even though Google brought it upon themselves.

    What I won’t accept is Google telling webmasters that there is “right” way and a wrong way to put paid link (ads) on their pages, and giving people the impression that doing it the wrong way (the natural way) could attract some sort of penalty, as happened last year. It is not Google’s business to tell webmasters things like that. It is sheer arrogance to assume that paid links are there solely to boost the rankings. The same applies to other types of links.

    But Google does have a problem. They caused the link manipulations, and it has affected their results, so they’d like to identify and nullify the effect of ranking-type links. I don’t object to that. What I do object to is penalising sites on the blanket assumption that certain types of links are there just for ranking purposes. I don’t mind it if Google simply discounts certain types of links for rankings and PageRank, but I do mind if a site is penalised because of natural links.

    Intentionally leaving some or all of a site’s pages out of the index because of assumed link manipulation is morally wrong, imo. Matt didn’t say that sites are penalised for it, but he did imply that sites won’t have all their pages indexed unless they score well enough in OBLs and IBLs, among other things. He also said that such links are not hindering, but they aren’t helping. Imo, intentionally leaving pages out of the index, for whatever reason, is a penalty.

    That’s why I say that it isn’t a search engine’s business what types of links a site has in its pages. I don’t mean that an engine should count all links for everything to do with rankings, but I totally disagree with actively penalising sites because of them – unless it is blatantly obvious that they are spam. An off-topic link cannot be assumed to be there for rankings. An affiliate link is never there for rankings. Link exchanges cannot be assumed to be for rankings, whether on or off topic. By all means discount them if you don’t trust them, but don’t tell people what types of links they should and should not have in their pages, and don’t actively penalise sites unless it is certain that the links are there to boost rankings.

    As far as we can tell, what the new crawl/index function does is actively penalise sites, partly on the basis on links.

  421. Hi Matt,

    our site lost the Google-Directory PR Bar in July 2005. On the Google Toolbar we have PR7, the GDPR seems to be PR0. The PR of all of our Subsites are also PR0, only our Homepage has a TBPR 7.
    Could this circumstance correlate with inbound Links?

    Thank in advance,

    Greetings form Germany,
    Markus

  422. Eternal Optimist

    Matt,

    What a great shame a few people are hijacking your blog. The content of your post has been driven into the background by personal and sometimes egoistical comments from a few self-opinionated people, who should know better than to use your extremely informative blog for their own benefit.

    Over 400 posts so far, and I am sure that many of us pass by most of them, so as to keep within the framework of the topics you chose to discuss. I think you are owed huge apologies from certain posters on this thread. 🙂

  423. Over 400 posts, eh? It just shows how interesting Matt’s original post was, doesn’t it?.

    I think if you read the whole 400, you’ll find that there are precious few posts that are off-topic. But if discussion and debate about the original posts aren’t allowed in these comments, some people really do owe Matt an apology. Let me see now – the post above this one is off-topic, isn’t it? Would you care to be the first? 😉

  424. I guess I should have clarified something before, and I can understand the confusion. My bad on this one…but I’m not backtracking on my stance. Just clarifying it a bit.

    What I actually intended to convey was that the OBLs mentioned in examples above were in general terms. Jack Mitchell, you probably wouldn’t link to something shady (I don’t know you, but I’ll at least give you the benefit of the doubt). But unfortunately, whether you wouldn’t do so or whether I wouldn’t do so or whether Dave wouldn’t do so really doesn’t matter worth a damn because we’re all individuals. There are a significant number of people that would do so, and that’s part of the reason I made the statement.

    Mind you, that’s the secondary reason. The primary reason is that, in general terms, the links mentioned earlier aren’t purely organic links. The question that needs to be asked when it comes to these types of links is “would I still link to this site if there wasn’t an income/traffic opportunity associated with the link?”

    The answer is, in most cases, no. I’m sure there are some cases where someone would link to a site regardless and more power to them for getting money for it. But the much greater majority of those links are there because they’re affiliate links…they may be topically relevant, but there is a bias associated with them.

    That’s important because it affects the end user. For example, let’s take Bob. Bob wants to go buy some Lawrence Sanders books.

    Bob visits Ted’s site.
    Ted has a “bookstore” of sorts listing Lawrence Sanders books from Amazon and Barnes and Noble (all affiliate links).

    Are the links of relevance to Bob?
    On the surface, yes.

    But, when we look at it much deeper, does Bob derive the maximum benefit from Ted’s site, and is he being presented information in a fair and unbiased manner?
    No. Most people would have said “how much does Chapters offer the Lawrence Sanders books for? What about Indigo? What about XXX bookstore?” And so on and so on.

    In other words, Bob got information that he could possibly use, but it’s biased information and may not be the best source of it.

    And before anyone goes off on this tangent, I can see what some of you are saying. “If the content is relevant to Bob, so what? SEs shouldn’t penalize based on this.”

    But…consider two groups of people in there before you make that statement.

    1) Bob, the average web user. He may not even know what an affiliate hyperlink is. Most people don’t. Is Bob going to be necessarily aware that the information he is provided contains a very real potential for bias? No, he probably isn’t.

    2) The small business. The one who may not be able to afford an affiliate program or have the human resources required to maintain one. Since that represents the vast majority of businesses, that’s a big problem.

    So yeah, discounting those links makes a buttload of sense. The information is potentially biased, the links in general terms aren’t purely organic, and the vast majority of users and companies get affected in a negative way.

    As far as paid hyperlinks not being there to influence rankings, I agree…there are legitimate advertisers that are after traffic from their advertisements. All they care about is that people are visiting their site from the ads they put out.

    And that in itself is an even more logical reason to use the nofollow attribute on a paid hyperlink. Knowing that certain stats programs (e.g. Live Stats) have a tendency to misreport bot traffic as user traffic, would it not make more sense to head that behaviour off at the pass and ensure that the traffic generated isn’t from bots (at least the ones that adhere to the nofollow directive) and that it is from actual people?

  425. Matt,

    What a great shame a few people are hijacking your blog. The content of your post has been driven into the background by personal and sometimes egoistical comments from a few self-opinionated people, who should know better than to use your extremely informative blog for their own benefit.

    Over 400 posts so far, and I am sure that many of us pass by most of them, so as to keep within the framework of the topics you chose to discuss. I think you are owed huge apologies from certain posters on this thread.

    I’m just wondering if you can be a little more specific as to who you are referring to and why. I don’t really see this so far (although at this point, recordset pagination would be a REAL good thing.)

  426. What I’m getting at about paid links (ads), Adam, is that it isn’t a search engine’s place to tell webmasters that there’s a right and wrong way of putting paid links on their pages, as happened last year. The problem that engines have is internal, and should be sorted out internally.

    But that’s a bit of a digression on my part, because I have a ‘thing’ about search engines trying to *change* what webmasters to with their sites, instead of working with the Web as it is – as it always was – and resloving their problems internally.

    I don’t disagree that engines can treat links in any way they want. I’ve said a number of times in this thread that I’m not against them discounting or ignoring any links that they prefer not to count, and affiliate links are certainly not within the scope of what links-based engines want to count for rankings. So I don’t think that we have any disagreement on that score.

    What I am dead against is penalising sites just because they contain links that search engines don’t want to count, and intentionally omitting some or all of their pages on account of them is well out of order, imo.

    It may be that nothing has changed on that score, and that we simply hadn’t noticed that pages were intentionally omitted. We knew about less frequent crawls for sites with less PageRank, but I don’t think anyone noticed if pages were being intentionally left out, and they may have been. If they were, then the new crawl/index function merely improves the identifying of non-countable links, and as Matt said, they just aren’t helping any more.

    If Google was doing it before, then my view is still the same – they should not penalise sites on the strength of links. Discount them if you like, treat them as nofollows if you you like, but don’t intentionally omit pages unless you are short of space, or unless the links are definitely spam.

    I’m sure there are some cases where someone would link to a site regardless

    Coincidentally, I have a site that is less than a year old. I’ve no idea how many pages it has, but it’s probably in 5 figures. It’s a decent and useful resource, with no affiliate stuff, and no paid anything in it, and I intentionally built it to ignore all linking ‘strategies’, including link exchanges (it specifically says that it doesn’t do link exchanges). Linkswise, it’s just a plain ‘organic’ site. It links out to hundreds of totally relevant sites, and the only IBLs that I gave it were several from 2 of my own sites – one on-topic site, and one off-topic site. Those IBLs were just to get it noticed by the engines. It was doing fine until this fiasco. Plenty of pages were indexed, and plenty of people were using it, because it’s useful. But no more. It’s dead now. It has 13 pages indexed normally, and 407 pages in Supplemental, and all of the pages have useful content.

    Where’s the sense in that? Apart from the 2 off-topic starter IBLs, the site is so organic that you could eat it for lunch. I haven’t complained about it, although I’m obviously disappointed, but where’s the sense in it?

    Judging by Matt’s post, I would guess that the 2 IBLs from my off-topic site are no longer counting, but the few from the on-topic site should still count, although they have very low PRs. So there’s a useful site that people were using, that is now dead because of what? Presumably it doesn’t have enough trustable IBLs. Is that a good reason?

    Btw. Am I the only one who sometimes has to refresh the thread because the captcha code is unclear? I’ve had to di it several times in the last few days.

  427. So how would you suggest affiliates make money then? If Google discounts all affiliate links then people with affiliate sites will be starving. Take a look at the price comparison sites that just use an affiliate link but compare prices from the different merchants. Does this offer value? In my eyes it does. As do coupon sites. The big problem here is to try and cover all affiliate sites and say they don’t offer any value, which of course is not true. Yes there are some that don’t, but you can’t punish all the good ones that deliever value just beacuse of the bad apples out there. In my mind, it’s a lot like link directories. On a side note, it needs to be said that an affiliate is doing the same thing basically as Adsense..i.e advertising a good or service. Somehow I cannot see Google penalizing tho for Adsense so how are affiliate links any different?

  428. It seems to me that they just implemented a second “sandbox” (I know, but you may call it any way you like :-)). Not only that they do not list, or should I say “mention”, sites in competitive industries for up to 18 months (or even more) they prevent every “sandboxed site owner” from getting traffic by creating more pages (read content) now.

    I never liked the idea of being depended from a single search engine in terms of revenue, or any third party for that matter, but I strongly consider whether webmastering makes any sense at all anymore.

    I feel an article comming… 😉

    Star

  429. Well I’m getting increasingly despondent with Google. I have a website that is full of useful content, not remotley spammy, I only link to relevant and useful sites, my content is updated and added to constantly, but the more my site grows the more Google seems to hate me 🙁

    Now Google has dropped all but 48 pages from my site and I don’t appear anywhere in the the search results for my main keywords. There are sites in the second page of results that bear no relevance to the keyword search and are spammy.

    It seems Googles big updates are penalizing many honest sites and rewarding the spammers.

  430. I feel an article comming

    LOL!!! 🙂

    Jack. Imo, it’s ok for Google to discount affiliate links, so that they don’t help anything, as long as they don’t dump a page because of them (except pages that are nothing but affiliate links, of course).

    Matt said a couple of times that certain links weren’t harming, and that they just aren’t helping any more. That’s fair enough, but what I don’t understand is why point out OBLs, because they never helped the page or site, anyway – at least we didn’t think they helped. If they helped, the dumped site I just described should be stinking rich with Google muscle 🙂

    I can only think that the reason for pointing out the OBLs is because they are now scoring negatively for crawl/index purposes, and that, imo, is just plain wrong.

  431. Sorry Phil; as I just can’t resist this comment: 🙂

    Phil wrote”
    “I never do blackhat for a client without the client’s full knowledge, understanding and agreement – never.”

    The thing is, you are assuming you are giving that client “full knowledge”. How do you know that? I don’t know about you or anyone else, but the phone calls I receive from sites who got penalized by a firm out there, “all” said the firm gave them full disclosure. They simply had NO IDEA of the consequences of being caught for search engine spam, and were not happy about it at all.

    Isn’t it the job of the firms in our industry to “educate” that client that they “never” have to do blackhat stuff? Shouldn’t we be showing them how it’s done? I guess if a firm is an SEO “only” and does not or can not redesign sites so as to do things that are within the se guidelines, then that firm has to resort to blackhat stuff. I think that’s bogus. I also think it’s just not right to assume “every” person on the internet happens to fully understand everything involved with the risks, and then understand exactly what it means to get a ban or penalty.

    This is the biggest beef I have with blackhats. They always fall back on this thing called “full disclosure” when it’s easy stuff to educate and then proceed to fix the existing site without spam.

    YES. This post might be off-topic. That’s tough beans. I could not let his comment go without a rebuttal to it.

    I still think the majority in this thread just don’t see the big picture, and are simply taking what Matt says word for word and applying it to their sites. I would never be that short-sighted.

  432. Wotchit Doug, or you’ll have Eternal Optimist on yer back 😉

    Yes, I missed a couple of phrases out. Here is my first description from higher up the thread. You’ll notice the differences.

    Never ever use blackhat on behalf of a client without the client’s full knowledge of the risks involved, and his/her agreement to take those risks.

    If a client wants to me to do something that I know is non-compliant, I tell them that it can attract a penalty, and we discuss what the penalties could be. I.e. they know that it could result in their site being dropped from the engines. It isn’t possible to disclose more fully than that.

    It will probably surprise you that, when I come across a new client who already has spam on the sites, such as hidden text, I tell them to remove it, and I tell them why. It does happen.

    Isn’t it the job of the firms in our industry to educate that client that they never have to do blackhat stuff?

    No it isn’t. It is our job to tell them the truth.

    I’m not going to debate it here because, as you rightly pointed out, it is way off-topic. I said before that you are welcome to discuss/debate at my place. Flaming isn’t allowed there, so you may feel a bit restricted, but there are plenty of whitehat views, so you wouldn’t need to feel alone.

  433. Doug

    I still think the majority in this thread just don’t see the big picture, and are simply taking what Matt says word for word and applying it to their sites. I would never be that short-sighted.

    I’m sure you’ll understand when I say that I’d much rather take Matt’s word for what Google does systemwise, than yours, or any other outsider’s.

  434. Hey matt thanks for the response , i didnt mean to insinuate you hadnt read my post , sorry about that..

    Incase you need a refresher , im the guy who was asking questions regarding the influx of recently expired non-adult domains being filled with spam doorway pages showing up high in the serps across the board for the most searched “adult” terms..

    I have a followup questions.. and a comment

    #1 i notice the examp[le ( offending ) domains i posted vanished from the serps shortly after i reported them.. I have reported this chain to google before and in google discussions and no action has ever taken place so i thank you for whatever you did that seems to have improved the serp’s ( somewhat ) i also noticed all the ones ididnt specifically mention were NOT removed so i really hate to do it but since i never seem to have any luck with google support or google groups taking action ill drop some more links and hope whatever voodoo took place happens again and they are removed ( if not i dont mean to be pushy just hope you can help

    #2 ok this may sound funny/silly/simplistic but here goes.. Why doesnt google have a team of people that manually reviewed the top 100 search terms for the day before for spam pages ? I dont mean every page but the top100 lets say , and i dont mean removing any site thats questionable but just the obvious ones.. I timed myself and searched for 50 of the most searched search terms on google and reviewed the top 100 results and it took me approx 3 hours to sort them into “obvious spam pages” and ” safe”. so to me it doesnt seem that hard that you couldnt have a team do that ? If you extrapolated my expirement and had a team of 50 people you could manually review 250,000 sites every 3 hours and you would have cleaned up the “most liekly searched items” ( now of course this would make the results way too relevent and nobody would click n the ads * cough *cough ) 🙂

  435. heres those examples

    charterschoolleadershipcouncil.org/main.html
    gtpmanagement.com/hot/hardcore.html
    1800hithome.com/pornmovies.htm
    helsingborgsguiden.com/mpegs.htm
    sitel.org/hot/anime-sex.html
    nfgslim.com/amateurporn.htm
    bradseglarforbundet.com/indianporn.htm
    dogori.com/amateurporn.htm

  436. hey phil, can’t you move this babble off matt’s blog and put it somewhere else? I don’t want to read it, I don’t want to scroll through your self aggrandizing garbage, and I’m sure I’m not the only person who feels this way.

    Since you have so much to say, and since you are clearly in your own mind right in everything you say, why don’t you blog your thoughts somewhere else, I don’t want to wade through this junk to follow this stuff, it’s a waste of my time.

    Or can’t you get any readers on your own blog so you have to come here? That’s probably it is my guess.

  437. Dave (Original)

    RE: “Lord knows what you’ll make of this one…”

    Same your other one, not much. You only seem capable of focusing on the person & not the topic.

  438. h2. I’m sorry that you don’t like debates, but I can’t help that. I can only repeat what suggested to you earlier:-

    “Then I suggest that you skip my posts. It’s easy enough to do. Just look at the name and, if it’s mine, skip to the next post. Then I won’t waste any of your time. Easy huh? ;)”

  439. Matt:

    While I agree with your comments regarding spammy links and links that are not relevant. I can only assume that you comment regarding a mortgage link on a real estate website was not relevant was an error.

    While the way the link was implimented amongst other not relevant links and done purely for PR or rank was poor quality, realtors work with mortgage brokers every day and the referals that go back and forth between them in real life confirms the relevance of the two professions and therefore related websites.

    What do you say?

    John

  440. Dave (Original)

    RE: “Please get off the silence thing…”

    For someone who doesn’t want to go there, I find it odd that you mention it every single time. Perhaps what you really mean is “Let me have the last say on it” 🙂

    RE: “You see? It was all about what had been written in the thread at that time”

    Then I guess we should NOW “assume” the tide has turned against you 🙂

    I know what you wrote about blackhat work (I read it) and it makes you a blackhat. Period. Or is a cheat only a cheat if they cheat ALL the time?

    RE: “What I’m getting at about paid links (ads), Adam, is that it isn’t a search engine’s place to tell webmasters that there’s a right and wrong way of putting paid links on their pages, as happened last year. The problem that engines have is internal, and should be sorted out internally.”

    It is sorted “internally”. Those that do not tow the line, in return for a FREE Google placement, are dealt with “internally”. You ALWAYS have a CHOICE and nobody can force you. Why not extend your belief to hidden text, cloaking etc.

    Do you have rules for your forum, blog or whatever? I thought so.

    RE: “I don’t disagree that engines can treat links in any way they want”

    Yes you do! You disagree that Google treats sites with poor/little/no links differently to those that have good/many/lot links.

    RE: “I don’t have the inclination to go through it sentence by sentence, as you have done”

    That’s because I’m not cherry picking.

    Anyway, as you are admittedly cherry picking it is pointeless debating further. I will end with my thoughts on the issue.

    Google is the most popular SE in the history of SEs. They have guidelines that are written in a VERY clear manner IMO. The one statement below sais it all “Following these guidelines will help Google find, index, and rank your site.”

    In ADDITION, Matt also helps many by posting his beliefs and thoughts etc. Rather than moan, gripe and complain from an UNIFORMED position, I do as they ask and apply “common sense”. In return I have full listing in Google.

    There has been MANY times in the past that 1000’s of my pages have ‘dissapeared’ from Google. But rather than moan, gripe and complain from an UNIFORMED position I think “What can I do to help Google find all may pages”. Many times they simply come back without me doing anything.

    So, in summary, I look after my site and try to make it useful, popular and of quality and let Google get on with running their site. Guess what? It has worked for nearly 10 years 🙂

  441. Matt,

    I heard a rumor that Google might penalize the sites of those who hijack your blog comments and turn them into a personal rant space, using it primarily to attack each other in public without actually adding much content to the conversation. Is this true?

    -Michael

  442. So how would you suggest affiliates make money then? If Google discounts all affiliate links then people with affiliate sites will be starving.

    First of all, when did Google become this huge monopolistic entity that so greatly controlled traffic that not being indexed by it would make or break a segment of the website “population” (for lack of a better term)? There are millions of other places to draw traffic from…and webmasters that don’t know what those places are will have much deeper issues than indexing/ranking. No site or search engine should be so prevalent in a site’s stats that even the slightest fluctuation can deeply impact the bottom line of a business.

    Second, I never said once that affiliates couldn’t make money. If someone is paying them a percentage of sales from a hyperlink, more power to them. That’s their choice to do so.

    The problem with that logic is twofold:

    1) “Affiliates” can take away business from operations that have arranged for proper supplier relationships, taken care of shipping issues, RMAs, customer problems, and have generally put in a lot more effort. How difficult is it to maintain a hyperlink on a site in comparison to someone who has to deal with all of that stuff?

    2) The question of whether the hyperlink would still be there if the affiliate arrangement isn’t there has not yet been answered in such a way as to suggest it would. (In most cases, it likely wouldn’t).

    Jack, you need to step back and look at this from another set of eyes. The beauty of the comments that I’m making is I have no interest, personally or professionally, in your site whatsoever. I don’t know anyone that would compete with it on any level, directly or indirectly, and I sure wouldn’t.

    In your case, you’re negatively affected, and you’re upset about that. I understand that, and can empathize to some extent. But you need to realize that, while you have put in more effort than a “typical” affiliate site, it’s still an affiliate marketing site and does present some bias, whether you choose to accept that logic or not (there doesn’t seem to be one free link on your site). They’re not “your” products…when you send people to the affiliated sites, they’re not your problem after that point…so you really don’t have to do much other than tell people about your site to maintain your “business model”.

    When it comes down to it, all you’ve done is used what appears to be a stock osCommerce skin, thrown some hyperlinks into it, and said “here’s my so-called storefront.”

    And from the standpoint of the end user, that’s not as valuable a resource because of the bias presented. If you’ve already got Adsense on your site, why not offer a truly useful shopping resource with free links to things? You know, compare those sites that don’t offer you any financial compensation for doing so and be as thorough as you can about it.

    More links and info = more content = more traffic = more money in the long run than the bits and pieces you’ll pick up from affiliate links in the short.

    No matter how you explain it, affiliate marketing (in its online incarnation) represents a penny-ante game…minimal effort, minimal return. Why should Google reward that?

    On a side note, it needs to be said that an affiliate is doing the same thing basically as Adsense..i.e advertising a good or service. Somehow I cannot see Google penalizing tho for Adsense so how are affiliate links any different?

    There are major differences between Adsense and what you’re doing in the body of your site.

    Adsense is a contextual advertising solution designed to provide complementary ads for content. The sites themselves don’t necessarily have to “sell” anything as their major premise…in fact, most of the good ones don’t. It’s advertising, and it’s clearly delineated as such. And those advertisers who choose to participate don’t gain anything from search engines by doing it…it’s a straight exchange of money for user traffic. That’s what non-SE advertising is supposed to be about…money for traffic.

    With Adsense, the webmaster cannot fully control the ads that appear…they’re served by Google in an attempt to be contextual. Webmasters can’t turn around and say “I want eBay because their ads pay more” or “I don’t want mom-and-pop because they only pay $0.10 per clickthrough.” Webmasters don’t know, and that means they have minimal influence (other than via content) over the links that show up.

    Just some stuff to ponder for you…maybe you’re looking at this from the wrong angle.

  443. Dave (Original)

    Michael, I heard Google was going to target those who try and stir the pot.

  444. Hey Adam:

    Your post was very good reading. The main thing I wanted to emphasize in my analysis was that both my affiliate sites and Adsense were both advertising IMO. To me that is what I do, advertise specific products or specials. I know I could add some informational content, but if I wanted that kind of site I’d do a sales site. To me a sales site and content site are two very different things. I do have a movie blog that links to my movie sales site and that site just has movie reviews and movie news. Lastly, bear in mind I don’t mind the fact if pages rank low, just the fact that pages aren’t being indexed even in the supplemental index.

  445. can anyone explain me what dose “Linking to spammy neighborhoods on the web” to means?

  446. Matt,

    I’m glad that you and the guys at Google are working hard to improve things and that your still listening. However, having read the post by Lunov above, I have to say I’m getting something very similar. Our site seems to be listed on the DC’s (I have done site: on most DC’s) but yet is only ranking for keywords on a limited number? I have no idea why this would be nor does there seem to be any explanation from Google or any lead SEO’s on why this would happen.

  447. Dave (Original)

    Then I guess we should NOW assume the tide has turned against you

    Three people are a tide?

    I haven’t the rest of your post. I got to that bit and decided that you haven’t anything to contribute except argumentativeness just for the sake of. Sorry.

    Adam, Jack

    *If* affiliate links on a page have a negative effect for the page with Google, then it would be grossly hypocritical of Google, because AdSense is nothing more than affiliate links, whether they are on the actual page or not (AdSense isn’t on the page – it’s in an iframe – a different page)). I don’t see that contextuality has anything to do with it. For instance, if you write an information page about a various kinds of mortgage, and you add a few affiliate links to some mortgage companies, they would be on-topic, but for Google they would just be affiliate links.

  448. Hi Mat, i have been reading your blog for a long time. Now you have said that:

    “The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site”

    Our company has a network of more than 20 sites and we link to our sites under “Our Network”. We also advertise heavily on adwords.

    How are those links treated in this update?. They are not related, but the sites belong to the network of that company.

  449. Matt,

    I’ve lost eleven pages 11 pages since Friday, of which all were indexed in the past 10 days. Today only 59 of our more than 400 pages are indexed after hitting a bottom of 35 on May 3rd and a small peak of 70 on May 9th. Furthermore from May 3rd to yesterday, while the overall number of pages indexed remained flat what was indexed did not. Every few days one or two “old” deindexed pages appeared while pages that survived BD (no matter what their PR or content level in our DB) disappear. And while our keyword ranking for pages listed on any given day has returned to pre BD levels, the SERPs vary significantly from day to day. Pages returning a top 10 position yesterday have disappeared from the top 1000 today.

    I would hardly compare this roller coaster with the recovery you alluded to in your original post. You’ve suggested that Google is crawling sites with few quality links less frequently than before, I’m actually being crawled more than I was before. Furthermore, you’ve suggested web pages that are considered spammy are being put in the box for roughly 30 days. If Google considered my pages spammy why were they indexed or reindexed since May 3rd only to be deindexed once again? This is not a gripe. I would greatly appreciate a response so that you or I can find a fix to whatever it is that my site is suffering from.

  450. Hi Matt,

    Great post – not too much of the voodoo (as my non technical colleagues call the more hard-core lingo).

    One point I wanted to make though was about the real estate site, with the links to the motgage lender. What does someone usually do when they have found a house to buy? Find someone to lend them the money to buy it with.

    In this instance I think that it was a perfectly valid link (and no, I have nothing to do with either sector). It’s a small point, but I think a valid one in relation to the relevance of links.

    I’m not sure what the answer to this is, but maybe that’s why you work for Google & I don’t!

    😉

    Cheers,

    Ciarán

  451. Your post was very good reading. The main thing I wanted to emphasize in my analysis was that both my affiliate sites and Adsense were both advertising IMO. To me that is what I do, advertise specific products or specials.

    I agree totally with the first part. They’re both forms of advertising. But the manner in which they are presented, as Phil pointed out, are very different.

    Google Adsense is usually quite distinguishable, even when blended into the rest of the content, from the actual page itself. An affiliate link can be buried in content without the average user knowing it.

    Webmasters also have greater control over affiliate links than they do over Adsense.

    And therein lies the problem.

    Jack, you may advertise different specials as your site’s theme, and that’s cool…but again, I go back to my original point about bias. If you’re truly out to assist your user base in the best manner possible, then it really shouldn’t matter whether or not you’re getting a cut of the sale/special. If you only promote affiliate links, to a certain extent you’re cheating the end user and presenting partial content.

    It’s still a fair trade…you get content for your site and the opportunity to increase your userbase (since you’re still running Adsense, you’re fine that way), and you get the Adsense income from the ads themselves.

    As far as being hypocritical on Google’s part, it would only be hypocritical if both Adsense publishers and advertisers were rewarded for any contextually provided hyperlinks, and there is nothing to suggest that. With Adsense, since it’s behind a Javascript, it can be assumed relatively safely that it’s a straight traffic-for-money exchange, with no SERP benefit. (Yes, it’s possible that Googlebot could read its own Javascript and thereby extract the links that way, but there’s nothing that would establish that behaviour and I believe it’s not the case.)

  452. Jack, you may advertise different specials as your site’s theme, and that’s cool…but again, I go back to my original point about bias. If you’re truly out to assist your user base in the best manner possible, then it really shouldn’t matter whether or not you’re getting a cut of the sale/special. If you only promote affiliate links, to a certain extent you’re cheating the end user and presenting partial content.

    Isn’t this eaxactly what Google is doing. Promoting only sites deemed “worthy” by their types and number of links and cheating the end user and presenting partial content?

    A case of “Do as I say, not as I do.”

    Dave

  453. “Promoting” should be read as “indexing”…

    Dave

  454. I don’t know where the “types” portion of it came from or what you’re referring to. So I’ll leave that part alone.

    As far as cheating the end user goes, yes there are sites that wouldn’t show up that probably deserve to be there. But there are also sites that don’t need to be there and that wouldn’t have the backlinks that could end up indexed just because someone asked. The problem is that there is no conclusive way to tell just from looking at a site. How do you tell?

  455. As with the latest update… I’ve noticed sites like ehow and about, are now leading the pack on most search terms….

    and the little sites are mia, and dropping out fast…….

    I’m starting to wonder if Google is lining their pockets more…..

    Also last night went to buy a harddrive and noticed that westerndigital, included “The Google Pack”

    coincidence or not western digital is ranked “1” for the term harddrive….

    And as with sites selling links, google shouldn’t get involved if the links fit the site…. Who is google to say that sites can’t sell advertising “aka links” and collect money. We could care less if the people paying for the links are after pr, link pop…. or just prime real estate on the sites…

    Google use to have good results, now I noticed google punishes sites for using the dmoz, yet google is showing up more and more for its version of the directory….

  456. Missing the point, and repeating errors. Yep that pretty much sums up my problems with site indexing. Let me start out by saying I paid Yahoo to list my site. I thought at the time it was blackmail, and I admired Google for letting me go through the process for free. But honestly I have spent significantly more money on Google, if you factor in time. I work with a guy Isaac Bowman, http://www.isaacbowman.com who is a Google fanatic. I mention his site, because he is trying to be a blogger like you, and a lot of his posts are about Google. Everyday it is Google is doing this or doing that, but I think he is even baffled by our situation.

    We are a new company, and we want to be active in managing our Google listing. In fact we spend a couple of hours a day trying to improve our rankings. We pay for AdWords, we use Google Analytics, we track goals, we use Site Maps, we are listed a dozen times in different Wikipedia pages on our business topic. A short point on Wiki; Our Wiki pages on Electronic and Digital signatures bring us 50 times the traffic our PPC bring. I mention this because I was SHOCKED. I had no idea people used Wiki to that extent. The irony is they drive that much traffic to us, because they are listed in the top 3 search results for electronic signatures and digital signatures on Google. The content that exists on Wiki is a direct copy from our site. We wrote it and published it there, but they get all of the love.

    We know that active Google management can and will drive business. Unfortunately we cannot speak with anyone at Google to get help, and it is frustrating me. We all own Google stock, and we believe in the company, but the average small business person must be lost, because we are working our butts (I wish it had a literal effect), and we are getting nowhere. In early April of 2006 we had 130 pages indexed by Google. One of our developers posted a Robot.txt on our site blocking everything. This was brilliant, NOT, but none-the-less we tried over the next 50 days to fix it. We even submit a re-inclusion request last week, because we thought maybe we got blackballed or something. I know people will make mistakes, and I know there are ways to fix mistakes, but I do not know how to expedite the process, and I need help.

    This is what brings me to you. I know you don’t answer specific site questions, and this post will not make it on your site, but Matt I need help. Is there someone I can call, or email who could suggest tools or approaches we are not using. I just want a helping hand to come down from the Google heaven, and give me the answers, and even though it goes against the open world we all hope for I would even pay, but SEO optimization always seems like a scam, and honestly there is nothing they can do we shouldn’t be able to do with hard work and effort.

    Matt, this is very forward and you don’t know me from Adam, but I sure hope you made it through this. I know I am not alone in wanting a help line, or a support service. We respect Google, we pay homage to this blog and the work you have done. I just see this as our last shot in the dark before we curl up in a ball and forget the whole thing ever happened. If you have time, and I know you don’t please look into https://privasign.com, and help me figure out what sacrifice I can pay the gods to get back to where we once were, or heaven forbid even better.

    Thanks Matt. You are the Google voice of reason, and we truly do appreciate the help you try and offer to others. It is obvious you care, and I hope there is an answer out there somewhere.

    Jason McKay

  457. But there are also sites that don’t need to be there and that wouldn’t have the backlinks that could end up indexed just because someone asked. The problem is that there is no conclusive way to tell just from looking at a site. How do you tell?

    I don’t think you need to tell, Adam. Imo, a general purpose search engine should index as much as it can – just because it’s there. If they can then find, with a reasonable degree of certainty, that certain pages and links shouldn’t be indexed because they are spam, or because they are links that shouldn’t be counted, then drop them, or remove the links from the index so that they don’t count for rankings. What Google is doing is simply leaving pages out on the strength of a site not scoring enough in the trustable links department – not enough juice.

    Incidentally, the site that I mentioned near the top of this thread, suddenly came back yesterday. It had got down to having only 25 fully indexed pages and now it’s up to ~14,000 of them. I’m not now certain that it suffered from the dropped pages syndrome (DPS), or if it had been penalised again because of its functionality, and the timing was coincidental. So that’s one bit of good news.

  458. Dave (Original)

    RE: “If you’re truly out to assist your user base in the best manner possible, then it really shouldn’t matter whether or not you’re getting a cut of the sale/special. If you only promote affiliate links, to a certain extent you’re cheating the end user and presenting partial content”

    Isn’t that true of ANY selling site? That is, they don’t promote the competition.

    RE: “a general purpose search engine should index as much as it can – just because it’s there”

    Why do you assume they don’t? Looks to me like Google are far ahead of all other SE’s in that area. Like I have said (but I guess you didn’t read), perhaps they HAVE to make choices (no, I don’t know the reason) as to which pages they index.

    We can rest assured though that a reason DOES exist.

  459. Phew it took me sometime to get through this blog entry – great post Matt.

    I suppose what you are saying is that forget reciprocal link exchanges and concentrate on building content that people will want to link to – as these are classed as better quality links.

    For blogs and such sites I think this is much easier – I only have to post about the effects of the South East Asia Tsunami on my blog for a large number of people to link to it.

    Getting quality links to a website which is providing a service or product though is going to be more difficult – if I have a page promoting blue widgets who is going to want to link to that.

    Time to get my thinking cap on 🙂

  460. Dave (Original)

    RE: “Getting quality links to a website which is providing a service or product though is going to be more difficult – if I have a page promoting blue widgets who is going to want to link to that”

    Agree, it is harder. I would have links on that page to “how blue widgets work”, “why are blue widgets blue” etc. On these pages I would link back to the blue widgets page.

  461. I don’t think you need to tell, Adam. Imo, a general purpose search engine should index as much as it can – just because it’s there.

    Really?

    So something like this should be indexed:

    http://www.bme.gatech.edu/groups/fontan/welcome.htm

    Or this…

    http://216.89.218.233/

    (By the way, I already know about the broken images and CSS…but since it’s not the live site, I couldn’t care less).

    The former is actually indexed in Google (if anything, showing a weakness in the engine as far as excessive indexing goes, although I suspect that’s part of a framed site and the menu frame isn’t showing.)

    There are millions of pages under construction, just like these two…and that’s just one of the reasons why your logic is extremely flawed (and it’s not even the best one). How are these pages of any use to anyone in any capacity?

    Just because a page is there doesn’t mean it should be indexed.

  462. Dave (Original)

    Thinking on the indexing issues, wouldn’t sitemaps also help Google find all pages on a site?

    If yes, then all any site needs to get all pages indexed is sitemaps.

  463. Thanks for your time.

    I worked for Geac on big library indexing in the seventies, writing bespoke add ons and installing core system/troubleshooting, so i think a bit like you guys i guess, one of the few still alive and almost sane.

    Given that background: on my teddy bears site, which helps ex-alcoholics/addicts like me for free, i got high level listings for most pages, purely designed as fun bear adventure pages to make people laugh, and self-help for various things like weight loss, addiction, debt etc. on original pages until some unscrupulous russian porn guys took over my guestbook with their links.

    that was in november 2004. my mate spotted it in october 2005 (razor sharp on the uptake). it was so obvious im amazed i missed it for nearly a year. though rectified i still seem to be languishing as a site though i fixed it in october 2005.

    i have an issue with a couple of pages highlighted before in your blog you so kindly published for a creaking old pro like me; which your excellent sitemap service brought to light – however even with that fixed i cannot see it would have stopped all the other 200 odd pages from gaining credence – so i suspect i still have a ball and chain attached whilst i am trying to swim horizontally man.

    is it likely that as it stands my site will gradually resurface properly or am i still, to quote doug and dinsdale, transgressing an unwritten law. is this also why my head is nailed to the floor or is that another google algoritm at work?

    my friend and i have other sites for business, which have taken a hammering lately, but as they are genuine attempts and not spam i reckon they will resurface as you tune it up to reinclude those good guys you screwed up in the crossfire; in any event thats business; this teddy bear site is purely there to help people help themselves/make people laugh at the bear adventues and forget worries for a while, i.e to spread joy man.

    its not spreading too much joy at present as it only gets 400 odd visitors a day, wheras before declassification/emasculation it had 1000 a day and rising. i make no money off of it except for the odd donation, which everyone on the blog will know doesnt pay the piper on a website;
    i used to get 200 quid an hour, i reckon i lose about 4 an hour now on a good day!

    strangely, as the bears link to some pretty odd places/frogs site/ferrits site and undoubtedly by their nature some VERY odd people (like me) link to them nevertheless all their pages are there. (bound to disappear now).

    problem is few get above 141 in position in serps, not prime for normal mortals.

    my point in all this rambling is this:

    as with a lot of other people on this blog i have slogged my way through submitting emails to google. very erudite emails. and i always received an answer, unlike some, but it was pretty much machine generated mass produced help waffle totally disregarding any salient points at all – uusually with a four day lapse time. this is not fun, to spend hours writing up your problem which you only got to doing because all else has failed to be regailed with MR HELP from zog says read this help text……

    i think it would help more , even if it took a week or so, to know you were going to get some kind of answer within the bounds of reasonable disclosure, which pertained directly to the questions specifically asked.

    presumably Mr Cutts, you will now email me with “i note your blog entry and refer you to paragraph 23 sub patagraph 67 of the google help manual”. i would see the funny side.

    i doubt this will be published man, but it is a genuine sadness i cannot reach others to help them and put a little back with my expertise that once i took out.

    sometimes these sweeping changes, somewhat cavalier in their instigation, take a toll on completely innocent sites merely trying to forge ahead quite legally and properly; whose owners often go bankrupt or mad before a solution filters through, if at all – a lot of people identified with google as the boy made good, the underdog that bit back and put their faith into gaining credence and exposure on Google, often abandoning others and putting all their eggs in one basket BECAUSE THEY LIKED GOOGLE AND WHAT IT STOOD FOR, so its not so good if they suffer because of that total annihilation if their sites are indeed pukka and not spam; i think Google has a slight moral obligation to try to weed out these good sites and help reassert them to proper status, and having a proper email/helpline/support line would surely allow them to try to at least have a fair crack of the whip, there is a danger Google is forgetting its origins and primary ethics in abandoning its little peoples sites to ruin by draconian algoritm changes – is this what the original altruism envisaged?

    from my own viewpoint it would be nice to see the teddy bears unshackled, at the very least the addiction and weight loss pages.

    Thanks for listening, malcolm pugh
    mr-bong@blueyonder.co.uk
    http://www.stiffsteiffs.pwp.blueyonder.co.uk teddy bears site for all man.

  464. Dave (Original)

    malcolm, are you saying your sites pages aren’t ranking as well as you would like and/or that not all pages are indexed?

  465. PhilC said:

    Intentionally leaving some or all of a site’s pages out of the index because of assumed link manipulation is morally wrong, imo.

    I wouldn’t say it’s morality is something you can argue about – but I would suggest it’s dangerous close to anti-competitive behaviour, *if* Google were to be saying that certain non-Google advertising models could result in penalties for the advertisers and publishers.

  466. Should clarify – first sentence should read:

    “I wouldn’t say morality is something you can argue about in business.”

  467. to dave original

    nice of you to read the ramblings of an insane english systems programmer with a blissfully stressfull thirty five years at the grindstone – it hasnt affected me………….

    astonishingly, in this paranoid era, ALL my pages index, even discussion ones i couldnt give a damn whether they index or not – im sure there are pages in there from people just being emails or discussion pages from aunt dolly to her errant little jimmy on mull faring orgaically………

    however i digress, yes – these pages used to figure on page one of google before russian porn guys – period. ALL of them. i guess i should count my blessings they are all there at all i hear you all wail, but notwithstanding that, apart from two who are mentioned in dmoz(which i think lifts them to divine absolution by the god brian) all of them languish between about 89 – 199 in position. these are pristine pages. they validate on every validator known to man, and also on one i wrote in c specifically for google. they pass four other progrma i wrote in c for google affinity tests i couldnt face doing over and over by eye – thay also xenu fine, they are keyword density we olove you status – they are even allowed to sing in the choir in the local church so pure is their sound – however they potter along in no mans land waiting for a google shell to put them out of their misery- they of course run riot oven msn/yahoo/jeeves/altavista et al quite happilt being the teddy bears with attitude at the top of the table and well smug………but google still denigrates them to being has beens and pariahs – which kind of grates as all they are there for is helping others and having a laugh in a grim world -though my english and warped sense of humour allowing for nearly forty years systems programming on top of an originallly probable deranged mind may figure highly in googles reluctance to grant exposure – they may well be right – perhaps the world is not yet ready for teddy bears with attitude – one of whom is a cannibal – also they may be suppressing the fact that there is incontravertable evidence the original rock and roll heroes were in fact teddybears see teddyboy.htm on stiffsteiffs website.

    you must forgive me – i dont do blogs or logs usually – in fact my computing partner keeps me in the basement in a rocking chair most of the time – but its raining so heavy here i got let out for the day.

    so, yes – all pages feature, and no they arent exactly pre-eminent – if they were gladiators they would be dead.

    a last line to all birmingham city supporters about emile hesky from steve bruce during a recent match.

    get warmed up son, youre coming off.

    this is fun, but id better go back to the basement. cheers malc pugh rubery england

    and yes, i know there used to be an asylum in rubery……………

  468. Thinking on the indexing issues, wouldn’t sitemaps also help Google find all pages on a site?

    If yes, then all any site needs to get all pages indexed is sitemaps.

    According to the sitemaps page, this is what is “supposed” to be happening.

    Just like, according to Matt, when there’s not enough results in the main index, the SI is “supposed” to be used to fill the querie.

    What “supposed” to be happening and what is actually happening are 2 different things.

    Dave

  469. Thinking on the indexing issues, wouldn’t sitemaps also help Google find all pages on a site?

    If yes, then all any site needs to get all pages indexed is sitemaps.

    No. Submitting a Sitemap only tells Google that the URLs (files) exist. According to Google, it doesn’t mean that they will be crawled and indexed.

  470. We’re getting more crawls now with BD than we did last year although what is indexed is still fluxuating wildly for us — and actually less than before.

    I’m especially puzzled about 1 thing though: how can my 7 year old site that has a PR7 homepage not even be ranked on the first 4 pages for our own business name? We get daily crawls and used to be ranked #1 for our name (with or without the hyphen, as a single word or 2 words, it didn’t matter).

  471. Mike.

    Matt said that Google is intentionally crawling more pages from a site than they will index. That could account for the lower number of indexed pages, and the higher amount of crawling.

    He also indicated that affiliate OBLs aren’t helping, but he was talking about crawling and indexing, and not about rankings. Perhaps an abundance of affiliate OBLs is now having a negative effect on rankings.

  472. Hey everyone,

    PLEASE PLEASE PLEASE (re)read Matt’s comment guidelines! before posting here.

    Matt’s on a (much deserved!) vacation right now and I’m assisting with his blog. In particular, I’m unabashedly doing the best I can to uphold his comment guidelines by moderating the comment queue and sending inappropriate comments to that great big bit bucket in the sky.

    Thanks for your understanding.

    – Adam, on behalf of the Search Quality Team

  473. Dave (Original)

    RE: “According to Google, it doesn’t mean that they will be crawled and indexed.”

    I didn’t mean to insinuate that SiteMaps would *guarantee* a site would be fully indexed, only that it would *help*. If a sites pages are not being fully indexed then, for me at least, submitting a SiteMap would be common sense and my first port-of-call.

    At the end of the day, Google is doing a far better job than the other big 2 IMO. I guess as Matt has stated before though, Webmasters tend to compare Google with perfection and not their competition. With this mind, some will NEVER be happy.

  474. Dave (Original)

    Mike, your business name is VERY generic and is probably a common SEO phrase. Also, I dont think we should confuse real PR with toolbar PR. Keep in mind also that Matt has stated before that PR is is only one of over 100 factors used in ranking.

  475. If a sites pages are not being fully indexed then, for me at least, submitting a SiteMap would be common sense and my first port-of-call.

    Absolutely – even though it’s unlikely to make a difference when a site is being restricted.

  476. Matt’s on a (much deserved!) vacation right now and I’m assisting with his blog

    Hi Adam. You arrived just time to jump right in at the deep end, huh? 😉

  477. PhilC, I don’t think there’s *ever* been a dull moment at Google 🙂

  478. Dave (Original)

    RE: “Absolutely – even though it’s unlikely to make a difference when a site is being restricted.”

    I wouldn’t say that for sure. If however it didn’t help, I would then seek out links from relevant sites.

  479. hello Adam – good luck (and thick armour).

    you will be pleased to know this is my very last post, i hope i havnt broken too many laws.

    will you be addressing my queries, or will Mr Cutts be doing it on his return?

    i am a bit with blogs like groucho marx was with being a member of a club.

    thanks in anticipation, appreciate having somewhere at long last to even ask something and get a meaningful reply.

    yours sincerely

    malcolm pugh – rubery – england

  480. PhilC, I don’t think there’s *ever* been a dull moment at Google

    Probably not, but Matt dropped a huge bombshell with his intial post in this thread. Anyway – all the best with it 🙂

  481. Like lots of others, my site has suffered badly in the recent Google shake-up.

    I’m trying to understand why my pages should suddenly disappear overnight from the first 10 – 20 results.

    I don’t understand why over 75% of my site is now “supplemental”.
    Why is Google regularily crawling my site, but not displaying up-to-date pages?
    You are in some instances, showing way out-of-date pages, which causes confusion with my customers.
    Surely, it would be just as easy to display the current cache?
    I don’t understand it.

    I submitted a sitemap, thinking that it would give links to all my pages for Googlebot.
    In fact, although Google is telling me that the sitemap is being regularily downloaded, the results get worse by the day.

    So, please can someone at Google be more precise about things?

    I am told that in-bound links would help.

    How on Earth am I supposed to solicite quality in-bound links from competitors sites?
    I certainly will not link to theirs.
    And if I don’t show in the rankings, how will other types of site find me?
    I could arrange reciprocal links with hundreds of websites, but in the main it would not improve the quality of my website to customers.
    Surely, this policy will only serve to boost the rankings of bigger, more wealthy businesses, who can interlink with several controlled sites, and run lengthy weblogs. I see lots of evidence of this in my trade. The same high-ranking companies using lots of different websites, all inter-linked.

    I am also told that Google is comparing the content of different webpages, and dropping those that appear to be similar or “cloned”.

    I have lots of SIMILAR products, each showing on separate webpages.
    There is a wealth of data showing in detailed drawings and photos. Each of the pages is, in fact, entirely different, but they could, I suppose, be considered similar by a robot.
    Must I now add superfluous cluttered text to these pages, just to make
    them “different” to Google?

  482. Hi, Matt. I’m not SEO ptofi just the editor of architectural site(Russian). I had some supplemental results but use to think it’s OK. Unfortunately today I found that I have only 4 pages in main index – 2 from my site & 2 from another site (have no idea about that site). The question is – is it temporary situation or I really have some troubles? I spent a lot of time writing articles, collexting original photos etc… So I beg u to explain me the situation

  483. Hi Adam,
    I have some questions. Hopefully they have not been asked before. After the majority of my pages have been deindexed (due to the recent google/big daddy/google dance stuff), several of the pages still kept their page rank (google toolbar ranking) but they are not currently indexed. My question to you is what exactly does this mean? If pages still carry a page rank but are not indexed then what part does the current page rank play when a user does a search on google? Are my pages still going to show in web results? After all, they do have a rank, right?

    Is google going to return and index these pages that still kept their page rank?

    Any help/clarification you can provide would be a great help.

    Jamie

  484. I can see that since big daddy the indexing of OS Commerce sites has taken an almighty twist!

    The pages are still indexed but have not been refeshed for months, the google cache still shows old meta data and content. Even those sites that have had the URL’s rewritten to .html format and employed a 301 redirect from the previous dynamic URL’s have yet to fall under the google radar!

    Not even a google sitemap helps!

  485. I agree with PhilC. The expectation of the user of Google is that all relevant sites will be listed, large or small, new or old.

    Here’s the thing about refusing to index sites without “enough” trusted IBLs. If you’re a small niche site webmaster, as I am, you don’t have a lot of natural IBLs. But most of us are honest and do our best to create good content. After all, the average webmaster thinks, that should be enough. I’ll make good, unique content and put it on the web and eventually, people will find it.

    But how can people find it if it’s never fully indexed because it’s too small or new to have a lot of IBLs? How are users benefitting when they search for unique content that is on the site, but Google shows them NO RESULTS because the site is largely supplemental? It’s not as if we’ve done something WRONG or DISHONEST just because we don’t have more than a handful of IBLs.

    I built my site organically without “tricks” but I currently have three pages indexed out of 600. The message I’m getting: “Google won’t index anything more than a couple pages unless you play their game. Aggressively try to get IBLs and hope to death the people you approach don’t expect you to link back to them. Good content isn’t enough anymore.” I thought we were supposed to design sites as if search engines didn’t exist. I did so, and as a result got largely erased from Google. While I’m not going to go blackhat, I’m not surprised others are. What have they got to lose? I won’t be contriving links – therefore I probably won’t get more pages indexed unless something changes. I may even lose all but the homepage at this rate. That doesn’t seem right, does it?

    Google built its rep on being the most comprehensive. I don’t understand why suddenly that’s not important anymore. They can do what they want, of course, fair or unfair. But I’m not coming here to whine or rant. I’m coming her to express my concerns with the hope (perhaps naive) that someone will read these and realize yes, it is unfair and yes, we need to make some changes. I’m coming here because I don’t yet believe Google is a lost cause even though currently it treats my site as practically worthless.

    If worse comes to worst and Google continues to de-index, it is only hurting itself. No company is so big that it can afford to disregard user needs.

  486. Great Post! I know it took along time to do.

    I am see some of the same issue with my customers.

    I wish Google would stick with a set of rules and make no other changes.
    I will not hold my breath on that one.

    Thanks

  487. Way to go Dan for dragging Matt into reality for a single post! 😉

  488. Adam,

    There is a thread going on at webmaster world about pages still dropping out of the index. A lot of webmasters noticed that on the 17th and 18th that google began to de index pages again.

    http://www.webmasterworld.com/forum30/34398.htm

  489. “It’s more comprehensive.” Has been said regarding BigDaddy.

    I then read that sites will no longer be indexed fully if not linked to according to googles liking. Full indexing is in the past, and we will now only list 5% of your pages and the rest will be 6 month old supplimental pages.

    I then went on over to google and used the operator, define:comprehensive

    All of those definitions seem to say that comprehensive means to cover everything, most of things, stuff in its entirety.

    So now we have conflict of statements. How can the index be more comprehensive while at the same time not include as many pages. I travel the same blogs and forums as most here. They are not full of people saying that their page count went up like crazy since bigdaddy rolled out.

    Does that just mean that for every page deleted from a small site operator that doesnt have 100 friends that have websites link to him another ebay auction page will make the index? Because I’m not so sure having pages for Light Blue Widgets for sale on Ebay that expired two weeks ago constantly returned as a search result is better than having a page written about the use of Light Blue Widget written by a retired gentleman in Fargo. Unfortunately for the world he cannot get a link on the Washington Post website to his work, so google just won’t even show it on page 999 of the serps.

    What does more comprehensive mean? More pages but from less sites? More of the same sites shown for the same search terms? I don’t understand the double talk.

    Thanks,

    John

  490. Jeff said, “There is a thread going on at webmaster world about pages still dropping out of the index. A lot of webmasters noticed that on the 17th and 18th that google began to de index pages again.”

    There also discussion going on about someone who made their home page have 10,000 links on it to product pages and they were reindexed. The flatest of flat directory structures. There are going to be some interesting sites out there due to this.

  491. John,

    Can you post the link to that thread?

    Jeff

  492. Dave (Original)

    What might help, those with not all pages indexed, is seeking out directory listings. Not those that *require* a link back as they are link farms.

  493. Jeff and all,

    The WMW Post disccussed above is http://www.webmasterworld.com/forum30/34442.htm It starts with someone noticing a pattern, trying a crazy fix, it working, then devolves from there.

    John

  494. Googles original reason for being was to index the whole of the net, comprehensively, without fear or favour, and allowing the small guy in the street to achieve his dreams of his website being visible to all on a level playing field with any other user, big or small, rich or poor; without money having any influence or being able to do back door deals.

    I think this also reflected in the usage of google by ordainary joe public; they wanted to champion a little guy made good who took on the big guys but retained the integrity to look after them, and nurture their websites.

    All that is deemed critical in the “Guidelines” is to create relevant content and avoid spurious actions. Nothing in there says “yea verily, if you are small, go out and multiply links like the sands”.

    In view of the driving statement on this thread, that is in fact what has been deemed to happen. Inbound links, and preferably themed ones that come one way inwards, are purported to be necessary to have more than your index page visible, though you may have forty pristine pages.

    There are two ways of looking at that statement, one is that it is actual fact, and two is that it is a convenient cloak to disguise a problem. Google is notorious for not giving anything away, to the extent you have no idea what they are doing, then all of a sudden a missive on this thread sets out in meticulous detail how big daddy works and that small users must basically cultivate inbound links from total strangers without being able to offer even a link in exchange.

    this to me smells a bit off.

    however, given it were true, is it not totally contrary to the whole ethos of Google when originally set up and marked out in clear guidelines like tablets in stone?

    i have spent four years sweating over my own private help website and another one on twelve other commercial websites, rigidly adhering to these self same guidelines.

    Is it really good enough to say “sorry guys, big daddy went and shifted the goalposts, go ye forth and seek inbound links”?

    how the hell are new websites just setting up ever going to get any links in that case? Who will ever see them in the first place? Is it going to be viable to say to another webmaster “ive got a cool website of forty pages man, you just have to see it and link to it.”, “ok dude, let me see it, which page should i link to?” “well, you can only index my main index page………………” well good street cred for that site.

    This being contrary to original guidelines seems to be a huge red herring covering over a disaster.

    the ceo said there was a huge space and machine problem.

    big daddy may have been rolled out not “pretty good all in all” but pretty flawed and unretractible.

    It just seems odd a company so intractible and reticent about putting data out there in the public domain suddenly becomes effusive about why things are happening.

    Even if all this is true, what this thread is saying is “hey, all you little people sites that supported us from the off; you dont count any more, we have abandoned you because you are not big enough and rigged it so as you will never be able to be so – sorry guys, our guidelines got trashed, heres a new set”.

    Is this acceptable for millions on millions of people who have slaved blood sweat and tears to get their website up on google? is it fair to trash their dreams in march then wait till half way through may to say they have moved the goalposts and hey, you are all now history?

    Who searches? who types into google to ask for answers? does big business sit there typing away at mundane searches, or is it joe public who uses Google and made it pre eminent in search mythology?
    the same joe public whose websites are getting trashed by the day.

    perhaps something should be done to put the guidelines back where they were and these genuine and heartflet personal websites, which people have put real thought, effort, time, patience and swear words a plenty in creating, back into full view, full pages, full stop.

    this current situation is unfair, intolerable and shabby, and a sad way to treat loyal people who raised Google to where it now sits treating them like plebians to their proletariat.

    We want a level playing field again, with the old guidelines and we want it now.

    Malcolm Pugh – webmaster – http://www.stiffsteiffs.pwp.blueyonder.co.uk
    mr-bong@blueyonder.co.uk

  495. Googles original reason for being was …

    I’d guess that Google’s original reason for being was simply to launch the new engine and try and make a success of it. I don’t know of any Robin Hood attitiude.

    The only thing that makes any sense to me is space. Eric Schmidt (the CEO) said that the machine are full and that they have a crisis, and yet Matt said they have all the space they need to run everything including the index. In the first few months of this year, Google spent several hundred thousand dollars on servers, so they shouldn’t be short of space, but Google needs servers for more things than the index, so all the servers probably weren’t for the index.

    With the new servers, I can imagine that Google really does have lots of space for the index, and to expand the index a lot. But I can also imagine that a decision was made to be more selective in what is included in the index, so that the current capacity isn’t filled too quickly, and to avoid keeping on adding more and more capacity all the time. That would make sense of this fiasco to me.

    I can’t see this fiasco being wholly about spam and/or certain types of links, because of the health care site example. It’s “a fine site”, there’s nothing wrong with its OBLs or IBLs. It’s just that it doesn’t have enough good IBLs, so spam and/or link types aren’t the problem there, and it looks like a simple limitation based on good votes for the site.

    When you think about it, is it possible for a search engine to continually add everything it finds to the index, bearing in mind the rate of expansion of pages on the Web? It probably isn’t possible without continually adding more and more capacity, and I’m not sure that that’s possible either. Both MSN and Google are in the process of building huge facilities close to large hydro-electric plants (for the electrical power), so large expansion continues, but is it really possible to index everything, or is it actually better to a bit selective? If I were an engine, I’d certainly give selectivity a very close look.

    If that’s what’s happening now, then it’s working in the wrong way, imo. There are billions of pages in the index that nobody wants there except their owners – scrapers, etc. It may be very difficult to programmatically indentify them with a reasonable degree of certainty, but that’s where the focus should be, and not on limiting the presence of perfectly clean sites in the index.

  496. Oops! That should have said that Google spent several hundred million dollars on servers this year.

  497. China, google video, google earth – may have used up a bit of the server space, lord knows how they expect to store every video known to man.

    what remains is that googles core users are being stitched up like kippers whilst what sold on ebay three weeks back is readily available.

    People want quality in searches, so the bottom line on this is that as the standards drop, and all you get is ebay, other search results, ufindus directory pages and out of date data then people will switch to other searches – this is why google thrived originally – natural progression to better results. likewise, if google abandons its little user websites another “new” google will evolve to take up those sites just as google did originally.

    it would seem to me that google were originally altruistic, i worked for a few firms like that that went big and forgot their dreams, they are nowhere now as their customer base went out of the window in direct proportion to their loosing their grip on their original purpose, ethics and focus.

    I am suprised so many on this blog just accept this huge swing in policy and being sidelined big time, and merely fawn to the google gods and ask “wherefore shall we seek these new links then great ones”

    i would suppose there is a core of google employees who are secretly fuming at this deriliction of all the standards they held dear and worked for and towards – it is not a lot of fun to find your employers who sold you on a dream are now watching and promoting another movie to the one you are trying to uphold.

    i worked with great devotion to some firms in my youth, in real belief in what i was doing, only to find that the bigger they got the more the original emphasis, camaraderie, visions and hopes for the future got blanked out by the grey reality of corporate success, to the detriment of the firm, the employees and those it originally strove to help.

    IBM were invincible in my day until it believed it was – thus goes google as we speak, could probably also be true if this imbalance and failure to support and sustain the little man, who in the first place MADE google the institution it now is allowed to continue.

    Natural selection will take out what was until now the only hope for small real human single users and their own cherished websites which bring them fun to see their name up there on google.

    i think it is sad that such a light in a grey universe is slowly dimming as it sinks under the waves of its own success, neglecting to remember how it came to be sucessful in the first place, as the only outlet of the ordainary man it has now patently seenn fit to betray and banish from its pages under the guise of “what a super new algoritm we have, better get those link finding skates on normal people”

    and everyone seems to be scuttling to do their bidding instead of stopping to question why the hell they should be having to do so.

    content was purported to be king and good practices his courtiers – it would appear the queen has taken over and we must all bow to her wishes.

    its not goog enough for me, it should not be good enough for you, and id be suprised if it isnt not good enough for googles original employees.

    i am on msn/yahoo/jeeves/altavista. i have articles all over the shop, real ones written by me. i have directory entries for real things. my teddy bear site, non google high listed though extant still pulls in 600 visitors a day sans google – and thats in an era of google supremecy. so if google slips a little should we all cry – after all we have been left out to dry.

    what they mention in their guidelines(of old – as in original guidelines) is still salient – good content – proper links – couple this to good articles and directory entries and if the current google doesnt get its act back together then someone else will evolve that will, or other engines will realise that it really does come down to those criteria any way – the only thing you cant spam or tweak is real content, the only other thing that shows yu are good guys is real visitors – these will surely surface in the end anyway with or without google so i for one will not be rushing off and fabricating artificial links to satisfy googles whims – i will be hoping they reassert some sanity – but regardless of that going on about my business in the right manner. i thought it salient to try to talk to them via here to give them a chance to see it as i and many others perceive it, in case they had lost the plot and needed a reminder. this is that reminder.
    good luck to you all – i think the original guidelines are not a bad idea to follow in essence whether google sinks or swims.

    malc pugh

  498. I am suprised so many on this blog just accept this huge swing in policy and being sidelined big time, and merely fawn to the google gods and ask “wherefore shall we seek these new links then great ones”

    and everyone seems to be scuttling to do their bidding instead of stopping to question why the hell they should be having to do so.

    I think you’ll find the very few people in this thread have agreed with what Google is doing. In fact, I don’t remember any posts of support, except from Doug Heil, and I’m not even sure about that.

    Acceptance is different though. The reality is that, unless Google changes it, it’s here to stay, and we have no choice but to live with it (accept it). Like you, I have no intention of doing any unnatural link-building just to satisfy Google. Why the hell should I? If the site is good, it should be indexed. If Google doesn’t want to index my site properly, and do as good a job as they can for their users, then “stuff Google!” is my attitude – I said that before.

    But not everyone is in a position to adopt that attitude, and, as long as Google is the biggest provider of search engine traffic, nobody can be criticised for taking steps to fit in with the new way. It doesn’t mean that people approve of, or agree with, what Google is now doing. Judging by the posts in this thread, people strongly disagree in general, but people can’t be criticised for recognising that things have changed, and seeking to survive in the new reality.

  499. Damn! It amazes me how often a typo causes the meaning to be reversed. The first line of the last post should have been:-

    I think you’ll find that very few people in this thread have agreed with what Google is doing.

  500. There is a choice anyone can make – choosing another search engine to use for day to day searches – google does not have to be the default search per se forever – maybe it has become too blase on that score.

    people tend to vote with their feet eventually.

    Not a lot of traffic would flow if no one was there.

    malc pugh.

  501. Traffic is customer driven, not google driven – you can have the best shop in the world, it will shut without customers coming in the door.

    Assume the computer mags keep running their six best searches in the world to the public – week in week out – as soon as google dips down to fourth, which is on the cards the amount of salient data it is dumping, what price traffic then?

    you are as good as your last game in sport, as good as your delivery in business – as soon as you take your customers as read for granted in your pocket you are on a slippery slope.

    If evryone bought any petrol but brandx for two months how would brandx fare?

    likewise, if google suddenly has no punters using its search for months on end due to poorer results what price its market share, share price and traffic then?

    google rides on people USING it as a search engine, we dont subsist to pander to google. it should be a symbiotic relationship, but is currently one where the shark is eating pilot fish and expecting the scant surviving pilot fish to clean it.

    If everyone disaffected with google started using alternative search engines to SEARCH im sure google would soon rectify these little “anomalies” and im sure they would pay a bit more attention than they are at present.

    We govern who cracks the whip by our own choices and actions, we are not governed by the retailer when its us who are buying and choosing.

    I will quite happily wander off and write my own search in the end if i have to and all else fails, google are not the only people who can program indices sucessfully.

    We all have a choice, we all have a will, we all have freedom of action.

    if google fails us all in this then why frequent its search and perpetuate its existance?

    ive said all i can on this, good luck to everyone suffering at present

    yours sincerely

    malcolm pugh

    england

  502. Dave (Original)

    Perhaps there is not enough hours in a day, weeks in a month or months in a year for Google to include everything it can reach. A criteria for deciding which ones would then need to be used.

  503. Can someone at Google please give a simple explaination for this:

    As I’ve said in a previous post, my site enjoyed good rankings in the first 10-20 results until the “Big Daddy” shake-up.
    On 30th March I disappeared from view.

    I submitted a sitemap to Google, cleaned up the site’s internal links etc, and nervously waited.
    During April, nearly all my pages were re-visited and current versions were gradually being displayed in the cache.
    Then, in a cruel twist, all my major keywords returned to the first 10-20 placings on 16th May.
    I whooped with joy.
    This joy lasted for just two days.

    Suddenly, the site went “supplemental”, and again, I disappeared from view..

    Google has now decided to list only 20% of my site pages.
    The rest are “supplemental”.

    OK, reading the many posts on this subject, maybe this is the way of the future at Google.
    BUT, the “supplemental” cached pages being shown of my site are up to 12 months old!

    So – given that Google had an April 2006 cache of my site, WHY is Google using these OLDER versions ?
    WHERE are my April 2006 -cached pages ?
    Have they been lost or discarded ?
    IF the April 2006 cache was to be found and used by Google, would those pages still be classed as just “supplemental ?”

    In other words, are the OLDER versions of my webpages affecting my rankings ?

  504. While I appreciated the fact that Matt Cutts is talking to us, I hope that he is also taking the various suggestions you can read throughtout the comments back to Google.

    So, here goes my humble comments and in between the lines, suggestions for Google or anyone out there who cares to develop a better SE:

    I believe that Google, as well as the other search engines, should focus on finding a way to better SELECT indexed sites – rather than quantity, quality. We’ve seen Google’s index grow a great deal (particularly in the last 18 months or so), making it more difficult for webmasters and SEOs to get a decent ranking, when following its original guidelines. Clearly, this benefits its advertising business.

    The link development requirements are crazy and unrealistic because it only makes everyone spend more time on getting links while they could be developing real partnerships (and real bona fide links) and content (!). Even if a SEO knows the best way, the clients are pushing for results, and faster and faster results…

    Our approach is to develop quality content, create site tools that are important for the visitors, AND select link partners (yes, shame on us!) like we select friends. Very carefully. We also work on increasing overall online visibility (submitting articles, for example).

    However, whatever the product/business segment, we are competing with many, many spammers who DO get good results, despite all the talk… and if we were to report spammers, we would do nothing else all day long. Humm, I do have a business to run, clients to take care of…

    Here, we are also not neglecting the other SEs, because our goal is to get quality traffic across the web, especially because we have growing concerns about Google’s standards – and results quality. For example, why do directories show up when I do NOTt type “my keywords” + “directory” in my search? Most directories have no content, just links. If I wanted a directory, I would search for one. Also, why does Google keep old, outdated and obsolete sites in its index? There must be fairly easy to purge pages from the index that have not been updated in the last, let’s say, 180 days… Who needs a site that hasn’t had any new CONTENT (not links) since 2002??

    Google should be looking into developing more ACCURATE search results – e.g. based on business/organization location, main purpose, etc. – instead of indexing more and more and more and more pages… Perhaps revisiting its original idea/mission is not a bad idea.

    I think Google might be shooting itself on the foot: by making it so difficult for good sites to get decent rankings, it will start loosing quality and thus, losing users. Just look at the posts above and you see “joe public” starting a revolution.

    Thanks all for your contribution.

  505. Sounds like Google is cracking down on ad networks, but might end up hurting general sites (like my little baby blog) that link to whatever they want just based on what they think is interesting that day.

    We’ll see how it all shakes out- thanks for the post Matt

  506. I think you’ll find the very few people in this thread have agreed with what Google is doing. In fact, I don’t remember any posts of support, except from Doug Heil, and I’m not even sure about that.

    Since apparently it has to be explicitly stated now or for some bizarre reason it doesn’t count, I support what Google is doing and am okay with it.

    Hey Adam (as in the one who needs to change his name because I got here first 😛 ), are you gonna be doing any guest posting for Matt or just cleaning up threads?

  507. I recently spun off a subdomain of my website to its own domain using a 301 redirect in an .htaccess file. That went alright and it seems to be successfuly reindexed by the major SEs. However, the new domain has lost its page rank in Google.
    Is it typical that a subdomain doesn’t carry pagerank over when redirected to a new domain? Also, is the redirected subdomain going to be treated as a brand new domain, ie in the sandbox for a year?
    I am planning to do the same thing with another subdomain in the same site. Can you see anything here that would penalize the websites with Google?

    Thanks – love your blog
    Mike

  508. Dave (Original)

    RE: ” think you’ll find the very few people in this thread have agreed with what Google is doing”

    That would be the silent majority I mentioned, or the vocal minority.

    Reading Matt’s Blog I see 95% problems all-round.

    RE: “are you gonna be doing any guest posting for Matt or just cleaning up threads?

    I think he has just quit 🙂

  509. Because I am not tied to WMW and always seem to miss out on the opportunity to submit my specific case details, my website is still in the crapper. Site: shows 300+ out of 1,800+ pages. Seems to have only the home page and 1-2 levels crawled/indexed. Even level 2 is patchy.

    “Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled. As these indexing changes have rolled out, we’ve improving how we handle reciprocal link exchanges and link buying/selling.”

    I am disturbed by this comment, not becasue I do the same thing, but because there are so many reasons why this would happen in a perfectly natural way. I am begining to wonder if I am the victim of this type of “spam prevention” mechanism. I run a resource website that links to seemingly unrelated websites A LOT. By unrelated, I mean to a machine, not to my target audience.

    Can I be sure that Google understands that cakes, travel, dresses, photographers, fireworks, babysitting services and magazines are all, in fact, related under the overall theme of my website?

    Am I being penalized becasue websites that deals in antique cars, a day spa, a jazz band and a choclatier are all linking to me. Unrelated? Seems so. Poor quality? Sure, none of them are SEO savvy and therefore know nothing about links and PR.

    My visitors see the connections, though..

    I do NOT engage in link heavy exchange. A few (1-10) will happen every month.

    I DO engage in link selling. IT”S CALLED PAID ADVERTISING.

    I DO get a lot of incomming links as a result of press. Are these links also unrelated and therefore of low quality (suspect)?

    I’m beginning to wonder.

    “If the site is good, it should be indexed.”

    Who says what’s good? If the website is crawlable and indexable, it sould be crawled and indexed.

    “You can argue as much as you like, but you still can’t come up with a valid reason why any decent, perfectly clean, website should not be fully indexed, given that there is plenty of space in the index.”

    Like I said…

  510. Since apparently it has to be explicitly stated now or for some bizarre reason it doesn’t count, I support what Google is doing and am okay with it.

    Having just realised who you are, that doesn’t surprise me in the slightest. But, since you mention it, yes, you *do* have to express an opinion for it to be counted. Silence simply doesn’t count.

  511. My site which follows Google guidelines to the letter has gradually improved its rankings over the last year to the 30-40 results for our actual business name and in the 1-10 results for some of its keywords, with some more common keywords in the 100-500 range. Our keywords are actualy very specific to angling and our location.

    Up until now weve solely concentrated on providing original quality content and interesting reading to the angling communnity, but its becoming apparent that to gain higher rankings we need to obtain hundreds of IBLs, it doesnt matter from where !!!!

    The most frustrating part is that 2/3 of the sites we are competing with on Google have no content at all, just thousands of inbound/outbound links, just directories in effect !!!! These same sites then have completely off theme links adverts !!!! Aaarrggh !

    Matt, why doesnt google penalize sites that have more outbound links than actual text ?

  512. Dave (Original)

    Phil, talk is cheap. Actions are what count. Most of those happy with Google are too busy to complain 🙂

  513. Matt, regarding pagerank, my observation is that if you have a squeaky clean site, perfect code to W3C standards, lots of original content, a few good inbound links, and absolute minimal outbound links then after say 6 months the site would be PR4, this seams to be the baseline for Google Pagerank, if a site is less than say PR3 then it doing something wrong, if its PR5 or above then it conforms to Googles guidelines and is benefitting from quality inbound links.

    Googles Pagerank seems to be a good indicator of quality for most sites, but Ive come accross quite a few sites that have PR3 but have no content at all except for ‘under construction’ ?

    ps there are some good content sites that only show PR1 but I assume they are due to poor coding.

  514. As you wish. I’m sure you’ve polled the silent and busy ones, so I can’t argue with that, can I? Perhaps you’ll publish it sometime 😉

  515. Dave (Original)

    Nah, no need to poll. I’ll just “assume” like you 😉

  516. > are you gonna be doing any guest posting for Matt or just cleaning up threads?

    Not at this time, and no 🙂

  517. Coding has nothing to do with page rank. And i think most everyone agrees by now that the green toolbar that shows PR is pretty meaningless.

  518. I’ve just recently noticed that the Google traffic to my blog has vanished. I don’t usually look at my stats, but now all my visitors are from Yahoo or MSN, whereas I used to have 100+ from Google every day. The somewhat amusing thing to me is that while I am a webmaster and do have commercial sites, my blog is completely and utterly pristine, no advertising, no affiliate links, and 100% original content all the time. My only outbound links are my blogroll.

    I write pet health articles and used to have huge amounts of traffic to a few of my entries, including my most popular one, about medicating dogs who are afraid of loud noises. I discussed the pros and cons of various available treatments, and had a few visitors to that entry every day. Now that entry isn’t even in the first 100 search results for phrases I think would find it. Ironically though, on the first few pages of search results there are links to comments made by me on the blogs of other people, where I’m talking about some of the problems I’ve had with my dog! So, my one-sentence version of my article is loved by Google, but the entire article isn’t. Btw, those comments by me which show up in the search results aren’t spammy hyperlinks, just conversation on the blogs of friends’.

    I guess the problem is that most of my inbound links are reciprocal because of the blogroll? This was never intentional, it’s just that it’s normal for a community to form, and for people to link to each other as they form “blog friendships”. I read all the blogs on my blogroll each day — that’s why the links are there. They help remind me to go read, and they help people who read my blog to find blogs about related subjects.

    I think this is an example of where this algorithm is faulty. Pretend I know nothing about SEO, and I’m just a person writing original articles about a subject on which I’m well-informed, in a blog format. This is supposed to be what Google now loves, but my traffic now totally sucks. I started my blog because I wanted to be helpful about what I’ve learned about pet health over the years, but it’s not very helpful to anyone if my articles don’t show up as they should.

  519. Dave (Original)

    RE: ” And i think most everyone agrees by now that the green toolbar that shows PR is pretty meaningless.”

    I wish that were true. Unfortunately many out there live and die by TBPR and are PR junkies.

  520. regarding pagerank, my observation is that if you have a squeaky clean site, perfect code to W3C standards, lots of original content, a few good inbound links, and absolute minimal outbound links then after say 6 months the site would be PR4, this seams to be the baseline for Google Pagerank, if a site is less than say PR3 then it doing something wrong, if its PR5 or above then it conforms to Googles guidelines and is benefitting from quality inbound links.

    Googles Pagerank seems to be a good indicator of quality for most sites, but Ive come accross quite a few sites that have PR3 but have no content at all except for ‘under construction’ ?

    ps there are some good content sites that only show PR1 but I assume they are due to poor coding.

    Terry. You are mistaken about PageRank representing the sort of quality value that you described. Doing something right or wrong doesn’t affect PageRank. In fact, some of the spam methods increase PageRank a lot, and doing everything totally by the book doesn’t improve PageRank one iota. There are plenty of squeaky clean sites that show PR1 in the toolbar, and plenty of not so clean sites that show PR6 an up.

    If you do a search on ‘pagerank’, you’ll find some good articles about it.

  521. On an inspiring rather than dissillusionary note, it is quite likely than predominantly disaffected people write to blogs like these as they have, or perceive they have, a problem, wheras as many have stated i guess if you are having no problems you might not even search this log out in the first place. the only reason i did was i had a problem and could not get any joy anywhere else.

    also to their credit google have printed all my comments, both pro and adverse.

    If there has been more than a link issue, which i deem likely, then it is likely to have been a space issue or a rebuild the index from scratch and throw out the dross issue; either of which might well involve just retaining the index core page as a base reference to rebuild the link pointers onto.

    in which case all of our websites that are down to an index page should soon “flesh out” again as the salient data recently gleaned from better and deeper crawling is reapplied to a stable and cleaned out base with better foundations.

    this would explain the crawling but apparant non reindexing, a lapse between the two, possibly until new space is brought online to service it.

    if websites were deemed to be only so good as to warrant an index page, what is the point of even retaining an index page? it would be better to completely zap the site, and effectively much the same effect, as a site with 600 pages hardly functions with just its index page anyway.

    however, if your intention is to completely spring clean all the dross at the same time as adding more space, then you would need to retain your index pages as salient points of reference for a rebuild; so i predict those with just an index page will gradually see all of the pages that are valid and pristine reappearing, or in most peoples cases all of their pages full stop.

    i dont buy the inbound links explanation, it seems to have too many holes in it, i think we have witnessed/ are witnessing a rebuild probably initially prompted by space considerations, and thence seen as a good time to revamp properly to put the index on a sound and restructured basis removing a lot of historical; and “known” spam dross.

    presumably also a lot of valid sites like the Lady with pets blog may have got caught in the crossfire and will be addressed and rectified as they learn to fine tune their innocent victims.

    if this is the case i think id like to have been told “hey guys – we are revamping the index and you might lose your pages for a month or so………” which would at least make you realise that at some stage they were due back online, rather than the thought of eternal damnation and forever excluded.

    i hope this raises a ray of hope for some, as this must have been catastrophic for some businesses who still must wonder what the hell is going on, and not forgetting thjat we here can debate this sissue knowing a little of what is behind it, where to look for answers, and why things work as they do. for everyone that does there must be thousands of ordainary people and businesses who just simply cannot understand “where google has gone” who are probably the most innocent users afloat as they havnt a clue about how any of it works and simply wrote it as it is “much as google says you should” just popped in their content and tried to lob a few salient keywords here and there, no devious intent, no machinations, no machiavellian manoevers, yet they are suffering the fate apparantly reserved for those who spam unmitigatedly. this i think is where this is most scary, this update is cleaning out websites it is supposed to be sponsoring and nurturing, the completely innocent webmasters to whom seo might as well be ufo.

    But I am sure this is coming good in the wash now, or soon, its just a pity this, if the case, had not been made a little more transparant.

    Fear not, those single index pages will soon flesh out again.

    Malc Pugh
    England.

  522. Dave (Original)

    RE: “..and doing everything totally by the book doesn’t improve PageRank one iota”

    You really are the eternal pessimist (no, not realist). There are 2 serious flaws with the statement.

    1) Nobody outside Google knows what the *true* PR of any given page at any given time really is.

    2) Doing “everything totally by the book” is a what will make a Web business prosper long term.

    Your condoning of SE spamming and admission to being a black hat Phil does nothing for your ‘professionalism’ .

  523. Can’t you do anything but troll? You’re not even a good troll.

    Your #1 statement: nobody suggested any different.

    Your #2 statement: nothing to do with anything that was previously written.

    Your third statement: “professionalism” is nothing to do with black and white hats – success is 😉 But I suggest you learn to understand English better before you comment on people’s posts, because you’re not very good at making educated guesses.

  524. Btw, Dave. Matt wrote some guidelines for comments, which you might like to read, because your posts don’t comply with them. If you really do want to discuss your views, I’d be very happy to discuss them with you somewhere else, but not in Matt’s blog. But if you are just trolling for the sake of it, then I’ll leave you to it.

  525. One is led to presume from this thread that inbound links should be cultivated now to help boost results. no doubt over the course of this blog in the past, the present, and the future, there will be many other imperatives to follow which may give short term gain also.

    but i believe the original Google concept of good content, to the point of being completely ignorant of seo is probably the most viable long term idea.

    if you develop completely natural sites with real text and real pictures and real videos and real sound and hook it all together with like mionded people with like minded websites and then tell the world via articles and directories and blogs and ezines and podcasts all about your site and ideas and aims, then you will have new exciting content people will want to view, hear and read, and link to naturally out of interest alone to pop back to now and then.

    this may fly in the face of the seo of the day or month or even season, but long term it is REAL, it is ORIGINAL and it is INTERESTING, and most of all it is yours. So it will prosper long after fancy ideas squeek through holes in the search engines logic.

    plus in that idiom you are not tied to one search engine, or one system of coding or one basket with one set of eggs – you have men for all seasons and baskets on each arm and in the house and in the car – we are talking versatile, fascinating and eyecatching by the rarity of being honest, true and personal, an oasis of contemporary original creation in an era of duplication and recylced material.

    we are having a hard time at present, but honest to goodness reality will bring long term visitors and long term viability, and also not be tied in to any one avenue of traffic but served by diverse traffic from all kinds of springs.

    Malcolm Pugh
    England
    http://www.datacoms.co.uk free help site for all.

  526. Hi Matt,

    Great info on Big Daddy, I am not an SEO expert but this was pretty straightforward for even the beginners as myself – one quick question for you though – we have a Video content site where we allow many of our users to “embed” videos out across the web, typically at their blog or MySpace pages – for awhile that was really cranking our PR, but as of late seems to have almost totally halted – wondering if our “portable” embed links are now working against us.
    Thanks,
    _MD

  527. if you develop completely natural sites with real text and real pictures and real videos and real sound and hook it all together with like mionded people with like minded websites and then tell the world via articles and directories and blogs and ezines and podcasts all about your site and ideas and aims, then you will have new exciting content people will want to view, hear and read, and link to naturally out of interest alone to pop back to now and then.

    When I read that, I thought you were being sarcastic, and I smiled. But you weren’t, were you? Oh dear.

  528. Dave (Original)

    Call me a “Troll” if you like, but I will still make comments on erroneous statements you make. I would think Matt would be happy to have something pull you up on condoning SE spamming. The problem is big enough for Google without you making it worse.

    RE: “When I read that, I thought you were being sarcastic, and I smiled. But you weren’t, were you? Oh dear.

    Now, what were you saying about “Trolls”. Oh dear 🙂

  529. Dave (Original)

    RE: “if you develop completely natural sites with real text and real pictures and real videos and real sound and hook it all together with like mionded people with like minded websites and then tell the world via articles and directories and blogs and ezines and podcasts all about your site and ideas and aims, then you will have new exciting content people will want to view, hear and read, and link to naturally out of interest alone to pop back to now and then.”

    That is correct IMO. Content always has been and always will be King. It is this train of thought that will likely result in googlebot grabbing ALL content on a site. Just like Matt has hinted at.

    The best links are the ones you probably do not even know about. That is, a Webmaster on another ‘like’ site sees your content as *worth* linking to and gives it a vote.

  530. Matt, I think it might be useful if you did a post in which you defined some of the terms you routinely use. For example, in this post you mentioned “refreshing supplementals” and back in late December or early January, you mentioned that Google had just done a “data refresh”. I’m sure there are lots of other terms you use frequently that are also ambiguous, but I can’t think of them offhand right now. These are terms that you probably take for granted, since you and your co-workers use them frequently in-house, but those of us on the outside tend to get confused. Perhaps this would be a great post for Adam to take on as well. Just a suggestion. 🙂

  531. Dave (Original)

    That is a great idea Dazzlindonna! I often see an acronym or buzz word used that has different meanings in different circles.

  532. Call me a Troll if you like, but I will still make comments on erroneous statements you make.

    That’s ok if that’s what you do, but your comments had nothing to do with the statement that you picked me up on. I.e. we know that only Google knows a page’s actual PageRank, but that’s nothing to do with “doing everything totally by the book doesn’t improve PageRank one iota”, and “Doing everything totally by the book is a what will make a Web business prosper long term” as you put it, is also nothing to do with “doing everything totally by the book doesn’t improve PageRank one iota”. It looks an awful lot like trolling to me – or it could be that you really don’t understand what PageRank is.

    I would think Matt would be happy to have something pull you up on condoning SE spamming. The problem is big enough for Google without you making it worse.

    Er….which part of my post condoned search engine spamming? Or is it that you are stalking me here because of my views, whether you can find anything to pick me up or not?

  533. Now, what were you saying about Trolls. Oh dear

    No trolling there – sorry. I guess my comment went over your head.

  534. Nice to see that you guys are cracking down on European websites as well as your own over in the states. In recent months the amount of sites especially in europe where I am – are spamming and using duplicate content. I sent a couple of requests to your department with no enjoy, but it is nice to see that you are actually doing something about it.

    I know that spanish site that you mentioned but I never looked in to great detail of why they were well ranked.

    Keep up the good work especially in Spain, please….

  535. “Perhaps there is not enough hours in a day, weeks in a month or months in a year for Google to include everything it can reach. A criteria for deciding which ones would then need to be used.”

    Most hurl-inducing, sycophantic post of this thread.

    Is Google’s mission statement is to “index the world’s information,” or the Fortune 500’s?

    If MSN can do it, why can’t Google?

  536. Hey Matt,

    Some fodder for your next article: http://www.adsensepages.com

    You may find it worthy of a post or two 🙂 Normally, its not even worth pointing on bad sites, but this one seemed to be worth mentioning.

    al

  537. Dave (Original)

    I don’t believe MSN have as many pages as Google by a loooong shot. If you have prove to the contrary, please share.

    RE: “Is Google’s mission statement is to “index the world’s information,” or the Fortune 500’s?”

    That’s right it’s their *mission*. That doesn’t automatically mean they have done it. Your point?

    RE: “Most hurl-inducing, sycophantic post of this thread”

    Grow up…..please!

  538. Dave (Original)

    RE: “That’s ok if that’s what you do, but your comments had nothing to do with the statement that you picked me up on. I.e. we know that only Google knows a page’s actual PageRank, but that’s nothing to do with “doing everything totally by the book doesn’t improve PageRank one iota”, and “Doing everything totally by the book is a what will make a Web business prosper long term” as you put it, is also nothing to do with “doing everything totally by the book doesn’t improve PageRank one iota”. It looks an awful lot like trolling to me – or it could be that you really don’t understand what PageRank is.”

    Yes it does. As by your own admission, nobody outside Google knows the PR of any given page at any given time. So nobody can say what raises PR and what doesn’t. Or, more specifically, which *links* get counted for PR and which links don’t. However, it is is commnon sense that black hat methods carry a VERY HIGH risk (funny how you never mention that when condoning black hat methods) that would outweigh any *percieved* PR gain. It is also common sense that placing good content on ones site increases the chances of another simliar site voting by linking to it. Thus, likely raising PR.

    Now, doing everything by the book means following Googles guidelines. If you do that your chances of gaining PR (without cheating) are increased with NO risk what-so-ever.

    RE: “Er….which part of my post condoned search engine spamming?”

    PHIL SAID
    “Doing something right or wrong doesn’t affect PageRank. In fact, some of the spam methods increase PageRank a lot, and doing everything totally by the book doesn’t improve PageRank one iota.”

    On your site you condone cloaking (all though you are very confused on what it really is) and even link to WELL know black hat who has cloaking software.

    On your site you also link to a well known PR monger and take commission from them. This site you link to attempts to sell PR. As shown below by the heading statement on their site:

    “Buy Text Links from our partners for high PR quality ads”

    Phil, there is only one thing worse than a self-admitting black hat IMO. That is one who does not have the back-bone to admit they are one, will not condemn black hats and talks out of both sides of their mouth.

  539. Dave:

    My point–obvious to most–is that their current approach to indexing precludes them from fulfilling their own mission statement.

  540. Dave (Original)

    It may *appear* that way from out extremely limited and view with bias blinkers on. But some of us are willing to admit there is a bigger picture. Likely a means to an end, not the end itself.

    Do you really think Big Daddy was implemented to improve indexing etc, or make it worse? That one IS “obvious to most”.

  541. Dave, it is an incontrovertible fact that unique, honest, non-spammy sites are being erased from Google’s index. Therefore, if Big Daddy was implemented to improve indexing, it is “obvious to most” that it has taken a step in the wrong direction. I only hope Malcolm is right and that this is a temporary situation.

  542. Dave, it is an incontrovertible fact that unique, honest, non-spammy sites are being erased from Google’s index.

    I’m not saying this isn’t a fact. It may well be true. But if you’re going to make a statement like this one, be very prepared to back it up.

    For example, I’m going to ask you to list the sites to which you’re referring.

  543. Dave (Original)

    RE: “Dave, it is an incontrovertible fact that unique, honest, non-spammy sites are being erased from Google’s index.”

    I would say that is your opinion based your minute part of the Web. Only Google themselves would be in a position to make such a bold sweeping statement. Besides, I bet just as many (likely more) “unique, honest, non-spammy sites” are being added to Googles index that were NOT being included before.

    Having said this, Google indexes pages not sites.

    Like Adam, I too would like to see you back-up your “incontrovertible fact ” with proof.

  544. It should be pretty obvious with all the people complaining about pages and sites disappearing and not being updated. Obviously there is something wrong on Google’s end when a page is there one day and gone the next.

  545. Dave (Original)

    Like I said, that is based on one minute part of the WWW. Matts blog, forums etc are always full of nothing but *problems*. That is why they exist. To base the whole of Google and the WWW on complaints, moans, gripes, and outright bias is very niave at best IMO.

    RE: “Obviously there is something wrong on Google’s end when a page is there one day and gone the next.”

    Dissagree completely! In fact, it often means Google is outing spam. There are literally 1000’s of reasons why a “page is there one day and gone the next”. Matt himself has said many times that when he checks pages/sites that have vanished it is mostly due to a breach of the Google guidelines.

  546. Dave (Original)

    Jack, also keep in mind that those who have ALL pages indexed would not be heard from. You know, the silent majority, or the vocal minority.

  547. All
    I agree that there are reasons why google needs to change it’s outlook on how pages are indexed.

    I personally know of many people who have decided to open “keyword rich” aka “spam” sites with lots of outbound links that they receive 10p a click on…. many of these sites are just lists, and are in my opinion pointless and annoying… I mean, how many sensible people actually use these sites anyway?

    The thing that is getting me, is the fact that my company website, which used to have over 100k pages indexed, is now hovering around the 10k – 27k mark. The thing is that we didn’t change a thing! We’ve got No inbound links from directories as far as I know, and we don’t link out to any.. in fact we have minimal outbound links, as there’s not many people who we feel need the free adverts….

    Since this has happened, I’ve completely redesigned the site, and we’re not actually going up, or down, at the moment, but more than 70% of our stock lists is no longer indexed…. and to make matters worse, we used to be the top for 90% of our products…. in a competative market!!!

    I’m sitting here with my boss breathing down my kneck asking why this is happening…. the funny thing is, does anyone actually know? How long does it take for items to be indexed,

    And why, if a page is relavent, and it’s been spidered, does it not appear in the index….. and it’s not being indexed 1ce a week.. but more like 2ce or 3 times a day!

    How about google do something drastic….. DELETE EVERYTHING, and start again…… becuase half the stuff that’s in there is old (including a few of our pages), or completely pointless.

    Dave, I don’t think that anyone has ALL their pages indexed, unless their site contains 1 or 2 pages….

    If google is changing for the better, I wish it would have been fully tested before it started to go live……

  548. Dave. Your posts just aren’t worth responding to (or even reading). You invent what people have said when they haven’t said them, and you show little to no understanding of anything to do with Google or search engines in general. You bring in topics that were never mentioned just to argue about them, and you continually appear to be wanting a fight. You ignore the offer to come and discuss whatever you like somewhere else, and you ignore Matt’s guidelines for posting here.

    Quite frankly, you haven’t shown enough knowledge or sense to merit debating with, but the offer is still open if you have the guts, but I don’t think you have. You know the forum in my site – come and discuss it there. I’ll make it one-to-one if you like, and I’ll not allow anyone else to post, so you won’t be overwhelmed by people. If you’ve got the guts to have an open debate with me, without the constraints of Matt’s guidelines, then come on over. Btw, flames and bad language are not allowed, so all you’ll get is debate. Do you have the guts for it?