Heh. I wrote this hugely long post, so I pulled a Googler aside and asked “Dan, what do you think of this post?” And after a few helpful comments he said something like, “And, um, you may want to include a paragraph of understandable English at the top.”
Fair enough. Some people don’t want to read the whole mind-numbingly long post while their eyes glaze over. For those people, my short summary would be two-fold. First, I believe the crawl/index team certainly has enough machines to do its job, and we definitely aren’t dropping documents because we’re “out of space.” The second point is that we continue to listen to webmaster feedback to improve our search. We’ve addressed the issues that we’ve seen, but we continue to read through the feedback to look for other ways that we could improve.
People have been asking for more details on “pages dropping from the index” so I thought I’d write down a brain dump of everything I knew about, to have it all in one place. Bear in mind that this is my best recollection, so I’m not claiming that it’s perfect.
Bigdaddy: Done by March
- In December, the crawl/index team were ready to debut Bigdaddy, which was a software upgrade of our crawling and parts of our indexing.
- In early January, I hunkered down and wrote tutorials about url canonicalization, interpreting the inurl: operator, and 302 redirects. Then I told people about a data center where Bigdaddy was live and asked for feedback.
- February was pretty quiet as Bigdaddy rolled out to more data centers.
- In March, some people on WebmasterWorld started complaining that they saw none of their pages indexed in Bigdaddy data centers, and were more likely to see supplemental results.
- On March 13th, GoogleGuy gave a way for WMW folks to give example sites.
- After looking at the example sites, I could tell the issue in a few minutes. The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling. The Bigdaddy update is independent of our supplemental results, so when Bigdaddy didn’t select pages from a site, that would expose more supplemental results for a site.
- I worked with the crawl/index team to tune thresholds so that we would crawl more pages from those sorts of sites.
- By March 22nd, I posted an update to let people know that we were crawling more pages from those sorts of sites. Over time, we continued to boost the indexing even more for those sites.
- By March 29th, Bigdaddy was fully deployed and the old system was turned off. Bigdaddy has been powered our crawling ever since.
Considering the amount of code that changed, I consider Bigdaddy pretty successful in that I only saw two complaints. The first was one that I mentioned, where we didn’t index pages from sites with less trusted links, and we responded and started indexing more pages from those sites pretty quickly. The other complaint I heard was that pages crawled by AdSense started showing up in our web index. The fact that Bigdaddy provided a crawl caching proxy was a deliberate improvement in crawling and I was happy to describe it in PowerPoint-y detail on the blog and at WMW Boston.
Okay, that’s Bigdaddy. It’s more comprehensive, and it’s been visible since December and 100% live since March. So why the recent hubbub? Well, now that Bigdaddy is done, we’ve turned our focus to refreshing our supplemental results. I’ll give my best recollection of that timeline too. Around the same time, there was speculation that our machines are full. From my personal perspective in the quality group, we have certainly have enough machines to crawl/index/serve web results; in fact, Bigdaddy is more comprehensive than our previous system. Seems like a good time to throw in a link to my disclaimer right here to remind people that this is my personal take.
Refreshing supplemental results
Okay, moving right along. As I mentioned before, once Bigdaddy was fully deployed, we started working on refreshing our supplemental results. Here’s my timeline:
- In early April, we started showing some refreshed supplemental results to users.
- On April 13th, someone started a thread on WMW to ask about having fewer pages indexed.
- On April 24th, GoogleGuy gave a way for people to provide specifics (WebmasterWorld, like many webmaster forums, doesn’t allow people to post specific site names.)
- I looked through the feedback and didn’t see any major trends. Over the next week, I gave examples to the crawl/index team. They didn’t see any major trend either. The sitemaps team investigated until they were satisfied that it had nothing to do with sitemaps either.
- The team refreshing our supplemental results checked out feedback, and on May 5th they discovered that a “site:” query didn’t return supplemental results. I think that they had a fix out for that the same day. Later, they noticed that a difference in the parser meant that site: queries didn’t work with hyphenated domains. I believe they got a quick fix out soon afterwards, with a full fix for site: queries on hyphenated domains in supplemental results expected this week.
- GoogleGuy stopped back by WMW on May 8th to give more info about site: and get any more info that people wanted to provide.
Reading current feedback
Those are the issues that I’ve heard of with supplemental results, and those have been resolved. Now, what about folks that are still asking about fewer pages being reported from their site? As if this post isn’t long enough already, I’ll run through some of the emails and give potential reasons that I’ve seen:
- First site is a .tv about real estate in a foreign country. On May 3rd, the site owner says that they have about 20K properties listed, but says that they dropped to 300 pages. When I checked, a site: query shows 31,200 pages indexed now, and the example url they mentioned is in the index. I’m going to assume this domain is doing fine now.
- Okay, let’s check one from May 11th. The owner sent only a url, with no text or explanation at all, but’s let’s tackle it. This is also a real estate site, this time about a Eastern European country. I see 387 pages indexed currently. Aha, checking out the bottom of the page, I see this:

Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled. As these indexing changes have rolled out, we’ve improving how we handle reciprocal link exchanges and link buying/selling.
- Moving right along, here’s one from May 4th. It’s another real estate site. The owner says that they used to have 10K pages indexed and now they have 80. I checked out the site. Aha:

This time, I’m seeing links to mortgages sites, credit card sites, and exercise equipment. I think this is covered by the same guidance as above; if you were getting crawled more before and you’re trading a bunch of reciprocal links, don’t be surprised if the new crawler has different crawl priorities and doesn’t crawl as much.
- Some one sent in a health care directory domain. It seems like a fine site, and it’s not linking to anything junky. But it only has six links to the entire domain. With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages. Hold on, digging deeper. Aha, the owner said that they wanted to kill the www version of their pages, so they used the url removal tool on their own site. I’m seeing that you removed 16 of your most important directories from Oct. 10, 2005 to April 8, 2006. I covered this topic in January 2006:
Q: If I want to get rid of domain.com but keep www.domain.com, should I use the url removal tool to remove domain.com?
A: No, definitely don’t do this. If you remove one of the www vs. non-www hostnames, it can end up removing your whole domain for six months. Definitely don’t do this. If you did use the url removal tool to remove your entire domain when you actually only wanted to remove the www or non-www version of your domain, do a reinclusion request and mention that you removed your entire domain by accident using the url removal tool and that you’d like it reincluded.
You didn’t remove your entire domain, but you removed all the important subdirectories. That self-removal just lapsed a few weeks ago. That said, your site also has very few links pointing to you. A few more relevant links would help us know to crawl more pages from your site. Okay, let’s read another.
- Somebody wrote about a “favorites” site that sells T-shirts. The site had about 100 pages, and now Google is showing about five pages. Looking at the site, the first problem that I see is that only 1-2 domains have any links at all to you. The person said that every page has original content, but every link that I clicked was an affiliate link that went to the site that actually sold the T-shirts. And the snippet of text that I happened to grab was also taken from the site that actually sold the T-shirts. The site has a blog, which I’d normally recommend as a good way to get links, but every link on the blog is just an affiliate link. The first several posts didn’t even have any text, and when I found an entry that did, it was copied from somewhere else. So I don’t think that the drop in indexed pages for this domain necessarily points to an issue on Google’s side. The question I’d be asking is why anyone would choose your “favourites” site instead of going directly to the site that sells T-shirts?
Closing thoughts
Okay, I’ve got to wrap up (longest. post. evar). But I wanted to give people a feel for the sort of feedback that we’re getting in the last few days. In general, several domains I’ve checked have more pages reported these days (and overall, Bigdaddy is more comprehensive than our previous index). Some folks that were doing a lot of reciprocal links might see less crawling. If your site has very few links where you’d be on the fringe of the crawl, then it’s relatively normal that changes in the crawl may change how much of your site we crawl. And if you’ve got an affiliate site, it makes sense to think about the amount of value-add that your site provides; you want to provide a reason why users would prefer your site.
In March, I was able to read feedback and identify an issue to fix in 4-5 minutes. With the most recent feedback, we did find a couple ways that we could make site: more accurate, but despite having several teams (quality, crawl/index, sitemaps) read the remaining feedback, we’re seeing more a grab-bag of feedback than any burning issues. Just to be clear, I’m not saying that we won’t find other ways to improve. Adam has been reading and replying to the emails and collecting domains to dig into, for example. But I wanted to give folks an update on what we were seeing with the most recent feedback.
{ 917 comments… read them below or add one }
Damn Ringtone People!!!
Hi Matt
Thanks for the much neede detailed update.
I do hope that you, GG and later Adam (when he feels ready) to post more of the same and more often than you are doing now.
IMO, its not enough of Google to tell us that they are listening. We need them to talk to us too. I.e communicate
Once again, thanks Matt. I know you must be also busy preparing for the vacation.
Wow, looks like someone is going to have a short interview today
Thanks for the update Matt.
Yawn !!!
After the past 12 months of Google messing about and still no better results … I’ve completely learned how to live without you.
Best wishes, You’re gonna need it
Every time someone asks a novice question in google groups while at the same time saying that google s-u-c-k-s I will refer them to this post.
Is adam bot or human?
Thanks Matt.
Thank you Matt for the update. I really appreciate you finally using some real estate sites as examples. Since this is an indexing issue I thought I would bring it up.
After checking the logs today I noticed this coming from Google pertaining to our site.
http://www.google.it/search?hl=it&q=fistingglessons&btnG=Cerca+con+Google&meta=
LOL now as you can see the #2 site is a real estate site listed for this search term.The page showing for this search is a property description page. As you can tell from the sites description it has nothing to do with this subject matter. Would you mind checking with the index team and see why maybe this would be indexed for such a phrase.
On a side note it would be nice to see more examples of real estate sites used in the future. Thanks again for the update.
Great post Matt. That really clears up a few things about how Bigdaddy works. Still seems like it is responding very slowly and I find that large companies are getting ahead of smaller sites for local terms even though they are not located in the same country. But that’s mostly because of my own business gripes
Keep up the great posting.
Great post Matt, thanks for putting in the effort to explain what’s being going on.
I have a quick question – how long is it taking these days for Google to index new pages? I added a forum to my site a couple of months ago, and while it doesn’t have many deep links from external domains, it is linked to pretty well from within my site and is in my submitted sitemap. Google seems to be crawling it quite enthusiastically. However, none of it’s showing up in the index with a site: search despite the intensive crawling and waiting about a month. Does this mean that Google doesn’t think my forum is worth indexing?
Yeah, blame this disaster on webmasters, Google can’t index the web properly and it is the fault of webmasters working bad links?
Funny that those that are running the biggest links scams on the net are ranking great Matt?
Explain that one, will ya ???
Where are the indexed pages Matt, do they just disappear, do you have an answer for all of us or are we all using linking scams?
Thanks everybody. I’m glad that I sat down and got all this down. Yup Mike, I figured if I could get this post out before I talked to Danny, then we could just sit around and shoot the breeze.
Danny: So, how’s life?
So how ’bout those Reds?
Matt: Not bad. How are you doing?
Danny: Pretty good, pretty good.
Matt: The communists??
Danny: No, the Cincinnati Reds!
Matt: There’s communists in Cincinnati!?!?!
Sina, it’s by design in Bigdaddy that we crawl somewhat more than we index in Bigdaddy. If you index everything that you crawl, you never know what you might be missing by crawling a little more, for example. I see at least one indexed post from your forum, so the fact that we’ve been visiting those pages is a good indicator that we’re aware of those pages, and they may be incorporated in the index in the future.
Great post Matt! Good job. Nice to hear some more detailed feedback.
Hey can you answer this for me? Finally we have been seeing some improvement to the indexing of our site. I have seen other webmasters mention the same occurance of indexing down to about level 3 pages and that is it. Althought deeper pages are being crawled (level 4+) they just don’t want to stick very long in the index. Linking a bit higher can get them to stick (turning them to level 3 and 2) but that just impossible to do with alot of content. Is this something that will correct in time? We have PLENTY of links at all levels so I don’t see this as a huge problem. Pretty much looking for reassurance to sit tight.
I read two real estate sites and hoping one was mine, but neither applied to me. My real estate site only has outbound links to Home Builders, so I doubt this should quality as spam.
It still seems to me that you are blaming this on penalties, which I’m fine with, but why would you crawl my site thoroughly on a weekly bases, then never put the results in the index? This has been happening for 2 months now.
Hello Matt
Thanks for the information.
“Bigdaddy: Done by March” Is it really true. It means that I do not understand why there are still different search results between
http://66.249.93.104/ and http://64.233.179.104/
Please could you give us more details. It’s confusing.
Where is really Bigdaddy!
Thanks for your reply.
Thanks for a very informative post. Just one quick question though, is there ever a time when link exchanges are considered legitimate? Maybe even an example of the case? It’s easy to tell the irrelevant link exchanges, but there has to be some instances that maybe a … real estate agent exchanges links w/ a … local moving company.
Can you comment on this?
HA!!!
To celebrate this new information I deleted an old directory that was hanging off my most valued website. It made an awful shriek as I removed the database. In the coming weeks there will be a few autoemails asking “where is my link”??? and I will reply, “you will not drain my power anymore, die die!!!
(ok enough of this Matt Cutts fellah for today, I got work to do, how about you?)
Hi Matt,
Thanks for the post. Problem is… none of your explainations seem to fit my site. I’m trying to maintain a straight ship in a dirty segment. My links have been accumulated by form relationships with related sites (thus I’m building links a bit slower than straight link exchange would allow). My content is most certainly provided to educate the visitor. My affiliate linkage is quite low. But yet my pages seem to continue dropping and supplementals are increasing.
Thanks for reading this,
jim
Matt thank you for the explaination about big daddy. But I have checked my websites for points you just wrote down. And I can’t find any of them for my site.
I have pretty much backlinks. I don’t link to crappy sites and still my indexpages is like a wave.
On monday I can have 800.000 pages indexed on tuesday 350.000, then back to 600.000 down to 400.000. The difference is way to big. And we had over a million records.
I also requested a reinclusion request but we never heard from it or saw any changes. My domain name is techzine.nl I have www, forum, babes, msn and pricecheck.techzine.nl in use.
We did have some problems in the past I e-mailed it a couple of times to google but never got an awnser about it.
We changed to domain name of the website from tweakzone.nl to techzine.nl (oktober 2005). We forwarded it with 302 (stupid) I found that out later and changed it to 301 (permanent) redirect. No I am still trying to get the whole tweakzone.nl domain out of google and get techzine.nl indexed correctly. We asked many many webmasters to update their links and that worked. Our HTML code is by the book. But still we are not being indexed as we were. I’m running out of ideas and options to fix this. Can you explain to me what I am doing wrong. I have been reading SEO sites, webmasterworld.com, Google guidelines for months now and I can figure out what I’m doing wrong…..
Kind Regards,
Coen
Strange how you ignored comments before, and now you have decided to respond.
Unfortunately, the serps have become absolute trash, so the changes have failed, and I see more spam sites doing well than before.
Thank you for the timeline.
I find it rather frustrating to follow how your timeline basically outlines how everthing is working just as it should, and watch pages display as regular one day, supplemental the next, a week later regular and then back to supplemental. Searchable as regular listing, completely unsearchable as a supplemental.
Good to hear you guys have plenty of machines with plenty of room. Perhaps someone should inform the CEO.
I look forward to you finding other ways to improve.
Dave
Please, please, please delete all of the old supplemental results! I think if you took a poll, you would find very few webmasters (or end users) who actually value any of those old junk pages (many of which do not even exist anymore).
I have even used the URL removal tool in the past – but those old pages just keep coming back!
I don’t think what Mr. Cutts meant the mortgages sites, credit card sites, and exercise equipment sites were junk, most likely that they were unrelated.
Now, I don’t think it’s fair to penalize a site for linking to an “unrelated” site, since many webmasters link to their other websites etc. Links being devalued because their coming from an unrelated page would be more fair.
And what’s the deal with reciprocals? Although I rarely do them (time related), I don’t think it’s unfair. A vote is a vote right? Even if two people vote for each other. As long as it’s not automotive I don’t see why it would be a problem…
What about the impact of getting a bunch of unrelated inbound links to your site? Image if someone used a linking scheme to point hundreds, or thousands, of links at your domain? All those links from “unrelated” or “junk” sites would surely put a hurting on you. Not fair.
I agree that reciprocal link directories should be removed as they are link farms, so Google is doing the right thing there!
Some reciprocal linking is natural though and sites should only have their sites removed if they have a high percentage of reciprocals in their totals.
[quote]Google should NEVER * NEVER * even entertain the idea of deciding what Products or Services are “JUNK”
This is a recipe for disaster, and extremely arrogant.
What gives any search engines the right to decide that someone’s business category is “JUNK”. This would be analogous to Yahoo Directory or DMOZ devaluing certain TYPES of products or services.
[/quote]
It aint that often you’ll see me stick up for Google but MR SEW you are VERY wrong.
Google can do what the hell they like with their search engine, cos it is THEIRS.
If they want to devalue links in their algorithm, that’s their perogative, cos the algo is THIERS
If they want to say certain business models are junk in their search engine then that is their right, cos the search engine is THIERS
You have exactly the same right. On YOUR web properties you can say and do what you want. If you want to link out via affiliate URLs you can as the web site is YOURS.
If you want to buy or sell links, you can as the web site is YOURS.
When all is said and done when you own something it is up to you what you do with it. Google is no different with whatever it decides to stick anywhere on its domains than you or I am with mine.
Personally I think Google makes lots of mistakes. I also believe so do many webmasters, myself included but they are our mistakes to make the way we see fit at the time.
I’m happy with what I do and I am sure Google are happy with what they do. Personally I am going to carry on trying to beat Matt and his team at Google and I am pretty sure he and his team will carry on trying to beat me.
He wins some, I win some but therein lies the nature of the web. On his site he can do what he wants. On my site I can do what I want. I suggest you, Mr SEW do the same
Damn … great summary Matt … the “other Matt” must be saying “gulp” to try to follow that act while you are gone. And yea, what are you going to talk about in a couple of hours on the radio show?
BTW, here’s an oddball corner case that I would classify as a bug – one of your favorite subjects – redirects!
So URL1 ranked well for keyphrase1. The SERP’s show a title, some text, and a URL. A (legit) 302 (temporary) redirect was setup to URL2. After a few days, the SERP’s for keyphrase1 show URL2, but was still using the title tag for URL1. The “other text” is pulled from URL2. Looking at the cache, it is all URL2. This persisted for several days – looked pretty darn funny actually in the SERP’s, since the URL2 title tag had nothing to do with keyphrase1.
I think (?) correct behavior would be that if you are going to show a URL in the SERP’s, you should show title/text associated with that page … but in this case, some part of the indexing machine got confused by the redirects and the title1 piece got left in even though URL2 was displayed.
Email me if you want more info, but you should easily be able to setup a test case based on that description. BTW, Yahoo has a similar bug in the SERP’s (I forgot how MSN handled it), so it’s not just the big “G” struggling with redirects.
I had some clerical errors in my post above (automotive should be automated
, wish I could edit it… sorry.
Hi Matt, great information as always. I have a question about this:
How might this impact the typical blog with a lengthy blogroll? Many people have blogs with lengthy blogrolls… and many of those sites in my blogroll end up linking back without it really being arranged as a reciprocal exchanged.
From what you are saying I get the idea that having a blogroll/recommended reading list doesn’t sound like a good idea.
Doesn’t matter…..they don’t care about results. Bad results means more money for Adwords:)
Microsoft will squash Google like it did Netscape. When Vista comes out….Google will fall.
Matt. For me, that was the best post that you’ve ever posted here – by a very long way.
I’m one of the people who has sites that are suffering right now. One of them is the site that we spoke about last year. It had a clean bill of health from you, and nothing has changed since then, and yet it’s pages are being dropped daily. Right now it’s down from a realistic 18k-20k pages to 9,350, but only around 500 of them are fully indexed – the rest are URL-only partials. Yesterday it had 11,700 but only ~600 of them were actually listed, and some of those were partials.
From your post, I would say that the site fits the description of not having many trusted IBLs. Would that be correct? Reminder – http://www.holidays.org.uk
To be honest, if it is correct, then I dislike it a lot. It would mean that it isn’t sufficient to have a decent and useful site any more to be fully indexed by Google, if the site has quite a lot pages. It would mean that we have to run around getting unnatural IBLs just to be fully represented in the index, and unnatural IBLs are one thing that Google doesn’t want.
Chris, I talked about this a couple comments above:
http://www.mattcutts.com/blog/indexing-timeline/#comment-27002
With Bigdaddy, it’s expected behavior that we’ll crawl some more pages than we index. That’s done so that we can improve our crawling and indexing over time, and it doesn’t mean that we don’t like your site.
arubicus, typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.
Ronald R, I’ve got a finite amount of time.
I spent a large chunk of Saturday writing this up, but I don’t have time to respond to every comment. I wish I did. But improving quality is an ongoing process; if you see spam, I’d encourage you to do a spam report so we can check it out.
CrankyDave, the supplemental results are typically refreshed less often than the main results. If your page is showing up as supplemental one day and then as a regular result the next, the most likely explanation is that your page is near the crawl fringe. When it’s in the main results, we’ll show that url. If we didn’t crawl the url to show in the main results, then you’ll often see an earlier version that we crawled in the supplemental results. Hope that helps explain things. BTW, CrankyDave, your site seems like an example of one of those sites that might have been crawled more before because of link exchanges. I picked five at random and they were all just traded links. Google is less likely to give those links as much weight now. That’s the simple explanation for why we don’t crawl you as deeply, in my opinion.
Brian M, I’ve passed that sentiment on. I believe that folks here intend to refresh all of the supplemental results over the summer months, although I’m not 100% sure.
How about a tool so that we know who we should be linking to or not?
I see spammers in the google index. Maybe they should get penalized down to a PR of 3 for linking to a bad neighborhood! LOL. Just kidding.
I guess you just may as well nofollow every external link just in case.
Yes a good example of this is our link backs here, I linked to this blog entry from my forums and my link here goes back to the forum!
Is this what Google is going to take out or are you looking for a high concentration of reciprocal links Matt?
Problem with this post is that most of us would have identified the spam examples that you listed and yet most of us still don’t understand what has been happening to our sites, in our case going from 20000 pages indexed to less than 100 instead.
You had indicated that there were only a “dougle-digit number” of emails sent to the bostonpub address and that someone was going through them over a week ago already. Today, you also stated that someone was still going throught them. We did send an email and we still have not received a reply. Based on the most recent thread on wmw, it looks like we are not the only ones.
Real answers would help.
Many small businesseses are suffering from these massive de-listings. It is not a light subject for us. From our point of view, bigdaddy has not been “pretty successful” and general replies are now a bit short on comfort at this point.
Nice post Matt. Very informative and not at all too long.
Shoemoney – Was that one of your ringtones sites?
“arubicus, typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.”
Thanks MATT!
I think it is a PR the factor but nothing is trickling down from the home page – (Backlinks for the homepage reported from google are completely ?????)
We keep the most logical structure you could possible have. A pyramid strucure drilling down to the articles. Articles linking to related articles. Googlebot crawls just does not like level 4 +. If pr is a factor (I thought it now updates continuous) I am not sure why it does not filter down (besides I have no clue if it actually does since what is shown on toolbar may not be accurate).
Jason Duke, I did another pass to mark all SEW links as spam. Gotta muck around and delete SEW from my user database.
Anthony Cea, I gave a quick example above. Someone was complaining about their pages being supplemental, but that’s the effect, not the cause. The right question is “Why aren’t as many of my pages showing in Google’s main results?” I picked five links to the domain at random and they were all reciprocal links. My guess is that’s the cause. I mentioned that example because CrankyDave still has an open road ahead of him; he just needs to concentrate more on quality links instead of things like reciprocal links if he wants to get more pages indexed. (Again, in my opinion. I was just doing a quick/dirty check.)
Valentine, I made the links I showed an image so no one would feel the need to go digging into actual sites.
What if our problem isn’t crawling so much as seeing those pages indexed at all. I have checked the supp index and haven’t seen them there either but I have seen the Googlebot crawling the pages.
P.S. Is there an email I should send to asking about this and if so where?
OK Matt, so what you are saying is that we should produce great content and hope we get linked to because of the value of the page!
But when is Google going to get real about schemes to game the engine so that natural links that are earned are rewarded?
Matt… I have previously reported spam, and not in my sector. But nothing happens, so in the end I just gave up.
I’m wondering how you gain relevant links, in some sectors, without reciprocating, or paying? Do you believe that rivals would give you a free one way link, lol?
@Matt:
Some days I really wonder why you even post to your blog at all lol It seems that for every 1 legitimate query there are 10 others holding you personally accountable/responsible for their serp/penalty/crappy result.
I mean really… if the amount of Q&A here was T&A a team of plastic surgeons couldnt wipe the grin of your face
anyway …. “my site is getting crappy results and no traffic …” its your fault and Google sucks… LOL not really… but I want to get in on the fun too !
Dear Matt, thank you for explaining us google’s view of link exchanges.
We have dropped low-quality link exchanges months ago, now going on only with high quality links, added tons of new and unique stuff to our site, but the crawler does not crawl much, and the site is low rated. One year ago it was on top of many competitive searches.
Is it possible to overcome this bad backlink reputation? It’s almost impossible to get rid of low-quality links once they are there. Do you have an advise for sites like ours?
I have a snall site that offers a free downloadable tool. So I registered a sitemap and waited.. some months. Still not indexed. Every day the bot visits, picks up the site map then the index page then the download exe (which is about 3.5M) Any idea why the bot should try to spider exe files?
I needed a slightly different version of the tool for a specific audience. so I registered a new domain, copied the site with minor changes. Did not register a sitemap because I wasn’t particularly bothered if it was index or not. The new site was indexed in a week or so, and now has a PR of 4. The original, near identical site, still not indexed.
The original site has been in Yahoo and MSN for months….
I don’t blame Google for dumping on webmasters that try to game the engine with manufactured links, purchased links, traded links, links from reciprocal link farm directories and so on, this is good long term if they can index the web properly taking these things into consideration!
Better late than never
Thanks Matt, you put my mind to rest on a lot of issues
I cannot wait to forward this to my mortgage lender, who just asked me just the other day,
“You work in SEO any idea why I’ve lost so many of my pages in Google?”
Your explanation sounds so much nicer and more official than… “It could be because your website has a bunch of crap in it, and on it, and connected to it”
BTW- “It could be because your website has a bunch of crap in it, and on it, and connected to it” is an accurate analysis for many of the mortgage and realtor sites who do not rank well on Google right now.
Personally I don’t care about where my site ranks. I believe that would happen ranks would happen naturally if you serve your visitors well.
What many of us DO care about is having equal treatment as any other website owner large and small as well as equal opportunity. Spammers should not be there when legit sites should be there but are not being indexed for some reason. I believe that it is healthy for to get a bit of feedback and give feedback to google so that such equal opportunities can exist.
Matt, thanks for the information…but it doesn’t help me at the moment! My most important pages just aren’t getting indexed but are getting crawled. We have a really useful website with thousands of members but it seems that only Google thinks its not good enough! Any advice would be greatly appreciated.
Anthony Cea, you’ve got some people who were relying on reciprocal linking or link buying complaining specifically that they’re not crawled as much. So as far as “when is Google going to get real about schemes to game the engine so that natural links that are earned are rewarded,” I think that we’re continually making progress on judging which links are higher-quality.
Ronald R, we’ve been checking spam reports more closely lately. You ask “I’m wondering how you gain relevant links, in some sectors, without reciprocating, or paying? Do you believe that rivals would give you a free one way link, lol?” My answer is that trying to force your way up to the top of search engines is in many ways not working in the most efficient way. To the degree that search engines reflect reputation on the web, the best way to gather links is to offer services or information that attract visitors and links on your own. Things like blogs are a great way to attract links because you’re offering a look behind the curtain of whatever your subject is, for example.
Mike B, I’ve talked to the sitemaps folks a lot. Having a sitemap for your site should *never* hurt your domain. On the other hand, don’t expect that just listing a sitemap is enough to get a domain crawled. If no one ever links to your site, that makes Googlebot less likely to crawl your pages.
That’s a very concise way to say it, Bob Rains, although a lot of variation that I see is also if someone’s domain is hardly linked at all. At the fringe of the crawl is where you’re likely to see the most variation, while a site like cnn.com with tons of links/PageRank is going to be less likely to not be crawled.
It’s funny, because most people understand that on a SERP there are 10 results, and if one webmaster is unhappy because they dropped out of the top 10, then some other webmaster is happy that they have joined the top 10. In the same way, we have a finite amount of crawling that we can do as well. Bigdaddy is more deep, but we still have to make choices about whether to crawl more from site A or site B.
Well said, arubicus. Adam recently sent me 5-6 sites that he thinks we could do a better job of crawling, for example. So I wanted to give people an update of how things looked right now, but we’ll keep looking for ways to improve crawling and indexing and ranking.
Hi All
Anybody wish to say hello to our new friend Adam_Lasnik of Google Search Quality team
>Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled.
So is the conclusion that sites that are deemed “low quality” will also have “light crawling” correct?
Thanks for the feedback Matt, I really appreciate it. Made me feel thoroughly warm and fuzzy inside
. Seriously, it’s really great to have people at Google who directly talk to webmasters and demistify things that can seem a unusual to outsiders. Keep up the great work!
graywolf, it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic. Hope that makes sense. Light crawling can also mean “we just didn’t see many links to your domain” as well though.
Glad I could answer questions, Sina. It’s nice that I didn’t have any meetings this afternoon, so I could just hang and answer questions. Then I’ve got Danny in a half-hour or so. But that’s okay too. Maybe for some of the questions, I can just be like “Ah yes, Sina and I talked about this in paragraph 542. It helps us to crawl some more pages than we index so that we can see which pages might help us improve our crawl coverage in the future.”
Boy matt you have to have a vacation after all of these posts you are doing.
“improve crawling and indexing and ranking.”
I personally expect things to move more from an SEO standpoint to more of a QUALITY standpoint in that businesses and sites to compete more on the QUALITY level rather on the SEO level. I believe now (after what you mentioned) this is where you want us webmasters to compete (probably always have). This push for quality will make this a WIN WIN WIN game for all of us.
Yup, exactly, arubicus. There’s SEO and there’s QUALITY and there’s also finding the hook or angle that captivates a visitor and gets word-of-mouth or return visits. First I’d work on QUALITY. Then there’s factual SEO. Things like: are all of my pages reachable with a text browser from a root page without going through exotic stuff. Or having a site map on your site. After you’re site is crawlable, then I’d work on the HOOK that makes your site interesting/useful.
Yep, clear enough, and what I suspected, thanks.
Matt, I have to agree with Joe and Anthony in that spanking webmasters for reciprocal links is often unfair. And I don’t have an intelligent suggestion on how to spot reciprocal link breeding facilities vs. honest, natural reciprocal links…at least not anything that can’t be instantly and easily “gamed”.
My industry might be a good example to use to look at reciprocal linking, actually (it’s weddings & honeymoons). In this market, there are certainly a large number of blind link-exchangers out there, adding no value to the end user with their hydroponically engineered reciprocal link spaghetti. But on the other hand, a site like mine (honeymoon travel) might have pages that list a small number of recommended related businesses (e.g. half a dozen wedding coordinator companies in Hawaii…an online jeweler for rings…an association of wedding officiants…etc.). We list other wedding-related companies on our site with whom we’ve done business (and been happy with)…and naturally, many of them also recommend us on their sites. We each are happy to recommend other companies in our general industry whom we believe do a great job for our customers and yet don’t compete with us.
Now, without thinking algorithms, should this kind of link be very important in determining good sites to return to users?
And what should one think about two companies where one thinks the other is great and links to them….but the feeling ISN’T mutual?
So there’s my argument for SEs being VERY careful when it comes to designing algorithms to discredit or punish for reciprocal links. Yes, I realize that massive reciprocal linking campaigns are evil and manipulative, but there may be some baby parts being thrown out with this bathwater.
Matt:
I haven’t experience the pages dropping problem webmasters are attributing to big daddy, but I have seen some behavior I would like to understand.
Through the middle of April, our SERPs showed with our homepage and then the product page indented on the next item. It looked really great. Over the last month, the deep linked pages no longer show up for some high volume keywords, only the homepage.
I won’t list the keywords in a blog, but if you want to look into it, I would be glad to provide a list.. Alternatively, look at our sitemap page and you can see it for the 3rd, 4th, 6th and 7th term listed (terms 1 and 2 are our brand name)
Am I alone in seeing this or does it represent a trend?
Thanks
Matt,
Somethings been eating at me…
If link exchanges are frowned upon and buying links is a no no. How is a new site supposed to ever be able to succesfully enter a competitive space? It seems the only people who would be able to compete are very old sites (not neccesarily the best) and people who maintain a zillion domains for interlinking purposes. Google seems to be placing an unfair barrier to entry UNLESS spammy tactics are employed.
-jim
-Jim
Circling back to folks who just had comments approved. Joe Hayes, it’s not that reciprocal links are automatically bad. It’s more that many reciprocal links exist for the wrong reasons. Here’s an email that I just got:
I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.
Matt, everyone knows that Google has a Supplemental Index, but no one outside of Google knows exactly what it is and what its purpose is.
Even if you cannot give us the details, will you please share a working definition that SEOs can point to as the most reliable description?
Okay, I gotta go do a pass at email before meeting up with Danny. Talk to everyone later..
Michael Martinez, personally I’d think of it as a fallback way that we can return results for specific queries where we might not have as many results in the main index. Okay, now I really am going to go.
>>>>The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the ***inlinks*** or the outlinks of that site.
Nice. We can destroy our competition by making spammy sites and then linking to the competition!!! SWEET!!!!
Maybe Google should update ‘There’s almost nothing a competitor can do to harm your ranking or have your site removed from our index.’
at
http://www.google.com/support/webmasters/bin/answer.py?answer=34449&topic=8524
Now it’s easy to harm the competitions ranking!!!!
Thanks again for the feedback!
Great Update Matt!!!
It looks like I had put it together pretty well in my explanation of why people were disappearing from Google that can be found at http://www.ahfx.net/weblog/80 . I just needed to build on the devaluation of reciprocal links.
The only remaining question is whether it is the reciprocal link that is bad (we had already discussed that reciprocal links were losing value back in November.), or that the “unrelated” outgoing/incoming link that is bad. My bet is on the lack of quality of the inbound/outbound links. It seems the “tighter” the content, links, and tags are, the better the page does. Although, I agree also that reciprocal links should be devalued.
Matt,
I’ve seen mentioned that duplicate content can potentially hurt a site. On one of my sites I’ve had people write FAQs, etc, and am now wondering how much of what was written might not be original content. Can you, or anyone else, point me in a direction of being able to check for duplicate content, other than just pluggin sentances into Google. How divergent does content need to be to be considered original?
I’m sitting here watching Danny.
Matt,
Example, a website contains a “link exchange” button within their navigation. When you look closer, the websites forming the link exchange are real companies but the majority of links are unrelated, e.g. car-hire, wood art gifts, labels. Would I be correct in assuming that the non-related links carry no weight and that the domain is scoring only from the related “link exchanges”. Note: I say link exchanges and cringe as I’ve usually been against this however, having just read your latest note I feel encouraged to build a link exchange page and provide reciproical links to associated quality websites. Have I got the wrong end of the stick here? Thanks in advance for your time.
Clearly, I’m a bot.
Aaron Pratt, what is your a/s/l?
Matt Cutts, c/t/c?
I am a magic 8-ball. Type !future to read your future.
Okay, goofy stuff aside, this sort of a statement was long overdue. I can’t speak for anyone else, but I was ripping my hair out for the longest time watching people bitch, moan, and complain because their spamtastic sites weren’t getting indexed or that they were dropping. Tough **** for those people. Let ‘em build something worth visiting.
The only problem is that now the idiots will come up with some random and illogical explanation that “linking to other websites and forming alliances isn’t a bad thing, and Matt should be listening to me because I’ve created some 3-page keyword stuffed piece of crap and think I’m an expert.”
Anyone else wanna bet that SEW says something stupid in response?
I just have one very stupid question:
Doesn’t this also lead to the possibility of increased blogspam as far as people reading this comment going and creating BSLogs (TM) full of meaningless drivel about something loosely related to the topic at hand and/or cross-posting to other blogs related to topics (moreso the former concern)?
Personally, I’d rather not see blogs like yours and Aaron Pratt’s and Jaan Kanellis’ blog get dragged down into the mud because a few dumbasses ruin the concept.
Matt: ‘I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.’
The thing is, just writing great content isn’t enough. I’m not saying my content is the greatest ever in the whole world, but its pretty good. If people can’t find your site, along with all its great content, they will never link to it. I don’t know what the answer is, I can see how some reciprocol links are bad, and how buying links is a problem for SE, etc. But it is extremely difficult to get links to a site with just good content. Unless maybe you know lots of people who can give you links, etc. For shy people like myself its tough, I just don’t know enough people and because of the shyness I haven’t participated in any online communities like I should have – I’m working on that though. It seems that getting traffic from SE is kind of like a popularity contest – its like highschool all over again – I could be real nice and real smart, but too shy to be popular so my site is just ignored by SE.
Oh well, sorry to whine. I’m trying to write high quality blogs to attract links. (Doesn’t seem to be working too well yet though. )
That’s not what he said. He said the spammy IBLs would not help. He didn’t say they’d hurt. They basically have no effect at all.
The worst thing you’ll do is give that person no increase in traffic. The best thing you’ll do is give them a bunch of direct traffic from your spamlinks.
“They basically have no effect at all.”
The only thing I see happening is when your site used to rely on the effects of such links in the SERPS and now since the effects are gone you may see decreased rankings and spiderings (even fewer indexed pages) and lower PR.
Matt,
That was your best post so far on this site!
The reason I liked it so much was that you gave many examples.
Please keep the examples coming. That’s where we learn the most!
Dave
Matt. What you’ve described really sucks, and not only from a webmaster’s point of view, but also from a Google user’s point of view. I know that you are the spam man, so it’s not your fault, but the whole thing is just plain crazy.
What you described means that a website with quite a lot of good, useful pages, won’t be fully indexed unless the site has enough IBLs, and not just any IBLs – certain types mustn’t dominate. What kind of search engine is that? FWIW, I don’t mind the death of reciprocals (I’ve never got invloved in it anyway), but it’s crazy for a search engine to require a certain number of IBLs for a site with a lot of pages to be fully indexed.
For one thing, as a user I want a search engine to show me all the relevant pages that it knows about, and I don’t want good pages left out just because the sites they belong to didn’t have enough IBLs. I want good service from a search engine, and depriving me of good relevant pages is a very bad service.
For another thing, as a webmaster, if my pages are good, index them, dammit. What on earth do IBLs have to do with it? Doesn’t Google want to show good pages to its users? If you don’t want to rank them very highly, don’t rank them very highly, but there is no reason in the world to leave them out of the index, and deprive Google’s users of the possibility of seeing them. It’s just crazy, and makes no sense at all.
No, I’m not talking about the site I mentioned earlier in the thread. Forget that site – there’s nothing wrong with it, but let it go out of the index. I’m talking about Google users who are being *intentionally* deprived by Google, and the owners of perfectly good websites who are being shafted because their sites just don’t happen to have enough IBLs to satisfy Google.
The other nonsense is the outbound links that you mentioned. What the hell has it got to do with a search engine what links a website owner puts on his/her pages? If people want to put affiliate links on a page it’s entirely their own business. And if they want to link to off-topic sites it’s entirely their own business. And if they want to sell real estate on their sites, it’s entirely their own business. It has nothing whatsoever to do with search engines, so why are they penalised by not indexing all of their pages? Why are Google’s users *intentionally* deprived of good and useful information, just because a site’s pages contain things that are nothing to do with search engine’s?
From what you described in your post, Google has consigned many perfectly good sites to the scrap heap, just because they didn’t have enough IBLs, or because the sites had some perfectly valid links in them. And they’ve intentionally deprived their users of a lot of perfectly good results for the same stupid reasons.
Yeah right. Just what Google has always said – concentrate on making a great for visitors. And if the site doesn’t have enough IBLs to satisfy Google??? What a load of ….
Frankly, the whole thing stinks, and it stinks big time! I’m just not going to run around getting unnatural links to satisfy a bloody search engine, as you suggested to a couple of your examples. Why should anyone need to do that? My attitude to it is “stuff it”, and stuff Google!
Great post from PhilC, I agree with his statement that IBL should not determine if a sites pages are indexed, Google should not be guilty of selective indexing of the web as Microsoft calls it.
To be a world class search engine you have to index pages to serve relevant results, Microsoft is indexing pages on the web much better than Google and so is Yahoo at this point in time, thus their results are much better and more relevant than Google SERPs.
PhilC said it perfectly.
And what really sucks is this is KILLING small businesses that just want clients to be able to find information on them. What do they know about inbound linking or reciprocal linking? They just want to be found for [product anytown, usa]
I have a one off italian pizza place that just wants people searching for catering to be able to possibly find him the area. He’s in Google Local, but some people don’t even look at that, or depending on the query it doesn’t come up. He links with all his other local buddies: a clown, a hotel for catering, an iron worker they did his little cafe fence. Now this seems to be discouraged. They just want to share business, not join this big link scheme.
If i type in my small town name on Google now, the top 20 hits are all gigantic spam sites, that contain the equivalent of a Wikipedia article.
and what is wrong with affiliate links? how else do some sites make money?
Thank you for addressing my concerns directly Matt. I do appreciate it.
I must say that I’m really disappointed.
I’m really disappointed that related sites with good and logical reasons to exchange can no longer exchange links without harming themselves.
I’m really disappointed that if an authority site links to me, I cannot link back to the authoritative information they provide without damaging the crawling of my site and theirs.
This is not a matter of “not counting” something. This is a matter of blindly punishing sites, and most importantly, searchers.
No, Google has not not moved forward. They’ve taken several steps back.
Dave
So, how does this relate to the inented index page event that people have been seeing. It’s not hosting crowding
Example: Search for “MY company Name” would normally brining up the listing index page from Google directory. Now, it brings up another page from the site with index page indented under it.
Penalty, fluke, ??
“They just want to share business, not join this big link scheme.”
The way I see it is that there is NOTHING wrong with trading links. Just don’t expect higher rankings and faster indexing because of them. If you rely on recip. links and junk scraper/directory links and have not much for any other quality links you may see some adverse effects because those links are not counting for much anymore. Go out and promote sure but be smart on who who cross promote with just do expect your ranking to go up because of it.
EDIT: Go out and promote sure but be smart on who who cross promote with just do expect your ranking to go up because of it.
should read
Go out and promote sure but be smart on who who cross promote with just don’t expect your ranking to go up because of it.
thanks for clearing everything up matt.
enjoy your new man-boobs on your plastic surgery vacation.
love,
tmoney
Matt,
First, I appreciate you maintaining this blog and responding to some of the comments.
I realize you can’t analyze every site, but from what I’ve seen at Webmaster World, the sites you have picked are not very representative of the sites which are having problems with the supplemental index and not being crawled. The sites you have picked are obvious offenders, but sites such as my own and many others have none of these issues. To us, it seems that building a site to the best of one’s ability isn’t good enough; unless you can play the Google game, you’re out of luck. For instance, the inbound link issue. There are only a couple active fansites related to mine (most are no longer updated, and my site is only a few months old). Therefore, I am stuck with a couple inbound links unless I try to contrive inbound links, which I have no desire to do. Of course, the related sites also naturally link back to me – I’m related to them too, after all! Now that’s bad? It’s quite a Catch 22.
I think one should hesitate to imply that all the websites with supplemental problems “deserve it” because they’re all doing something so terribly wrong that they no longer are recognized by the index. There are many sites which do not fit into this penalty schema that have lost pages – too many to blow off as abberations in an otherwise successful change.
I care because my site, the last time I checked, had seven pages out of over 600 that are non-supplemental, and it is jumping wildly in the Google rankings daily for main keywords, varying from 35-75 any given day. Meanwhile, it varies between #6 and #8 on other search engines.
But frankly I am more concerned with the fact that so many pages with good content are being ignored. If I were #105 for my keywords but could look at site:[my site] and see that my pages are indexed, I would be OK with that. At least they’re there, and people who are looking for content unique to my site can find it. However, now, according to Google, only 7 pages on my site are searchable for the average Google user – only seven pages of my site exist in Googleland. I can put exact phrases from supplementally indexed pages in the search engine and get no results returned. With almost nothing indexed, I feel like all my honest efforts are worthless to Google for some mysterious reason.
Yes, it’s your search engine and you may do what you like. However, I’m sure you understand that a search engine that throws out good content is not doing its job. Hopefully, you will not shrug off the numerous legitmate concerns because you were able to find in the vast array of e-mails you received some egregious offenders.
Matt,
Thanks for confirming my theory. I – and a few others – have been saying all along that the Dropped Pages bug is being caused by a faulty or out-of-date backlink index.
You just confirmed it. Do you honestly think that all of the people making a noise at the moment are naughty people with some irrlelevent outbound links, or “not enough inbound links”? Isn’t it far more likely that Google just arent’t finding or indexing the backlinks properly since Big Daddy?
Are you looking on Yahoo or MSN for backlinks before you go generalising about sites not having enough? Because that’s Big Daddy’s problem: many, many, high quality backlinks are just not registering as backlinks anymore. It’s a bug. You must have a very low opinion of an awful lot of people to just dismiss us all as whining idiots who didn’t know you need a few backlinks. Take a look at Yahoo’s backlinks for the effected sites before you condemn them all to the garbage.
How long is it going to take you guys to notice your backlink bug? It probably doesn’t help that you keep deleting any comments that mention it.
I would recommend as one other poster that if Google wants to get a handle on reciprocal link farms to look at real estate sites. I have pointed out before and I have been guilty of this myself but there are huge link farms operating with high Google rankings that are nothing but link farms.
Multiple site creations on the same subject, directory creations, scrapper sites all that are created to increase the manipulation of Google and to benefit the present link farm group even further in Google.
A Good example of this was some research that I performed last week on our # 1 competitor in Google. Out of 1000 links, this site had 40% of them coming from 5 IPs. Yet Google has rewarded this type of linking scheme with top rankings.
Based on my own personal experience Google has rewarded reciprocal link farms and continues to do so. Based on these subject sites if a link farm is created and is themed, Bigdaddy is rewarding these unnatural link schemes.
You have groups and some Seo companies that are able to point 1000s of links at their clients sites or create a closed off network of themed reciprocal link exchanges that are not natural according to Googles definition. Myself and others as I am sure you understand Matt that these systems are only meant to manipulate Googles serps.
On the flip side of this coin is the fact that new sites who are trying to compete with these sites must follow the example set by Googles reward of high rankings of these practices. As long as Google rewards even a few sites with these type practices new sites that may offer more to the online user will forever face an uphill battle for business in Google.
so, no affiliate links? or how many is ok? cause you know, why not just kill the affiliate business model all together.
let’s have a look at some examples: amazon.com – currently nothing but a site promoting other site’s merchandise but have own transaction processing capability and sell some books whathaveyou on the side (177 million pages indexed by google). any site providing syndicated news? nothing but a “duplicate content” aggregator. every coupon site on the web (type in “coupons” in google, all those sites are there) is nothing original but a bunch of affiliate links (mostly cloaked). are you gonna not index any of those? i say let the users decide which ones they like most. bookmarking rate maybe? i don’t know. things like that. backlinks? well if you delisted all the sites that originally linked to some site, there will be no backlinks left i guess. you know all the small sites that decided to give each other a boost.
Great post Phil C. It’s nice to see somebody who is pro business. Google wants to corner the market on search but has stifled small business’s ability to make money. BD seems to favor only their “fat cat friends”.
Google: Our goal is to index the entire world’s information but
alas we’ve found it more lucrative to censore.
I have a question about sites missing from the index, and I wasn’t sure where else to get a reply, so I hope you don’t mind me asking here.
Last fall I had five sites completely banned from Google for having “outgoing links to pharmacy sites”. I removed all outgoing links from all the sites, and filed reinclusion requests. One site, a PR 7, was immediately back in the index and continues to show up on page one of the search results. The other four sites have never reappeared at all, despite the fact I made the same modifications to them.
The Google reinclusion people wrote to me in March about my missing four websites, saying, “Please be assured that your site is not currently
banned or penalized by Google.” When I wrote back and asked why my sites were missing completely (grey bar, and the domain not in the index at all), I was told the matter would be investigated by the engineers. That was three months ago, and my sites are still invisible. They’ve been gone from Google for 8+ months now, after being in the index previously for over two years.
Have my sites been “sandboxed” or something, prior to reinclusion? They were only a PR 5 or 6, so did the PR 7 site get some sort of priority? I really would like my sites back in your index, and I’m at a loss as to how to achieve that when your own engineering team claims my sites aren’t banned at all.
Matt, it seems that google picking on reciprocal links just makes it more attractive to buy expired domains.
you always avoid talking about this type of webspam, yet its doing more to upset the balance of good serps tahn any other type of spam.
You also mention that blogs are a great way to develop one way links.
That also plays into the spammers hands.Expired blogs still work a treat and that profile I gave you many weeks ago is still live and active. http://www.blogger.com/profile/17839170
So much for your inside man at blogger taking care of it.
Matt, thank you for the update While I appreciate the information it does little to change my philosophy that it is almost impossible for small site (25 – 100 pages) playing by the rules in a competitive market to rank in Google.
It is sad to come to the realization that the only sites that Google feels provide any value to the web are the large multi-nationals or sites with 10k+ pages and thousands of incoming links. How relevant will Googles results be if webmasters abandon efforts to rank in your index and focus their efforts on the other engines?
So Matt,
Are you partly responsible for this debbacle then? Even if you didn’t have a backlink bug (which clearly you do), your logic is fatally flawed. The innevitable end result of requiring more and more inbound links before you will even dane to index a site is Spam. Spammers do this stuff full-time. They spend no time on content, and no time on value-added functionality.
The more ludcirous hoops you make sites jump through to qualify for the index, the more you pave the way for Huge Companies or Spammers. The in-betweens get sidelined.
Incidentaly, why does a site need a gazillian artifically bartered inbound links before it is worthy? No one at Google seriously believes that inbound links are still a measure of relevance do they? Have you read your own posts? They talk none stop about how to go about aquiring the right kind of links.
You’ve all lost the plot. You’ll delete this message without even bothering to pause and consider whether or not I’m right.
Ah, finally. Maybe now we can finally kill off the link exchange program cottage industry. A few particular countries are not going to be happy about this!
Hey Matt, when is Google going to implement the long awaited SERPs Randomizer? I mean, we’ve talked about it in the past and it would be great to see those first 30 SERPs rotating randomly. Do that and watch the life expectancy of a search engine marketer drop by a few years.
Matt,
I know google is not giving us webmasters a full picture with the link command. I did the link command on yahoo and msn and I noticed some scraper sites copied my content and added some links to a few of my websites. I have a feeling google is looking at these links as questionable. I am in the process of emailing these scraper sites webmasters and getting the links removed because I did not request to put them there and they violated copywrite by taking our content.
Since google crawls better than msn and yahoo, will there be a way in the future for us webmasters to see these links? Honestly right now if a competitor wants to silently tank a websites rankings in google all they need to do is drop a bunch of bad links. Without google giving us webmasters the ability to see the links we may never even know this could happen.
Hi Matt. I appreciate what you have explained here. I suffered through supplemental pages earlier than many others, and at this time I am happy to report that nearly all of my pages have returned when doing a “site:” type search.
Unfortunately my Google traffic has not recovered yet. At one point it dropped down to about 2% and has recently risen to around 5%. This is not good as it used to run closer to 75-80%. Have surfers changed search engines? I don’t think so as the total numbers from other engines hasn’t varied a whole lot.
Earlier I did a search for a page on my site and it was found on the 4th page. That’s fine for that page, but the sites that came up ahead of it were not even related to the subject and only mentioned in passing the words that I had searched for. I expected to see well known sites in the very same niche appear in that search, however none did. It looked like crap was floating to the surface instead. It looked like relatively had disappeared out the window and that cannot be good for Google’s business.
Great post Matt, thanks for sharing all the insight. Congrats on getting more help recently, I hope that this frees you up to make more posts like this.
Hello, Im really new to dealing with google and I really appreciate finding some feedback from you guys, great!
I have new site that has about 3750 pages. The total indexed pages are constantly hopping from 30 to 340. It would be great if I could get them all indexed. lol
But I’m completely lost as to what I am supposed to do to get all my pages indexed? I really dont want to be going around the net trying to get links to my site and we are being told its better we create good content instead. But hang on how will my great content get indexed if I have no links? As your also saying we need links to get indexed, but not any links they must be “good” links. Im lost again! lol What I mean is that for someone with little experience reading that they need links its really hard to judge what are good links and be able to find places to get good links. This again seams to mean that established sites with big SEO budgets are always going to be ahead regardless of there content.
I think PhilC made a really good point above too. I have some unusual specialist information on my site that isn’t indexed. There are currently no results for related search terms for this information. Now where is the benefit for people that these pages arent indexed as there is not enough links pointing to them?
What if you have one large site with 60k inbound links that has a page of information about a subject and it’s the only page returned for a search term. Then you have a small site with no links that hasn’t been indexed but has a similar page that’s a 100 times better content wise. Why not index it and show it second in the results? Surely that’s better for everyone?
Lastly, if the dropping of people index is because of site trust issues then why is my own new sites index going up and down like a yo-yo? Newly indexed pages then hardly any pages and then newly indexed again. Is it having trouble making up its mind if my site is trusted or not?
Hi, Matt!
I was wondering if you guys changed something to the algo in the last days…
A few hours back, my site dropped from 3 pos to nothing, although it’s a good site. The sitemap acct doesn’t show any spam warning, but google started to delist the pages…
Can you have a look? I’m a total mess now…
Thank you,
Chris
Are you kidding Chris?
Did you read Matt’s post? Your site is a piece of junk not worthy of Google’s index. It’s true. Matt has personally checked. And every site that has been de-indexed that he has looked at has not had enough inbound links or else has had outbound links that are just completely off the wall. Imagine a real estate site having the gaul to link to some other kind of site. What a joke. You’d better get busy and go after links. It’s links links links from now on. It’s official Matt says so. You are junk if you don’t have links. Google love blogs you know. You shouldn’t really be allowed to have a website nowadays unless you are willing to link yourself silly on your own blog. It’s the future you know. And it’s great. Matt says so.
Yeah, for a porn site that is some great spam work indeed man!
Google is taking porn sites out of the index if you have been reading the news there are lawsuits flying around about them being in the index!
Oh, my, goodness! It just so happens that at about the same time my remaining indexed pages disappeared I had just added a reciprocal link to my site!!! Ugg!
Soo… now that I’ve removed all links from my minute template based website and added a no follow command to the three remaining links, should I expect to see a change in indexed pages on the next crawl? Or am I banned for a year or something?
By the way thanks for the update I’ve been stalking your blog for over a month waiting for something like this post.
Heh… and I’ve only ever had two internet customers… (but they were recently which is why I was inspired to get my site indexed
)
Yeah the funny thing about that ranking is that my site is real estate, not porn. It only shows a flaw in Googles algo and ranking system. I kind of liked Midwestnets comment on DP
Fisting lessons with your new house, anyone?
The page ranked for that term is a property detail page of a listing in Las Vegas. First I thought ok maybe this page was hijacked but it hasnt been, then I thought ok did someone get access to the site to change title tags and meta descriptions, wasnt that.
Checking that page I found no backlinks to it with that anchor text, so this only leads me to believe that somehow someone at Google turn over a cup of coffee on their computer
and caused all this mess..LOL
Zoe C,
Shame on you. You added a reciprocal link! Why? It’s a simple fact that natural links just materialise out of thin air if you are any good. How? Because people find you, think you’re great and link to you. How do they find you? Why, on a search engine of course…ummm…wait a minute…Oh my god. The system is flawed! Heh Google. You’re a bunch of idiots.
I guarantee, history will not look kindly on this particular period in Google’s history.
Hi Matt,
I wanted to ask a couple of questions. In the next days I am going to launch a new site that will be offering a certain service to bloggers and webmasters. Basically it will offer a script for free. I am going to ask the people using it at their blogs and websites to link back to my site, that can attract all kinds of backlinks because the script can be used at any kind of site. If some sites from bad neighborhoods according to google use this script and link back to me will this penalize my site?
The other thing that I would like to ask is: on my blog I have a niche affiliate store related to my blog’s theme as way of monetizing it. Will this lower the overall trustrank of my domain? for example can this cause a decrease of the rate my blog is being crawled or cause my site to loose it’s current rankings for certain keywords?
If that’s the case I think it would be very unfair, it would be like msn penalising sites that have adsense code on them.
Thank you,
Dimitris
Matt, thanks for your great post.
One question relating to sites that send traffic in exchange for linkbacks. Say 20,000 sites link to a page, and in tern that page sends traffic to each of those sites. Here’s the twist: that page rotates links in and out periodically, so that on any given day, it only displays 200 links. I consider the 20,000 incoming links as manufactured links, but technically, 16,000 of those links are not reciprocal. Will Google be dealing with this type of linking scheme anytime in the future?
“What do you think of that? Hmm? I said ‘What do you think of that?’ Don’t answer. You don’t have to answer everything.”
Hi, Matt !
This is a very valuable post indeed ! It has given good insight over the quality parameters which Google considers when indexing the pages.
A better web can be made by openly sharing the problem & comments.I feel that there is need for something/ some forum where volunteers /enthusiastic can contribute to share their real time expereince about black hat seo /non ethical SEO practices followed by many sites in an annoymous way.This will help to improve the Google filters continuously and a better web can be made.
Thanks & Regards,
Ajay
Hi Matt,
Thanks for the post! I pretty much expeted everything you have said. After all Google is going to keep trying to improve itself so after all in the long run only the quality sites are going to last. Any thing that tryies to game the SE with backlinks or whatever will eventually get kicked out!
Anyways,on my site I have a link to my “web stat counter” at the bottom. Will that be concidered as a bad link at the bottom to have?
I have other bad links too…but i want to know specifically about the web stat counter link? Is it a bad link to have?
Thanks
I have a small, noncommercial, ad-free site (with good-quality content). You could say I’m not so much a webmaster as just some guy with a website. There are a lot of people like me, who seem to be being left behind by the new Google with its infatuation with giant business enterprise.
From my perspective, both yahoo and msn do a far better job than G at returning results from my site when they are pertinent to specific search queries. At some point -- early February as I recall -- I noticed that traffic to my website had virtually stopped. I then found I had dropped out of the Google index. After a little research I decided I was being penalized for duplicate content (which probably occurred when I moved the site to a new domain). I filed a reinclusion request and at least got my site indexed, although at its previous host -- defunct for almost a year -- it was still showing better results than the same site its current location last time I checked.
Right now I feel I’m doing about all I can, which is to improve and expand my content and hope someone notices. Maybe Google will some day start to return better results from my site so that traffic will pick up again, but it’s kind of out of my hands.
All of which is a long preamble to a comment about how organizations fail. I’ve looked at this issue a little, and typically there is some fatal flaw that seems insignificant at first but gradually become magnified and turns out to be their undoing. (I suppose that insight was the genius of Greek tragedy.) Anyway, it’s looking to me like Google’s fatal flaw is paranoia. By obsessing about people scamming its SERPs, it has started dropping valuable content. It expends too much of its energy in a kind of perpetual chess game with black hats, who are simply playing whatever system G devises, and so it has turned into a mirror of their tactics. The esclating back and forth is like an endless succession of reflections in funhouse mirrors. Meanwhile, its competitors, perhaps just by doing nothing, are now returning more useful results.
Or maybe I’m wrong, and the ship will correct its course. I hope so.
I am with PhilC. This whole thing is ridiculous now.
After 4 years online, my large content based adult website dropped like a rock today in Google. I lost maybe 80% of traffic in a single day, and after talking with my competition it seems they’re all doing fine. Not sure what to make of this so far, we haven’t made any major changes lately, or have any duplicate content.
We have setup XML link trading in the past few months to help our customers find similar articles about the sites we list, adding many top quality IBL’s. Hundreds of exactly-relevant links from a site ranked 575 in Alexa for example.
We have never used any spammy techniques (to our knowledge!) or anything black hat. I’d say we’ve followed all rules to a T since 2002. We have never wanted to risk our good relationship with Google. What can I do? It hurts to see cloaked sites and sites with no content out-ranking our high PR, old and established pages, with relevant, useful content.
It seems to me that google is having big problems with .biz, .info and .us sites lately, too. My 2 cents.
* clap clap clap clap clap *
That was brilliant, Phil. You’ve managed to come up with the most emotionally compelling arguments on this blog any of us will ever see. Like many of the things you have written in the past, it is truly a work of art. It’s passionate, it’s inspired, it’s emotionally charged, I laughed, I cried, I felt stirrings from the very cockles of my heart…no wait, that was a gas bubble. Sorry about that. My bad. Really.
If they were even remotely sensible, then they would have been great arguments. The problem is that you’re making the same fundamental mistake that most others make when they try to convince others (especially guys who have stroke, such as Matt): they don’t argue from any point of view other than their own. We’re all guilty of that, though. You do it, I do it, Aaron Pratt does it, Wayne does it, we all do. We can’t help that. It’s human nature.
(Side note: for those I mentioned here, it wasn’t an attempt to single anyone out. I was merely mentioning names as examples. So please don’t take it personally; I’m not trying to attack or insult anyone).
But the whole point of what Matt was trying to say here is something I think most dedicated SEO-types tend to miss, and that’s “worry about the site first as far as a resource for people goes, and THEN start SEO after.”
When webmasters start linking to ringtones or MP3s or Viagra or pet urine control from unrelated sites, that doesn’t do a thing to help the end user. It either sends the user on a wild goose chase or turns the user off.
When webmasters start receiving those links, they’re getting trash traffic at best. I’d rather have 10 visitors from a relevant search query than 10,000 from some trash-traffic link farm scheme (assuming it was even that good).
First off, who are you, I, or any of the rest of us to judge whether our own sites are good enough to be listed and indexed? All Google is doing by using quality IBLs as a sign of quality is extending the concept of human referral and word-of-mouth. If it’s good enough for a human to link to it organically, it’s good enough for Google to list it. How else are they supposed to figure out what to rank and what not to rank? People would complain if the Toolbar were used; on-the-page content can be manipulated very easily; and any other form of monitoring would be met with some heavy-duty scrutiny at best.
Where are these pages that are so perfect that Google is doing a disservice to the web and that don’t have hyperlinks to them from any other web destinations, anyway?
Second, if Google is going to list pages so that users can find them, they’re going to need to list pages in such a way as to provide users with easy access to them. In 99.999% of cases, the SEO-types call this “ranking highly in search engines”. So you want to be listed somewhere in Google SERPs for your content so that users can easily find it, yet you’re okay with it not ranking highly. Does anyone else see where a guy like Matt might have a bit of a problem with that?
As far as OBLs go, this is an area where webmasters should take some responsibility and show some moral judgement (and, to be fair, most webmasters are pretty good that way.) We have a certain moral obligation to those who may visit our sites to guide them via the hyperlink structure in a manner that will give our users the best possible experience. How does irrelevant OBL linking do that? How does providing a link to otherwise useless content help the user?
For those of you still not convinced that building a good website, putting up content and drawing visitors the natural way works, there is at least one website that has done a terrific job of doing exactly that.
The owner doesn’t obsess constantly about where his site’s positioned in any engine.
The owner has never linked to a spammy site without using the nofollow attribute, and has ensured that the spammy link was relevant to the site’s theme on the rare occasions that he has done so.
The owner has never bothered to participate in link schemes, exchange reciprocal links, or do any of that stuff.
The owner has quietly built up his content, and in the process has attracted a large, loyal and active userbase, which if I’m not mistaken is what we’re all supposed to be doing when we build websites.
It’s not a perfect site…none are (including my own). It could be improved, and I’m sure the owner would say the same thing. But at least the owner is focusing his/her efforts on his/her site.
And each and every one of you reading this has visited the owner’s website. In fact, you’re on it…right now.
Just something to think about the next time someone offers you a wonderful reciprocal Viagra link, or maybe buying some text links from a broker.
Matt et al,
Thank you for this insight. You’ve put the pieces of the puzzle together, and I appreciate it.
What I am getting from this is that links to a site that were previously considered a positive vote are no longer considered that, so some of your pages that were in the index because of that vote may now dissappear. Now of course if those pages had links on them, the sites that received those links may now dissappear. Thus the gradual deindexing of sites.
This effort has been put in place so that un-natural linking schemes such as link farms, directories, and paid listings.
Now since people like me cannot afford a superbowl ad to get the name out, and I’m not a seasoned SEO with 2000 sites under my control to “naturally” gain links my pages will go unknown to googles users. Unless of course they get fed up with the same old sites at the top of the SERPS and go to the other search engines that cache fresh sites.
All of this effort appears to made to discourage un-natural links, however I believe it will only increase them. Why you ask? Because I know my site was killed due to the new filters, perhaps I didn’t have enough “quality links”. However if I search for some very specific terms, 3 of the top 10 results are simply made for adsense mini-directory sites that have 10 links on them, some scraped content, etc. If I check their links it is trully a bunch of junk. So my only conclusion I can make from this is that the junk links still work, it just take a whole lot more than before. Until the day when all of those sites are gone that can be the only conclusion.
Now to address the paradox. To get indexed you need natural links, to have natural links you need webmasters to view your site, to get them to see your site you need to be in the index…yada yada. BUT the webmasters have just been told not to link to lower ranking sites and if you do use the NO FOLLOW tag. Why not simply show these low linked pages on page 800 of the SERPS and track if they are found. In my line of work (engineering) I frequently search very deep into the serps to get to sites written by real people in the field and not the corporate presence that rule the industry and the first 100 pages. In other words as a page is found included it, the more action it receive the more it moves up. This of course could be tracked with activities such as watching the back button (a vote for not finding what you wanted) etc.
Just my 2 cents. And I’m off to spend the night find a few thousand sites that want to link to me to get my pages listed again.
~John
PS If you don’t delete this I added my URI this time as yahoo doesn’t seem to care about the NO FOLLOW thingy.
I realise that “adult” results arent exactly your “forte” but how can you explain google’s seemingly deliberate action of making adult search terms give irrelevant results..
This practise has a string of problems with it..
Heres the main problem i see. do a search for an adult term like “porn clips”
the top 10 results are somehwat relevant, but the other 90 are filled with domains like this
http://www.thechurchillscholarships.com/analporn.htm
http://www.lewisandclarkeducationcenter.com/farmsex.htm
http://www.nyotter.org/porn.html
http://www.argenbiosoft.com/amateurporn.htm
http://www.universityplazahotel.com/porn.html
http://www.plannedparenthoodcouncil.org/amature.htm
notice a trend ? all are recently expired , non-adult listings.. the main pages are usually direct copies of the previous site pulled from the web archive , then each domain is filled with easily identifiable doorway.cloaked pages.
Not only does this give a bad impression of the adult industry in general , but the other problem is most of these sites contain trojans / virii /childporn/beastiality that then alter surfers browsers.
The only reason i could see google allowing this practise is they make more from google adwords this way..and they realise adult webmasters dont have a voice as loud as mainstream ( even though it is a huge p[art of google’s revenue )
I also notice a trend of google adsense sites jumping up in popularity when the sites dont even have relevent content, just ads for google adwords/adsense
Now i notice my “adult” website that is very relevant and is several years old and established with several hundred relevant backlinks is close to #300 position , while the vast majority of sites above me are either frshly created/expired domains with no content or guestbook/forum spam on mainstream sites that the owners cant fix without losing their entire website.
Why the foolishness ?
p.s. its really irritating when you write out a big long post and the “security code is invalid ” so it makes you hit back button and all your post is erased .. grr
Matt,
Your article reminded me of my startup days.
Having written a very deatils business plan (about a 100 pages long), I was told that although it was very heavy to hold, what most venture capital analysts I met with would read is the executive summary on the first page and that I should focus on it. Funny how an indexing article can get mo to remember those days
Matt I think you guys still have a lot of work to do. I know one of the real estate sites sites you mentioned in your main post and it’s got all it’s pages back but they bought ALL their links, and their content is dire! Then a site like mine, mainly natural links with good relevant content gets stuffed. Seems there’s still a long way to go…
Hi Matt
Thanx for the post. In the last 48 hours I’ve seen allot of change in the SERPS. The question still stands though, how the hell do one promote a new site if we’re not allowed to trade links with similar sites? Ok, so it’s not ‘not allowed’ but won’t help ranking. I assume it will however help with indexing, so all in all not a bad thing?
What I don’t understand is being penalised for linking to unrelated sites. For instance I’m really proud of the city I live in, so I run a blog about it. I also link to many city related sites, but they are all in various different niches, yet still in the city, so it’s kinda tourism related. Is that a bad thing? After all I am giving the user useful info about where to find what in the city.
Actually it doesn’t really matter where that site ranks, allthough I’m trying to get a better understanding of how things work…
I agree with Justin, google has to re-think its strategy about link-valuation.
Matt, my site is the best out there on my chosen topic. Despite this, there are many sites above mine in the SERPs for my chosen targeted phrase. Please fix this so the whole World can see my site at #1 when searching. Until you do, the Google SERPs are crap!
Oh, can I also have some more PageRank. My paid links just dont seem to work like they use to.
I really think a lot of you need to understand that the days of gaming the SE’s with links is coming to an end, links have nothing to do with Google’s problems with indexing the web, they could index pages if they had the storage space and dump the pages with bad links to the bottom of the SERP’s, the problem is the lack of indexed pages at the moment!
http://blogs.zdnet.com/web2explorer/?p=173
The above link was left on one of our forums and is common knowledge!
“““““““““““““““““““““““““`
Hi Matt!
Great post, and great answers.
My question is if BigDaddy, and the effects thereby are equally significant in alla languages?
I am seeing alot of link exchange, linkspam and other desceptive teqniques earning top positions in the index for certain non-english languages.
PhilC, we try very hard to find ways to rank mom/pop sites well. As I mentioned Bigdaddy is more comprehensive (by far, in my opinion) than the previous crawl/index. A site that is crawled less because their reciprocal links are counted for less is a different type of situation than many mom/pop sites, for example.
Halfdeck, I’m happy if it helped clear things up.
John, that’s your choice if you decide to chase thousands of links in one night. I just don’t think that’s the best way. BTW, just because Yahoo reports nofollow links in the Site Explorer, I wouldn’t assume that those links are counting for Yahoo!Rank (or whatever you want to call it
).
Justin, of the three real estate sites that I mentioned, two are unhappy because they’re not crawled as much as before.
Dave, nice one. I made it several sentences in before I got the (dry) humor.
And I gotta get some sleep now..
Hi Matt,
I am the owner of the health care directory domain you used as an example above. Thanks for having a look at the site, your comments are helpful and much appreciated.
I would like to clarify something on how and why I used the removal tool as I don’t think that was described properly.
I had pages that were indexed under both www and non-www in a directory such as:
http://www.domain.com/directory/
Those pages were mired in the supplemental index and indexed under both www and non-www. (At first, my server did not have a redirect from non-www to www but I have since put one in place. That is likely why they were indexed under both www and non-www.)
I removed those pages( /directory/ ) from my server and used the removal tool to let Google know they were gone. I re-built those pages at:
http://www.domain.com/new-directory/
I used the removal tool because I wanted to start fresh and didn’t want to get penalized for having the same pages under two directories. I did not use the removal tool hoping that just the non-www version pages were removed from the site. I used the removal tool to let Google know those pages were gone forever (six months in Google’s eyes).
Since the above maybe a little confusing, I am going to summarize one more time for clarity. I removed pages from my server (that were indexed under both www and non-www) and then used the removal tool to let Google know they were gone. I rebuilt those pages under a new directory to startover and hopefully get those pages indexed correctly.
I very much agree that the site could use some links. Thanks for your time and help.
NIce Day Matt,
hope you got some sleep.
Didi you recognise that thousands of small businesses are out of business now. Google was a search engine where small business could compete against big business. That days are over cause now the balance has changed up to the big business. That is a pity.
In a comment you said you/ the team are going to observe the spam reports more closely?! IMO spam reports don´t work. I find well ranking sites with more than 12,000 pages with javascript redirects!, wll ranking Dupliacte Content with 3 or more domains. Nothing happend to them. When does your fight against that begin ?
greets, Martin
Google has lost its edge: or more accurately, the crawl fringe. And as a result, it is officially broken in my view.
I have been using Google for about six years now, and Altavista before that. The key advantage Google had over Altavista in the early days was that its ordering was improved. In the very early days, Altavista had a lot more results than Google, but that changed fairly quickly. At any rate, Altavista always had the results, but you had to dig deep. In Google, the results were just ordered “right.” Indeed, if your search phrase was particularly detailed or otherwise unique, you could often click on “I’m feeling lucky,” a button unique to Google. And jump straight to the page you needed.
I have been following a change in Google’s results for the last four or five months which has gotten steadily worse: that is that Google is returning results which are not “dead on” anymore. That is, it seems to be using PageRank not as just an ordering tool, but a pruning tool as well. Now, it is true that PageRank has always been used in this way, but the aggression of this is now too extreme for my purposes.
Matt, you have effectively said in your entry and in the comments that PageRank is now being used to eliminate pages from the search results completely. I think this has what has broken Google, because PageRank is the foundation of the algorithm that used to make Google work.
A page which does not appear in the index has an effective PageRank of 0. Any pages linked only from this page have PageRank 0 also. In this way we find that this feeds a recursive loop- as a page disappears, it takes pages with it, these pages take pages, and so on. Yet these pages have keywords on them- they often have unique variations of them. Google used to be able to find these, even to “whack” them. Now it simply cannot. It has lost its power.
Now this wouldn’t be so bad- what point is a page without incoming links after all- except that this isn’t the only change Google has made. Google now has a manual switch which zeroes PageRank of sites it deems to be “unfairly gaming the system.” It also has a scheme which lowers the PageRank of pages in “bad neighbourhoods” or using known “black hat” SEO techniques- this is often dubbed TrustRank, but we have no indication from Google that it is separated from PageRank in the Google architecture. Additionally, it now appears that Google can detect duplication in results, which also seems to feed into PageRank in an unspecified way.
Matt, you have said before in an entry on canonicalization that everyone should 301 from site.domain to http://www.site.domain (one or the other,) but there are likely to be millions of websites which cannot or just won’t out of ignorance or laziness. Are these pages actually worth less than the others? Do they deserve to fall into PageRank 0 hell?
Surely you can see that what was already a nasty problem now has the potential to snowball. And this is what appears to be happening. The low ranking pages of the web, made by small people who don’t go out and get lots of links, have been caught in the SEO/Google crossfire. These small people had relevant pages for detailed search queries, not the so-called “competitive phrases” Google staff actively monitor. Now these phrases generate generic “authority” crud, really nasty black hat spam. or worse. The Googlewhack has become a “no results” and the “I’m feeling lucky” has been set to an instant trip to the Wikipedia world. Google is horribly broken as a result.
I fear, Matt, that if what you say is true, all my fellow techies can forget typing some bizarre error text into Google and hitting a three year old web discussion on some portal where someone else had the same problem. You’re just gonna hit the boring table of manufacturer error codes… or maybe nothing at all. It’ll be back to Altavista for me, I expect.
You didn’t sense my tone of sarcasm in my voice when I typed that!?! I didn’t spend the night chasing links, actually just wrote some articles on a subject I know something about, this interwebby stuff is to volitile for me right now. Someone will find it interesting, and natural links have to come..well…naturally. I think generating natural web traffic is like pushing a boulder over a mountain, it takes a long time on the way up, but on the way down you can’t keep up.
Matt let me see if I can summerize this correctly:
Big Daddy attacked crap backlinks and therefore if you have less backlinks you dont get deep indexed till your site earns it with quality links or site age.
Everyone who has either bought some links or traded for some links or sold unrelated links on their site will suffer. If not then the quality sites that do link to you lost some of their PR power becasue they lost backlinks and therefore you lost reputation points from them. It is hitting so hard now because of the chain reaction of the death of crap backlinks either effecting you or a site(s) somewhat connected to you.
My view on the affliate links is that if your site has nothing more then affiliate external links and product dup content then you are no more valuable then any other site with the same and therefore it goes back to backlinks and indexing. The only way for a site like that to rank is to out PR the other crap and then you are still in an up hill battle since your site does not provide anything more.
Simply put G knows what portion of your site is affliate crap and what is quality original content. IE quality .vs crap links/dup content ratios.
How am I doing?
Great Post Matt – you really do deserve your holiday now –
but
can we just clear up the reciprocal link question
is it OK to have RELEVANT reciprocal links – and could it even be beneficial.
My directory type site has many outgoing links to relevant sites and articles for which I’ve never requested reciprocal linking – but I was just about to run a campaign asking most of them to link back to my RELEVANT pages – would this be OK and not harm my position or ranking.
cheers
YES! That’s my exact question, summed up Weary.
I’m fairly sure that my massive drop yesterday was due to the inclusion of XML link trading with my competitors. My review on SiteXYZ links to their review on XYZ. I thought this was valuable, relevant information to my users, and valuable IBL’s for my site.
They don’t look so hot now! Oh gosh.
Anyways thanks Weary, looong day. I really hope this gets fixed, and I’d hold off on the relevant reciprocal linking!
The web will be transformed to take the shape of our current world.
Those who sell something has to be the Walmarts and the Amazons.
Of course it’s their merit that they are so big.
Just that the net was something that was equal for everybody and it’s now transforming so that the little ones don’t have a chance.
As a little one, I don’t have a chance not even with cpc now. I have to do tricks like Shoe to get something.
I think its all over folks.
Matt,
Thanks for you great post. However, something really concerns me – in the above example of outbound linking you state that the “Real Estate Site” has dubious linking, by linking to a “mortgage site” ??
Are you serious about this ? Or is it a mistake ? As this has very serious imlications.
Surely if I am looking to buy a house, then I am also extreamly likely to be looking for a mortgage, and that link is actually very relevant to the browser – I am actually hard pushed to think of anything that could be more relevant.
Could you please extend on this as if this is not a mistake, then along the following lines I would expect:
– Holiday sites will get penilised for linking to car hire sites
– Wedding sites will get penilised for linking to honeymoon sites
– Finance sites will get penilised for linking to credit card sites
In essence if the above holds true, a site will get penilised for linking to anything that is not exactly the same theme as the site it links from.
If you looked at numerous property sites I would guess you will find hundreds of adverts that have been paid for and hand picked by mortgage companies as they know that they are very likely to get the perfect customer from that site. It would seem that google is therefore going against the best human knowledge.
All I can see is, that if the above is not a mistake, then it is asking for the destruction of the web as everyone is so paranoid that they may be linking to a site that is not exactly the same as their own, that they pull all their links.
Hey Matt. Since you ignore my mails and poosts, I thought you might like a visit outside the blog before you head for the hills for your hols.
http://gooogle-search.blogspot.com/
How could it be possible that some important sites can put some dirty links on their footer, and not being dropped from the index ? eg : http://www.pixmania.com/fr/fr/home.html
And now : how could a directory be indexed, because it give some links to many differents sites ?
These kinds of posts are really good. It gives the lots of webmaster a feeling that google isn’t evil at all, just that they need to pollish their website.
Keep up the good work …
I hope in the future, and my opinion is, that bigdaddy is a step in the right direction. The future target is very easy for everyone to understand, give good sites the first places in the serps. There is only one thing that can make this happen, and all the bigdaddy stuff against link exchange … is the beginning. Sites that have good content becomes fee links, thats all. But its an enormous projekt to build a searchengine that wil give you really good serps. And all thes problems that google fighting against are self-inflicted by google. And of course of this i can understand people who are angry, causw they spend much time too be high ranked and now googles algo is changing.
)
I build my site with real good content and i hope the future will be good.
So kepp on try’n matt (and give nnew sites with good content a chance, and not so many filters
Hello Matt!
I read the whole post of you and I have very strange feelings about Big Daddy and your reciprocal links penalization.
Why? It’s simply. Let me build a dog breeder site. I want to get some surfers, so I’ll ask my friend to add link to my site on his one. He will also ask to add link on my site, so he will probably get some fresh surfers from me.
Then I’ll want to have my site on some dog directory, so I will ask them to put link to my site and they will ask for link to their site for sure.
But Big Daddy is telling me that if I want my site to be indexed and have good results, I have to not add links to other sites, but pleased for links to my one…
This is really stupid algorithm and a kid would create a better one.
This emphasis on IBLs is nuts.
Just for a favour (no money changes hands) I run a Chinese takeway site for a friend. He makes a good product and serves a quite specific geographical area.
Why in hell’s name should I have to run around getting “high quality” links to his site when that isn’t the way anyone would seek to access it?
And what is a “high quality” link to a Chinese restaurant? Local CHamber of Trade – as it happens, it has all the appearance of a spam site, with hundreds of links to unrelated businesses that happen to be members. Is a link from such a site “untrusted”?
Hi Matt
Dammm – I am late to this post – I hope you revisit it.
Matt, you say that crawl depth etc is largely based on PR.
PR at the moment though has been acting very strange – some sites that lost PR regained this in the last PR update – however, depth of crawl still looks like it maybe based on perhaps an older level. (EG PR5 site not getting crawled – was prev. PR0 – Due to ban, canonical, error – I dont know) – but it is still getting crawled like a PR0 – eg hardly at all.
Now – as you know some pages/sites didn’t have PR updated at the last change (about 4-5 weeks ago ?)
Soooooo – whats the score with PR at the moment – I would assume that an update will be coming soon that updates the PR of the sites which did not change at the last change over ?
These sites which regained PR after a long absence but no ranking changes – does this point to perhaps increased crawling in the future when PR is updated accross all sites/pages ?
PS. I did not get a reply from Boston email address thang – sniff
Maybe you do try very hard to do that, Matt, but it’s just not working. The new criteria for crawling and indexing that you explained in this thread is so bad that’s it’s hard to actually believe. To base whether or not a perfectly good site gets all of its pages in the index on how many links it has pointing to it (and the type of links), and what types of link it has on its pages is sheer lunacy. I asked before – doesn’t Google want to index good pages any more? Doesn’t Google want to give full choices to its users any more? Or is Google happy in the knowledge that there are so many pages in the index that there will always be some relevant pages for the user’s results, even if they deprive them of plenty of good ones?
Most people wouldn’t mind at all if Google identifies and drops certain types of links (reciprocals, etc.) that they don’t want to count for anything. If you don’t like certain links, cut off their juice – treat them as nofollow – drop the links from the index – but there is no sense or justification whatsoever in dropping a decent site’s pages from the index, and virtually killing it off because of them. It’s clear that Google can now programmatically recognise some of the links it doesn’t like, because you say that’s why some sites are being treated badly, so drop the links – remove the links from index – but don’t refuse to index the site’s pages because of them. It’s a sh..ty thing to do to sites, and it’s a sh..ty thing to do to your users – the very users that Google claims to think so highly of, but are now being short-changed.
Most people would support getting rid of spam links, but to treat sites that just don’t happen to have attracted enough natural links to them as second class and on the fringes, it plain stupid. Nobody would support that – especially Google’s users if they knew.
Google now wants us to go out an acquire unnatural links for our sites if we want them to be treated fairly. Whatever happened to, don’t do anything just because search engines exist? What an embarrassing about face! As I said in the previous post, I am not going to run around getting unnatural links just for Google. I’ve never gone in for it before, apart from submitting to a very few directories, and I’m not going to start now. You can stuff that stupid idea!
The site I mentioned earlier had a clean bill of health from you personally, and nothing has changed since then. 4 days ago it had 17,200 pages in the index, and on subsequent days it had 14,200, 11,700 and 9,350 yesterday. It started at an unrealistic ~60,000. I’m past caring about the site now. It’s a decent and useful resource, but who cares if your valued users ever see it or not? Google knows best about what their users want to see, so they are stopping showing them most of that site’s pages – right? They’ll love you for it! The site has only one reciprocal that is down in a very specific and relevant page in the site – and it’s staying there. The site has never had any link building done on it, and because of that, Google is dumping it and depriving their users of a useful resource. Nice one Google! If only your users knew how well you look after them.
That’s just an example of what a great many sites are *unfairly* suffering because of the sheer stupidity of the new crawling and indexing regime. Nobody gains by it – including Google’s users, who are being intentionally short changed. Actually, that not true. Those who gain are those who link-build. The filthy linking rich get richer, and ordinary sites are consigned to poverty. Is that what Google wants? You want the poor to turn to crime? That’s what you will drive them to. The whole bloody thing stinks!
Matt. My posts are not aimed at you – they are aimed at Google. I’m sorry if you take any of it personally – it’s not intended.
One last point…
All that this will achieve is that the link-poor will start unnatural link-building, and in ways that will deceive the current programming. Google will have caused it – not the site owners. This sledgehammer treatment of innocent sites, just because they haven’t naturally attracted enough IBLs for you, is madness.
At least now we know that the indexed pages filter is based on external linkage.
Thanks.
I agree it’s important to filter out low value sites (although it’s debatable what low-value means). Unfortunately, the same techniques used to promote such sites are the same as legitmate sites.
As a webmaster with a lmited budget trying to get a new site going, or dirve more traffic to an existing site, what are the options? No reciprocal, can’t buy links, can’t sell links, can’t compete with the big boys (Walmart, Target, Amazon, Overstock, Ebay…) in PPC …. what’s a webmaster to do?
Soon top Google results will be primarily big companies with big name recognition. Of course such sites gets thousands of back links. How could it not? But what about poor JoesSunglassStand? Sorry Joe, McDonalds is hiring. Or there’s PPC if you have the cash to go against the afore mentioned companies (not likely).
I don’t think you can determine a web site’s subjective value with an objective algorithm. And now the small webmaster’s site doesn’t even show in the results because he doesn’t have a few hundred natural backlinks, or he sold a link for $10 to a Ringtone or Credit Card site.
Despite all Google’s efforts, I can still easily find sites using black hat techniques (such as cloaking) that appear high in Google results. Here’s one I’ve reports a half dozen times:
term: comforter sets
linensource.com – Offers Down Comforter Setslinensource.com – Your one-stop source for all your down comforter set needs. The Linen Source offers a wide variety of down comforter sets.
http://www.linensource.com/down_comforter_set.asp
The asp page is a cloaked page that redirects the user to the main site.
I applaude Google’s efforts to bring order to chaos, but I can’t help but think that they are doing in a manner that is more and more exclusionary to the small website owner.
It seems to be a fact of societal evolution that democracies eventually ‘evolve’ into republics, where the power and wealth ultimately end up residing in the hands of an elite few, rather than being equitably spread through the population.
The irrefutable guiding principle of our undeniable monetized society is inescapable. When it comes to search engine position of ecommerce sites, it’s not about who is most deserving, it’s about who has the most money. Mega sites with name recognition and multi million dollar traditional media marketing budgets are taking over the serps, and it’s only going to get worse.
If you want to make a site about the mating habits of New England barn owls, or any other esoteric research topic, you can do great in Google. But if you want to run an online business that relies on the sales of products or services, you’re in for a tough time.
I don’t doubt you are trying hard, I like PhilC, believe you’ve simply got it wrong. Very badly wrong.
Deindexing part of a site or refusing to index deeper parts of it for any reason defies logic. You either index it or you don’t. How you rank it among the other pages is another matter.
Big Daddy may be far more comprehensive, but the results are not if you choose to deindex pages, or refuse to index them based on the types of links and not the links themselves.
Dave
Matt, are you sure BD is over, and does Yahoo link to bad neighborhood ?
site:www.yahoo.com – those were supposed to be 400k .
Thanks for the post Matt !
I translate some of the most significants extracts in French : -http://www.malaiac.net/seoenologie/91-bigdaddy-liens-sortants.html (hope the – is enough to not make a link)
Once again PhilC has put my concerns in a more coherent way than I could. As I stated above I’m new to this and struggling to work out what I’m supposed to do.
This is an example as I see it of not being indexed in action: Try the following term in the UK (google.co.uk and select UK)
Beta Tools 920A 1/2 Socket Set (a completely possible search)
As you can see the search returns 2 things firstly my XML sitemap which is pretty useless to anyone searching for the above item. The second is something completely irrelevant.
Now wouldn’t it be better if this page was indexed and then returned?
http://www.shacktools.com/beta-tools-920a-22-piece-12-socket-set-p-5415.html
This is just one example, and probably not the best, of how this is affecting my site. I have 1000’s more like the above. As you can probably guessed my XML sitemap page is very busy but people cant find what there looking for from that and then exit the site.
The thing that worries me is that this page has no competition so it doesn’t matter where it ranks just so long as its indexed. So to get this page indexed I need to go around adding links to other sites? This seams such a completely unnatural thing to do.
PhilC has a very good point.
I know of a site that can’t rank above page 3 for anything. I naturally thought it had some kind of penalty, as prior to its plummet it did fine on all manner of queries, then kaboom, overnight, I find myself in search engine purgatory.
To cut a long story short- I sent a letter to the good people at google asking if it had a penalty, only to be told that I should look at getting a few more quality links.
So yes effectively I have a penalty. I like to call it the lack-of-IBL-I-didn’t- go-out-and-aggressively-pursue-lots-of-links-penalty-cos-I-always-thought-it-would-bite-me-on-the-ass-oh-but-how-wrong-I -was-I-wish-I-had-penalty!
I would just like to pick up on the affiliate issue – what gives Google the right to determine that affilaite sites are bad? The internet is about choice and these affiliate schemes work, giving people a living!
Does Google not feel any responsibility for the thousands of people who will loose their income?
With so much unemployment these days, the internet and affiliate schemes offers – or did offer – a way of people earning money, setting up businesses and providing surfers a choice, even if it does mean they end up buying from the same place in the end.
I completely agree with John. Google is about to destroy the original linking spirit of the WWW. Matt reflects the whole paradoxon in his posting:
a) Since Bigdaddy, a high quality site xy.com is considered less relevant due to the fact, that inbound links might have been paid for.
b) On the other hand: Google’s webmaster guidelines and also Matt himself keep on recommending webmasters to get “quality relevant inbound links” for their sites to gain more relevance for Google.
c) Since Bigdaddy, another site ab.com is also considered less relevant, because it has got outbound links to sites that might cover different topics than the site itself (see the real estate example in Matts posting). But why does this happen? Because Google and Matt recommended other webmasters to get quality links.
Matt: Do you really think, that quality sites will ever link to other quality sites for free again, as it used to be in the old days of the WWW? As it used to be one of the major ideas of the WWW? If you link to another website, you’ll have to be afraid to get punished for this action. So, why link to other sites but for money? And on the other hand: How will you ever get “natural” links to your site again?
[Quote from Matt] it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic.[/Quote]
I have seen sites linked to by scraper sites whose only content is Adsense ads and scraped Google search results. Does that mean that my site would be penalised by the actions of a third party spam site, over which I have no control?
Matt:
Would you please clarify your comments regarding reciprocal linking and
discuss RELEVANCY and the RATE at which a site obtains reciprocal links?
The 2003 Google patent says ‘obtain links with editorial discretion’..
reciprocal linking is tough to avoid when site A won’t link to site B unless
site B links back to site A. And as you have noted, paid links are not
always the best course of action so where is the line drawn? Free
advertising is not very prevalent in this world. Paid or bartered
(reciprocal) are the current options.
Most sites (especially hobby, niche, small business) will not provide a link without a link back. That’s the nature of the web, you scratch my back, I’ll scratch yours. If sites didn’t link to each other, the web wouldn’t be a web.
Responsible reciprocal linking should be done for the end user and to
generate qualified traffic from like minded sites. When done correctly,
relevant and useful links offer content to the end user and provide
additional resources and “trains of thought” to continue the learning
process on a subject.
Relevant exits links add value to a site again through providing the user
with another “knowledge gateway” to pass through leading to more information or related information on a subject. This is the essence of the web. And many site operators won’t provide a link unless they can get one back.
If Google is giving less value to sites that engage in HIGH VOLUME
IRRELEVANT linking, I applaud this move as it sends the right message to website operators to keep linking relevant and for the end user.
If Google is penalizing sites for engaging in ANY TYPE of reciprocal
linking, that smacks all of the small businesses who have engaged in this
practice correctly, and ethically since the beginning of the Internet,
pre-Google.
Can we get some clarification on reciprocal linking please?
My goodness. It appears “many” in this thread either did not read Matt’s first post in it’s entirety, or are only reading the parts they want to read.
First; In NO way is Google only looking at “inbound” links. In no way is Google only looking at reciprocal links. In no way is Google only looking at links in general in regards to how often a crawl takes place, or how many pages of a site are indexed, or not.
Many of you are not thinking about the much bigger picture. I also know that “some” firms out there, including firms who do seo/design for clients, have not experienced ‘any’ of the problems found in this thread. Matter of fact; all positions have actually gone up.
One thing in Matt’s post went something like this….. for every one page that dropped out of first page serps, another page took it’s place with a happy camper. He also stated this, and why I say many of you did not read in it’s entirety.
[quote]After looking at the example sites, I could tell the issue in a few minutes. The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling.[/quote]
Sorry for the ’speech’, but it’s sometimes tough reading stuff and not responding.
Excellent post, Matt.
As an observer it is fascinating to see how Google seems to be trying to balance or reconcile what appears to be a long-standing corporate culture of secrecy with the increasing need to share more knowledge, information and insight with the world at large. I think that your role in that process is not to be underestimated.
My technical prowess in this field is virtually nill and I am thankful that my site has not suffered a loss of pages in the index and is still maintaining a good ranking on my keywords. I am, however, very puzzled by the major differences in inbound links as listed by Google vs Yahoo and MSN. What especially struck me today when I checked was that, as well as listing about 6 times as many, Yahoo seemed to be giving greater prominence to what I would view to be better quality links. In particular the links from my CafePress store are prominent in Google whereas links from relatively obscure (but probably much more credible) sources get more prominence in Yahoo results.
The other observation I would make is that I have always thought that Google was not as adept at searching images as at searching whole pages. For whatever reason, although able to consistently maintain a top 3 ranking for keywords ’stained glass’ I have totally failed to get images even into the top 50 for the same keywords. I’m now reading (today) that perhaps a dash instead of a space might help but I do also believe that the image search mechanism is something of an achilles heel for Google. Just MO.
Keep up the excellent work.
Thanks for the great post Matt.
Matt: “graywolf, it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic.”
One of my sites is a mom n pop jewelry store which has some really unique content. The link building process is going slowly because there’s only so much I can do, however, I find that a lot of backlinks are from scraper/junk sites which are totally beyond my control. Does that mean that my site will get crawled less because of these junk sites linking to me?
Site sitemaps is making the webmaster be proactive in helping G with crawling, so why not do this with backlinks too. It would be really cool if G also provided a link removal tool, where you could specify domain pattern matches to discount certain links from being counted. I’m sure you’d also love to see the aggregate info. You could also tie it in to the spam report function… ok enough rambling.. need coffee…
Doug. You are wrong about people not having read Matt’s whole post. It’s true that smaller parts are being focussed on, but that doesn’t mean that we haven’t read it all. A smaller part:-
and more about the same site…
Google knows that the site exists, and they know that there are more “fine” pages that they haven’t indexed. They don’t need to be told to index more so that the engine is more comprehensive. They should try to make the index as comprehensive as possible.
Matt’s best guess is that it is a low priority crawl/index site, and that they are intentionally leaving some of the site’s pages out of the index, just because it hasn’t attracted enough natural IBLs. That’s no way to run a decent search engine. It is grossly unfair to link-poor sites, and it short-changes its users.
Now if you can think of a good reason why some of that site’s pages should be left of the index, just because it has only attracted 6 natural IBLs, then tell us. We’re all ears.
You are whitehat incarnate. Do you think that webmasters should have to do unnatural link-building, just so that a search engine will treat it the same as other sites? Do you think it’s a good idea for Google to tell webmasters that their sites can’t be fully indexed unless they make the effort to do things that they (and you) have always talked against – doing things solely because search engines exist?
A general purpose search engine should try to index all of the Web’s decent content as far as they are able. It should never come down to leaving stuff out just because it hasn’t had enough votes. If it’s there, and if it’s useful, index the bloody stuff.
I guess webmasters still don’t understand what a link farm is because they keep asking if reciprocal linking is great!
Some websites run reciprocal linking pages, you have seen them, “cut and paste this code into your pages” and we give you a listing and many directories do the same thing for you to gain a listing in their link farm database!
These networks are simple for the SE’s to bust, webmasters must figure out that all this exchanging of links and getting a million links from anywhere and rank high stuff is old and worn out!
I can see this is hard to accept because many have conducted “SEO” this way for years and don’t know any other way!
My first post was long enough, so I didn’t adress it, but will now since Doug brought it up:
“The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site”
Inlinks?
So now my position in organic results, or the number of pages Google chooses to index from my site, can be affected by the sites that link too me?
I hope I’m interpreting that wrong….
Hi Phil, For the record, my post was not aimed at you. I actually did not fully read your last post until now. I’ve just seen many in here that really don’t get the overall picture about what Google is trying to say.
First off; The overall structure/architecture of a site has lots to do with ‘crawling’ in general,,.. and btw; has lots to do with how Google views your site as a whole… quality wise. Quality for pagerank… “INTERNAL” Google PR, and quality of other sites in your network of links in and out. AND: quality of the programming involved with the site. It’s the overall picture. Robots are not getting dumber,.. they are getting smarter.
I think many simply believe that someone can go into an existing site and change some code here and there… and presto, the site is doing good. I also think many still think that this stuff is mainly about links coming in and going out.
None of that can be further from the actual real world.
[quote]You are whitehat incarnate. Do you think that webmasters should have to do unnatural link-building, just so that a search engine will treat it the same as other sites? Do you think it’s a good idea for Google to tell webmasters that their sites can’t be fully indexed unless they make the effort to do things that they (and you) have always talked against – doing things solely because search engines exist?[/quote]
I can’t speak for anyone else, but my firm “stopped” looking for reciprocal links about 2 1/2 years ago. Matter of fact; we deleted all ‘link pages’ only that clients had. We don’t ever plan on “pursuing’ links in any way, shape, or form. It seems to work just fine. And no; some are in competitive markets as well.
IMO; The best built websites that will do well into the future are those sites built “strictly” for their visitors. Period. That’s the philosophy we have had for along time now. If built that way, the robots will like the sites as well. At least they have up to this point in time.
Matt, it’s an old topic but I have a question related to it.
Is it possible say that if I put adsense on say, page1.html and it links to page2.html .. can that cause it to fetch and cache page2.html? Even though there are no links coming to page1.html or page2.html from anywhere else on the web, and it hasn’t been submitted to google?
If so, this is a potential problem.
Example: I put up a domain a while ago that just had 4 words on it… while I developed the site under a subdirectory.
Anyway, one of these subdirectory pages had adsense on it while under development (testing placement etc..) and happened to have it’s links pointing to the main URL.
What I noticed shortly after was that the main URL was now cached and in google’s index (while the page with the adsense wasn’t).
I’m trying to think of another reason, but the main URL had no in-links at the time from anywhere (at least none showing in msn, google or yahoo, and no visitors from anywhere)
Now the site is live, and I can’t get googlebot to revisit it. It’s been a month since it’s first uninvited visit, and since it saw just a “coming soon” type message, I can’t blame it for not coming back.
Could this be an unintended consequence of the adsense caching thing?
From google sitemaps
“Some of your pages are partially indexed”
Explanation from google sitemaps
“We are always working to increase the number of fully indexed pages in our index. While we cannot guarantee that pages in our search results will always be fully indexed, crawler-friendly pages have a greater chance of being fully indexed. Our crawlers are best able to find a site when many other high-quality sites link to it.”
So what I need to do its create links or I wont be indexed and create my pages for crawlers……..Zzzz
HI Matt,
I read your comments in a different way to most of the negative posting.
It seems to me that Google have realised that they cannot index every page on the web every day and simply have to prioritize.
Therefore sites with lots of good themed inbound and outbound links are “prioritized” in the crawl cycle in a similar way as they are “prioritized” in the SERPs.
Sites with a high percentage on non-related reciprocals or spammy links are not given the priority treatment and therefore get crawled “Less Often” (rather than not at all).
The result being that internal pages of these sites, especially the deeper pages of very large sites of this type, may drop out of the index from time to time resulting in poor SERPs for some pages of these sites.
May seem unfair to some but if you think about it a themed natural link (or reciprocal in some themed cases) can be taken as a vote for the site by the WWW and as such makes it worth crawling by Search Engines more than a site without many or any votes.
Question is should Google concentate on delivering high quality “popular” sites in the top ten, where most of the public click, or concentrate on indexing every page on every site even if the page is unlikely to be returned in the top 50 results.
My take is the first option with the top ten results including the top ten most popular sites calculated by:
Natural inbound and outbound themed Links (votes)
Click Through Rate & time spent at site (popularity)
Clear Relevent Title & Meta Tags (keyword friendly)
Original useful updated content (quality)
Some themed Reciprocal Links (community)
If your site passes the above tests you really should be OK.
Good luck all.
Matt,
Snow Crash by Neal Stephenson is one of my favourite cyber-punk books, you could also try Mr Nice by Howard Marks.
Enjoy your vacation!
>>Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled.
Don’t know what happened to my message – here it is again.
>>Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled.
In regards to this whole revelation about crawling priorities and all that jazz, I have one question that should clear up alot of things for all of us webmasters and SEO folks.
Lets say we have said site that is under a year old. It has well written and informative original content, clean coding, no shady stuff going on whatsoever. To promote a healthy link exchange, webmaster/SEO installs a link exchange directory which is accessible from all pages of the site. Now is having a link exchange directory that contains many different business categories (no porn, pills, warez, casino and hopefully no “made for adsense” sites) a bad thing?
This is important becauase many MANY sites have this kind of a setup. And with the amount of free dynamic scripts out there that enable and automate the link exchange process, are they now considered to be tools of damnation in Googles eyes? Please give a good example of what is good in this scenario and what is bad about this scenario.
I think alot of us are at the breaking point with Google, and that can only spell trouble in the long run for everybody.
Hi Matt, I have a decent sized forum on my site with about 221,000 posts in 8,000 threads.
I recently moved my forum from domain.com/forum to forums.domain.com, and I now have only 6 pages listed in google. I am guessing this will change, but I have lost some of my domain.com listings that were unrelated to the /forum directory.
My concern is, I want to use some kind of redirect to send visitors to the appropriate link – what should I use? At the moment I am using a php redirect in /forum/index.php and /forum/showthread.php to redirect to the ppropriate link.
And also, could this move and redirect be affecting my other results on the top level pages?
Thanks for your help!
um, so that is a great post… but like many others I am seeing strange things happening with one of my site where a 301 seems to be producing all sorts of strange results. Our site travelscotland.co.uk now seems to be registering on Google as http://www.scotland.net and strange variants of that domain such as http://www.facts.scotland.net even tho these have all been correctly 301ed to travelscotland as they are supposed to. I thought Big Daddy has cured this. I now notice that this problem seems to have resulted in the site not being indexed much anymore – caches are all for april where we used to be always up to date in the old days. Is this another artifact of the Big Daddy changes?
Hello Matt, this morning I send a mail to bostonpubcon2006 at gmail, may be resolving this issue helps cut down the noise in your comments.
Hi Matt, Hi JohnScott. Hi everybody.
I’m the webmaster, the person who did this.. http://www.mattcutts.com/images/poor-quality-links.png
Matt, first, thanks for not mentioning the url, my name or email – its really, really stupid step i took to bring back my clients site indexed in google again.
In short:
the website is about 6 months old, with around 30 uniques a day. I was building “natural” links for 2-3 months, few pr5,6 sites, but mostly 1-2-3.
Right now it has 520 pages indexed, Matt, you can check them;)
And again 30 uniques a day
2 moths back – the website had (almost)each page indexed – around 1000.
There were NO LINKS IN THE BOTTOM. Not a single link ! Only internal links.
And the pages went down to one. Three in the best days. For a month.
My clients were not satisfied – they asked me to fix this. So what I did ?
Pyramid Linkings – 3 urls in the index. 5 DP co-ops. Reciprocials from directories – the clients were not going to pay me for this, so i just spend exactly one hour to set pyramidlinkings, coops, stupid directory listings.
JohnScott, I AM REALLY REALLY SORRY for this link (v7 contest)! It IS RECIPROCIAL LINK FROM A DIRECTORY! I absolutely did not looked at the anchor text. SORRY AGAIN
When I bring back the “new website”, with this links in the bottom, I got nothing. Google started bringing back my webpages.. with around 10 to 30 per day. One day I saw they were back to 100 or something, then I sent the mail to Matt Cutts. The other day the index was good – 400 or something, from this day, Matt pointed the date, I was getting +10 pages a day. With this links from the image. An hour back I removed every single outgoing link, I left only the internals.
My hands are shacking.
I wont be able to sleep…
Matt, I did not wrote everything above in the right direction – I dont think myself this is the reason for NOT being indexed in google.
The situation is not the way you describet. The reason is something else, i know my website, i know what i did with this. I can proove that this links are not affecting my dropping and upping pages…
I will not post anything else here if Matt dont want to.
JohnScott, you are great person, sorry again for mentioning your contest with this bad topic
/sorry for my broken english – Im from Eastern European Contry
/
Matt,
Topic Specific links
I’m a believer in topic specific links. Is this post saying that off-topic links – whether inbound or outbound – will incur a penalty?
Isn’t relevance more important then if there is a wrong link on your site?
Sorry, this is bullshit. I only want to find what i’m looking for. And if there’s a wrong link on the page(which could be relevant to the human reader) doesn’t intrest me.
Can we get an Amen and a Hallelujah for this?
TES-ti-fyyyyyyyyyyy, mah brutha!
Come on, everyone, throw your hands up for the GOOD Word.
If I have to look at one more Amazon listing at #1 and #2 after being bumped to #3, I’m going to run screaming into the night.
Just because it’s dAmnazon doesn’t mean EVERY page of their site is more relevant than everything else on the planet…
Hi Doug. I didn’t take your post as being aimed at me personally
Unlike you, I’ve never done reciprocals, so none of my sites, or any site that I’ve ever worked on, will ever suffer because of that method. I haven’t even been talking about any sites that I’ve had anything to with, although I used one as an example, because it specifically has a clean bill of health, and it’s frustrating watching it die – presumably because I didn’t do any link-building, so it doesn’t have enough IBLs.
This isn’t about a site “doing good”, Doug, or about rankings (you mentioned that in your previous post). This is about a site being treated fairly – just because it’s there. If a site contains useful pages and resources, then it should be fully indexed, regardless of how many votes it’s managed to get. That’s what search engines are supposed to do. That’s what a search engine’s users expect it to do for them – they (we) expect a good search engine to give us the opportunity to find as many useful pages and resources as it can. They (we) do not expect the engine to intentionally leave out useful pages and resources.
Doug. Can you come up with a good reason why that example site (the one I mentioned in my previous post) should not have all of its pages indexed?
Thank you for the post; it was the most information I’ve seen on the situation thus far.
Unfortunately, it was also rather disheartening as regards inbound links. Some sites just do not naturally generate all that many valid inbound links – and I’m not talking about small “mom & pop” sites, either. My two largest sites are B2B catalog shopping sites, not really small as each has in excess of 2000 products online. Both have been in operation for years, and both were hit very hard in the past few months by pages dropping out of the index. I would stake my year’s salary that there aren’t any spam issues with either site (and both of which did QUITE well up until BD started rolling out) – but (at least according to Google’s light) there just aren’t many links out there to either site – the only places that WOULD link them would be spammy b2b directory sites, for the most part. They are undoubtedly in a lot of people Favorites folders, because they have very high rates of return customers, but they will never ever collect up any quantity of relevant natural links. I mean, think about it. Would walmart.com need a bunch of backlinks in order to rank highly, and why would anyone put a link to walmart.com on their site in the first place?
So now I have to explain all this to my clients, who aren’t going to understand anything except that they’re pretty much out of Google for the foreseeable future, and there’s nothing legitimate we can do to get back in.
Like I said, disheartening.
Matt,
Sounds like Google is now actually penalizing for poor quality inbound links. Does that mean that a malicious competitor could link our site to FFAs, link farms and other bad neighborhoods and actually hurt our rankings or get our crawl cycles reduced?
Also sounds like Google doesn’t like affiliate links. But if AdSense is okay then why are affiliate links bad? Seems that coupon sites, shopping comparison sites and reviews are in jeopardy of being dinged now.
I have a PR7 homepage (www.shopping-bargains.com) but no rankings and my homepage isn’t even in the index (I just checked and we have only 10 pages there -- was 8 yesterday). We used to have thousands of pages indexed. Something seems odd -- we have no reciprocal links, don’t sell links, don’t buy links in mass (occasionally buy a banner or link for marketing reasons in newsletter or blog, etc.). We have original content and have been online for 7 years. The index is fluctuating wildly though for us.
I too agree with PhilC!
And, all I know is that as one who has tried to actually use Google to buy some rare plants, the results are way bad!
Trust me that when you do a search for an item to buy and all you can find are the paid for listings and stupid content sites, the value sure ain’t what you were looking for!
I have even put “buy” and “for sale” in the searches and all I get are stupid content sites. Some great value that is! I want more choices than the stupid paid listings. Thank you just the same!
All I have found are stupid sites describing and telling the history of the plants. What I wanted was the small plant nursery in New York that sells the plant. I was ready to buy! That is why I typed “buy” in the queries. Too bad and so sad that they didn’t attract enough natural links! No results for me!
Google results suck for shoppers of rare plants, at least!
Matt,
About indexing. Why is it that the site: command for some sites still showing more pages than the site actually has? I know of sites that have maybe 12.000 pages and Google is happily claiming they have 35.000 pages indexed.
[quote]Sounds like Google is now actually penalizing for poor quality inbound links. Does that mean that a malicious competitor could link our site to FFAs, link farms and other bad neighborhoods and actually hurt our rankings or get our crawl cycles reduced?[/quote]
Mike, IMHO, Google is not actualy penalizing a site for poor quality IBL. What they do is they just discount those links in a much more drastic way than before. What is more, they now regard reciprocal links as very poor quality links.
As you have less IBL, your PR drops and your site is less crawled / indexed than before.
I think you guys might be missing one thing here. I dont think what happened is permenant thing. Your pages have dropped out because of crap links not counting or your quality links own crap links not counting. It is all about reputation. If your pages are gone now it is because you lost your reputation. This doesnt mean your pages will not get indexed. This just means your pages are pushed back to a waiting almost sandbox like state where it will take time for them to index again. Quality natural links just help like they allways did. Now it is just harder to fake.
Maybe Matt can tell us if that is true… Will the pages eventually get indexed even if they dont have the reputation from other quality sites? Is it a matter of time and age if there are no links?
Adam Senour… what do you think about webmasters who submit their sites to directories?
These types of links arent’t exactly natural, and often aren’t relevant. Are these people trying to manipulate the search engines?
OK, Matt, I just don’t get it, how on earth does a site do this sort of linking
http://hyak.com/links/links_computers_internet.htm
And not get penalized.
They have thousands of links like this, totaly irelevant to the content of the site.
It is not a penalty.. it is a matter of what counts and what doesn’t PERIOD.
What feels like a penalty is just the fact that you lost reputation due to your links or your links own links are no longer valid. Your competition can not hurt you by doing bad link in your site’s name. That might actually help with whatever small amount of traffic you get from that. Other then that it will not hurt it will just not help.
Matt,
Thank you for all the information that you furnished to us and the examples that you showed. It was a very welcomed feedback.
The information that you furnished surely helped everyone understand what is happening and why it is happening.
Personally I hope that Google continues to work on presenting quality sites that assist a visitor with helpful information in the high SERP’s. People have to remember that Google sets guidelines and those don’t follow the guidelines will suffer. We have to follow the “rules” in order to win the high SERP’s.
Again thanks for your post.
Now enjoy your vacation and the time with your family.
Exactly what “rules” did the Health Care directory site break? (that’s one of the examples that Matt gave)
Hi Matt, thanks for the post, I agree it was good to hear some real estate examples used.
I have a question regarding overuse of reciprocol links possibly causing lack of crawling? I run a network of real estate sites, one site for each different country/region we offer (total 9). Each site links to each of the other sites for obvious reasons.
My question is would this interlinking of 9 different sites to one and other from every page on each site be regarded as spammy for the google bots. Would this have a negative effect on my sites?
If anyone else can offer any advice I’d be very grateful. Thanks.
PhilC, again do you feel that the pages even in time will not get indexed without quality links? If a site has something indexed, don’t you think in time the rest will get indexed? I think so, but how long is the real question.
Matt, could link selling affect a site’s ranking in the SERPs, directly or indirectly, despite the site’s not “losing” any pages in Google’s index?
I’m seeing a client that sells some (on topic) text links on some of his pages suffer in the SERPs since a couple of days ago, yet that client hasn’t lost any pages whatsoever in the index – the only thing I’m seeing is that all pages that previously ranked well are now not ranking well (yet still can be found)… Could there nevertheless be any conection?
Mike, it might have something to do with the many many links you have from spam pages like this one:http://www.creativehomemaking.com/articles/112603g.shtml
and the fact that all of these spam pages participate in the same “web rings” with the links on the bottom.
Hi,
Matt,
If the sites you stated above as having poor links, if the outwards links had a rel=nofollow would it improve the number of pages indexed?
thanks
Phil my site is a mom and pop site, I don’t have a lot of quality incoming links (have a ton of links from scrapper sites), and all my pages are indexed. Like you I have never done a link exchange and don’t plan to. So the idea that you have to out and build links in an unnatural way is not the case for every site.
Most of my pages can be reached by 3 clicks, a few 4 clicks from the home page, and or the common navigation that is on every page. I don’t use Googles site map. I do have a pretty good html lite map, so I’m wondering if navigation may be part of the problem for some sites loosing pages or not getting them included in the index.
I see a big difference in the number of pages when checking with the API and actually checking at Google.
I do have a problem that has surfaced in the last couple of weeks. For me Google is having problems with 301 redirects again. At least for me.
site:domain.com 500 plus pages
site:www.domain.com plus pages
site:www.domain.com -www.domain.com 300 plus pages which are supplemental so maybe they are on their way out again.
Up until a few weeks ago site:www.domain.com -www.domain.com showed 0 pages. According to the API domain.com is showing PR again for domain.com.
hmm, so things are becoming more and more difficult each day and i am now of the view that people will need to understand the real importance of Good Content updated frequently and having good links only.
No short cuts anymore
Connie, is your site ranking for smaller terms only? I bet your niche is not a very competitive niche and you are ranking for obscure terms for then large general terms.
I too am associated with a niche site not doing any link building that has not been affected by Big Daddy, but this site is ranking for small stuff.
[quote]Can you come up with a good reason why that example site (the one I mentioned in my previous post) should not have all of its pages indexed?[/quote]
The one in frames? The one using javascript links and no hrefs? The one that looks like an MFA site? Is that the one?
It’s worth remembering that not every site can ever appear at the top – seems obvious, but Google has to place resuts in some kind of order, and I have no quarrel with newer sites not being featured as fully as mature sites; if they develop, their turn in the sun will come.
If the result is a cleaner, more spam-free search result, then I doubt many users will be complaining. And I suspect Google has not forgotten the needs and preferences of searchers, as we consider our sites, our client sites and the spam sites that get in the way.
This has been a very useful thread – but has it really contained any surprises? I don’t think so – Google has long warned of the bad practices mentioned above; just some people just never believed they could put their money where their mouth is; kudos to Matt and the teams for significant progress on reciprocal and paid-for links.
I’ve spent alot of time today simply looking through the serps and reading posts here there and everywhere.
Yesterday we saw a massive change in the serps and now my target market results are full of nothing but spam, cloaked pages and general rubbish.
Now, I run a number of affiliate based sites, but i build my own pages with my own content and as of yesterday, i basically don’t get found, has Google scanned my site, realised that my visitors are joining an affiliate program and therefore penalised me due to that?
I can appreciate penalising sites that buy a domain name and then basically copy their affiliate programs text etc. and thow the site up for themselves with nothing originall to offer at all, but just becasue I promote affiliate dating does that mean that the 2 years of work that has gone into the site is simply ignored?
Matt, I’d be intrested in any feedback and the url is available if you get the time to respond.
Connie. It may be that a very large number of sites haven’t suffered – yet. But I can’t see that what’s happening is to do with the navigation. For one thing, Matt used examples where he said that, if they get more IBLs, the new system will know to index more pages (as if they don’t already know). For another, the many sites are having their pages dropped, had their pages indexed – so why drop them if they are already in the index?
We have always known that PR affect the frequency and depth of crawling, so crawling was never equal amongst sites. But now they have added links to the criteria, and if a site doesn’t do well on links score (e.g. not enough IBLs that can be trusted), it just doesn’t get crawled as it deserves, and its owner is froced to either accept it, or to get spammy and do some link building.
I was tempted to suggest that it still may be down to just PR, because IBLs bestow PR, but my site that I’ve used as an example currently shows PR5 on the homepage, which has always been enough for decent crawling. Even so, the toolbar PR is always out of date, or they may have simply moved the scale up a bit. But Matt said that the new crawling/indexing system is new, so I’m sure that it isn’t still just PR, and that IBLs, and maybe OBLs are significant factors.
The example site that Matt gave – the one that I used in a post – is a directory, and, as a directory, it probably needs to be drilled down – good pages that are plenty of steps away from the homepage. If the number of steps could have been the cause, then I’m sure that Matt owuld have said, instead of simply saying get more IBLs and we’ll crawl more of the site’s pages.
Hi Matt,
some URLs lose pages in the index but this site is still growing :
site:69.41.173.145 --> Do you think your duplicate content algo is ok ?
Greetings from germany,
Tobias
Hi Phil, don’t know. If you post your site in question, maybe “ihelpyou” with it.
You know there is no way a general answer to an unseen website with problems is a good thing. I’ll put it this way; I really doubt your problem with your site has anything to do with “links” in or out. The entire backend code and html code output might need to be redone.
In and of itself, I don’t have a problem with the concept. The problem lies in the quality of the directory, whether it offers free one-way inclusion to sites that deserve it, and how many of said sites they submit.
Submitting to a directory, particularly to one with a captcha tool or similar device, is not automatic, nor is the approval (depending on the quality of the directory again). I don’t really see it as “unnatural” either, since the basic premise of these sites is to act as informational portals. Submitting to a directory provides them with the content that they need to build their own site, and gives the webmaster a link for traffic generation purposes (notice how I didn’t use the phrase SEO purposes).
A good example would be Human Edited Directory (yeah, I’m a mod there, so I’m slightly biased…although I’d say the same thing if I wasn’t). You don’t get onto that directory unless you damn well earned it, although you can submit for free.
What’s “unnatural” to some about submitting to directories is that you have to go to some stranger’s site, find a relevant category, and ask for a link in that category.
So no, I don’t have a problem with it, as long as the directory has some quality standards in place and the link provided is a relevant category backlink.
To borrow from something Phil has stated in the past, it’s quite often not the concept, but how people choose to abuse it.
Hi Matt,
I have a quick question for you regarding the supplemental index. When I search for a manufacturer part QUA41965 in Google it starts returning supplemental results on the first page (4 out of ten). Each additional page is primarily in the supplemental index. When you drill down to page four you can still find results that are not in the supplemental index though. Should not pages not in the supplemental index be returned before those that are? It seems to be against what is considered supplimental IMO.
Thanks.
hehehe
Buy some Adwords folks. Free traffic is now only free for the multinationals, spammers and those with special deals with Google.
All the others buy some AdWords please…contribute to the great cause…
Is SPAM any attempt to deceive the SEs to artificially increase rankings?
What if I have a nice W3C validated site with some 16.000 clothing products from various vendors, nicely categorized, with updated datafeed, some coupons, with no spaming techniques, no hidden text, no cloaking, nada. zip, zero.
That, in the definition of SPAM… is not SPAM.
So why am I reduced to one indexed homepage in Google?
Isn’t the ability to search for the same class of products from various vendors at the time, and compare prices, service enough for the mighty Google?
Hi Matt,
Great post, and thanks for following up the comments, makes for a happy community
Like most people here, I have a number of otherwise good sites with 5 or so rubbish footer links on each page. I was never happy about putting them there, but only did it because it does work, and there is no point writing original content if nobody will read it.
Are you saying that these links are now completely worthless? I’m all for “best practice” and following the guidelines, but I’m reluctant to stop this kind of linking if it still works for other people.
Don’t get me wrong, I’m very keen to see the death of unrelated footer links, “resource” directories and begging emails – but if Google still rewards these practices with good rankings then people will continue to use them.
Thanks for the responses so far.
Harvey.
In reference to supplementals, I am sorry but the issue has not been dealt with here, as doing a site: check on a number of both small and large sites, it is difficult to find a site that does not have any supplemental issue.
This must mean that Google has an inbalance in the settings of the algos, OR that they deem virtually no sites are worthy of the merit of having a clean bill of health.
I can check the same sites, which aren’t even mine, on a daily basis, and they delve deeper and deeper into supplementary hell.
Why would webmasters have coined the phrase ’supplementary hell’ if there was no issue?
Thanks for trying to appease webmasters Matt, and we don’t blame you. It’s just that we feel it’s about time things improved.
One thing I want to be clear about is that Bigdaddy isn’t especially intended to do differently on spam; it’s just an infrastructure upgrade to our crawling, and we get better at judging link quality, our crawl changes as a natural consequence of that.
The other thing is that I certainly don’t want to imply that everyone who is still seeing less pages crawled was somehow getting spam or lower-quality links. I just wrote up the five cases that I analyzed in more depth. As a large change in our crawling infrastructure, it is to be expected that some sites will see more or less crawling.
In fact, I just got out of an hour-long joint meeting with crawl/index. Jim, we talked about your site, the one where you said “I’m trying to maintain a straight ship in a dirty segment.” There’s absolutely no penalties at all on your site; it’s jut a matter of PageRank in your case. You’ve got good links right now, and several hundred pages crawled, but a few more good links like you’ve got now would help some more.
what does this command do? site:www.domain.com -www.domain.com
I have to admit, from reading the post and Clikz column, that what is happening in practice is that sites with less money and marketing spin behind them are regarded as less important, and are therefore to be pretty much to be ignored. It’s no longer about document relevancy, as much as site popularity.
Perhaps one day Google will go a step further, and simply take the top 1000 sites according to Alexa, and return only results from them?
Let me also describe a little bit of the interaction between the main results and the supplemental results. Think of the supplemental results as a large set of results that are there in case we don’t find enough main results. That means that if you get fewer documents crawled/indexed in our main results, you’ll often see more supplemental results.
So I wouldn’t think of “having supplemental pages” as a cause of anything. It’s much more of an effect. For example, if you don’t have as much PageRank relative to other sites, you may see fewer pages crawled/indexed in our main results; that would often be visible by having more supplemental results listed.
That’s a post Cutts! hehe. As you’ve stated that poor choices in outbound links can cause crawls/indexing to be negatively effected, I’m wondering if the opposite can be said of linking to high quality (trusted) relevant links? What say you Inigo?
BTW, great show yesterday. Hope you can do similar more often!
shorty, much appreciated. I wanted to get the timeline out of my brain and talk about what I was seeing before I headed out for some time off.
Alex Duffield, in my experience those links aren’t making much/any difference with Google.
Peter, without knowing the site I couldn’t be sure. It’s possible that we’ve indexed the site with www and without www, or there might be some session IDs or other parameters that are redundant.
“Sounds like Google is now actually penalizing for poor quality inbound links.” Mike, that isn’t what’s happening in the examples that I mentioned. It’s just that those links aren’t helping the site.
David Burdon, no, off-topic links wouldn’t cause a penalty by themselves. Now if the off-topic links are spammy, that could cause a problem. But if a hardware company links to a software package, that’s often a good link even though some people might think of the link as off-topic.
nsusa, Wordpress seems to have problems with the greater-than sign.
Peter Harrison, thanks for the book recommendation! I love early Neal Stephenson (less so his historical fiction).
“The other thing is that I certainly don’t want to imply that everyone who is still seeing less pages crawled was somehow getting spam or lower-quality links. I just wrote up the five cases that I analyzed in more depth. As a large change in our crawling infrastructure, it is to be expected that some sites will see more or less crawling.”
Kind of enlightening that this MAY not be the case for us and others. Still hurts not having the whole site (well the better part of our sites) being “avoided” in the index and not know why this is happening after 5 years of business.
The sad thing is that the only thing we really have to go on is sharing experiences and this isn’t getting many of us very far just that there is some sort of problem and we can’t find a correction.
Matt, what about sites that have some pages indexed.. with no links to the site or very few, will the enitre site ever get indexed? Is it a matter of time or do you have to get more links to get pages indexed deeper?
nuevojefe, thanks! It felt pretty business-like and on-topic. After the mike turns off, then I took Danny up to an office and we just chatted for a couple more hours. It’s amazing to me just how much fun some of the top people in search are.
To go to your other question. I wouldn’t be thinking in terms of “if I like to Yahoo/Google/ODP/whatever, I’ll get some cred because those sites are good.” If it’s natural and good to link to a particular site because it would help your users, I’d do it. But I wouldn’t expect to get a lot of benefit from linking to a bunch of high-PageRank sites.
Peter Harrison, I’m going to go buy some books right now; you’ve inspired me.
But I still don’t see an explanation for pages not showing up on the regular or supplemental index that have been craweled and that are over a month old.
Matt, why is it bad for a real estate site to link to a mortgage site? They seem to go hand in hand. Obviously I couldn’t follow the link to check the site to see if it was just a scum sucking scaper site, but your statement seemed to overgeneralize. If your bot does the same, then Google has a problem.
I am also perplexed by the reciprocal linking issue. Is it now always a bad thing? Is the relevancy of the topic a compensating factor? While it may be gamed, it has also become a powerful networking tool for many. In my personal services sector where referred business is an integral part of the business model, I have received referred business from those i meant via reciprocal links that amounts to close to $50k in income in the last 10 days alone. How is this bad?
I am also confused by the apparent contradiction regarding links, PR, crawling and indexing. Sounds like a chicken and the egg scenario. Its implied not to buy links or reciprocate, but if that advice is followed, then Google wont crawl it or index it, so how is anyone to find it to be so overwhelmed as to be compelled to graciously link to it?
OMG, did I just agree with Jill and PhilC on the same issue in the same sentence?
This has also been part of the problem Matt. The supplemental results have been unsearchable. They have not been being returned when you don’t find enough main results.
Dave
What’s Googles definition of ‘Find web pages from the site domain.com’? If you click those links, you some times only get the index page. Even supplemental results should show up when you make that search.
Wow Matt
You got some cahones and came out and said the CEO was wrong about the machines being full??
Everything else is old hat SEO that amounts to “Webmaster Quality Guidelines” being followed.
Obviously you could have saved some carpel tunnel just telling people what I and others have been saying ,.reciprocal links have zero value other than to hurt you. Follow Googles webmaster guidelines and you’ll be fine.
Clint
Matt Could you please check on this for me with the index team. I am sorry but I am getting a lot of traffic from this according to my logs which we shouldnt be ranking for fistinglessons for a home listing details page. The house listing has been removed from that but its the kind of traffic I do not wish to have.
If you decide to dig around in our site. Your thoughts on whether we are abiding by what Google likes to see would be nice.( I know I ask for it so what I get I wont hold against you
We want to stay 100% Google compliant but as I have said before we are small fish in a big pond so we make mistakes like everyone else.
Saying you can’t do reciprocal linking is just sheer idiocy. How does Google expect you to get back links?
Guys can you please stop asking silly questions…
The message is crystal clear…use AdWords…
I didn’t ask about my site, Doug. I asked you if you could come up with a good reason why the health care directory site that Matt used as an example shouldn’t have all of it’s pages indexed. Perhaps you should have *all* of Matt’s post
There isn’t a good reason. Matt’s best judgement is that it’s a shortage of IBLs. Simple as that. The site had had it’s pages indexed, but with the new BD crawling/indexing it’s pages have been dropped simply because it doesn’t have enough IBLs. It makes sort of sense at all.
That last sentence should have read…
It makes NO sort of sense at all.
Matt: is there any way to tell Google “index this page, but serve the permalink in the SERP?” This is a problem for my blog … when entries are being served off the main page (http://dossy.org/), searches for keywords in those entries return the main page URL in the Google SERP. However, it seems the index is updated less frequently than entries dropping off my main page, so while Google’s SERP brief text shows the relevant content -- causing a user to click through on the result -- the page they end up on no longer has the content. Eventually, it seems Google’s crawler figures things out and the SERP eventually links to the permalink for the entry … but I’m sure this behavior is frustrating some users.
I tried adding the meta header “noindex” to my main page to prevent it from showing up in SERPs, but then for searches for “dossy” where the main page SHOULD be #1, no longer has the #1 spot -- very annoying. So, I’ve removed the meta “noindex” from the page and am waiting for Google to crawl my blog again.
Any advice? Thanks!
Matt,
You have no idea what your two sentence comment has done to lift the spirits of 2 down and out guys in boston… thanks!
jim
It appears that everyone is getting on board the link train as the
problem.
So I did some checks on one of my sites, When I do a link:www.mydomaininquestion.com on Google , I get Results 1 – 1 of about 45 linking to http://www.mydomaininquestion.com. (1.25 seconds) in the bar, with only a page from my actual site shown.
However if I do the same thing on yahoo i get Results 1 – 10 of about
671 for link:http://www.mydomaininquestion.com.
So for some reason 44 of my links that google knows I have are hidden
from view but show up in the count, and perhaps hidden from the
indexing algorithm, this seems like something very specific that could
be checked out on your end.
Thanks.
Is your site losing pages from the index, John? I just did link: check on my site (the one I mentioned earlier), and it’s the same. It will only list 16 of about 629 links. For my site, I’d put it down the constant changes in the index as pages are being dropped wholesale on a daily basis. I’m thinking that the index might be a bight confused concerning my site right now.
PhilC,
I know Matt doesn’t want this to turn into a discussion board, so feel free to delete this post, but to answer your question yes, we were at a high of 17,000 pages in march, two weeks ago 500, saturday 140, today 39. I’m not going to check anymore after today, because I know whats next: NO INFORMATION FOR THAT SITE
Some thoughts:
1) The explanation that Google isn’t fully indexing sites based on the lack of quality/quantity of incoming links and lack of quality of outgoing links sounds like a policy change. Has Google always lacked a commitment to building the most comprehensive index it can and this is just the first time those sentiments have been voiced, or is this something new, perhaps in response to a storage crisis and Google’s inability to keep up with the growth of the web?
2) How finely tuned is this improved link quality filtering? Does it simply look at the percentage of IBL’s that are reciprical, and apply a filter after a certain threshold, or does it attempt to determine the relevancy of those recipricals before placing a value on them? When evaluating the relevancy of both inbound and outbound links is this just a quick semantic analysis that would miss the fact that the ironworkers endorsement of a pizza joint is certainly a good link? How much good content is Google willing to hurt while trying to prevent having it’s results manipulated?
3) What I see here is good times ahead for link-building SEOs. Panicky phone call from owner of a marketing site seeing it’s thousands of pages dropping from the index… Calm explanation of Google’s new approach – show them your blog here Matt… Tell client what will be involved in building a link network on multiple domains across Class C’s with relevant content that Google will perceive as quality IBLs… You could always just spend that money on Adwords and trust that Google won’t bill you for click fraud – show them lawsuit pages…
I hope Google has something better than this coming along quickly, because the future doesn’t look pretty.
Hi Matt C – reworded, jeesh
Many have an odd issue, what is the Big Daddy cause here?
Sites have done well for a very long time, sites have some good natural incoming links from major sources in science magazines and even links from large established online portals.
Since April 26 or so almost all our pages went from page 1 to page 4 across the board. Is this a penalty, why such a drop so fast on all positions on all terms? This is happening to many sites.
Not sure what to make of this, we did submit a re-inclusion request but I have seen no change, why such a huge dump on a site?
Thank you Matt
Matt, I got the feeling, that you are very harsh with the new filters… I just dropped from 3M results to below 300k with my site. That does not seem right, since I am one of the few places, where you can download MP3 files, which you can buy on CD elsewhere. I think G found the duplicate description and tries to filter now. That is not good. Mine are downloads, having the same description of the artists like the tangible goods… And I am gaining incoming links like a maniac, by signing up over 10 digital merchants a day… I get incomings from every place of the net, but I do hope that incomings can not hurt one? Maybe the over 500 scrapers, who live from my RSS feeds are causing that?
PhilC and others
Have you ever considered that with soooooo many web pages out there Google (at this time at least) HAS to limit its crawling and indexing.
Lets face it, they are (and have been for years) doing a better job that the other BIG 2.
When I search the SERPs to BUY, I often see mom & pop pages ABOVE those of the BIG merchants.
Well it just went critical for me after 10 years on the Internet.
Google has eliminated so much of our site, rankings, and traffic we likely won’t survive. What’s going on is just way too harsh. We play as much by the rules as we can. We only have about 75 links but apparently Google is annihilating our small site. I wish I could take a vacation but I’ll have to worry about putting food on the table.
Phil, I was responding to you thinking you meant the site you were watching.
Okay, …. health care directory?
Without looking; I’d say it’s more about the sites that directory is listing than anything else. Does it require a link back? Is it all paid? Does it exist strictly for adsense? Does it have a real purpose for users on the internet? Being in the market it is in, I’d have to have those questions answered and see the site. I’ll bet big bucks it’s because the quality of sites listed isn’t good. A major search engine has to start drawing the line somewhere. No one could continue to simply index page after page of low quality websites, especially directories.
You say that like it’s a bad thing:)
How many people ACTUALLY search for a directory anyway? Answer, not many. They are so low in demand that Google shifted its DMOZ clone off their main page years ago. Check your log stats, even DMOZ sends next-to-nothing.
Besides, why on earth would/should a SE list a directory page?? It would be a link to more links! There is no longer any need for this 2-step approach as SE’s are so much more advanced than when directories WERE popular.
Actually…that’s not what was said at all. That’s what you chose to read. The key sentence is actually here.
It was dropped because the site owner made a mistake. Not a spammy mistake, and certainly an honest one, but still a mistake.
That’s not BigDaddy.
That’s not Google crawling/not crawling/indexing/not indexing.
That’s not Matt pretending to be God and striking down upon some site that apparently doesn’t deserve it.
That’s a webmaster relaying a message, unintentional as it was, to Google asking for a removal.
So there’s a perfectly good reason for Google to remove it…they were asked to do so.
Matt, thanks for this update, I have to say, this confirms what I’ve been increasingly suspecting about a vast majority of those webmasterworld posters who have been complaining about these specific issues, and it fits exactly with what I saw over a year on another search forum I did for a while.
Especially amusing was the guy who had 10k indexed then dropped to 80, I’ve read him, he comes off as if he’s lilly white, and there’s that typical spam garbage.
This isn’t your problem, but I think wmw’s policy of not allowing any reference to the site in question is starting to seriously damage the viability of their search forums, especially their google forums. As you found, and as I’ve suspected, quick checks showed the weaknesses easily. That’s exactly what I found over a year of doing site checks too, that’s why I stopped, it got boring and predictable.
But still very glad to read the updates on this, I’ve been following that supplemental nonsense for a while, I pretty much ignored the big daddy indexing stuff because it was pretty clear what was happening even without being able to look at the sites in question.
I don’t envy you your job at all, having to dig through this stuff all the time.
Too many comments, didn’t read them, no need, your post was pretty concise.
Here is a question about quality “earned” links and recip or poor quality links. I am a web developer, and I create amazing websites that are linked to from all across the web because people are talking about the design, or functionality or other legitimate means.
Now I have a credit for work link as an image (My companies designed by logo) on each site. Thus in essence each of these sites is a backlink for me. By itself is this a good link?
Now here is the other thing – I also like to place my designs in my online portfolio for prospects to view (Im showing off my work) – and this usually includes a “visit this website” type link so they can see what the site looks like in real time and what the client is doing with. Have I not in essence created a reciprocal relationship here? Will these links be discounted in some way, were they poor quality links to begin with?
This relationship seems very natural whether link popularity or pagerank existed or not – companies would put their credit for work logo on a site, in hopes that others who appreciate the design would see the designers insignia and hopefully hire that company. And of course we artists always want to show off our work.
So what’s the deal? I know all recips arent bad – but where is the line?
Addition to above, and I failed to mention it – I am not complaining.
I rank #4 out of 153,000,000 in Google for my most important targeted term which is quite competitive, like I said, I’m not complaining. However after reading this, it almost made me want to remove the links in my portfolio to my designs, or put a no follow on or something. And what about my discussion forum? I run a vBulletin forum on my site that has thousands of members, all of which are also web designers or clients. They are there to post and chat and learn about web design, show off their latest projects etc. Now they have signatures so people can view their work, and they are html links. Thousands of them also link back to my site with varied anchor text, usually along the lines of “proud member of” that sort of thing. How does this sort of thing play out with reciprocal links in this situation?
I am not going change my portfolio of course, because they way I have my portfolio set up makes sense to me and it is not about link pop for my clients its about showing off my work. But my forum is another matter – we try to keep it as clean as possible and have great mods that kill spam right away, so I am still confident in the quality of my members signatures, but should I let posts like this scare me – is there a benefit to my users (From a link perpective to having the links in thier sig, is it a detriment to my site?)
I mean no matter what the members would still post if I took sigs away, they are their for the education and a sense of community, but what bugs me is that I feel like I have to do something special in fear of loosing my google rankings, that “If search engines didnt exist” I wouldnt do. Signatures come stock in vBulletin and members like to get creative with them and use them, they have fun with it. Do I need to alter this natural element to appeas the great G gods?
Before you try to dig through all of the comments, I’ve just written an executive Summary of Matt’s comments on reciprocal links at http://www.ahfx.net/weblog/83
Doug Heil >>> No one could continue to simply index page after page of low quality websites, especially directories.
I don’t think thats a good idea – shouldn’t it be more like, index but don’t rank if something is low quality? After all, a SE can go wrong in what it thinks is low quality – but me as a user would prefer to occasionally go through the first 100 pages in search of that one thing I am looking for. I would prefer if its there somewhere, and someone else doesnt ignore it altogether. Even the directories – why punish when you can’t be 100 % sure?
Search engines were meant intiially to index everything they could, but rank as they judge. Unless of course you have other issues like overload and unmanageable data…
h2, I completely understand the policy on WMW. You can’t go into specifics without it quickly unmanageable. The other thing is that those were the five cases that I dug into. But Adam found several domains that we’re digging into more, for example. I asked someone to dig into your domain for example, John. But John, bear in mind that we only show a subsample of links that we know about.
Dossy Shiobara, I see that on my blog sometimes too. It’s natural, because if we see the same article/text in two places, we pick the more reputable page (which is the root page of your blog, in this case). I wouldn’t use a noindex tag; you might consider putting fewer articles on your root page though. That would more quickly put the right text onto the individual pages.
Jack Mitchell, you said “Saying you can’t do reciprocal linking is just sheer idiocy. How does Google expect you to get back links?” I’m not saying not to do reciprocal links. I only said that in the cases that I checked out, some of the sites were probably being crawled less because those reciprocal links weren’t counting as much. As far as how to get back links, things like offering tools (robots.txt checkers), information (newsletters, blogs), services, or interesting hooks (e.g. seobuzzbox doing interviews) can really jumpstart links. Building up a reputation with a community helps (doing forums on your own site or participating in other forums can help). As far as hooks, I’d study things like digg, slashdot, reddit, techmeme, tailrank to get an idea of what captures people’s attention. For example, contests and controversy attract links, but can be overused. That would be my quick take.
And now my cat insists that I spend some quality time with her before going to bed.
Matt, you have the patience of a Saint.
It matter not what you write, many here will only put on their selective reading glasses anyway.
The funny thing is, most of what you write is just plain old common sense.
Matt, It sure looks like you’ve got your hands full here with all these posts. I tried to read all of them, but they were just too many. It’s funny how people (or should I call them, concerned searchers) view Google’s efforts in providing quality results in the SERP’s.
Even though I have my own battles as an SEO and SEM marketer, I have to abide by the rules and make sure my client sites are ready, not only for Google, but other SE spiders as well.
In my mind Google is doing their utmost to keep on providing accurate results based on the search terms. Why would you destroy the very “kingdom” you yourselves have built. Surely you’ll want to maintain your position as the #1 Search Engine worldwide ??
Anyway, great article Matt. It does indeed explain a lot. Thanks.
Matt,
Say that I did recip links in the past and now I decide to remove all the links. How long does it take for Google to know the change and adjust my ranking (crawling priority) accordingly?
Steve
Hi,
a question regarding cache. Most of caches pages from my site are dating from last february. Since I have change URLs. Olds one’s are redirected to the news one, but as google bots don’t like my site anymore I have loose 6000 indexing pages and no new pages are indexing since bigdaddy update. So I have 2 questions :
- why my caches pages are so olds (most of them were uptodate the day before bigdaddy update)
- how make google love my site again ?
Phew, glad you cleared that up about reciprocals otherwise I’d be deader in the water.
I deleted an old http site map and added my www. sitemap as nothing was gettin indexed. Still nothing getting indexed. Did I drown myself by deleting the old map?
Do I need to do a resubmittal form?
My site only has two links in google and theyre both from the same place! I know I need more, but is it the links or the sitemap keeping it from getting indexed?
Dave (Original). Coincidentally, I posted this in my forum just a few minutes before I read your question.
Doug Heil. The point is that Matt’s assessment of the health care directory site (and he examined it) is that it needs some more IBLs for Google to crawl and index more of its pages. It isn’t just any directory site that we can generalise and guess about – it’s one that Matt examined, and that was his assessment.
You said that, “No one could continue to simply index page after page of low quality websites, especially directories.” I don’t disagree with that, but it would depend on the definition of “low quality”. Matt said that the health care directory looks like “a fine site”. It doesn’t sound low quality to me.
Dave (Original). I’m sure you are mistaken about the usefulness of directories. Niche directories can be very useful, and some people really do use directories, so for some people, they are useful. Either way, they are not low quality sites by definition.
I dont know whats going on now but if I do site:www_mydomain_com
It says results 1-2 of 68, was about 320. Even though it says 68 its only showning 2 pages, homepage index and the an attached forum index.
Now that isnt ussual is it?
One of those is Supplemental too
Caios
This:
link:www.shacktools.com
Maybe because of this:
link:www.shacktools.com
Is Matt Cuttsa Matt Cutts – or someone pretending to be Matt ?
Thats true but why index over 300 pages and then remove them? market links link checker shows about 300 links through MSN. Also site: says 68 pages but only shows 2?
I know I need to build links to get my site indexed but I personally would rather not spend all my time doing that. With there being less sites out there indexed then that means that even if I did have more links out there then they are surley less likely to be seen by google.
Matt:
Over the past few months, I’ve submitted many Spam Reports (basically whenever you ask for them here) on a competitor’s site that is using Hidden Text, yet that site is still in the index. This site has been doing this for at least the past three years (that’s how long I’ve been in competitiion).
When Google finds SPAM on an internal page, is just that page removed from the index or is the entire site penalized?
It’s just frustrating. I keep submitting the reports, and yet the site still remains. Feel free to e-mail me at the address provided and I will supply more info if you’re interested in details.
When I go to a library to do research, I do NOT care how many people have read or checked out the book that I am looking for. I only care that the book is relavent to my research.
When I am doing research on the Internet, I do NOT care how many people have linked to that site. I only care that the site contains the material that I am researching.
TOO MUCH emphasis has been placed on links coming to a site. TOO MANY sites with excellent and exclusive content are being left out in the cold because they have no incoming links.
The Internet is about INFORMATION. By putting all their trust in the incoming links, Google has made the Internet all about POPULARITY.
That is NOT what a search engine should be concerned about.
This is my first post in your blog Matt and thanks for giving us an opportunity to spell out our views.
I’m talking about affiliate sites.
Individuals, who do not have the capacity to go on for something big, has no other option but to continue affiliate marketing for simple reason of earning few bulks.
Mostly optimised for less common keywords, these sites would provide products/services for a small group of people.
Google policy suggests W/Ms to think “whether you would do that if there were no search engine”. Needless to say, it makes no sesnse for a T-Shirt affiliate site to give history, origin and such unnecessary things about T-Shirts just to satisfy Search Engine Bots.
It is really difficult to change the description of the products like T-Shirt and such sundry other items, though that can be done using scripts to change words like “this is” to “we have” and to try and befool the crawlers.
Probably, it is possible to add valuable content for sites that offer a domain name registration/ web-hosting service through affiliate links.
And who wants to drive hard earned traffic to a different site and with the fear that the traffic may never come back and that too being well aware of the fact that had it been a direct sell, the publisher could have earned much more! It is nothing but compulsion.
So far as the value that such sites add to the internet – an analogy may explain it. Why do we visit a street-side small shop, when everything is available in a big shopping-mart, undoubtedly providing the best comfort.This is the very basic human nature and really difficult to explain.
And finally, it is rather easy to befool a crawler(!) but not a bonafied customer. Just because there is some links on a website, a customer is never going to buy anything from that site unless and until he gets something meaningful.
So this is my small request to let the visitors decide what they want to do, whether to buy it from the Principal site or go via an Affiliate site.
Regarding this issue, your earlier stand seems to have more sense where affiliate sites would have come in SERPs for rather uncommon keyphrases and Pricipal sites would enjoy the traffic for more commonly used keyphrases.
Thanks again.
I saw that too. I think he went Italian.
Heyyyyyyyyyyyyy CUTTSAMATTAYEW, huh?
Hi Matt,
Great post and great information!
Ok some say there is no such thing as “sandbox” but there is a holding cell. I have been working on a site for nearly a year and a half now and I still can’t get the site to rank- even for the most unused/stupid keyword. I do see tons of supplemental pages in the results and your explanations seems to fit the bill. But still don’t understand why it won’t come out of the holding cell. Was there something new in the update for new domains that keeps’em in the cell longer other then getting quality links and everything else I know?
You can email me for more details if you like.
Thanks,
Beth
Hi Stephen
“Is Matt Cuttsa Matt Cutts – or someone pretending to be Matt ?”
I’m sure it was Matt. Its his style 100% .
Yes, looks like his style – but also not his style in a way.
EG. It looks like it is has been made to fit his style but some things dont add up ( He has two cats for a start
)
Caios
Let me put it like this:
Either you play the “Backlinks Game”..or you don’t play at all
Stephen
“EG. It looks like it is has been made to fit his style but some things dont add up ( He has two cats for a start
) ”
I know. But it seems that it was Emmy that insisted that Matt spend some quality time with her.
While J.D guy might have had more important things to do than to waste his time on Matt
You know women always need more attention than men
Harith
You might be right JD is probably still just running around the house chasing laser pointers, its tail etc.
Matt
Any chance of an update on the PR situation ?
As has been noticed at WMW and other places. The last PR (Early Aprl) update only seemed to effect some sites and no ranking changes were noticed as a result of this update. (OK this might be hard/impossible to notice anyway – but from the outside it just looked like the last PR update was purely cosmetic)
Other pages kept the old PR which was probably updated around February time…..
Unfortunately, it still sounds like sites that wouldn’t naturally collect large amounts of links aren’t ever going to be spidered/indexed completely.
I don’t care about ranking at this point; if I can get the pages into the index in the first place, I can *make* them rank. I just can’t get them back in.
A software site has a “these people use The Widget” page linking to their customers’ sites.
We’re thinking of doing that, and probably will. Our site will probably drop in PR, maybe even disappear from the Google listing altogether, because most of those links will be to sites not relevant to our site because those who are interested in our product would not also naturally/automatically be interested in the products of the sites we link to. We get sales by other advertising and word of mouth, so SERPs really don’t matter as much to us as they might for others. Our concern is for our customers.
Question: If we are penalized, will that also penalize our customers?
[quote]
Matt Cutts Said,
May 17, 2006 @ 1:49 pm
…
Alex Duffield, in my experience those links aren’t making much/any difference with Google….
[/quote]
Matt, I am sure you know better than me, but the fact remains that the site I pointed out comes up number 1 for Many searches (rafting BC) and in the top 5 for just (river rafting).
I manage the site for one of there competitors, and have kept an eye on these guys for many years. Befor they started participating in this sort of link scheme, they did not recieve this sort of ranking.
There site does not include nearly as good user valuable content as any of the others in the top 5.
My main concern here is that my clients think they should (need to) also participate in this sort of linking scheme in order to compete. I insist that good content, a well designed site combined with regular updates and good (honest) linking is the better approach. I have pointed out that Google guidelines clearly stat that “Linking schemes designed to improve PR” are against the rules and tell them that in the long run they will get burned, but I fear I am slowly loosing the battle against the fact that it does work.
All I am looking for is some ammunition to convince my clients against this coarse of action.
Netmeg, as a major change in crawling/indexing, I expected to see some people say “I’m not crawled as much.” Somehow the people that are crawled more never write in to mention it.
But we take the feedback and read through it. I’ve been talking to someone in crawl/index about the future of the crawl, for example. We keep looking for ways to make the crawl better.
Stephen, I haven’t asked around about PR lately. Yes, the one cat is much younger. He can keep himself busy with of string for an hour. It’s the other cat that often demands attention.
Spam Reporter, a lot of the time we’ll give a relatively short penalty (e.g. 30 days) for the first instance of hidden text. You might submit again because sometimes we’ll decide to take stronger action.
Hi Matt
Thanks – sometimes I wish I was a cat – less stress.
I have sent another email to the Boston address and a follow up as it looks like someone had a look last night but no reply – OK – perhaps I should be more patient.
I dont know if you are looking into the site deeper, just ignoring my site, or what ? It seems to have regained PR at the last change – but still suffering from a penalty – I have given more details of perhaps why in the email.
Cheers
Stephen
Hi Matt,
A gold mind of stuff, great.
However one question. I understand the principle of relevant OBLs and improving the visitors experience, but here is a quote
“Moving right along, here’s one from May 4th. It’s another real estate site. The owner says that they used to have 10K pages indexed and now they have 80. I checked out the site. Aha:
This time, I’m seeing links to mortgages sites,”
I can’t see how linking to a mortgage site from a property site would not be deemed as a relevant link and not improve the visitors experience.
I would not be able to buy a property without a mortgage and my guess is, that this would aply to most people.
Is there no slack for cross subject linking?
I have a “Breakdown Recovery Site” I have a lot of information, regarding cars and motoring and driving holidays. It is not directly related to your car breaking down, but it is related?
Thanks
Mark
Being someone who consults to many companies there is a need from Google to avoid everyone from spinning wheels and wasting time, I speak for ALL website owners & even folks at Google dealing with all the questions.
I had to deny a Google Paper Publications advertisement as I was not sure doing more advertising was good or bad, I feel like anything I do could cause a penalty. So in the end I turned off my Google Adwords account 100% and will not use the Google publications anymore for advertising. All going to the other large players. (Why would anyone at Google want this). I just can not figure things out with Google. (Investors must love this part) Thus Google does not make an extra few K now at least. My other clients, all off! Now we are at about -15k a month for Google. (And this was my professional answer, do not take chances)
Please (if possible) find a way to let folks know that there is a penalty, makes such sense, everyone would save time, all would win.
The way it stands it appears search engines do not want to tread in these waters, but be professional and let people know, get a great lawyer, write a disclaimer, save everyone hours of time.
Is it a dream?
i built that t-shirt site matt said wasn’t interesting to my visitors. well, my bookmarking rate is 15-20% monthly. so, the users! find it interesting. i just put together stuff i liked, and users didn’t have to go around looking for this stuff for days. just a fashion magazine. was hoping that google would be in the business of “indexing” not editorialising… the affiliate links are nobody’s business but mine, it’s legal. some of the content has been provided by business partners and syndicated on the site – this is also legal.
matt, i sent the specifics of my website traffic to to the original email address if you need the proof of what IS and ISN’t interesting for people that search for this info.
Matt,
There appears to be two datacenters that show a completely set of results from the others. I believe these datacenters are the original BD Dc’s, not sure. Would you expect these to spread? My real question is at what point in the timeline would you expect some stability and consistency across all datacenters.
Thanks Matt.
Chris
MATT, Can a site get fully indexed in time without having to get links? I know you can get indexed faster with them, but I want to know if a site never gets links (they do have a page indexed though) will they ever get indexed or will they never see there pages(entire site) in Google until they do get links.
hey matt i didnt hear back from you regarding the adult listings as mentioned above..
I think this topic needs some investigation. I see a continuing trend of freshly expired NON-adult domains getting insane rank for adult serps while older established adult sites are pushed further and further down the list.
The talk amongst the “adult” seo community is the only way to get good google rank for adult these days is by buying or getting your links on NON-adult mainstream pages..
This practise makes everyone looks bad and in continuing the way google is operating it is really harming the non-adult community..
The top 100 listing for most adult terms are filled with SCHOOLS and EDUCATIONAL domains that recently expired. Banking on the fact many other schools will still have links up to the expired domain..
So now all we have done is made the serp’s irrelevant , and shown alot of porn to kids and unsuspecting people , all for some google rank..
If that wasnt bad enough the rest of the results are guestbook spam of adult links on mainstream results. If google didnt reward these spammers , they wouldnt attack innocent sites with automated software just to add their adult links..
So by continuing to allow these sort of methods google is creating a problem where one didnt exist..
Ironically enough, two or three years ago we had to contact Google to throttle back the crawling on one of the sites that so concerns me, because it was being hit way too hard at the time. Oh, for days gone by…
Matt,
You seem to have referenced the fact that sites might be penalized without being banned, and I know the question has come up a couple of times, but I’ve never seen a clear cut answer on this. Is there such a thing as “penalizing” (drop in position but still listed), and if so, is doing some of the stuff you discuss here, such as recipricol linking, a possible cause? I’m not talking about linking to spammy sites, as you have been clear on that, but what abnout recips in and of themselves?
For instance, I was told recently that I should submit one of my sites for a particular award. I’m pretty sure that receiving that award means being listed on the list of award winners with a link to my site. Are you saying that if they link to me, great, but if the award logo hotlinks back to them (and thus becomes recipricol), not only would the link from them then become worthless (well, aside from the ego boost I know I’m going to get if I win.
), but that it might actually hurt me?
I somehow doubt that’s what you’re saying, but it certainly isn’t a clear cut issue.
Thanks.
-Michael
Hey Matt, how about mentioning something to the sitemap folks about adding a feature to get rid of dead/404 urls from a site. Google seems to take forever to rid itself of 404 pages, could be a great asset to Google and webmasters if there was a functional way to remove urls via the sitemap system? Kind of a dumptheseurls.xml anti-sitemap deal.
Cheers,
John
it seems google has forgot a basic fact, webmasters are the net, and google is a bridge between users and webmasters, users opted for google because it gave them the most in-depth and the most choice when it came to searching a certain term, after finding a site through google, users decided later on which site to use or bookmark, now it seems google is trying to choose for them what they should see and what they shouldn’t see, as somebody mentioned this is editing content and not indexing content.
I have a question, lets assume someone had a website with a lot of original and unique information, yet at the same time they were involved in heavy and excessive link exchange to generate traffic from other sites (like the net old days, exchange links for traffic), will you curtail that site valuable content from millions of users because a dumb crawler saw lots of links?
The sad fact is thousands of webmasters have lost thousands of pages, and millions of people have lost tons of information, because a bunch of spammers have decided to manipulate google, while manipulating search results could be and is a serious problem for google, the way big daddy have been designed to solve it is not proper.
Google mission was supposed to be organize the world information,
however I believe the mission has evolved into “editing the world information according to a blind algorithm, because we got blinded by spam!”.
Hi Matt,
With over 200 comments it is time consuming to read through each one so I apologize in advance if this question has been asked.
With regards to Web design companies, it is standard practice to insert “Designed by Company Name” etc along the footer of our clients pages. No suprise there.
Now, usually these links appear site wide. What impact do you find these will now have with the recent updates? Is there a better process that Google perfers to have our clients credit our work?
Michael VanDeMar, yes, a site can be penalized without being outright banned. Typically the reason for that would be algorithmic. I wouldn’t worry about being listed on a page of sites winning prizes though, unless it’s the Golden Viagra Mesothelioma Web Awards.
Netmeg, I’d like to see us provide some ways to throttle crawling up or down, or at least give preference hints.
Adultwatcher, don’t take a non-reply as not reading it. I did pass all of those on to ask how some new things do on stuff like that. There’s a part of our pipeline that I’d like to shorten, for example.
Relevancy, I wouldn’t count on getting a large site fully indexed without any links at all. We do look at things like our /addurl.html form and use that, so it’s possible that a smaller site could do it without links.
dude, I didn’t mean to cast stones at that site. Someone who gets to the site can certainly buy a T-shirt from different brands. But at least some of your links are from stuff like an “RSS Link Exchange” and those links just aren’t helping you as much.
Bruce, that’s your call of course. Advertising with Google wouldn’t affect your either way though (help or hurt). I think we’ve talked about your site; 1-2 of the pages on your site, plus the “sponsored by” link plus the “Search engine optimization” message on that page would be where I’d start.
Stephen, Adam is going through all the emails. He’s writing back to the ones that he can, but he can’t write back to every single one; I need him to do other stuff too (e.g. keep an eye out for other feedback across the web, learning more spam detective skills, etc.).
I dont think I have ever seen so many comments on one Cutts blog.. this will make comment #263 (sorry Matt if this comment violated your Guidelines on Comments) but just wondering, did this thread break a record? Whats the highest # of comments a Cutts blog has seen?
Matt can you explain why sections of our website are now showing supplemental results for almost every single one of our home listing detail pages. Each of these listings are unique and required to be on the site if we want to make our visitors happy.
The system houses about 25k listings all with unique information. I guess I dont understand why these pages should be placed in to the supplemental results.Site: (mygorealty.net)
If there is a problem on our side we want to correct it. If it is a problem with Google it might be nice to know what that problem is so your crawl/index team can correct it.
Matt:
I’ve read your post with interest.
About four or five days ago, I noticed that google had dropped all but four of my pages on my new site. Now, four or five days later, it has dropped them all, besides the index page.
Shocking and unexpected and, for me, unexplicable.
My site is almost 100 percent original content and even though I do feature an occasional affiliate link, it is certainly more content-oriented than affiliate oriented.
so….after reading your comment policy, not sure how to phrase my question so that it can be general interest but here goes and i hope it flies……
if a site has almost 100 percent original content, will a few affiliate links cause google to stop indexing it?
Thanks, neva
matt, link exchanges with relevant sites for small guys is the only way to get traffic. please discount the exchanges, just not factor them into your results. if a site can’t get to the top of the results, exchanging links is one of the few means to get some visitors. automating the procedure kinda makes sense. cheers.
ps: i have approved every link on the exchange with the goal of not getting random visitor, but a targeted visitor. granted, i didn’t get too many as the result from that one, so i am not doing it anymore.
Matt, firstly thanks for your further comments on supplementals
There has been a considerable amount of ’statement of fact’ in forums, although it is probably rather more speculative, but does Google take into account things such as age of site, time spent on pages by visitors, percentage added to favourites, number of years a domain is registered with the domain provider, etc, or are these all academic to ranking and indexing?
By the way you mentioned that supps. may be caused/affected by, higher ranking pages being indexed by priority,but I notice that there are some very high ranking websites with many supplementals
Thanks
Matt,
Thanks, no, it’s not a Grande Cialis Hair Loss Award, but it isn’t strictly related either. Loosely they tie in, but it might take a second to see the relationship.
As for the penalties that you mentioned… would an email from Google indicating that you were not penalized or banned cover those? Or might you be anyways? And would that class of penalty be something that might get mentioned in the Sitemaps Penalty Awareness program…?
Also, I’d like to know the answer to Joel’s question too… is this a record comment count for the blog?
Thanks.
-Michael
Matt,
Sorry for double posting, forgot this one. This is a long post to read with the comments, and I think that if you start from the top without reloading, and then go to comment, the security code might time out. Didn’t happen this time, but it has in the past. You should really make it a forward-only process on missed codes, retaining what has been typed to the next page, to keep people from having to retype comments.
-Michael
So is there a way to know if we’re linking to what Google “thinks of as a bad neighborhood”? Also I am interested in what someone said about sites such as coupon sites. Obviously these sites don’t have original content. Will linking to other coupon sites still help you? Also, on a mall site how can any link not be related?
Matt,
I quote:
Yup, exactly, arubicus. There’s SEO and there’s QUALITY and there’s also finding the hook or angle that captivates a visitor and gets word-of-mouth or return visits. First I’d work on QUALITY. Then there’s factual SEO. Things like: are all of my pages reachable with a text browser from a root page without going through exotic stuff. Or having a site map on your site. After you’re site is crawlable, then I’d work on the HOOK that makes your site interesting/useful.
I find some real advice: Word-of-mouth = Be popular and get links from related sites. Factual SEO = get the tech right (and clean). Work on the HOOK = Try to be interesting in your own way. (We don’t care about this)
One thing bothers me between the lines: “Return visits”. Please tell me that you are not tracking and using return visits as part of your algorithms.
/chris
I may be about the only person in the world who feels the way I feel about this issue (and those who know me have heard me say this before), but I’m still gonna say it.
Unless the work was non-commissioned (which is highly unlikely), putting a hyperlink on a client’s website is tacky and unprofessional, and deserves no real credit. It’s like watching a Ford commercial and seeing the logo from the ad agency who designed in the lower right-hand corner.
Personally, I’d like to see no credit whatsoever given to these links. It does no benefit to the customer and goes against the whole organic link concept. If there were ever an “unnatural link”, that would be it.
Google has been my preferred search engine for many years, however in the last year or so the results seem to be getting worse and more irrelevant, with other big search engines results are improving. I fear that because Google has become the number one search engine it has made itself number one target for financial gain for web authors through ppc, very much like Microsoft Windows became the number one target for hackers.
Its quite frustrating that unique specific content isn’t enough to get ranked on Google, it seems that you have to get links regardless of their quality or relevance.
Personally I don’t do reciprocal links, if a site wants to link to me then great, if I think a site will be of interest to my visitors then I will provide a (nofollow) link.
I hope that Google will sort this Internet search mess out, the sooner the better, my suggestion is to penalize; directories, duplicated content, automated sites that have thousands of pages etc etc
Very nice to see you take the time out to communicate stuff like this Matt —-- definetly worth the time to stop by and read….. Thanks!
Matt, Isnt the point of your addurl and sitemaps program supposed to help get sites indexed? If that is the case what is the point of them if it takes links to get indexed?
-- You will see the site but not index them with addurl and sitemaps? I know sitemaps does more, but it’s original point was to help pages get indexed. Now it doesnt help that.
Google sitemaps is a great idea and the perfect mechanism for Google to communicate to webmasters.
It should have no effect on rankings, but merely act as a mechanism to inform Google of the the structure of your website and any new pages that may need crawling.
I understand if there are 2 pages that talk about something and one page has tons of links and the other doesn’t.. that site should rank higher, but if there was a site that had tons of dedicated pages about that term with no links(maybe its new or a mom and pop) it should at least be indexed and judged on its relevance and merit.
Matt, you would do yourself and everyone else a good service by not allowing a lot of the above confusion about “reciprocal links” to go unanswered. Just say “there is nothing wrong with Wikipedia linking to Dmoz and Dmoz linking to Wikipedia.”
You can end the FUD once and for all, and put a lot of link brokers and “three way” spammers out of business just by saying its not reciprocation that is the problem, but spam and deception and trying to pretend a site is more important than it is.
Matt,
I have another question for you. I will repost my first one here along with my second question, so they are consolidated.
#1 There appears to be two datacenters that show a completely set of results from the others. I believe these datacenters are the original BD Dc’s, not sure. Would you expect these to spread? My real question is at what point in the timeline would you expect some stability and consistency across all datacenters?
#2 I am really frustrated with the quality of serps in a few instance, where I am trying to do research. Today, I was doing a little medical research and was trying to find information about a schedule 2 narcotic. Specifically, I was searching – difference between oxycodone and hydrocodone – (without quotes). I got page after page after page of junk scraper/directory style sites with links to other sites like this one: getcreatis.com/oxycodone.html
Many of these urls in the Google index immediately redirect to the affiliate page. Total junk. I am not trying to sound overly critical, but I wanted to point this out to you. It is very difficult to conduct any type of scientific research, especially medical research, when these spammy, worthless affiliate sites with page after page of just spammy links or adsense are ranking so well. Many of the pages have zero PageRank, so I find it amazing they are ranking so well. Actually, I see a lot of pages with PR0 ranking well these days.
Thanks again for your help.
Chris
Thanks Matt.
Chris
Google Pagerank on average seems to be a good indicator of quality pages, but seems to have little relevance to ranking on Google at present, but I can understand why Google is holding back.
I’d suggest it being included as a filter when searching Google or maybe include it as a filter on Google Toolbar search at least.
My only criticism of Google Pagerank is that its very slow to update new pages. (Why dont they link it to Google Sitemaps?)
Hello Matt,
Seems like the recurring theme is that recip links are now bad. This is hard to fathom. Isn’t this the nature of the web?
Especially in my sector which is fishing charters, guides, trips, etc where you have so many many less than professional websites that will never rank that high.
Now when they come to me asking for a link trade I have to deny them for fear of suppressing my ranks?
These link trades for these usually poor charter Captains that barely eeke out a living are their life blood of the Internet and now I am going to tell them “Sorry Charlie” no links because Google doesn’t like it.
OK I have good links and don’t actually need a link back so since you are freely spouting out great info and insight can you take it a step further and let us know the heads up on linking to sites like I mention, one way.
I want to continue being “friendly” to the charter Captains and guides that are struggling to survive, so will sites that only link out to relevant resources have a “drain” or negative affect on their respective websites?
I certainly understand that a fishing site linking to a credit card site is bad and that would be an obvious sign of link laziness or just someone trying to manipulate the system but a fishing information site trading a link with a fishing charter site should be considered what makes the web go round, no matter how many times this is done.
Anyway have a nice evening and I wish I could have caught this post when you first wrote it. Thank You – Joe
Hi,
Matt,
If the sites you stated above as having poor links, if the outwards links had a rel=nofollow would it improve the number of pages indexed?
1)Also I had a directory, with 15000 pages listed, I have been hit hard and now only have 600 pages listed, (do you not like directories). (the site contains very few outbound links).
2)Also directories would principly have links coming in from all cateogries they list, so which category could be taken as relevent for a directory.
thanks
PhilC, I was actually thinking more along the lines of not enough hrs in a day/week/month/year to index ALL pages out there. It would appear that Google NEEDS make a choice in many case and what Matt describes would fit.
I’m not trying to say ALL directories are of no use, just that the number of people (in the scheme of things) that they are useful to are low.
Hi Matt,
Here is an idea for Google to sole its reciprocal linking problem. Why not only give value to a certain number of defined reciprocal links. If Google promised us that only 100 reciprocal links from our site would count then it would seem to solve a large part of the problem. To make things simple these could all be put on one page- with a specified name. Of course a site could have as many outgoing one way links as it wanted and as many reciprocal, but only 100 would count in terms of the search engine. This would only apply to reciprocal links- all other links would be as they are now. A number of different similar schemes could be thought of- what about 25 per a year. We would be a lot more careful about reciprocal linking if we new we only had a certain number and that those links in a sense defined our site – as well as the one way links the site was able to attract.
YES!!
[quote]I may be about the only person in the world who feels the way I feel about this issue (and those who know me have heard me say this before), but I’m still gonna say it.
Unless the work was non-commissioned (which is highly unlikely), putting a hyperlink on a client’s website is tacky and unprofessional, and deserves no real credit. It’s like watching a Ford commercial and seeing the logo from the ad agency who designed in the lower right-hand corner.
Personally, I’d like to see no credit whatsoever given to these links. It does no benefit to the customer and goes against the whole organic link concept. If there were ever an “unnatural link”, that would be it. [/quote]
No Adam; you are “not” the only one who feels “exactly” like that. I find it “extremely” Unprofessional for a design firm OR SEO firm OR both who stick their links in the footer of client websites. It’s so bad. It’s very amateurish and not only looks bad for that site the links are on, but looks bad for that designer/SEO as well.
Not only all the above, but that particular client is “unknowingly” linking to a SEO or designer without the full and clear knowledge of what linking can actually mean in the long run. That firm they link to could get caught for spamming, or be deemed a ‘bad neighborhood’ firm, which would indirectly affect the poor client who is linking.
We all hear all the time in this industry about SEO firms/designers practicing “full disclosure” to clients. What does that mean exactly? Does it mean that as long as the SEO asks the client if they can stick a link in the footer, then it’s perfectly fine? This goes for any technique the SEO claims they do for clients and then trying to explain it in “full disclosure”.
What this industry does not get is the fact that NO way does the average joe client understand all the ramifications involved with “anything” their site is doing, whether done by the SEO or done by the client. It should be up to OUR industry to educate that client and then blame ourselves for the bad SEO’s/designers in this industry. We shouldn’t be giving free passes out to firms who show Unprofessional-ism day in and day out. But you know what?… we sure do hand those free passes out very freely.
Getting back to the links in footers….. can you imagine seeing a link on Google.com in the footer that says:
“Designed by Church of Heil” LOL
or
A link on Sony that says:
“Consulting by Doug”
with a link to Doug’s website?
Why do firms feel the need to jeopardize client websites in this way, and feel the need to advertise in such a cheeky and unprofessional way? I’ll never know.
(Editing disabled while spellchecking)
Stop spell checking
Matt:
Just submitted another SPAM report with your name and my name (above) in the message box. Please have a looksee and take action!
DUUUUUUUUUDE!
Do you have any idea how long I have been waiting to see someone actually get this? Just to find one person who truly understands the ramifications of these links and the potential negative ramifications of such?
This post is truly a thing of beauty. Other designers/developers/SEOs, read and take heed. SEO reasons aside, all the marketing stuff Doug mentioned here is reason enough not to do this.
Hey Matt, would there be any possibility of a future blog post or at least a comment on this, since it’s one where a large percentage of your readers would be interested in it (including the three who posted about such)? I wouldn’t go postal on you or claim you’re an asshole or anything like that if you didn’t, but we (and I say we because there are at least two of us who asked) would love to hear your take. Thanks in advance…and if not, thanks for posting stuff like this that lead to the tangental thoughts that others have.
It seems that quite a few people who are complaining about having lost indexed pages are stating it is due to reciprocal links. Well, I can say that after some research we finally found out why our site was not Googlebot friendly. We fixed what we thought was our issue and went from 3500 indexed pages to 80,000 indexed pages in short order. Well, now I am down to about 600 and seeing this go down the last few days. I was enjoying the traffic while it lasted. The kicker is that we do not have any reciprocal links at all. I have some one way inbound links I have been working on obtaining but no reciprocal links. So I wonder if you do not have enough inbound links if that hurts as well and will cause you to lose indexed pages?
In my eyes Goolge IS BROKEN it’s been ruined. Forget about SEO
It’s unreliable, the results are from the easy to sort through and find literally any page indexed google that existed 2 years ago.
I no longer can find anything that I seek in google. If I do it’s 20 pages deep. Your algo and focus on combating spammers has maken quality results and the basis of what google started as tyake a back seat.
PLEASE for the sake of a decent search engine enough is enough with exlcuidnf results, and deciphering whos’ back links are valid or not.. There are many quality bl’s in my eyes that don’t even get counted by you guys or get very little credit given.. like a user reccomending a solid site they found useful in a forum by providing a link to it, that to me is one of the best most valuable ways to determine if a site is worthy..
Regardless I could go on for an hour..
Google is broken. I can not find anything i search for, and any of the hard work put into a few quality sites that have been around for years is being negated and de-indexed page by page day by day
Matt--
As much as I often disagree with his analysis, Phil C. is essentially correct.
Google is dividing the web into “haves” and “have nots.”
It is no longer enough to build a decent, spam-free, original content site. Now you need to attract links from major players or your content is not worthy of the index.
Shame.
It seems to me that you guys are trying your best to stunt or at least ignore the natural growth of the web via your new selective indexing policy.
Might it have something to do with a capacity problem? Let’s ask the boss: “We have a huge machine crisis – those machines are full”.
Matt we all know how hard you work, but you’re beginning to sound a little bit like “Baghdad Bob.”
Be well.
I find it “extremely” Unprofessional for a design firm OR SEO firm OR both who stick their links in the footer of client websites. It’s so bad. It’s very amateurish and not only looks bad for that site the links are on, but looks bad for that designer/SEO as well.
Why stop at web design/SEO? Let’s remove logos from all products – from cars, tins, clothing, computers, etc. After all, logos are unnecessary – why the need to advertise the company who made the product (just like web designers advertise they made the page the person is reading)? What’s the difference between developing a car, and developing a website, in that respect? Why is it OK (in your mind) to have your logo on a car you created, but not a link on a website you created?
I would respect your opinion if you actually stated WHY it is unprofessional. I think a discrete link at the bottom of a page is fine – it’s actually doing a service to the reader – they may LIKE the way the website is laid out/designed and want to know who made it. Sure, you can just put the raw text of the web design company’s website address at the bottom of the page, but it’s hardly friendly to force users to copy and paste links into their browser rather than simply click on it.
Why do soooo many base the success/failure of Google on their site(s) position in the SERPs? (that’s rhetorical guys)
I wouldn’t mind betting that Google has indexed more pages than ever since Big Daddy.
“Google is broken”, “the SERPS are crap” yada yada yada all boils down to “My site isn’t ranking like I want”.
Matt Cutts: re wmw and looking at sites: imagine talking about paintings without being able to look at them, by policy. Then you’ll get art critics getting into big arguments about some piece of garbage, without even realizing that the painting is garbage. I think once you take an absolute position like this, year in and year out, it begins to erode the overall quality. At least that’s what I’m seeing.
The one positive of that other search forum I did was that I finally got to see the garbage sites that people had been complaining about not ranking. At least 95, probably 99, out of 100 were total junk, spam, tricks, keyword stuffing, link spamming.
It’s a question of creativity, thinking outside the box. Lots of ways to do it. Brett has often said that he wants his stuff to be reference quality, thus no specific examples that will be different and changed in the future. But that pretends that anyone is ever going to go in and read google threads from a year, two years ago, which, let’s get real, is ridiculous, only the most hardcore of seos are going to sit reading old search threads. Only a tiny fragment of the world’s population would ever think of doing that, and an even smaller fragment would actually do it.
Anyway, doesn’t matter, the quality drop is what’s readily apparent, things move on, blogs are getting more interesting than webmaster forums, at least blogs like this one. Brett’s decision to not allow blog linking [with the exception of yours I guess] is going to continue the quality drop, since more and more authoritative sources are writing in blogs. More and more I’m getting my primary information from developer type blogs.
Doesn’t matter in the larger picture, but this particular thread/posting was really revealing to me in terms of how low the quality on wmw google forums is getting. Much of what you said was fairly obvious to anyone who’d followed jagger update, no real surprises, except to the spammers who continue to complain about getting caught by the algo.
Re the footer links, I’ve been guilty of that, more out of ignorance and laziness than anything else, I started pulling them all off sites I’ve done a year or two ago, and I’m happy I’ve done that, I agree with the poster who said how amateur that is, it is. And it’s a cheap trick.
Personally, I’m tired of cheap tricks, I’m happy to just let the cards fall where they will, if search engines like my sites, fine, if they don’t fine, if people like them, fine, if they don’t, that’s fine too. Life is too short to worry about how stuff ranks every week or month.
Ranking doesn’t bother me so much. It’s the fact that pages aren’t getting indexed that bothers me.
Matt, nice to hear you guys got to cut loose a bit afterwards.
As far as the linking goes, yea that’s understandable. I guess what I meant was that if the crawl depth is being reduced based on low quality inbound links and spammy/off-topic outbounds, someone less informed could infer that they could just not have outbound links altogether in order to avoid some of the reduction. That obviously wouldn’t be a good thing for users so I guess i was just prying to see if in relation to crawl depth, G might now also be taking into consideration on-topic analysis and quality analysis not just for the purpose reducing it.
There are sites out there with millions of automatically generated pages designed to manipulate Googles index.
There is a well known site which is showing 162 million pages, sites like these should have a penalty applied to all but its root level pages.
Its no wonder the results are in a mess.
There are sites out there with millions of automatically generated pages, designed to dominate the search results for practically every subject matter you can think off.
One well known site has 160 million pages, its no wonder search results are getting worse.
Sites like these should have a penalty applied to all but its root level pages.
Replying to Joes comments,
Reciprocal links are bad because they are open to so much abuse.
Im in the same sector as you, charter boat fishing.
The point is, there is no harm in your fishing info site having a reciprocal link with the fishing charter boat sites, just include nofollow in the link.
The link is there for your visitors to follow, not to get either site a higher ranking.
Dave (Original)
IMO, most directories are totally useless, and are there for other purposes than providing a useful resource for people. I have an very negative attitude about them because of what they are. They are are there because search engines exist. It’s just that the directory that Matt used as an example isn’t like most directories – not according to Matt, anyway. It sounds like a useful resource that is being unfairly treated by Google, AND Google is intentionally depriving their users of much of that resource. I see no sense in it at all.
Doug (Heil)
Use the HTML blockquote tag to quote. Forum-type codes don’t work in this blog
Robert G. Medford
Perhaps you have read my other analyses closely enough, Robert
Jack Mitchel said:
For me, that’s the crux of this. I said it earlier, and I’ll say it again – let the rankings fall where they may, but index decent pages – just because they are there! That’s what a search engine is supposed to do. That’s what its users expect it to do. Allow your users to opportunity to find decent pages – just because they are there.
I’m seriously wondering if Google really is short of space, as Eric Schmidt (the CEO) said. Matt said that they have enough machines to run it all, including the index, but to run what exactly? A pruned index? I can imagine a decision being made as to whether or not they keep on adding new machines and new capacity, or start being a bit selective about what they index. Perhaps Google really is short of indexing space after all.
Whatever the reason for the new crawl/index function, it is grossly unfair to websites, and it intentionally deprives Google’s users of the opportunity to find decent pages and resources. It’s not what people expect from a good search engine. By all means dump the spam, but don’t do it at such a cost to your users and to good websites.
That should have read…
Perhaps you haven’t read my other analyses closely enough, Robert.
Hi Matt thanks for update but it does raise a number of concerns. As mentioned by PhilC and Robert Medford above – I do wonder if you are not in danger of dividing the web into the ‘have’ and ‘have nots’ regarding links.
The web is a very big place and there are very many, very diverse users and publishers – some are very well skilled in the web and code etc. and many others are not (my self included). This is what makes the web the interesting place that it is – you can find real gems of information – that you really rate but which may be of little interest to the majority of surfers. There is a site out there somewhere about growing pineapples and other exotic fruit in your living room – great just a few pages with a real rough and ready look to it – but who is going to link to a site like that. With your new BD policies sites like that will disappear and we’ll be left with thousands of bland, corportate clone sites that are SEO’ed to the hilt and are as dull as ditchwater!
Try looking at some asian sites, especilly japanese to see rampant creatvity – little robots and clowns and racing cars etc and not an SEO to be seen anywhere!
Back to the main point, which is that within the web community there are those who are well connected and savvy about links etc. and there are very, many more who are not. So some publishers start off with a huge advantage regarding linking strategies and others are always at a disadvatage. If the rate of indexing is to be determined by the number of quality IBL’s they will always be at an advantage. The unconncted will suffer a double disadvatge, they won’t have the benefits of extra traffic that links provide and also they won’t get indexed – therfore they will just fade away.
The sort of quality links G is looking for are presumable; .gov and .edu links, large corporates sites, all of these give a natual advantage to a website if you are well connected and can get a link. Likewise folks in the SEO and SEM community – know their way around and can easily get links. But what about the small, enthusiast webmaster, the small business or hotel and small community sites. How are they going to get quality links to their sites. They have to rely solely on reciprocal links with simlar sites. From what I can gather from this blog these changes will wipe out all of these sites. But why – they are the life blood of the web – they are what keeps it going. G will stife the webs diversity if you are not careful.
We’ll end up with a web of full optimisted, cloned, corprate brochure sites and thousands of blogs talking abour the web in the good old days!
Anyway thats enouth of that. Have a great break – we expect to see some nice pics when you get back. Oh and don’t take any electronic devices with you – camera excepted!
regards dave
I think this is why I like MSN. They seem to rank their pages based on what the page is about rather than spammy linking techniques that seem to work in some other engines.
My site (yes talking about my site) for example – there are only 2 websites on the subject in the whole of the WWW, MSN recognise that my site is relevent to the topic, wheras Google doesn’t see it as relevant at all, infact, Google decided to drop the pages it once indexed – now I read that this could possibly be because Google doesn’t think I have enough high profile IBLs?
I’m not trying to knock Google, because I like Google in general, but I know when people search for things relating to my site, and knowing they would be glad to find it, they wont, because Google dropped the page and don’t rank it.
Just another thought to throw in – how can people naturally link to a site they can’t find?
All right Matt this a preventive intervention! I’m not asking to go into Google-purgatory, just having some fun, because sometimes you have to laugh to keep from crying.
I was reading a SEO Forum a discussion came up regarding link-bait and you, well my gears started turning (the engineer in me), and I threw up a quick blog post with my attempt at graphic arts.
I won’t spam your Blog with the link, but if you are interested, its one click away from my URL.
John
I totally agree with DavidW and the rest of the folks who wrote about dividing the net into have’s and havenot’s. At the moment it all boils down whether you play the backlink game or not. But even if you want to play that game – for some non-commercial sites with good content that’s just not feasible. If you’re in a niche like us with an enthusiasts audi s and rs models website it’s quite hard to get decent links and it get’s even harder if you’re in a niche and your website language is not English. Where should we get that many high PR links from to get a deep googlebot crawl into our discussion board topics? English websites usually don’t link to us or or blog about us. Of course we’re using sitemaps but that obviously doesn’t help as long good IBLs are missing. It’s a lot easier if you play the backlink game in the English market because it’s so huge.
Cheers,
Jan
I totally agree with Nicky.
I’ve been checking whois data on websites that have lost there indexes and those that are mostly intact or at least shown a load of old pages.
So far anything less then 6 months old has been dropped and anything older is still there or has a load of old supplementals showing.
What does this mean? You don’t get indexed until 6 months from registering your domain?
Matt, just a comment for your consideration…
If as you say there’s no server crisis or problem storing data at Google, then how do you actually see the new crawling method as benefiting users in terms of relevancy?
Certainly there’s a lot of “spam” content that wants to be indexed, but there’s also lots of new “good” content that wants to be indexed, too.
It used to be th case that you could get listed at directories, get link exchanges, or buy a couple of ads to help the spiders know you were there.
Seems to be the case that Google is intent on killing these methods.
In which case, how on earth is a new and useful site suppoed to get useful links?
The suggestion seems to be that a site must be exemplary to get the gets and indexing, but surely you are aware how difficult it is for newer sites with good content to be exemplary?
Not ranking sites for some types of links was one issue – it’s understandable – but not even indexing sites to any degree on those grounds isn’t going to be helpful for anyone.
It used to be the case that Google wanted to index the entire web – access the cotent that was normally difficult for search engines to find – and crowed about the huge size of its index.
But now that index is backfilled with supplmentary junk that very commonly comprises of nothing more than long-dead URLs and 404’s. And his type of content is preferable to new content?
I have to say, the situation does sound more like a server problem and the indexing issue is simply Google’s immediate response to addressing the problem. In which case, I can only hope this is true, and that normality will return, because otherwise you will simply continue to provide less and less relevancy in your results.
2c.
The core of my disappointment is that the Internet is no longer a level playing field. ‘Back in the day’, the Internet provided an unprecedented business opportunity for anyone with a little gumption and willing to put in the time and effort. By way of perserverance and elbow grease, and minimal capital (depending on how much I did myself), I could build a site that could compete with the ‘big boys’.
That’s no longer the case. Developing an online business now is like trying to open a hardware store next door to Home Depot. Site age, backlinks, link age, link churn, degrading purchased and reciprocal links and other filtering factors have more and more of an influence on position, while actual content seems to matter less.
Google isn’t Walmart. Google is the Department of transportation, and all the roads it’s building lead more and more to Walmart and less and less to Mom and Pop’s Tool Emporium.
The Internet started as a democracy, with everyone equal. In almost any eCommerce or Service segement however, it’s evolved into a monarchy, with stores like Amazon, eBay, Walmart, Target etc. ruling as kings while the serfs fight over the scraps and try to eek out a living.
Hi Matt,
I read with interest the bit about URL’s with hyphons and the issue there had been.
I checked my sites and sure enough, those with hyphons seemed to be hit hard with pages removed, one site to one page only.
You suggested there was a quick fix, but as up to now there has been no difference in my sites.
Will there be a difference do you think to the main fix?
Or are you suggesting where my sites are now are my normal stats and this issue has now been completed.
Also is there a time differnce to the UK as to the USA?