Notifying webmasters of penalties
If you don’t want to read the full post, the executive summary is that Google’s Webspam team is working with our Sitemaps team to alert some (but not all) site owners of penalties for their site. In my world (webmasters), this is both a Big Deal and a Good Thing, even though it’s still an experiment. Sign up for Sitemaps to try it out. Oh, and the Sitemaps team introduced a bunch more new and helpful features too. Check out the Sitemaps Blog for more info.
The responsibility of picking “Don’t be evil” as an informal motto is that everybody compares Google against perfection, not to our competitors. That’s mostly a good thing, because it keeps us working hard and thinking how we would tackle each issue in the best possible way. Lately, I’ve been thinking a lot about how the ideal search engine would communicate with webmasters.
There’s a Laurie Anderson song called “The Dream Before” based on a quote by Walter Benjamin. Part of it goes
History is an angel being blown backwards into the future.
History is a pile of debris,
and the angel wants to go back and fix things, to repair things that have been broken.But there is a storm blowing from Paradise, and this storm keeps blowing the angel backwards into the future.
And this storm is called Progress.
In the early days when Google had 200-300 people there was no way we could do everything we wanted to do. But as Google grows, we get more of a chance to “go back and fix things,” to build the ideal search engine. And part of doing that is having more and better communication with webmasters.
I believe the ideal search engine would help site owners debug and diagnose crawl problems, and the Sitemaps team has made great strides with that in Google’s webmaster console. But I think the ideal search engine would also tell legitimate site owners when they risk not doing well in Google.
For example, I recently saw a small pub in England that had hidden text on its page. That could result in the site being removed from Google, because our users get angry when they click on a search result and discover hidden text–even if the hidden text wasn’t what caused the site to be returned in Google’s results. In this case it was a particular shame, because the hidden text was the menu that the pub offered. That’s exactly the sort of text that a user would like to see on the web site; making the text visible would have made the site more useful.
That’s an example of a legitimate site. On the other hand, if the webspam team detects a spammer that is creating dozens or hundreds of sites with doorway pages followed by a sneaky redirect, there’s no reason that we’d want the spammer to realize that we’d caught those pages. So Google clearly shouldn’t contact every site that is penalized–it would tip off spammers that they’d been caught, and then the spammers would start over and try to be sneakier next time.
The way that we’ve been tackling better communication over the last few months is by testing a program where we try to email some penalized sites that we believe are legitimate. The issue is that it can be hard to contact a site by email: some sites don’t give any way to contact them, and some sites don’t receive/read/respond to the emails that we send. Overall, the experiment has been very successful, but email has definite limitations.
The Webspam team and the Sitemaps team have been working together for several months on a new approach: we are now alerting some sites that they have penalties via the webmaster console in Sitemaps. For example, if you verify your site in Sitemaps and then are penalized by the webspam team for hidden text on your pages, we may explicitly confirm a penalty and offer you a reinclusion request specifically for that site.
I’m really happy about this new way to communicate with webmasters, even though it is a test for now. If the initial results are positive, I wouldn’t be surprised to see us gradually broaden this program.
Here’s some questions from a webmaster perspective:
Q: Are you going to show every penalty for a site in the webmaster console?
A: No. Our program to alert webmasters by email has been successful, and this new program is a natural extension of that, but we’re still testing it. We are not confirming every site that is penalized for now, and I don’t expect us to in the future.
Q: I don’t understand why you wouldn’t show every single penalty to every single site owner that asks?
A: Let me give you a couple examples to illustrate why. First, let’s take an example of a site that we would like to confirm a penalty for. Check out this site:

This is a small hotel. They offer 18 bedrooms in Bath, England, for you to rest and relax. It’s a real site for a legitimate business. But notice the hidden text at the bottom of the page where I’ve highlighted in red. This is a perfect example of a site that should be able to find out that their page conflicts with our quality guidelines. Google wants this hotel to know about potential violations of Google’s webmaster quality guidelines on its site.
Now let’s look at an example site that we wouldn’t want to notify if they were penalized:

From this picture alone, you can see that the site is doing
- keyword-stuffing
- deliberately including misspellings
- nonsense or gibberish text, probably auto-generated by a program
- you might be able to guess from the left-hand side and all the variants of “tax deferred” that there are many other pages like this. You’d be right: the site has thousands of doorway pages.
What you can’t tell from the snapshot is that
- the site owner attempted to gather links by programmatically spamming other sites. Specifically, the site owner found a vulnerable software package on the web that doesn’t yet support the nofollow attribute for untrusted links, and then spammed several good sites trying to get links.
- this site is also cloaking. Search engines get the static page loaded with keywords that you see. Users get a completely different page.
- the pages returned to users employ sneaky redirects. Users get a small page with a JavaScript redirect and also a meta refresh; each page just does a redirect to the root page of this domain.
- Given all this, would it surprise you to find out that when a user finally arrives at the root page, every single link that they are offered is a link that the spammer makes money from?
Needless to say, I’d rather not tip off spammers like this when we find their pages.
I hope these two examples give you some idea of the sites that we’d like to alert (and not alert) to issues with their site. Just to repeat: not every site with a penalty will receive confirmation and the offer of a reinclusion request. But if this program works well, we’ll certainly look for ways to keep improving communication with legitimate site owners while not tipping off spammer sites.
Q: Okay, okay, I understand that not everyone will be notified of penalties, and that it’s a test. What will it look like if I do have a spam penalty?
A: In the webmaster console, once you verify a site, click on the tab labeled “Diagnostic” and one of the page sections is called “Indexing summary.” The specific text will say
No pages from your site are currently included in Google’s index due to violations of the webmaster guidelines. Please review our webmaster guidelines and modify your site so that it meets those guidelines. Once your site meets our guidelines, you can request reinclusion and we’ll evaluate your site. [?]
Submit a reinclusion request
If you find the issue and clean it up, then just click on the “Submit a reinclusion request” and fill out the form.
(Someone asked me this at a recent conference, so I’m throwing it in.)
Q: I’m the SEO for a client’s site; can I enroll my client’s site in Sitemaps on their behalf?
A: If you have the ability to upload files to the root directory for the client, then yes. Just log into Sitemaps, add the site, and you’ll get a file to upload to the root level of the domain. Multiple people can verify the same site in Sitemaps, so both client@gmail.com and seo@gmail could sign up and get Sitemaps stats for a domain, for example.
Craig Wilcox Said,
April 26, 2006 @ 11:46 am
Great news. I was just talking last night about how google might have to disclose a little more about how it ranks sites due to the recent lawsuit from the link farmer. (I’ll trust your explanation of why you’re doing this, though, rather than my CYA theory.) Now, if you’d just tell me and others how to get out of the non-existent sandbox that’s been holding my sites hostage for 10 months, I’d be most appreciative!
Chris Harris Said,
April 26, 2006 @ 11:48 am
Great post matt, Very helpful - never had any experience getting my client sites penalized. Will pass on this information to my friends though.
Your post mostly deals with onpage spam. How do you deal offpage spam like comment spam, crosslinking lots of domains etc??// Does sitemaps team recognize and notify about this???
Sina Said,
April 26, 2006 @ 11:53 am
Great stuff…the sitemaps team are really working wonders. I’m looking forward to see what they come up next.
Scott Springer Said,
April 26, 2006 @ 11:53 am
Very informative post Matt. Thank You! Sitemaps have been a very usefull tool for us and i use it everyday. It’s definitly a breath of fresh air to know that our site has no errors!
Keep up the good work,
Scott
Pex Cornel Said,
April 26, 2006 @ 12:05 pm
Hi Matt.
Thanks for the info.
I recently bought a domain and its unusual that after 3 months none of the pages is indexed, not even if I search for the domain name.
so I maild Google with THE QUESTION
Is my site banned?
From what I understand, if in the sitemap there is no info about that, means its not banned?
Thanks.
Pex
Ken Said,
April 26, 2006 @ 12:20 pm
C’mon Matt, it’s not hidden text, it’s tinytext!
Splasho Said,
April 26, 2006 @ 12:21 pm
Its great to see Google doing so much to combat this and also that it is being discriminating and enforcing rules based on the level of intent.
But I have to say the new SiteMaps design is UGLY. It’d be nice if they made the sludge colour more like GMail’s blue.
Tony Ruscoe Said,
April 26, 2006 @ 12:30 pm
Good work. However, since this has been introduced, I’m now seeing this error for some of my sites under Potential indexing problems:
“We can’t currently access your home page because of a robots.txt restriction.”
When using the “robots.txt analysis” tool, I click the “Check” button and the result says:
“Allowed by line 2: Disallow: Detected as a directory; specific files may have different restrictions”
(I’m pretty sure that it only used to say which line disallowed URLs to be checked…)
And here’s my robots.txt file which hasn’t been a problem in the past:
# BOF #
User-agent: *
Disallow:
# EOF #
Is this a known issue? I assume it’s just the tool that’s broken rather than Google being unable to read robots.txt files incorrectly…
(BTW, my other sites’ robots.txt files that actually contain Disallow: entries don’t seem to have a problem.)
Adam Said,
April 26, 2006 @ 12:32 pm
Wow, that’s great to know Matt. I use sitemaps and thankfully (cross my fingers) have never had a site banned. Now at least I know where to go and what to do if it happens.
Mr. Football Said,
April 26, 2006 @ 12:47 pm
Hey Matt,
Questions regarding hidden text and keyword stuffing.
I came across a fantasy football site when I accidentally misspelled a term in a search engine. The landing page was this:
http://www.ffspiral.com/typos.php
Obviously, there’s plenty of keywords there that lures in the visitors, but nothing is hidden and the site admits that these terms are all typos. Is something like this okay? If so, I’m adding it to my site!
I’d be interested in hearing your thoughts - thanks.
-Mr. Football
Ron Said,
April 26, 2006 @ 12:52 pm
So often it seems (or feels) like optimizing for Google is a “webmaster vs. Google” battle. I’m excited to see Google reaching out to webmasters with tools that help us achieve a common goal. You guys are doing a great job. I’ve really gotten a lot of useful information out of certain Google tools (Analytics, for example) and I’m looking forward seeing how Sitemaps will help.
Matt Cutts Said,
April 26, 2006 @ 1:15 pm
Chris Harris, most of the stuff that we’ll be notifying people about is on-page stuff for now–things like hidden text and hidden links. Those are two of the most common mistakes that legit sites make.
Pex Cornel, if you don’t see a message in Sitemaps, it doesn’t mean that there’s not a penalty. Right now, we’re not telling every site that has a penalty about it. If you bought the site 3 months ago and still aren’t seeing any pages in Google, it’s possible that it does have a penalty. I’d read up more here about how to do a normal reinclusion request.
Ken, it’s some pretty tiny text.
Splaso, I’ll pass on the feedback about the colors.
Tony Ruscoe, I’ll ask someone about that. I know the Sitemaps folks read over here, too.
Nick Said,
April 26, 2006 @ 1:18 pm
I have the same problem with the robots.txt file. It allows all bots but I get an error.
Joseph Hunkins Said,
April 26, 2006 @ 1:21 pm
>> the ideal search engine would also tell legitimate site owners when they risk not doing well in Google.
YES it would. Another very helptul post Matt, and I think the sitemaps process is a good idea, though I fear most people who have problems (I’d guess 95% of your ranking complaints) fall into the “subtle algorithm issues” that are still addressed vaguely or not at all.
Some transparency is a virtue, even with the algorithmics. I’d guess that there is MORE spam from secretiveness because it creates an elite group of successful algo chasers rather than putting everybody on the same page, competing only on content quality issues.
nsusa Said,
April 26, 2006 @ 1:22 pm
[quote]What you can’t tell from the snapshot is that
- Specifically, [b]the site owner found a vulnerable software package on the web that doesn’t yet support the nofollow attribute for untrusted links,[/b] [/quote]
Hm, does that mean the directory software I installed a long time ago (long before nofollow existed) is causing my website to raise a flag and therefore is not showing up in any serps?
My site in question offers tons of information [b]+[/b] a local business directory (yes, the software is a little old, but it cannot easily be upgraded). If such a directory would meet the criteria listed above by you - would that explain the whole domain with all the other content is punsihed? If that is the case - let me know and I remove the directory immediately.
Can you elaborate? Thanks
Remi Said,
April 26, 2006 @ 1:29 pm
Hi Matt,
Good to here you guys are giving webmasters more and more information!
I got a question though; I hope it’s not off-topic. Since March 14th (BigDaddy ?) one of my sites has no descriptions left in the SERPs, you only see the title and uri. But it’s still ranking the same. We never use Black-hat stuff, so it can’t be a penalty.
What is happening here and what should I do?
Greg Said,
April 26, 2006 @ 1:39 pm
Matt,
Is there some quirk that cause a site to completly disappear from a google search and then return to it’s place a few moments/hours later? I just freaked when searching for my site and finding it gone from google. I paniced, sent a reinclusion request and 3 minutes later, my site is back in all the right places.
PS, thanks for this informative blog. I’m not a seo expert, just a Realtor with a web dominance dream.
Ryan Said,
April 26, 2006 @ 1:53 pm
nsusa, what matt was referring to there is a forum software or blog that posts user comments without the nofollow tag.
Thus, they created a bot to automatically comment on posts in the forum or blog, and put links to their site in all the comments.
Because of such bots, I’ve actually shut down some of my older sites. I had one bot alone fill up 10meg of my mysql database / day. It would post 1 comment on every post… every day.
Your outdated software isn’t what’s throwing up the flag.. it’s people who take advantage of old software and use it to post a ton of links to their site.
Michel Leblanc Said,
April 26, 2006 @ 1:55 pm
Thanks for the advice and for the graphic explanations that comes with it. I will be doing a presentation on responsible SEO soon and your examples will speak loudly.
Regards,
Thomas Said,
April 26, 2006 @ 1:58 pm
Matt, can you please clarify your statement about Doorway Pages?
You said: dozens or hundreds of sites with doorway pages followed by a sneaky redirect
Google Guidelines say: Avoid “doorway” pages created just for search engines, or other “cookie cutter” approaches such as affiliate programs with little or no original content.
I am wondering if Google recognizes any legitimate use for doorway pages?
My own example is that we sell widgets across the United States. We know that state and major city names followed by the word widgets and a couple widget related terms are effective key phrases and that the people who search using those key phrases will be interested in our products. We would like to create simple landing pages on our website to help these potential customers more easily find us (no cloaking or redirects or stealth), but we have not done so because we are afraid of incurring a penalty.
T2DMan Said,
April 26, 2006 @ 2:05 pm
- Clients site (in url) not even showing up for business name.
- Site several years old.
- No penalty showing on sitemap.
- Ranking well on other search engines.
- Properly cached pages.
- 1 1/2 years ago it was hit by a 302 link, and none of its pages have shown on searches at Google since.
- at the time of the 302 link, index page disappeared, and other pages progressively disappeared/decached. Got 302 page stopped, and after a year, pages started to be recached
- many site reinclusion requests and nothing changed
There must be a number of sites like this. Its very frustrating, especially when the SEO is good enough to have pages ranking well on the other search engines.
What can be done about such situations???
phil Said,
April 26, 2006 @ 2:14 pm
Wow,
this is the coolest - doesn’t seem to be working yet (for me at least)….
thanks matt, thanks google
Craig Wilcox Said,
April 26, 2006 @ 2:18 pm
Speaking of disclosure, we should all go to the New Yorker caption contest at: http://www.cartoonbank.com/captioncontest/. Vote on #46.
This shows Matt that we want more disclosure from Google, but not TOO much. It’s good for a laugh at least.
Craig Wilcox Said,
April 26, 2006 @ 2:20 pm
Sorry, it’s http://www.newyorker.com/captioncontest/
Matt Cutts Said,
April 26, 2006 @ 2:34 pm
BTW, Barry Schwartz was live-blogging the Meet the Crawlers session at SES Toronto where Shiva first introduced this to an audience:
You can see the hidden text at the bottom of the constitution.org home page, including the words “constitutional compliance.” Who would SEO for a phrase like that?
Matt Cutts Said,
April 26, 2006 @ 2:43 pm
Michel Leblanc, happy I could give examples. Lots of time, people don’t realize just how spammy some sites can get.
T2DMan, was your site in anything like an digital automated link exchange network? Maybe in 2004?
nsusa, what Ryan said. BTW, Ryan, how did the interview go?
Greg, I’m guessing you were just hitting different data centers. Different centers can rank things differently.
Huvet Said,
April 26, 2006 @ 2:44 pm
Cooperation between different parts of Google is really a good thing. Things like this is the result of this. I’m hoping for you working together with the Analytics team next, you are overlaping alot.
A question for the sitemaps team: I have been using Sitemaps for a long time (over 2 months I think) and I’m fully indexed and get crawled quite often. But the “Crawl stats” and “Page analysis” are almost empty of information. Is this some known bug or should I just continue waiting?
Oxford Said,
April 26, 2006 @ 2:45 pm
I agree with Splasho, the new look is ugly, but progress isnt always pretty.
RE: The Tax deff3rd,
“..every single link that they are offered is a link that the spammer makes money from..”
Seems more and more of these crappy scrapped sites are the results that turn up, glad to see it’s being combatted!
Stuey Said,
April 26, 2006 @ 2:54 pm
Matt,
Does this also include the CSS display:none tag as well?
Stuey
Ralf Said,
April 26, 2006 @ 3:22 pm
Not really an issue but a little annoying.
Searching in the help pages I couldn’t find really good information about this statement:
I wonder is this because of exclusion of a directory by robots.txt, while the excluded directory shows up as URI only on a site:domain search? All other pages have description and titles.
Kevin Said,
April 26, 2006 @ 3:38 pm
Ralf - One of my sites shows the same, but I was unable to find what pages it refers to when I did a site:domain search (all pages show full descriptions and cached copies).
I’m also getting an error saying that my sitemap unable to be downloaded because it is “restricted by robots.txt”, but I can pull it up with no problem, and the Sitemaps tools shows no errors.
Matt, perhaps this is a bug?
Vanessa Fox Said,
April 26, 2006 @ 3:48 pm
Just wanted to quickly post here to mention that we’ve fixed the robots.txt issue and are in the process of refreshing the display of information. So if Sitemaps is incorrectly reporting that robots.txt is blocking your site, you should see an updated status shortly. The Sitemaps blog has more information:
http://sitemaps.blogspot.com/2006/04/updated-robotstxt-status.html
Thanks for your patience as we update the display!
Ryan Said,
April 26, 2006 @ 4:13 pm
Matt, I don’t think the phone call went too well..
he told me that my work experience was too light for what he was looking for (and i have to agree.. i graduated in Dec 2004 with my BS), but he sent me a worksheet anyway. He was concerned with my lack of python (1 semester in school), and my lax skills at user interface design. (I can critique the heck out of a UI and tell you what stinks and why it should be better… but graphically I can’t create…)
He reccomended I apply for a different opening. Sadly, I wasn’t even sure what position it was before the phone call, as I had sent in a resume and it got forwarded to him.
I filled it out and emailed it back, and havent’ heard anything from him since.
I know I screwed up the last question on it though…. it only worked in IE not firefox, and I was running out of time before it was due…
I guess it’s time to start monitoring the Google jobs website and keep sending in those resumes.
I have my BS in CS, but i’m going back next fall for my MBA (and possibly a MS in Industrial systems engineering too.. cuz it’s only 6 more classes)
So maybe that will help. (but if your team has any openings, I can fight spam with the best of em!)
Thanks..
Ralf Said,
April 26, 2006 @ 4:15 pm
hi kevin
not a bug but, but a matter of time, I’m sure this will resolve itself, but it shows what may happen if too much time is involved in executing a complex program.
Kevin Said,
April 26, 2006 @ 4:28 pm
Ralf, I agree… I suppose we just have to be patient (arrrgh!).
Vanessa - thank you for the update!
Ryan Said,
April 26, 2006 @ 4:37 pm
Yeah Ralf, I’ve been wondering about that myself.. Trying to think what could cause that..
I’ve noticed a trend with my partially indexed pages on noslang. doing site:www.noslang.com (and including the excluded results) shows no description for the “partially included” results.
I’ve also noticed that most of these pages don’t have many (if any) incoming links from other sites. That’s really the only constant I’ve seen on my site.
It makes sense (sorta), as it’s not entirely natural for people to only link to your front page. However, I can’t see Google rationally penalizing them for that, as in many cases the site owners discourage deep linking to these pages. (it bypasses ads)
Even more confusing.. these pages show up with no cache or description using the site: operator, but searching for a string of text unique to a page, google shows the page with cache and description.
I’m guessing that’s what partially indexed means? I’m also guessing that it’s keeping these pages out of results for broad queries..
But again.. this is only speculation.
Glenn Ford Said,
April 26, 2006 @ 4:56 pm
Hi Matt,
I think a better way is to not require people to enter sitemaps initially. There should be a standard query in google or maybe even *part* of the existing “site:” command that gives the user notice as to whether they should take a next step because there are some penalties. The next step could then be to join sitemaps. Not all of us want to do sitemaps for all sites and lots of our clients who we develop sites for don’t want to a) pay us or b) care about sitemaps THEN a 2nd gen owner/company comes in to clean up the mess and is now screwed and probably has no clue.
A simple query at google for a given site should provide enough information as to appropiate next actions. Such as contacting google or joining sitemaps for that particular site.
Regardless, I think this is a good step but lacks the simplicity that is required for the millions of webmasters that never see this forum or know any better.
Cheers!
Ralf Said,
April 26, 2006 @ 4:58 pm
Ryan, confirmed
First, I thought partially indexed may be 50% or what ever.
Sometimes google’s explanations are very hard to understand, may be folks at google have to much IQ, they can not image people like us with less brain.
Chris Bartow Said,
April 26, 2006 @ 5:14 pm
I’m confused by how it can say that my sites are partially indexed. How does it know?
It must know that pages exist on the site that haven’t been indexed, so why doesn’t it just crawl them?
Michael Said,
April 26, 2006 @ 5:18 pm
Matt, kudos to Google for the notification work!
I’d echo Thomas’ comment regarding near-duplicate pages. In my industry, I need a way to deal with synonyms without being an evil naughty spammer.
Specifically, I have pages about “wedding registries”, which to a human are exactly = “bridal registries”. To make it tougher, if I have a page that talks all about our WEDDING REGISTRY (singular would be the natural form to use on such a page), I have a hard time getting it to rank well for WEDDING REGISTRIES (the plural is the more common search!).
Now, I can gen 4 pages for the crawlers, subbing in a variable throughout the text, title, and meta tags:
{wedding, bridal} X {registry, registries}
But that’s gonna make Matt very angry [if we get caught!], and we don’t like making Matt angry
I realize that solving the singular/plural problem is tough, and the synonym problem is an order of magnitude tougher for you folks, and will necessarily take some time to solve. But realistically, do you have advice for us in the meantime?
Matt_Not_Cutts Said,
April 26, 2006 @ 5:21 pm
Hey Matt, I was just fiddling around in the sitemaps section of Google and noticed that one of the “tools” was a ‘report spam in our index’ link, which goes to a different form and posts to a different location than the standard ‘report a spam result’ page. Is this because you think people who are comfortable enough with showing Google every nook and cranny of their website aren’t likely to be spamming, and are therefore a more reliable source of anti-spam information? Are spam reports received through the sitemaps given more weight/credit/urgency?
Ian Said,
April 26, 2006 @ 5:21 pm
Partially indexed means they have seen a link to the page from other sites, and include the URL in the SERPs but have not yet indexed the content of the page(s). There are times when they will decline to index that content, because of something that has happned in the past, or other “bad” indicators about the site.
**A simple query at google**
Have you tried:
site:domain.com
site:domain.com -inur:www
site:www.domain.com
yet?
Those three searches can tell you a massive amount about how a site is indexed, show you problems with duplicate content, and show several other things.
Ian Said,
April 26, 2006 @ 5:23 pm
Partially indexed means they have seen a link to the page from other sites, and include the URL in the SERPs but have not yet indexed the content of the page(s). There are times when they will decline to index that content, because of something that has happned in the past, or other “bad” indicators about the site.
**A simple query at google**
Have you tried:
site:domain.com
site:domain.com -inurl:www
site:www.domain.com
yet?
Those three searches can tell you a massive amount about how a site is indexed, show you problems with duplicate content, and show several other things.
Thomas Said,
April 26, 2006 @ 5:28 pm
Hey Ian, would that be the famous SEO/SEM Ian from Portent?
Ryan Said,
April 26, 2006 @ 5:31 pm
Michael, if you want to rank for “wedding registry” or “bridal registry” or whatever… I might suggest a complete redesign of your site.
I know this isn’t what you want to hear, and i’m not trying to be negative, but to a human visitor it appears as if your site is more about wedding travel than wedding registries. In fact, I couldn’t even find any combination of those words right next to each other anywhere on the site, and I didn’t find them at all without the toolbar’s highlight function.
If you approach it with “i want to build the most useful site for term X” instead of “i want to rank for term X” you’ll do a lot better.
Side Note: Some of us regulars should start an seo critique site: whydoesntmysiterank.com or something… and offer a free critique every week or so.. I bet it would be pretty successful… and the extra PR (not pagerank), would be pretty good.
Adam? Harith? Aaron? What say you?
Chris Smith Said,
April 26, 2006 @ 5:39 pm
Matt, you accidentally left the h t t p : / / off of the Google Sitemaps link at the top of your blog — you might want to fix that.
Great news — I’m really impressed by the features you guys are adding onto the Sitemaps service! I hope this trial works out so that you’ll continue to increase transparency (where possible) for webmasters and SEOs.
Muchos gracias!
Chuck Ayoub Said,
April 26, 2006 @ 6:11 pm
Hi Matt,
Has Google ever thought of publishing a list of domains that have been banned in their index? I made the mistake of picking a domain that was previosly owned. After having my code there not get picked up into the index I sent a note to Google as to why it was not indexed - and then I got the form letter back saying it was banned. I had to move to a different domain to get indexed. I thought of doing a new site which would be a repository of domains that were banned which might make it very helpful for people not to make mistakes of picking expired domains which were bad - but it wouldn’t be as current as Google doing this themselves.
Or maybe Google could re-evaluate old domains to see if the issues on them have been cleared up by new owners? Just a thought.
Chuck
Jonathan Said,
April 26, 2006 @ 6:21 pm
That’s awesome Matt. I’ve been waiting for something like this for quite sometime now. I have a site specifically that had over 15,000 indexed and then suddenly it dropped down to 850 indexed, no idea why. I hope this penalty notification will be out soon. I think it’s a great idea
Pythod Said,
April 26, 2006 @ 6:40 pm
The site that Mr Cutts took as an example here is http://www.villamagdala.co.uk/
Matt Cutts Said,
April 26, 2006 @ 6:50 pm
Stuey, if display:none is used to hide text, that can cause issues.
Ralf, I see that message too. It just means that there can be some uncrawled urls for a site, which in general is not a big deal.
Vanessa, thanks for the update! Now that’s working together.
Ryan, are you still in Michigan? Or are you expecting to be in the Bay Area?
Glenn Ford, fair feedback. We wanted a way to communicate penalties to a webmaster without showing it to the world, but I see where you’re coming from.
Michael, we actually do a pretty good job on plurals/singular and synonyms. My advice would be not to make a page for the cross product of all those words, but to make one essay which (naturally) incorporated all of those terms. People fixate on a single page for a single keyword phrase, when one nice essay page could do a good job on bridal+wedding+registry+registries. The title might be a bit awkward: “Bridal Registries: Is each wedding registry created equal?” but doable, and then there’s plenty of room on the page to include each of those words in natural text.
Matt_Not_Cutts, remember that you sign in to Sitemaps, so we have a little more information that just a web form. I wouldn’t be surprised if spam reports via Sitemaps could be given more weight. Give us a few weeks to hook those reports into our system though.
Chris Smith, that’s the idea. I’m pretty psyched too. I’ve been wanting to get this out for a while now, and thanks to the Sitemaps folks (with a little bit of assistance from webspam), I love that it’s happening. Thanks for mentioning the h t t p, by the way.
Rick, I’m going to go ahead and prune that comment. I’d save it for a grab bag thread. No sigs, either please.
Tom Davis Said,
April 26, 2006 @ 6:52 pm
Indexing summary:
No pages from your site are currently included in Google’s index. Indexing can take time. You may find it helpful to review our information for webmasters and webmaster guidelines. [?]
Googlebot last successfully accessed your home page on Dec 1 .
am…… so ……… very………….. tired
Ryan Said,
April 26, 2006 @ 7:14 pm
Matt, I’m still in Michigan, but have enough vaction time, and would love to see the Bay area.. I think i could get free airfare too.
I’ll be in vegas in july.. that’s closer (lol)
If there’s a good reason for me to be somewhere, I can get there!
Why What’s up?
Adam Senour Said,
April 26, 2006 @ 7:31 pm
On the one hand, it would be a cool feature for Matt’s site.
On the other, it’s already being done as part of a more comprehensive review over at HEDir to a certain extent.
So I’m not really sure how I feel about it. I’ll defer to the opinion of the others and Matt. If they’re down, I’m down.
Dan Said,
April 26, 2006 @ 7:43 pm
Matt,
I’m interested to know why the following happens.
If I type http://www.google.com/webmasters into my browser’s address bar, I get the “Google Information for Webmasters” page. However, if I search google for “sitemaps” and click on the sponsored link that is returned (which has a url claiming to be http://www.google.com/webmasters), my cookie is used to take me to my Google Sitemaps homepage.
Isn’t this in itself contrary to google’s guidelines?
Dan.
Dave Said,
April 26, 2006 @ 7:52 pm
RE: “Needless to say, I’d rather not tip off spammers like this when we find their pages.”
I agree Matt. However, the site might also be victim of the spam (just like Google are) due to a shonky SEO “pro”. In these cases, I would love to see Google somehow get the name of the ’so called’ SEO from them and address the problem at the root? This way, you could kill the spam at the source rather than simply trimming the branches!
I know that Webmasters are held responsible in these cases but I really believe that is the wrong approach. Besides, a shonky car mechanic WOULD be accountable by law if they did unprofessional work which caused a car accident.
arcade boards Said,
April 26, 2006 @ 8:50 pm
oooo I cant wait for the day an engine tells a webmaster where they are badly going wrong… with so much BS online about this and that you kinda end up hating search engines and forget about using them all together. Im suprised there isnt a better DANGER list for the google engine. at least it would help folks setup their site better. maybe a google bot helper tells you off for getting certain things wrong…. might be worth while hehe
Dynamic urls
keyword stuffing
Over anchor text pages
Exsesive Keyword link and prodominace.
Threhold limits per content format
NAv placement
Table structures
Priority of tags
Anchor variables
Landing page inbound and foundation keyword matching
The list goes on
etc etc
dam maths
blah blah If only there was a more detailed rule that all webmasters should know about instead of googles plain jane approach.. it would mean many webmasters would get their crap together. One of my sites was dropped yet still doing well in yahoo and msn…. funny thing is the pages are generated yet very accurate to the keywords and still provide info and the right products to the end user ….yet It was dropped :0(
help me obi wan your my only hope
Win a prize and realise the mistake I`ve made with this site (ok maybe not a prize) but at least some respect for noticing
http://www.products-directory.co.uk/
alek Said,
April 26, 2006 @ 9:07 pm
Matt/Vanessa,
1. Does it ever make sense to do a proactive re-inclusion request to signal to Google that we play by the rules and we are a real person?
2. I have a site that Sitemaps says “Googlebot last successfully accessed your home page on Apr 18″ but according to the Apache log data, he has hit that page dozens of times since. And yes, I did a reverse-IP lookup to grep out the legit GoogleBot versus the User-Agent spoofers.
3. Minor nit - bottom of Sitemaps Blog has copyright 2005.
4. Consider allowing comments on Sitemaps blog rather than pushing comments into Google Groups. Later is good for general discussion, but be nice to have relevant comments attached to the posts.
Took a little getting used to the new Sitemaps interface, but I like it. Been fun watching Sitemaps develop - nice work Vanessa (saw you chime in above) and others Google’ers.
alek
arcade boards Said,
April 26, 2006 @ 9:32 pm
Google requests are a waste of space…..
Tried so many times and never got a reply not that I dont blame them…. I mean there must be what 25 million webmasters…. say they get 25000 emails a day…. geez I hate to think of how much of a headache that must be lol
Anyways thats my 2 cents and email requests can hurt
Thomas Said,
April 26, 2006 @ 9:57 pm
I normally would not do this here and Matt, please feel free to delete this post if I have crossed the line.
Since Ryan said: Side Note: Some of us regulars should start an seo critique site: whydoesntmysiterank.com or something… and offer a free critique every week or so.. I bet it would be pretty successful… and the extra PR (not pagerank), would be pretty good.
I invite everyone to visit
http://www.SEOcritique.com/forums
It’s not fancy or anything cause I just bought the domain and set it up.
Pex Cornel Said,
April 26, 2006 @ 11:15 pm
Matt, i have a question about “display:none”
I have a bug in my site, i use some background images to make it pretty.
The thing is that those images are downloaded last, so my page looks awfull until the browser puts those images in place.
So I have to preload those images first, and i do it with this line:
So it’s no text, just images with display:none. Images that would show up anyway, not really hidden.
So by my logic, no ground for a penalty here.
Am i right?
Thanks.
Pex Cornel Said,
April 26, 2006 @ 11:17 pm
Somehow my line of code was filtered, sorry, here is the line again:
“img src=”header_mainpage_1.jpg” style=”display:none;” alt=”"/”
General Public Said,
April 26, 2006 @ 11:52 pm
Hi Matt,
You are simply admitting that google search algorithms are broke! The google’s situation has deteriorated to the situation AltaVista was in when google made entry, it is as simple as that. Google stock is in the honeymoon phase else GOOG price would have fallen by few points overnight!
At AltaVista’s peak in 1998, there were only spammy commercial sites that were trying to gain an edge over competitors, using keyword stuffing in the Meta tags, keyword stuffing in the content was pretty low at that time. Non commercial content sites had pretty good standard, people were publishing content to express themselves. Successful Banner advertisement sites were popular sites generating traffic from “bookmark” repeat visitors this has not changed much even today, regular net user visits about 5-6 sites daily that’s it.
The introduction of Adsense suddenly changed the scenario, today most nonsense crap information is found on sites running Adsense, only purpose of those crap sites is to run Adsense and nothing else and google is squarely responsible for the debacle.
If google want to improve the quality of content on the net, google should review sites running Adsense and ban crap sites. If scrapper / crap content sites are banned by Adsense program most of those crap sites will disappear, and improve the quality of content available on the internet!
Proves my point once again “no evil = no profits” “big evil = big profits”!
Charles Tran Said,
April 27, 2006 @ 12:31 am
Hi Matt,
I’m a human bot and I’m notifying you of a potential problem.
Potential indexing problems: One of the pages I tried to crawl returned a HTTP error. In particular, while attempting to follow your link to “Sign up for Sitemaps”, a http error code 404 was returned (it’s just missing the http://).
You may want to take a closer look at: the hyperlink.
Feel free to file a reinclusion request with this human bot with the subject line “convoluted requirements april 2006 update secret password abc exactly as is or the mail rule will send you to a blackhole”.
Charles
Jeff Preston Said,
April 27, 2006 @ 1:02 am
Thank you for the article explaining Sitemaps further. At first, I thought is was not necessary to submit because I thought my site was built in a spider friendly manner. But after I submitted to Sitemaps, I found out a lot more about my site and was able to track down some bottlenecks and broken links through the HTTP errors page. The Diagnostic area has some great tools.
IMO, Google would generate A LOT of goodwill by notifing webmasters of potential penalties.
Thank you.
Aimee Said,
April 27, 2006 @ 1:07 am
The IP addres for blogspot in China doesn’t allow to vist.
So I can’t see the complete post about this penalties.
Eugenius Said,
April 27, 2006 @ 1:31 am
I definately will try this .
I am working now with a website that has in Google 2 versions of homepage (with and without http://www. ) , the problem is that non-www version is indexed 5 july 2005 , I redirected non-www to www but not sure if Googlebot will ever see redirect as Google didn’t reindexed this page almost a year .
About this kind of problems system will notify us ?
Jan Said,
April 27, 2006 @ 1:33 am
>2. I have a site that Sitemaps says “Googlebot last successfully accessed your
>home page on Apr 18″ but according to the Apache log data, he has hit that
>page dozens of times since.
Same here. The bot visited us many times since Apr 18. We also do a reverse lookup on every GoogleBot visit so we know it’s the real one. I believe the timestamp is the date when your title was last updated in the index and not when a particular site was last visited. We recently experimented with the title-tag on our main page (http://www.mysite.tld/default.asp) and it was around or at Apr 18 when the new site main title was taken into the index. Not the site description though.
Oh and a side note: as there are more and more problems with updating DMOZ site-descriptions wouldn’t it be more reasonable to drop the syncing from their site-descriptions? DMOZ has become a bad joke in recent years… we just can’t get rid of that very ugly spelling mistake in our site description. Nobody cares to update it
Dave Roberts Said,
April 27, 2006 @ 1:48 am
Thanks for the Update on Sitemaps Matt.
i gave a presentation upon some Google products to my bosses yesterday, one of which was the sitemaps before it was updated. Its great to see sitemaps and other G products are being updated after release.
Lata Said,
April 27, 2006 @ 4:38 am
It is a nice thing if Google has plans to do this. I am saying this ‘in general’ as well as, of course, for my own gains as well. I see sites being penalized and then taken back in the index fairly soon. My site has been out of Google’s index for more than one year now. It is a lifestyle-related content site and I cannot figure out what went wrong. It could have been the wrong type of redirection but that has been fixed a few months ago and there have been two updates since that.
I have written to Google several times but what seems weird is that I did not even get a single answer trying to help me what the problem actually is. And this for over one year! Let’s hope this new drive on Google’s part brings something nice for me.
Andreas Said,
April 27, 2006 @ 4:40 am
> Stuey, if display:none is used to hide text, that can cause issues.
Sometimes it makes sense not to show the complete text for some parts of a page, but to allow users to toggle/expand the parts they’re interested in. Obviously the extended version of the text has to be hidden at first, usually via display:none. How does Google handle this?
Michel Leblanc Said,
April 27, 2006 @ 6:29 am
Question about duplicate content
1- Let’s say I write a lot of excellent stuff on a blog. My stuff is so interesting that a prominent news portal (with a hign PR) reprint with my permission the content of a post in its integral version, one day later. They do not link to the post itself but they give me credit for the content in the form of my name and business information without links. This is very good for my credibility but does it hurt my blog and be considered duplicate content, even though I am the rightful owner of the rights?
2- Let’s say I wrote several articles at my pervious job, which posted my article in form of Web pages. I do not work there anymore but I keep all my rights to my content. Now I build a new web site and reprint in it my rightful content that already appeared in the previous web site that I have nothing to do with anymore. In a way we both share some rights to that content but I am the rightful author. Will that hurt the new Web site? How to legitimately keep the rights and post that content without being penalized?
Thomas Schulz Said,
April 27, 2006 @ 7:21 am
Great news, I am actually quite happy about this.
I have multiple shareware products, a site mapper among
them, which share parts of the same help file
(e.g. localization — i.e. how users can translate the program).
As I make my help files available on the website,
this has caused me some concern about duplicate content.
If one is “minor” penalized, it is only the page itself
(of which there is a “duplicate”) that gets penalized, right?
Wes Linda Said,
April 27, 2006 @ 7:32 am
Excellent concept from Google, I’m sure they’ll continue to improve these features as we move along in time. While some people feel Search is an old industry, it’s still quite new, and it’s nice to see innovation is still coming through.
I think this will be a great resource for legitimate sites that may not be aware of their actions, or, aren’t aware their actions are in poor taste.
As a web guy, I think my clients will love to see this, although, from what I’m aware, none of my clients are using techniques that should get them banned. I think I’m better than that. I hope.
Another great bit of info Matt. Thanks.
Martin Said,
April 27, 2006 @ 1:21 pm
“Ralf, I see that message too. It just means that there can be some uncrawled urls for a site, which in general is not a big deal.”
Hey Matt,
when you say it´s not a big deal, does it mean that 192 out of 5800 pages are indexed and the other are uncrawled? I would say it´s a very big deal. Or can we look forward to have this sides crawled and included back into main index?
Maybe you can pass this to Vanessa ( sitemaps team ). There is an error on that page
)
http://www.google.com/support/webmasters/bin/answer.py?answer=34480&hl=de
The link does not show up. (looks like hidden link
THanx Martin
Julie Said,
April 27, 2006 @ 1:47 pm
Hey Matt - that is really cool, and thanks for the information! KUDOS to the Sitemap team for helping us out
Trogdor Said,
April 27, 2006 @ 1:51 pm
I just couldn’t let this line go un-responded-to:
“even though it is a test for now”
um … as opposed to other items at Google that appear to be in permanent beta?
Sorry, couldn’t resist. I’m excited about the communication, though. I hope G keeps this program, and never stops making helpful use of it.
Ryan Said,
April 27, 2006 @ 2:12 pm
hey, permanent beta is great for a developer.. that means anytime somebody finds a bug you can say “Duh, it’s beta!” or anytime they reccomend a cool feature you didn’t think of you can say “yeah taht’s coming in the next version”
also.. you can quickly take it away too.. and say “it was beta…”
although, that would be lazy and irresponsible.
Michael Said,
April 27, 2006 @ 5:58 pm
Ryan, thanks for taking the time to look at my site! Actually, we rank pretty nicely for the registry-related phrases (we’re #1 for [honeymoon registries]…but it made for an easily understood example. Travel terms are what we’re really focusing on at present, which is why you see those all over.
Matt, we do have pages within the site specifically mixing those words, as you’ve suggested. In fact, we used to (gasp) cloak our home page and add a huge section of text and headings with all those combinations in it (but we’re better behaved now).
But those pages on our site which are “registry”-dense don’t end up being shown in the search results…interesting! My guess is that anchor text in IBLs is vastly outweighing the page content itself (we do have a lot of IBLs from partner companies with “registry” in the anchor text). Especially since our home page is what shows up in these SERPs and our home page is what our partners are generally linking to.
MC
Doug Heil Said,
April 27, 2006 @ 7:09 pm
hmm. I saw the member’s post about the “site critique” new forum. Is this something new to search engine forums? My understanding is that there are already many se forums out there who already offer “free site critiques”.
Am I mistaken about this?
This is real good stuff Matt. Just toe the line about “which” sites you actually notify. Some of them might be being helped by a SEO anyway.
A good suggestion by someone above; Website owners have to be responsible for who they hire to help them, but identifying the actual firm/SEO who may be spamming on their behalf would go a very long way to cleaning up our industry.
Sebastian Said,
April 27, 2006 @ 7:36 pm
Nice one!
Google listened to my idea and turns sitemap into “the”communication channel with webmasters.
I posted that idea here in a comment. if only i could find it….
Maybe i should try to get a job @ google

Michael VanDeMar Said,
April 27, 2006 @ 9:53 pm
Matt,
Thanks for updating us on the sitemaps/spam colaboration, I think it’s great! Quick question though… I have a site that’s #1 in allintext, allinanchor, allintitle for the business name, but doesn’t show up at all when searching for it. The site is about 15 months old, and sites that scrape content from other search engines for queries that my site do show up on, and therefore have the title of my homepage in the text of their site, show up frequently. When I wrote Google help asking about it, they replied that since the site showed up when they did a search on the domain, that I could “be assured that your site isn’t penalized or banned from our search results”.
Both that reply and what you said earlier seemed to imply that penalized = banned.
1) Is banning the only penalty?
2) If not, would penalties be something that you would notify a webmaster about?
Thanks!
-Michael
hara_kiri_diy_seo Said,
April 28, 2006 @ 1:10 am
well done google
i remember the panic stricken words of one webmaster who wants, like me, to be pro-google anti spamdex
“I’d happily fix the problem if only i know what it was”
on a moral level, leaving people in the dark breeds fear of unjust punishment ….& risks engendering a user migration from google.
on a practical level, google has a very difficult job balancing it all up
on real terms, there’s also the reality that rank drops occur because of bad diy seo & not anything to do with penalties….. getting to read sitemaps well + google analytics can help here
so pleased about this
any more thoughts about my previous proposal of an up-to-date keyword rank checker on sitemaps?
best
10080:BTG174 Said,
April 28, 2006 @ 1:16 am
best post this year Matt
The partialy indexed is strange I have an SEO client (an entertainment booking agency) that post big daddy has lost some of the key pages in the index google is indexing the contatact us form but keyword.php (where keyword is on the domain)
Has just dissapeard and the client is losing revenue and has had to lay staff off.
Ive tried renameing with a more specific keyword and redirecing the old page.
Given x pages how do i hint to google that one page is more imporant than another ill try seting all pages except the kwy ones to 0.1 in the sitemap and see if that works but some guidance would be great.
Pex Cornel Said,
April 28, 2006 @ 1:30 am
See what happens Matt, with all these changes…
I thinks it’s really sad.
Businesses rely on Google. We know “it’s free”. But if there were no Google, would be another search engine.
All these changes puts everyone in “changing mode”.
Not knowing sufficient information, the most changes made by the SEO’s are many times wrong.
I don’t want to criticise, Google does wonderful things, but my opinion is that lately Google has taking the posision of a bully.
Let’s wait and see…
Chatmaster Said,
April 28, 2006 @ 1:42 am
Matt this is an excellent idea that will hopefully assist some seo’s in determining when they are barking up the wrong tree.
I want to repeat the previous question, is there a difference between banned and penalties?
Then I was wondering. Google must be swamped with reinclusion requests for websites that has no penalties, but simply aren’t good enough.
Paul Salber Said,
April 28, 2006 @ 4:09 am
Good move Matt. This will save hundreds of man-hours and thousands of dollars of lost revenue for legitimate web sites.
The first compelling reason to use google sitemaps.
Paul
Jonny C Said,
April 28, 2006 @ 4:36 am
Hi Matt and all,
We hear about hidden text often refered to as css hiden text, but what about the use of the noscript tag. We use this on the home page and sitemap page in case anyone wants to use the site, or contact us for an order that cannot be placed using our javascript shopping cart. It contains our postal address and phone number and a description of the general theme of the shop. is this noscript tag something that causes penalties? is this a vaild or not…anyone?
cheers
Jonny C
UK
Danny Sullivan Said,
April 28, 2006 @ 5:07 am
Great new features to see, Matt — love what’s coming out of the sitemaps team.
What a Maroon Said,
April 28, 2006 @ 6:58 am
“Website owners have to be responsible for who they hire to help them, but identifying the actual firm/SEO who may be spamming on their behalf would go a very long way to cleaning up our industry. ”
My site was banned. Doug Heil did the SEO.
Still think this is a good idea Doug?
A Cutlett Said,
April 28, 2006 @ 7:52 am
T2DMAN
*Exact* same scenario here, for a company I used to work for. Their site was riding very high - then one day it went bang.
Never seen since regardless of reinclusion requests - all the same symptoms as you say.
I no longer work for the company, but it’s been bugging me forever.
Isaac Z. Schlueter Said,
April 28, 2006 @ 2:13 pm
The link to “Sitemaps” in the first paragraph of this article needs a http:// in the href.
Dave Said,
April 28, 2006 @ 5:12 pm
RE: “My site was banned. Doug Heil did the SEO”
That’s a pretty wild aligation. Care to back it up with some proof or at least a shred of evidence?
Doug Heil Said,
April 28, 2006 @ 6:23 pm
LOL Good one maroon. Care to share who you are, and care to share some proof?
I thought so.
It is good to know I’m so famous. Thanks for the chuckle.
Kathy Said,
April 28, 2006 @ 8:11 pm
Hi Matt,
I’m wondering how Google contacts webmasters about penalities…because my sitemap area doesn’t show anything wrong and I’m still listed in the natural searches. however I have been penalized somehow becuase for some of the keywords where I was naturally #1, I have been removed completely. I have emailed Google to ask for their help and been sent what seems to be a canned answer, not telling me I’ve been penalied but telling me to read the quality guidelines, comply and notify them. I’m confused as I dont’ see anything that is in non-compliance. The email was vague so I’ve assumed I’ve been penalized although I was not told so directly. I discovered that a few domains I had parked on my domain had been spidered as a separate domain. Sysadmin fixed with 301 redirects to make sure we didn’t appear to have “duplicate content on different domains”. Then one webmaster thought the digitalpoint’s coop links on my site could be the problem. I’ve removed that as well. I still don’t know if I figured out what it is I’ve done wrong….or if this is a glitch of big daddy so that I lost over 200,000 indexed pages. My frustration is with Google not telling me that I was penalized (if I am!) …and what it is I did wrong……when I really want to do this right.
Was I penalized for digitalpoint’s coop? Or was I penalized for the spider indexing parked domains?
What a Maroon Said,
April 28, 2006 @ 8:20 pm
Dave and Doug, your responses had me roflmao. Too bad that neither of you two got the point (and the joke), but that happens when you take yourselves so seriously fighting for truth, justice and what you deem to be spam. At least you both helped make my point.
The problem with outing an SEO (doug’s idea that I quoted) is that anyone can make an allegation without a shred of proof, or at the very least make it a “he said, she said” scenario. Framing an SEO would be quite easy.
for the record, no animals were hurt in the typing of this post, no sites of mine are banned, nor is Doug responsible for the SEO of anything that I am associated with.
Dave Said,
April 28, 2006 @ 9:53 pm
RE: “The problem with outing an SEO (doug’s idea that I quoted) is that anyone can make an allegation without a shred of proof, or at the very least make it a “he said, she said” scenario. Framing an SEO would be quite easy.”
While anyone can make allegations, it would require proof (payment confirmation etc) that the spam was indeed done by them. I would have thought those sort of details were obvious………well, to some at least
fathom Said,
April 29, 2006 @ 5:21 am
What like this:
Hide a link in your tracking code in where few know that the is rubbish and you create a PR10 in no time - while the offense is on everyone elses website.
And they blatantly advertise the “HIDDEN PART”.
What a Maroon Said,
April 29, 2006 @ 6:42 am
Dave, you actually think Google wants to play “People’s Court”?
As it is they drop pages for dupe content even though they cant determine which page is the original.
A Cutlett Said,
April 29, 2006 @ 7:57 am
[quote]
T2DMan, was your site in anything like an digital automated link exchange network? Maybe in 2004?
[/quote]
Matt, are you saying we should sign our competitors up for automated links networks? No, of course you’re not. But what I mean is this is starting to scare me a little. If google penalises people for certain types of links, or too many links too quickly then any webmaster can damage another site.
SEO Junkie Said,
April 29, 2006 @ 1:38 pm
Hi Matt,
That’s going to be VERY helpful for those who don’t know why their sites get penalized / delisted from the index.
But, instead of email notifications, you might as well create an API, so we can programmatically find out whether a particular site is penalized / banned or not. Just an idea!
Anyway, this change is awesome as it is, too.
SEOJunkie aka Sufyaaan
JamesNotBond Said,
April 29, 2006 @ 4:34 pm
Hello,
Like A Cutlett noticed - if there is some website I don’t like - I should sign in this site to automated links network. And I should bouth some domains, create come content. When this content will be in google index (about 20k sites similar to the site I don’t like), i just create redirect (javascripts or headers) to domain i don’t like.
Then only I must send spamreport and…, this site will be ban?
Humm, that’s great idea, but some sites of my client in that way was kicked of google index ;/
Somebody do something like above and it’s clear way to ban somebody site ;/
Dave Said,
April 29, 2006 @ 6:58 pm
TOP DOWN NOT BOTTOM UP.
RE: “Dave, you actually think Google wants to play “People’s Court”? ”
Yes, if it means a big drop in SE spam. Let’s say 5 site owners all quote SEO “A” as being the one who bulit all their doorway pages and cloaking. In my mind that WOULD warrant time from a Google employee to do some decective work. If all evidence states they are spamming ban them as they did with TP.
IMO, until Google tackle this problem at the root the weeds (SE spam) will keep coming back.
What a Maroon Said,
April 29, 2006 @ 11:44 pm
Google wont be playing Judge Judy just because you want them to. It is too easy to set someone (even you) up for a fall. Google gets little ROI out of that and it can only backfire.
Darren Cronian Said,
April 30, 2006 @ 3:02 am
Great entry Matt.
I think the problem is that webmasters when they first get involved in promoting their website online don’t understand how to optimise their website, so they assume by putting hidden text into the site will get them ranking higher in the search engines.
It’s only when you start to take the time to read up on how to optimise your site correctly, that you realise hidden text is a definate no no. I have a friend who runs a UK chat room, I said to him include the keywords chat room more into your content on the page.
Two days later I found that he had done that, but had hidden the text rather than write useful content on the page and included the keywords. To say he has now removed it and rewritten the text but it was interesting that his mind told him to hide the text rather than display it!
More needs to be done to educate business owners when they start to delve into the internet - and hosting companies can do something about this by writing articles or information on its site - very few hosting companies provide SEO information, which is the first point of contact for many webmasters.
Brian Turner Said,
April 30, 2006 @ 11:12 am
Psst! Matt, the link to Sitemaps in the original post appears to be mislinking to somewhere nonexistent deep within your blog. Seems you forgot to use http: in front of the URL to Google Sitemaps page, and Wordpress treated it as relative, instead of absolute.
Did no one else notice that??
Also signing up for Sitemaps.
fathom Said,
April 30, 2006 @ 2:22 pm
RE: No pages from your site are currently included in Google’s index due to violations of the webmaster guidelines. Please review our webmaster guidelines and modify your site so that it meets those guidelines. Once your site meets our guidelines, you can request reinclusion and we’ll evaluate your site. [?]
Submit a reinclusion request
If you find the issue and clean it up, then just click on the “Submit a reinclusion request” and fill out the form.
Example: A client recently loaded Javascript tracking code to the website. On review, the no-script area which contains a transparant gif to action a browser load if Javascript is disabled was wrapped with a link element to the tracking script owner’s website.
By general definition this would be a hidden link on every web page.
With GSiteMaps loaded that could be considered spam, potential delisting, and the fact that the general public would intentionally leave a hidden link on every page [because it is suggested a functional part of the tracking script] is a little problematic…
Google’s webmaster quality guidelines only suggest “Avoid hidden text or hidden links” but this particular ‘hidden link” is “tracking” they wouldn’t normally consider - “oh I need to remove tracking because that is the violation”.
Is this an issue?
Would it be worth adding a reference to Google’s webmaster quality guidelines - if it is?
Thomas Schulz Said,
April 30, 2006 @ 3:11 pm
Well, as I am kinda curious,
I will rephrase my original question
If I have multiple similar pages
(see my original comment as to why
– in my case it has nothing to do with singular/plural)
– and Google decides they are similar, what happens:
a) all “similar” pages penalized
b) all “non-first” “similar” pages penalized
c) depends on how “similar” and “site trust”
d) x = random(…), case x of …
?
Personally I am guessing c or d
Jonathan Said,
April 30, 2006 @ 5:49 pm
I don’t see any penalties in my sitemap overview for any of my sites, but it seems that my site keeps getting delisted everyday, moving from 950, to 850, now to 722 results for my domain, when I have clearly over thousands of pages on my domain. Any idea what’s next to why this is happening?
Noah Said,
April 30, 2006 @ 5:55 pm
Matt - I am very interested in the Google Sitemaps Spam feature. Last week we dropped off the map for what seems like every keyword that was of importance to us. We no longer even show up for our company name which we previously had #1 & #2 positions for. Our site has been around for years and we don’t get involved with unethical practices although we recently discovered one of our competitors was hitting our Adwords account pretty hard with click fraud. I also noticed that he took our meta description and keywords from our homepage and is now using it on his homepage. I am curious to know if it is possible that our competitor somehow got us penalized (submitting to a link farm etc). We used to have over 700 pages in Google’s index and in a week’s time that has dropped to under 200.
Any help or advice would be greatly appreciated!!
Dave Said,
April 30, 2006 @ 6:18 pm
RE “Google gets little ROI out of that and it can only backfire”
“little ROI” on outing SE spam sources??? I would say the ROI is HUGE. No problem is ever resolved by treating the symptoms (spammy SEO customers) rather than the disease (spammy SEO companies). I would think Google already has such a tactic in the pipeline.
No need to reply What a Maroon I know your retort already
Dave Said,
April 30, 2006 @ 6:24 pm
To those asking question about violating Google guidelines with no ill intent, I don’t think they can measure “intent”. Something to keep in mind. IMO, it’s never worth the risk.
VikasAmrohi Said,
April 30, 2006 @ 9:33 pm
hi matt nice listening but sounds gibberish
one silly question:P, if all this is about the on page the what does google bot read code of the page or the rendered text in the browser. If code then why cant’t your algo check the repetition and stuffing of words. If algo can filter this then we can get more accurate results.
Cheers
Hamdi Said,
May 1, 2006 @ 2:16 am
Dear Matt,
just wanna say a big “thank you”.
I am a googler since the very beginning and very much satisfied about all developments google has been made.
I am satisfied as well about the communication between google user / webmaster / site owners, it definetely makes google the ultimate search engine.
Al Said,
May 1, 2006 @ 3:57 am
Hi Matt
Sorry - I’ve been out of town in Southern New Zealand contemplating the Antarctic - frozen away from the internet briefly, and missed out on this wonderful development while away.
It looks like a great quality control step to communicate with webmasters committed to compliance with Google , but as I can see from the above posts there are likely to be many legit questions or interpretations based on G’s guidelines left un answered [ I'm not criticising - just observing ]. - I guess this is why it’s a trial.
It kinda worries me [ but please keep it going
] , because of some of the guidleines provide us with borderline interpretations. An emphatic yes/no on certain requests would be appreciated, particularily where other sites are adopting similar principles and are not apparantly penalized, which we can refer to.
The other aspect of this quality control step is that legit webmasters can focus on reporting facts rather than BS. That’s gotta be good for all.
On WMW and somewhere on your blogs several webmasters, including myself, indicated a willingness to pay for quality control feedback. I figured this breathed some further sincerity into the process for more in depth communications with Google. Any further thoughts on this one?
Kathy Said,
May 1, 2006 @ 5:37 am
Matt, I could have written Noah’s entry above. Ditto for my site and no response from Google with the exception of canned emails treating me as though I broke quality guidelines.
My site: command has left me from over 200,000 indexed pages to 700-800 (fluctuates by the moment) with another odd behavior: The title and description of my site is being pulled from DMOZ instead of the meta tags and title area from my pages. What’s up with that?
Am I being penalized for participating in digitalpoint’s link co-op? If not, what’s going on and why can’t Google tell me (us) what is going on so we can fix this?
Dropping in indexed pages is hard enough but being completely left out of search results…affected our traffic 8%. (Thankfully 78% of our traffic does not come from SE, and only 15% came from Google. Now its down to 7% and dropping…)
Adam Senour Said,
May 1, 2006 @ 7:36 am
Hey Noah,
You may be having a technical issue somewhere. It took me three tries to get your site to load from here, and the third try took approx. 45 seconds before i saw anything. Keep in mind that I’m on a cable modem, so it’s not a “I’m sitting and waiting for dialup to stop being dialup” issue.
It could be the design of your site, or it could be your host. I’m not really sure which it is. I’m sensing a hosting issue.
TheInsider Said,
May 1, 2006 @ 8:53 am
To all of you complaining about thousands (or hundreds of thousands) of pages being dropped from the Google index, here’s my take:
A number of bugs we’re introduced with the roll-out of BD. The “missing pages” problem has been there since the beginning, but got lost in the melee that was the “supplemental issue” and the innadequate crawl rate issue. The 1st two problems have been largely addressed, but the most important problem/bug is still very much there.
From my analysis it is clear that Google have introduced some kind of index “pruning” mechanism. The intention of this pruning process is to remove dead links - (URLs) with no links pointing to them - from the index. Such a feature is, of course, long overdue. However, there is clearly a serious bug in this new pruning process that is making it behave far too aggressively. In many cases the pruner is removing 95% of a Website’s pages, when it shouldn’t be removing any.
In my case, for example, any page that I link from the Home page (PR5) goes straight into the index within 24 hours or so. If I then remove the link to that page from the Home page, that page is deleted from the index within a few days.
Any pages linked from deeper pages (PR4, for example) are crawled relentlessly but never appear in the index - because the pruner deletes or blocks them.
You would have thought that someone at Google would have put 2 and 2 together by now and taken a close look at this new pruning mechanism. Seems like an obvious candidate for the cause of the millions of missing pages.
Jim Clouse Said,
May 1, 2006 @ 11:08 am
Matt:
Great to hear that Google is becoming more proactive in notifying legitimate sites about problems. However, this still remains a one-way street . I would like to suggest a way to complete the trip, making it two-way communication.
How about allowing legitimate site owners to petition Google why their site has plummeted in the rankings for an extended period of time. I am not talking about the whiners who complain about everything, nor am I talking about short term deterioration in the SERPS.
I am speaking about site owners who formerly were on the first page of the SERPS for at least six months who have fallen off that pedestal for at least three months. Perhaps something was done outside of their knowledge that resulted in a penalty.
That way the legitimate site owners have a solid forum to get the problem fixed. Anyone that has gone through this agony for more than 3 months deserves feedback from Google. As you can guess, my site is still going through this agony.
Thanks again for enlightening us. Keep the window shade open.
JohnMu Said,
May 1, 2006 @ 12:50 pm
TheInsider, interesting take! Add to that the fact that Google is still caching and indexing old pages which have been 404′ing for a year or longer, and you see Google remove pages which people want in the index, and keeping pages which they want removed. LOL. (or Ooops?)
Somebody should take the caffein-supply away from the sitemaps team, they’re clearly working much too fast and doing too many great new things! I can’t imagine the bribes they must use to get past the old politics of “never show the webmaster your cards” :D. Keep it up!! (How about a feature where you can specify which rank you want to have for which of your keywords? ha ha, just kidding / dreaming
)
Adam Senour Said,
May 1, 2006 @ 1:51 pm
While the intention is quite good here, there are two problems with that theory
1) For every person who may have a legitimate beef with Google due to improper indexing, penalization for something that may have been okay before, etc. and so on (and I’m sure there are legit cases), there are going to be 1000s of people who complain because they lost SERPs for some putrid pile of monkey crap that never deserved to be there in the first place, or people who can’t get ranked and think they should. If Google devotes resources and time to dealing with those idiots, that’s time taken away from improving the engine.
2) Most people tend to be biased when they look at SERPs, and the complaints in general reflect that. An informal glance at the complaints in this very blog, for example, are a good indication of that.
“My site was ranked #1 and now it’s not.”
“Why is so-and-so spammer site showing up for a SERP that I just happen to be going for?”
“Matt, you’re too busy letting the spammers in.”
“Search Engines Web has a problem with…” Hmmm…no…I probably shouldn’t go there. Should I go there, everyone? Nah, probably not.
I think you see what I”m driving at. It isn’t your idea that’s bad, it’s how people would use it. And that’s something Google would have a horrible time with trying to address.
Dave Said,
May 1, 2006 @ 6:44 pm
RE: “Am I being penalized for participating in digitalpoint’s link co-op?”
Hmmm, a link scheme designed soley to try and trick Google into passing PR, link pop and increase Google rankings. As the whole link co-op scheme is outside Googles guidelines I would say you have been ’sprung’.
Kathy, the “digitalpoint’s link co-op” is one of the biggest scams to hit the WWW in a long time. Dump it and any other trickery you have now before you lose ALL pages from Google.
I bet if you read the Google guidlines you will identify more things that could cause problems with Google.
Dave Said,
May 1, 2006 @ 6:47 pm
Kathy, I think you should read this
http://www.ihelpyou.com/forums/showthread.php?s=be92c1ecca5fc107848d07a2aba420b8&threadid=21950
Dave Said,
May 1, 2006 @ 10:40 pm
Not all “misspellings” are spam. A variety of transliterations may be necessary to cover possible variations.
Is Google able to determine the difference between transliteration misspelling requirements and same-language misspellings?
Dave Said,
May 1, 2006 @ 11:09 pm
Dave, IMO Google is not iterested in transliterations as they are the same word in another language. This is why one CAN use “content delivery” to direct a user (not SE spider) to a relavant language page. In other words, no 1 page should/would have the same words in x different languages.
The spammers love to blur the lines between “content delivery” (ok with Google) and cloaking (not ok with Google). Matt recentlty spelt out the difference right here on his blog.
Kathy Said,
May 2, 2006 @ 5:56 am
Dave, you said this: Kathy, the “digitalpoint’s link co-op” is one of the biggest scams to hit the WWW in a long time. Dump it and any other trickery you have now before you lose ALL pages from Google. I bet if you read the Google guidlines you will identify more things that could cause problems with Google.
There is no trickery on my webpages. I’ve read and re-read the guidelines and see NOTHING that I’ve done wrong….except those dp link co-op which I was told was approved by Google. I’m a 49 year old (gulp!) woman who has worked daily online since 1998 on this website. Its a large website with LOADS of great content that helps women with medical problems. I have no need for trickery. Its the place the medical community sends their patients. I appreciate the links you’ve provided but don’t appreciate the tone assuming I’m a criminal who should have known better. I didn’t know better. I trusted some very big guys in the industry when I signed up for DP. Plus…I was on the first page of search results, often in #1 position prior to the co-op link network.
Ryan Said,
May 2, 2006 @ 6:00 am
DigitalPoint has always caused me loads of problems..
I found one threadon there where a guy was selling a copy of one of my sites.. same layout , content and pictures.. the “product” he was selling even had my email address left in the mailto: on the faq page
I wouldn’t trust them for anything…
Adam Senour Said,
May 2, 2006 @ 7:21 am
Hey Dave, Dave, Dave, Dave, Dave, Dave, Dave and Dave,
Can you guys like use initials in your posts or something? That section where at least two of you are talking is really confusing.
Dave Said,
May 2, 2006 @ 10:16 am
Here, here Daves. I second that.
Regards,
Dave (a different Dave from those above).
PS: Matt are you going to say anything at all about Google’s Uber Bug that is steadily deleting most of the Web from its Index? This must be the worst bug in Google’s history and yet you just pretend you don’t even notice the comments. Does doing “no evil” not include puting 1000s of small businesses out of business simply because Google don’t want to acknowledge a problem?
Hanford Said,
May 2, 2006 @ 1:28 pm
Hi Matt,
I have a quick question: Suppose I have a website with advertising on the homepage, but not on the sub-pages. I’d like to detect deep-linked referrals (from google, or anywhere else) to the sub-pages and show them the ads they missed by skipping the homepage. It would’nt be bait-and-switch becuase the sub-pages would contain the same core data regardless of how you got there; there’s just be an extra column with some product and suggested reading (which would be associates links to Amazon), and perhaps some extra info for signing up on the site, and whatnot.
Is this kind of thing permitted by Google? Could implementing it get my site removed from Google?
Adam Senour Said,
May 2, 2006 @