Okay, a good night’s sleep helps cure the crankiness, so it’s back to work. No Etech, no Cebit, no SXSW, no GDC for me. No SES in China (though that would be fun), no AD:TECH. Just a nice month or two of solid work, I hope. 🙂 Time to get back into the swing of things, catch up with my colleagues, and tackle some of the hundreds of emails and blog comments that piled up during the funeral and the conference. Maybe I’ll try to figure out why our Honda Civic hasn’t been starting or maybe I’ll look at taxes this week, but mainly I’m looking forward to catching up.
By the way, if you have spam reports for non-English languages, now is the perfect time to send those in, even if you use the English version of the spam report form.
Hi Matt,
Nice to know that you are back at work after so much travelling and clubbing 😉
Just before checking your blog I was looking for spanish seo forums, and, of course, I’ve used Google for this search…. and I’ve been laughing for five minutes… for some reason, google.es thinks that I’m looking for ‘foro sexo’ instead of ‘foro seo’… in spanish, sexo is sex… so… of course, I haven’t found what i was looking for.
Any special keyword to attach to the spam reports? (like “Jagger1” before)
Hi Matt,
Am I being cheeky to ask for a BigDaddy update so soon after you go back to work?
Matt,
Read your posting about duplicate content issues and the blogs…we have a few sites (less than 5) where people post there own content – we currently allow them to post on more than one site.
To avoid being penalized for duplicate content should we:
1. No longer allow any postings on more than one site
2. use no follows to keep the postings out of the index
3. use 301’s to have the posting belong to only one site
4. make the poster write a new description and not allow them to
essentially cut and paste?
If 10% content duplication can get you into trouble – is this 10% of your site or does this also mean if one page on one site is almost the same as one page on another site (although 90% different is significant)? If pages are similar or duplicate, will that page only rank well for one site or can the entire site be penalized?
I realize it’s probably a matter of degrees – we have about 15% duplication of postings across our sites and want to take steps to remediate them.
Thanks for any clarification you can provide,
Sincerely,
Andrew
A question MC on the supplemental goof if you could please? Did you guys normalise things. This guys caught my eye and I noticed some of myself in it
Schadenfreude
” I’d read about people losing their positioning and I’d feel smug and superior. Oops! Am I sorry now! [Note to self:] be more humble and compassionate. (And develop alternate traffic streams!) ”
Cheers
Max
Welcome back, Matt!
I second Frank’s request…looking at http://66.249.93.104 I’m seeing two wildly different sets of results, just by clicking the Search button over and over. For the term I’m looking at (“hawaii specials”), I’m seeing one set of results at that IP with 6.4 million results, and another with 15.7 million results. Makes me think that at 66.249.93.104 there’s actually servers with two different sets of indices/algorithms hiding behind a load balancer. What’s up with that? VERY curious….
Michael.
Michael
It looks to me that 66.249.93.104 is not Big Daddy 100% of the time that is all.
Of course Matt if there is anymore to say on Big Daddy it would be appreciated – you were quoted as saying another 6 weeks or so at SES, so a little way of still if that quote was correct.
Cheers
Stephen
Matt,
There is a thread going on at WMW that GoogleGuy commented on. Basically it is in reference to major websites being almost completely put into the supplemental index. As someone that is also suffering from this occurrence, I am wondering if there is anything that you know about it. So far is seems to be effecting quality websites, and doesn’t seem to have any specific reason for it.
To me it looks like it could be a Canonicalization problem between https and http versions of website, but it is definitely devastating some sites out there.
WMW topics:
http://www.webmasterworld.com/forum30/33351.htm
and
http://www.webmasterworld.com/forum30/33386.htm
Thanks
Jamie
Thx Stephen…I’m DYING to know which of the results is BigDaddy and which is The Algorithm Of Yore….hoping it’s the 15.7 million results version 🙂
MC
Looks like the 15.7 million one is Big Daddy 🙂
However, the other BD Dcs show 35.8 Million results.
The 6.4 is definetly not BD.
So good for you. 😉
I’m still liking the new BD update…Matt, is it still a good time to send in English spam?
Brandon Hopkins
Boston isn’t that far off:)
Well Matt, i’ve reported about 30 romanian websites, from two months ago.
Only 2 got out. That’s good, but what about the rest ? 🙂
I’ve made a whole SEO spam category, each spamming website, with it’s own thread, at Seopedia.
I’m doing a “how to correctly report spam” faq right now, for the rest of the members, to better ghet this romanian spam control on the wheels.
Thanks.
The new thing among Swedish blackhat SEO firms seems to be cloaking for bots via a mixture of IP numbers and user agent identification. The redirects aren’t handled with javascript anymore but server side. This makes them harder to spot (you can’t just use User Agent Switcher and turn off javascript anymore) but it’s still as bad.
More serious though is the fact that one Swedish SEO company has taken up the habit of cloaking their customers title tags using javascript. All in order to get more sites to rank higher than sites explaining why this particular SEO company isn’t visible in the SERPs anymore. The fact that this will also affect their customers rankins negatively doesn’t stop them. And most of their customers, who have all bought positions in their link farms (or link network as they call it), won’t even notice.
I bought a site a few months ago and it has a ton of supplemental results for some pages I guess the guy generated. I wanted to know how I could get these removed from the index? I am talking a few thousand pages of BS. I did sitemaps, but it had no effect.
Hi Matt, I really appreciate the way you give us important news and tips on SEO matters.
On August 25, 2005, you said “…Google doesn’t algorithmically penalize for dashes in the url.”
Is this still true?
(My school teachers told me a hyphen was short and a dash was long, so it would be impossible to put a dash in an URL. I see Dictionary.com reckons “dash” is a synonym for “hyphen”. Not Down Under!)
It’s all well and good saying report spam sites, but I’ve reported a few for hidden text ans spam, (they’re not rivals), and yet nothing is ever done.
I suppose it just looks good to talk about it.
Hey Matt,
Welcome back to work. I was trying to write into you last week while you were at SES and I knew it was a long shot……. I’m hoping you can squeeze me in for a quick email. I’d like to send you a URL to look at…. You may have seen it already as it is the one Tom sent you via email, I just never got a confirmation if you were ever able to look at it…. It’s a very large old site that has been virtually wiped out by this new supplemental results index that we are seeing moving around….
I got confirmation back from Google help that the site isn’t penilized in anyway, but we are missing over 100,000 pages in the index leaving us with just about 120 of our most useless pages. Our entirre business areas, local data and author colums are wiped out even though they are completely unique….. Shoot me an email so I can show you the site…. Would really appreciate it. I’m sure this is an error, but I’m trying to speed up recovery if possible. Take care.
We have had the same confirmation of no penalty but have been hit by the “supplemental bug” I would appreciate any input on this from Matt … as to Google’s approach and if they have any timeline for getting it straight. We are an e-commerce site that his been hit very hard by this massive loss of pages.
Did you go to Seth Godin’s presentation? Did he hand out free purple cows to everyone like Oprah did cars?
One suggestion I have is please change that autoresponder from Google about “the site not being penalised in the supplemental index” . I know you guys do not like to comment on sites with a ban etc but it would be cool to be truthful. More accurate info would be.
The site is in the supplemental index and this is shown finally when we have no other results in the main index. G’s autoresponder makes it sound an every day event to be in there, when clearly it is not. Also a pointer as what one can can do to get back into the main index by removing dupe etc.
Matt, I have to second what Christian Mezei complained about in an earlier comment. I never got any feedback (as in “the reported site got removed / assigned a lower PR) from reporting quite obvious spam sites. Could you perhaps write a short post about how spam reporting is supposed to work, what information you need in the form and so on? Oh, and in what language should one report non-english sites?
Perfect time for spam report? Matt we also would like to read from you when will be the perfect time for Google to fix the “gonesupplemental” issue on Big Daddy 😉
Hi Matt,
I can’t work out if this breaks the google guidelines or not?
I found an interesting article which provided some code which examined the incoming URL, takes the q parameter from the querystring and then highlights it in the body of the page where it appears,
So on a similar theme, would it be a problem to use the search term from within the referring URL? and then provide a series of links to more related content within the website?
To me this is an added service to the user by providing more links to other content related to their search. However I get the feeling google would not like this and see it as a form of cloaking.
It’s a fact that the majority of users arrive at your site via google, so is there any harm in using the information provided to enhance the users experience?
Is this any different in checking the ip address of the incoming request and serving different content based on geographic location?
All the best
Steve
Matt, as Stunti already pointed out, Boston is only 6 weeks down the road. Or will you miss it?
Hi Matt,
I just write here because it’s your last post. Have you got any comment about GDrive and Lighthouse, we’ve heard about that on the blogosphere ( http://glinden.blogspot.com/2006/03/in-world-with-infinite-storage.html or http://www.techcrunch.com/tag/Google-Drive/ and so on), and now we would like to know something more.
Best regards,
Jean-Marie
Mr Cutts,
Even though Google is the most popular search engine in Greece, far ahead of the local search engines, you are not doing such a good work in handling the Greek language.
We came to this conclusion after extensive testing and we found out that Google treats for example the usual stop words, the plural and the singular format (for Greek) as completely different words. No need to mention that stemming, semantics and other advanced language handling methods are out of question.
I hope that you shall concentrate your efforts in all other non-English languages, as you have said earlier.
Finally, i would like to let you know that even though we have reported several websites in the past using black hat techniques, your team hasn’t done anything (yet).
Matt,
This was taken from a blog http://ez-search-engine-optimization.com/blog/
and was wondering if there was any truth to this statment.
“A few weeks back I recommended a book by Colin McDougall called the VEO Report.
What I liked most about the book is the way Colin’s techniques for building a site mirror my own. In the book, and also in Colin’s newsletter, he talked about domains with hyphens in them. Colin has stated that domains that have hyphens will be penalised by Google.
Colin’s source seems once again to be Google engineer Matt Cutts, and Colin stated that the penalty would come into effect at the end of February, beginning of March.”
Thanks,
Jared
Hi Matt.
Hopefully you can verify. We have spent a lot of time and money on a CMS system for our site. We have moved all of our content into a DB for ease of management and the pages will get created dynamically. We will be using mod_rewrites from existing .htm to the new PHP pages to preserve PR of our existing pages. I was about to swith the CMS on this weekend until I read a post on Webmasterworld about sites being penalized for mod_rewrtiting php pages to .htm.
Can you confirm if there is any truth in this?
Thanks
Graham
I know there is an issue with the update on Big Daddy and feel it is a wait and see issue right now as the searches Google is returning are really sub par to say the least and am really glad there is an issue and this isn’t what we are looking at as your default search.
I have a site went supplemental as well as 1000’s of others and look forward to the issue getting worked out.
I have seen the bot revisiting often and assume this is a sigh of reindexing the net for a fix.
I have got 2 questions Matt, guys:
1)Do you have any statistics on number of sites reported to Google for using black hat techniques compared with number of sites which are subsequently banned from the index?
2)I found this website (http://www.mediaheaven.co.uk/) which uses JavaScript to show/hide text as part of their design. Would this be considered spam or border-line?
Thanks
Jared,
I always thought that it may happen and have warned clients that have more than one hyphen…. It’s just unnatural to have more than one hyphen in your domain name… Sometimes a hyphen is necessary, but two, three hyphens, that’s pushing it…..
I have never registered a domain name with more than one hyphen, but I sure am going to be feeling for my client if this really does happen…. I warned him that I was never comfortable working on the hyphened name, but there was no solid indication that anything like this was definetly going to happen, but I always thought that it either could, or should….
Some people are going to get hit hard with this if it does.
hmmm – this is a real issue…..
By the way, Matt or anyone:
Does anyone know what the story is for this datacenter:
http://66.102.7.99/
Are we going to see the results that are on this DC again?
Any information on this datacenter?
Hey Matt – I have that review of Spamalot done – how can I get it through to you?
Duane
[quote]It’s just unnatural to have more than one hyphen in your domain name… Sometimes a hyphen is necessary, but two, three hyphens, that’s pushing it…..[/quote]
As more domains are used, the obvious choice for many is to use hyphens. Hyphens have nothing to do with content. I doubt Google has to resort to hyphens as a way to determine spam.
Our legal name is hyphenated with a single hyphen and so is our website. Are we penalized for that? We 301 redirect the non-hyphen domain name to the hyphen one (probably backwards to what many do). I suppose we could reverse the 301 from hyphen to non-hyphen. Do you recommend that?
Could that explain our drop from a PR5 to a PR1 over the past year? We’ve done nothing black hat over this time. Even our domain name won’t appear in the SERPs unless it is entered in full with .com in it. Just wondering since our Google traffic has tanked to almost 0.
The thought of having a keyword in a domain name helping the seo process will always tempt people to use hyphens.
While it’s not good to see those very long domains having lots of terms concatenated with hyphens, the penalty (if ever it is deemed to hit sites with hyphen) must be done on a case to case basis.
I am not sure if having multiple hyphens is similar to having multiple occurence of one single keyword which is called spam.
We don’t use hyphens as a spam signal.
Matt, got another of this guys profile.
http://www.blogger.com/profile/17839170
This time these blogs are all pr5.
Smells of a inside job.If they are really intersted they could look at caughtspamming.blogspot for all the spam blogs used by this guy that I have found
They seem to ignore most of my complaints.
[quote]As more domains are used, the obvious choice for many is to use hyphens. Hyphens have nothing to do with content. I doubt Google has to resort to hyphens as a way to determine spam.[/quote]
Unbelievable…….
I have seen hyphens used properly or in ways that would reflect reason, like pcf-law.com or ford-forums.com, where the wanted name is simply not available put speaking the domain name aloud is still concievable.
All too often I see names registered and in use for no other reason then to game the search enignes…. These names are similar to:
“play-online-casino.com” or worse “play-online-casino-poker.com”
And they get much much worse. I’m sure I do not need to go on and on with more examples for you, although your inteligence level is already in question at this point with your above comment name.
These names are without question registered and promoted for non other then attempting to game the engines and in most cases they do…. All I am saying is that it is not practical to rely on such practices as a method of good ranking.
Glad to hear about the hyphens issue; it would be pretty silly imho – domains with 2 or more hyphens are perfectly ok and will become more commonplace as the www grows and the domain squatters hoard all the basic ones.
Matt,
filling security reports is a boring and timeconsuming task – I am getting better at it, though. Wouldn’t it be more efficient if we could flag a search result, like with blog? I see the dillema here, people submiting wrong spam results just because the search did not return what they were looking for; on the other hand making people sign before flaging would constitute a privacy concern… what about an AP that mostly techwise folks would be aware of?
just my 0,02 eurocents 😉
Matt, I beg of you, please take a look at this thread on Google Blogoscoped:
http://blog.outer-court.com/forum/22209.html
It’s *extremely* urgent. Please.
ohh, this blog software has a problem with <<<
My last post was:
matt writes >>> By the way, if you have spam reports for non-English languages, now is the perfect time to send those in, even if you use the English version of the spam report form. <<<
Matt, you have now very many new work, I think.
Matt,
PLEASE, PLEASE ADVISE – its regarding this suplemental issue. One of our sites took a major hit lastnight with it. Almost 85% of the sites content pages went suplemental and vanished from the serps. Its a major authority site in its sector. We are left now with just a few meaningless pages.
Is the site likely to recover from this??
What can you tell us about GDrive? I’m very excited/curious about this potential tool. (http://hardware.slashdot.org/hardware/06/03/07/161234.shtml)
Thanks!
eOne Studio Web Design
in germany domains including hyphens are prefered. it’s the opposite in the states but since the urls are case insensitive there is no camel casing and the domain name is harder to read when made up of three or four words.
domains with two or three hyphens are a common sight in germany so it would be pretty stupid to declare the hyphen as a signal f0r spamming.
Matt, a honda civic??
I woulda guessed with all those google stock options you would be driving a Ferrari to work.
Don’t tell me your not one of the 1,000 millionaires at google?
On the same note, I heard that you get fined one share if you are caught looking at the GOOG stock price at work, is that true?
How could you ask for spam report ? Google blacklist bmw.de and unblacklist them only a few days later.
Besides, some sites are blacklisted, but have not the chance to be explain why on your weblog, and wait for weeks and years to know what happened.
Google seems to prefer big compagnies which buy a lot of Adwords.
Most of my local webmaster colleagues in Portugal are reporting Google originated accesses dropping 40-60% in the last two weeks.
Results are seemingly, randomly ordered (sometimes you appear #1, others #150, or more, in the same search, varying in days, sometimes even in the same day in differente hours) in many searches. Is this because of the update? How long is this will take?
We wished to sign a joint complaint, is there any proper place for it?
Its really dark that Matt has to serve as the only real link between Google and its clients. Google is a stock marketed company for christ sake… not a dark conundrum of hackers who need to hide from their evil deeds.
Hey Matt,
Is it a hybrid Honda civic? My dad has one and there was an intermittent starting problem that was fixed by tightening the battery cable. It was slightly loose. 🙂
“We wished to sign a joint complaint, is there any proper place for it?”
Heh, not if you are getting free traffic my friend. I think people who pay should get some sort of service from google. My paying customers come first in the queue. It is pretty frustrating that you can do nothing apart from post a message here in vein( vain?) hoping mc will check it out.
Not like google could not charge for this service but the prob is that admitting a penalty could bring on unwanted law suits etc. + they could be accussed from profiteering from dilberatley banning sites. But when you have 80% of searches going through your servers it would have to be a pretty large department as a lot of people are effected by updates.
So anyway MC what is the latest message from the engine room?
Not So Fast, She canna take much more Captain
I hate to reiterate what everyone else is saying, but this supplemental issue is tearing through every site regardless of whether any active SEO has taken place. Completely unique sites across the board have gone the way of supplementals and it is killing search results quality, not just incomes.
Please comment on this growing concern across the web.
Hey Matt
Will you or any of your crew attend SES in Tokyo?
If so you can tell them to look me up…
@ MaxD
I had that discussion so many times I can recall them all. I used to work for a portuguese search engine, one of the largest around.
The thing is, what if suddenly all good content sites would block google bot? What would become of google? N-o-t-h-i-n-g. Useless thing. So its pretty much a quid-pro-quo situation. Google wants webmasters to be nice to him, produce good contents and well organized, and in return provides them with tools and visitors. Its Law for all search engines.
I’m talking about good content well built sites, spam free, no hidden stuff or SEO low tricks, being thrashed. We (the guys i speak on behalf of) all have major news sites, and we’re seeing, since a while, tho last couple weeks its getting worst, our sites bumped out into low places, behind spam directories and sites with no content whatsoever, that make profit using adsense+SEO spam tricks. And even worst, behind blogs who quote our contents…
Where is the coherence of the past that made google great? We all been through major updates, and never the consequences have been so grim.
It’s disappointing, thats all, specially when the results appear fine on Yahoo and MSN. Tho, as we all know, they suck.
Mac, looks like you have a different problem than us. we have supplemental issues. Would be interested to see the scraper stuff ranking above you though. Paste a google url here?
I think G is gowing through some pain updating their servers. I for one know what this is like. We are 9 months overdue on a datbase swap from asp to asp.net so I can appreciate G have some troubles but also am giving them time to sort it out.
I have put up a page about the so called “supp glitch” where people can list their site if it is being affected. Interesting to see how many people are affected and whether or not google can use this list to help us.
Link is:
http://www.bleepingcomputer.com/supp_glitch.php
Here MaxD, try this: Search in Portuguese “noticias sobre jogos” (news about videogames)
Check the results. You dont even need to know english to see that most of them are scrap, and that the relevant ones only come late on the search. Thats just an example, but its good enough i think.
Search in portuguese has been invaded by thousands of crap directories who use massive keyword spam to appear in every search the directory is related to. What should appear first? The sites with the real content or the thousands of directories or quoters?
Its a retorical question.
Speaking of work look what I just got in my gmail inbox…
We recently received your resume and would like to thank you for your
interest in working at Google. Your application has been submitted for the
following positions:
Industry Marketing Manager, Advertising Agencies – New York or Mountain
View
After reviewing your resume, a member of our staffing team will be in touch
if we find you may be a fit for the roles for which you’ve applied. Thanks
again!
Sincerely,
Google Jobs
After SES NYC I’ve been thinking about switching sides 🙂
Matt do you know if bigdaddy will now recognise underscores as word seperators like hyphens?
RE: “what if suddenly all good content sites would block google bot?”
What if my Grandma had balls? I guess she’d be my Grandpa 😉
BTW. Have you blocked googlebot? I didn’t think so 🙂
I guess Bush thought the same when some lunatic said there were terrorist plans to turn the WTC to dust.
Thanks for adding nothing to the discussion. I’m not paying for AdWords to have googlebot blocked. Duh.
Let;s be realistic here. Nobody in their right mind would block googlebot. Your anology is waaaaayyyyyy over the top and irrelevant IMO.
One should not assume spammy tricks ARE the reason for a well placed spam page.
Knogle, I emailed Philipp to ask what the specific email address was.
HaHa, I know the Blogger folks sometimes read my comments, or I pass this info on.
Mac, there will be some differences in search results as people hit current data centers vs. Bigdaddy data centers.
@ Matt
As long things improve for the best we’re willing to take some beating while the improvements are being made. Our fear is that Google may be neglecting lesser important languages in this update, like ours (although is the 4th more spoken in the world), and things may not work for the best. So far, both on BD DCs and other DCs, things look messy as hell, in our language (portuguese).
Anyway, we decided to sit tight for more 6 weeks (i’ve heard?) and see if it was worth the suffering and confusion.
Thank you *very* much Matt.
If you would like to leave your email address here, I could also get in contact with you directly.
Update: I contacted you with my username at cutts at cs dot.unc.edu.
Concerning the spam in foreign languages:
You might be interested in the thread in the Usenet newsgroup,
alt.internet.search-engines titled
“Any idea why Google would have removed this site from its index?”
started 6th March
A guy has posted about his translation site that looks as though it has just been hit by a penalty of some kind.
Whether this is anything else, like duplicate content or the fact he has multiple pages but in different languages I don’t know.
But it makes you wonder what can happen to sites that have pages translated in to numerous languages?
If a page is translated in to multiple languages, is there a possibility now that gogle is looking in to spam in foreign languages, that the site could be penalised for duplicate content?
I happened across your blog and am fascinated by the amount of time and effort that you put into this project. It’s people like you that are the positive contributors to our society. Thanks a million for all your information about google.
Personally, I don’t think anyone should block the googlebot nor do I think anyone should be able to have any unfair advantage in attracting the googlebot, why not let google and the other bots spider and index to their hearts content, it’s a free world out there on the net!
All the best,
Mark
Hi Matt
I just got in from a 3 week trip to South Africa, so it is back to work for me too.
We are launching a .com.cn site very soon (actually online) and have a com.sg site. We have had some problems with duplicate content between our .sg site and our .com site in the Google.com index. The .sg site replaced the .com site or both the .com and .sg site dominated the SERPS for some main keywords. We implemented: a few months ago. Now there is very little corporate exposure in the local .sg index. A tag developed like .content=”archive.sg.only”> would solve this!
I am really trying to find out the best way to structure all these sites from a Google point of view. Are there any Google guidlines on this subject? We are expanding into Australia also and will eventually launch a .au site. Is it better to consolidate all the sites onto one server: .com/cn or keep them separate .com.cn?
You help would be appreciated.
Hey Jared,
I never said a hyphenated domain would get penalized. Simply not rewarded. How natural is a keyword focused hyphenated domain? Not very natural is it!
Take a look around at the SERP’s. You certainly don’t see to many hyphenated domains.
Also, read Matt’s posting to this very very careful:
“We don’t use hyphens as a spam signal. ”
I never said that a site would get flagged as spam. Simply that the site likely won’t rank.
I did ask Matt a few questions about my site at WebmasterWorld in November – He did not say that my keyword focussed hyphenated domain would be penalized but it would be wise for me to move to a new domain.
The problem, in my opinion, with this type of domain is that it likely gets caught in a “keyword stuffing” filter.
Think about this – To acquire the kind of inbound links Google is rewarding, do you think that an authorative site will link to http://www.keyword1-keyword2-keyword3.com – Likely not going to happen.
Now http://www.SomeReallyCoolDomain.com will stand a chance of attracting attention.
I have sites with a single hyphen and they do fine in Google however every site that has two or more hyphens does not do well. Again, nobody ever said penalty or spam flag – It is more like a lack of reward.
I think if Matt said too much on his blog about this sort of thing he might received his “First talking to from his employer” 🙂
Hey Matt,
I filed several sites who are using all kinds of spamming techniques. None of them have been penaliced or remowed in any way. Does it mather in any way that the sites is from Sweden? Dont you have the resources to take action against swedish sites. It´s a djungle out there with all big SEO companies saying they have changed techinques since Google changed the rules of the game last autumn! I´ve been having a big discusion with one of the oldest SEO companies in Sweden who now say they are going to take the shadow domains away since Google no longer allows this technique. This is the answer their customers get when they confont them.