The inurl: operator is one of the most misunderstood features of Google, especially when discussing “hijacking,” so before I ask for Bigdaddy feedback, let’s take a concrete example using the inurl: operator and a real site. Thanks to AlexS for permission to use his (fun and addictive) site, neatorama.com, as an example. If you do the search [inurl:neatorama.com] you’ll see a number of results that are not from neatorama.com. Specifically, the results for [inurl:neatorama.com -site:neatorama.com] show non-neatorama.com results. Does that mean that these other urls are “hijacking” content from neatorama.com? That’s not what’s happening here. Take the url http://hot.blogrolling.com/search_linked.phtml?q=http://www.neatorama.com, for example. If you check out that url, it’s not a redirect at all. Instead, the “neatorama.com” in the url is a parameter for blogrolling.com, which shows the blogs that have neatorama.com in listed in their blogrolls. Showing the blogrolling.com url is a perfectly fine result for a search [inurl:neatorama.com], because the blogrolling.com result does include “neatorama.com” in the url.
Q: So what’s a clear-cut case when I can be sure that my content is being hijacked or that there’s an issue with Google?
A: If you owned neatorama.com and did the search [site:neatorama.com] and saw results that were not from neatorama.com, that’s something that we’d be very interested in hearing about.
Q: If I do a Bigdaddy report and mention [inurl:mydomain.com], is that a valid report of hijacking?
A: As we showed in the example above, inurl:mydomain.com just searches for “mydomain” and “com” showing up adjacent to each other in a url–there’s no requirement that the url has to be on mydomain.com. So if all you do is say “inurl:mydomain.com returns results that aren’t from my domain!!! I’ve been hijacked!!!!!” then you will not get the best response from our engineers. We’ve seen this misconception enough that I’m telling you about it in advance so that you can avoid giving feedback like that.
Q: What if there is another url and it does a 302 to my domain?
A: First off, I wouldn’t worry if the url is uncrawled (just a url showing with no snippet or description). That means that we saw a reference to a url, but didn’t crawl it. If/when we crawl that url reference, we would see the 302 and would handle it just fine.
Q: So put all this together for me. How would you report a potential 302 problem that I spotted using inurl: so that it gets the proper attention?
A: Here’s how I’d say it.
Hi, I did a search [inurl:mydomain.com -site:mydomain.com] and I noticed a bad result. The #3 result is www.weirdresult.com/redirect2.php?url=www.mydomain.com. When I visit the weirdresult.com result, it does a 302 redirect to mydomain.com. I know the weirdresult.com url has been crawled, because I see a snippet for the result. And when I view the cached page on weirdresult.com, I see the content of my home page. My home page does appear when I do the search [site:mydomain.com], but I don’t think it helps quality to have this weirdresult.com result in your index. Can you check this out? Thanks.
A subtle point to note in the report above: the more specific you can be (e.g. avoid pronouns or using “url” without specifying which url you mean), the more it will help. Just saying “this url” or “that url” can be ambiguous. You will be quite familiar with your situation, but the person reading your feedback won’t. The note above is also polite. It may make you feel better to use words like “hijacking” or to sprinkle exclamation points liberally over your feedback, but I’d recommend doing reports in a normal tone, not a “I’m three seconds away from sending you death threats” tone. Just a tip on getting maximum bang-for-your-buck with your feedback report.
Q: Are there other ways to report a potential canonicalization/dupe issue?
A: Sure. Here’s another useful report that I’ve seen recently (sanitized to protect the reporter):
Hi, I did a search for [firstname lastname] and noticed a weird result at #4. The result is www.weirdresult.com, and if you view the cached page it’s just a “this domain is parked” page. I think you might have confused weirdresult.com and mypersonalsite.com, because I checked and weirdresult.com and mypersonalsite.com are both on the same webhost and the same IP address. I don’t know if it’s an issue with Google or with my webhost, but I’m pretty sure that somehow these two domains got mixed up. weirdresult.com has nothing to do with me and doesn’t have any matches for my name as far as I can tell.
That’s a perfectly fine report too.
Just to repeat, if you send in bigdaddy feedback and it just says “I used inurl: and I saw results that weren’t from mydomain.com” you probably won’t get much attention, because that’s how inurl: is supposed to work. At a minimum, you’ll want to verify (and then mention in your feedback) that www.weirdresult.com/redirect2.php?url=www.mydomain.com 1) was crawled because you see a snippet, 2) does do a redirect to your site, and (maybe) 3) shows content from your site. An engineer can do better investigation on your feedback if you give clear, specific, unambiguous info in your report.
Matt:
I would like to quote you:
“A: If you owned neatorama.com and did the search [site:neatorama.com] and saw results that were not from neatorama.com, that’s something that we’d be very interested in hearing about.”
I have a website that has this issue.
I beleive I know why and I think it is because it is double hyphenated in the domain.
Example:
weird–site.com
Does Google frown on double hyphenated domains, or is this a situation that you have described?
Thanks
Great post.
Sorry, I meant weird–site.com
The hyphenation shouldn’t cause any problems either way. When I talk about how to give Bigdaddy feedback, I hope you’ll submit your example.
Will do matt, it has been like that for at least 6 months, could be longer.
Hi Matt,
Really appreciate your communication and attempts to pass along info to the marketing community.
However, we’re stuck and are actually one of the more experienced marketers out there, and have no idea why our site, http://www.preferredconsumer.com was removed from the google index abruptly at the end of July in 05′.
We thought it may have to do with stylesheets (based on recent blog posts of yours), but we’re doing nothing wrong there and double-checked everything. We’ve also troubleshooted and removed flash from our homepage, any 301 redirects from old domains we were migrating, and anything else we could think of.. in addition, i’ve filed two reinclusion requests, one about 3 months ago, and one again this morning. I never heard back about the first one.
No other engine is having trouble accessing our pages, and I do see the googlebot accessing us, but only a few pages vs thousands before….but nothing shows in the index…..
We don’t know where to turn at this point, and really need someone to look at our site and point out what the issue is… as they’re is definitely no spam and has never been any….as you can imagine, we’re quite perplexed…and added to the fact that we see plenty of spammers in the index…kind of sticks it to us in the gut, as we play by the rules 100%.
Any direction would be appreciated. Can you take a quick look?
Thanks,
tom chambers
Matt,
A lot of sites use a 302 redirect for a links directory to track the number of times a link is clicked.
ex. domain.com/directory/default.asp?linkId=123
I realize this url would not show up in the inurl: operator but I think from what you just said, it would with the site: operator.
I am sure you don’t want people emailing Google for examples like this, so what is your take on this type of use for a 302 redirect?
Matt,
what’s wrong with our site http://www.preferredconsumer.com?
we’ve never spammed in the last 12 years, and no other engines is having problems with our pages… and abrupty at the end of July in 05′, our site was completely gone from the google index….
we’ve troubleshooted everything we know between now and then… and nothing’s changed…
what’s interesting is that the googlebot is still visiting our site, but only accesses a few pages and then leaves… and it doesn’t show up as being in the index online at all…. prior to July, it was accessing hundreds, sometimes thousands of pages and they were all in the index…
i’ve also submitted 2 reinclusion request, the lastest today, as i never heard back on the first one…
needless to say, this is a bummer, because we see spammers all the time, and always assume you guys will catch them.. and for the first time in our experience, we were suddenly gone from google for something that we have no idea of.
can you see anything that would cause a problem?
if we can’t find out what’s wrong and no one will tell us, how do we fix it? we’ve checked everything, all the basics and such, and everything’s fine, as far as we know. we do all of our publishing ourself, so there’s no option of another company doing something that we’re not aware of.
please help…
sincerely,
tom
You said that if you try site:mysite.com and see results not from you that it would be interesting… I started noticing that a few days ago on my personal site. There are a bunch of ?q=SPAM-KEYWORD pages listed for my domain with their snippets and cache being from some scammer with AdSense ads. Clicking on the actual link shows my index.html page as expected, but for whatever reason the GoogleBot sees someone else’s site.
The homepage is actually index.htm, so it’s not even possible on my server setup to do cloaking or what have you. And I definitely have not done anything like that. Not that it would even help, only affects the cache and the money goes into someone else’s bank account!
The homepage link for this comment is my personal site (jongales.com). Searching site:jongales.com on Google starts giving bad results at #11.
http://www.google.com/ie?&q=site%3Ajongales.com
I asked my readers at GoogleRumors if they knew what this was and so far no one has the answer.
Matt,
Would this be an example of hijacking that Google wants to know about?
http://www.acura-reference-guide.com/dynamic-frameset.html?http://www.hubcap-tire-wheel.com/
Hmmm, interesting. Perhaps this is the reason why Google doesn’t love one of my sites like it used to. When I search inurl:cointalk.org -site.cointalk.org I get among other results:
1867.onlineinfosource.com/www.cointalk.org/archive/index.php/t-2178.html
addurllist.com/www.cointalk.org/archive/index.php/t-994.html
Is that the kind of thing you’re looking for Matt?
thanks
I’ve seen a lot of sites show up that way that use a cgi script to count hits. Like: domain.com/directory/default.asp?linkId=123
While not a 302, the same problem sometimes occurs. Quite a few sites that use a top sites script generate this problem inadvertently.
Ben/Charles, I don’t think that these links do any harm to the destination site in Bigdaddy.
For a while I had a problem with the 302 redirect thing, but the guys at Google fixed it without me saying anything. It was driving me nuts that someone’s site was coming up 2nd for my keywords, and mine was nowhere to be seen.. and theirs was a redirect to my site. I figured as long as they got to my site I didn’t mind too much =P
Jon Gales, it looks like that’s happening because someone is linking to urls like
http://www.jongales.com/?q=Affiliates
and your webserver appears to serve up pages in response to that. I would make it so that you don’t serve up a 200 (status ok) and return a page when someone passes “?q=” to your webserver. After you make those pages return a 404, the pages should drop right out over a week or two.
Peter Davis, just to be clear, you had site.cointalk.org instead of site:cointalk.org in your comment. When I do the search [inurl:cointalk.org -site:cointalk.org] I mostly see Supplemental Results, which I wouldn’t worry about at all.
The . was just a typo in my post here, I typed it in correctly when I did the search. Thanks, I didn’t notice they were supplimental.
Matt, kind of off topic, but related to the [site:domain] search. I was using a .ORG domain, and decided to switch over to a .COM domain many, many months ago. Everything went smoothly until my redirect was accidently broken when my programmer was doing some tweaks to my .htaccess file. The site was then penalized for dupe content (which is how I found out my redirect was broken). I got my redirect working again, communicated with Google, and the penalty was eventually removed. However, a few months ago, my old .ORG domain reappeared in the index again, and my site was all but removed from the index. I contacted Google, and after a few months it’s started to reappear again.
My questions are:
– Why would these pages suddenly reappear in the index? The cache for the pages are really old, and you can see them here: site:familyresource.org
– Why can’t I remove them using the remove tool? Every time I try to add the .ORG urls to the remove tool, it says they still exist.
Jon Gales,
Your last significant whois change is from Nov 13th or so.
http://www.whois.sc/whois-history/?domain=jongales.com
Based on the cache, did your domain expire and GoDaddy changed the nameserver to point your domain to its own webservers, even for a short time? The cached versions of your pages suggest this.
If so, omitting this information from your appeal to Matt is curious, considering that he just asked for detailed information…
It’s not the same (I don’t think) as a 302 hijack, but since it relies on the same search concept, I was wondering if maybe I could report it here:
http://64.233.179.104/search?hl=en&lr=&q=inurl%3Aadamwebdesign.ca+-site%3Aadamwebdesign.ca&btnG=Search
Last time I checked, Top Free Sex isn’t something my site’s all about (but I will give it away to hot chicks if they ask nicely. 😉 )
I know they’re supplemental, but still…that kinda bugs me.
Matt,
Excellent work on BigDaddy. It loads a lot faster and the results are excellent. All my site’s 302s are now taken care of.
I have a question: now that Google deals with 302s perfectly, does that mean 302s will be considered inbound links? That is, will they start showing up with the link: operator?
Thanks!
I suspect that the jongales.com urls are left over from GoDaddy, but jongales.com actually responds to those (now stale) urls. If Jon gives a 404, the pages should go away.
Ben Pate, that looks more like a site that takes a url as a parameter and frames it. By itself, that wouldn’t worry me as much. Only if that url appeared instead of the real home page would I be concerned. For example, if [site:hubcap-tire-wheel.com] showed results from that framing site instead of hubcap-tire-wheel.com, then I’d be interested.
Oooh, Jon Henshaw, can I make a teaching example out of your site? It’s a fine site (no spam penalties), but you used the url removal tool in an attempt to kill http://www.familyresource.org, thereby knocking out the entire domain starting around 2005-10-29. I’ll submit a reinclusion request for you..
Adam Senour, I can ask someone to clip those; looks like subdomain spam. But to be clear, I don’t think those urls were harming your site.
Well…probably not in terms of SERP results in general.
But one of my clients got the idea to query my domain name (why they can’t just type it in the address bar escapes me, but people do this for some reason) and they found that.
That’s a tough thing to try and explain to someone. Fortunately, this client has known me for 10 years and knows I wouldn’t pull a stunt like that. 🙂
Thanks, man. Much appreciated.
Matt, thanks for the reply. I actually want http://www.familyresource.ORG to be OUT of the index. I ONLY use http://www.familyresource.COM now, and I have a 301 redirect for the .ORG domain to point to the .COM domain. So my problem is that I can’t get rid of it, the opposite of what I think you’re suggesting to do.
Just so I don’t screw myself here, I want to be perfectly clear:
– Please REMOVE the .ORG domain – I can’t get familyresource.ORG out of the index to save my life
– Please KEEP the .COM domain – http://www.familyresource.COM is my livelyhood and my baby. Please don’t do anything that would hurt it 😉
The reason no 404 is given is because the requst is for the index and passes the GET variable which is the spammy part. For example, your site does the same thing. It’s default behavior.
I am not aware of GoDaddy taking over, I did recently renew but the site was never down or forwarded to GoDaddy. The domain definitely wasn’t in a registrar lock because I just paid the $7.95 or whatever.
Hmm. Jon Gales, you could try a mod_rewrite. I’ll dig in a little more.
Jon Henshaw, I think I understand; that additional wrinkle makes it less of a good teaching example. I think your problem was that you submitted http://www.familyresource.org for removal using the url removal tool. After that, getting the 301 from the .org to .com to count is a lot harder. Here’s what I recommend. I’m going to ask the removal of http://www.familyresource.org to be revoked. You’ve got the 301 set up correctly, so with a little bit of time, all the .org should transition to the .com. The exception is the supplemental .org results. When supplemental Googlebot roams forth again (it will be a while) those urls will be seen as 301s and should be dropped.
Net effect: I think all you have to do is keep chugging with the .com, and all the .org stuff should catch up and consolidate into the .com over time.
Thanks Matt. Your help means a great deal to me. If I were there, I would make you a Butter Fried Krispy Kreme Donut ( http://www.familyresource.com/lifestyles/74/1183/ ) 😛
Hi Matt,
So if we see weird results but they are supplemental, then we don’t report them? We don’t worry about them at all?
Thanks!
404 – give 404, pages should go away
excellent!
thank you
Liza, that would be my advice. The supplemental results are independent of Bigdaddy. You can think of them as being overlaid on top of data centers. So I wouldn’t report weird supplemental results in this iteration.
Jon Henshaw, that fried donut is so wrong, yet so right. 🙂
wow…intersting discusion. im gonna watch out this topic
@Adam Senour: looking at my logfiles I see A LOT of people typing in the domain name into google instead of the browser. beats my why they do it, but on some domains it’s 10-20% of the google-hits.
apparently people don’t know the difference between google and the browser (toolbar issue?) and do not know how to bookmark a page (although I provide a link on the site).
Hi Matt,
“The hyphenation shouldn’t cause any problems either way. When I talk about how to give Bigdaddy feedback, I hope you’ll submit your example.”
I have submitted the example via spam report. I included your Q&A for that instance in the submission. I would be curious to know the results. Meaning if it really was a “hijack” or actually a google issue.
Thanks.
Mike
You and me both, buddy. But I’ve seen it range from anywhere from 1-15% of traffic from SEs in general.
Hi Matt,
Sorry for my late comment, i just fount this interesting article.
My question is: when i search for mydomain.com and there appears a redirect link – is this something google would like to hear about?
The redirect-Link is well placed now – in fact Nr. 1 with the two most important keywords. So there are still visitors to my site over this redirect-Link. But the problem is, that other sites from my domain – which where well placed for other keywords before that happened – are now far away from the Top-Results. So my thought is, that this redirect doesn´t hurt the ranking of the index site (it only appears the redirect-link instead of the real domain), but it hurts the ranking of all the other sites of this domain. Is that possible?
Thanks,
Markus
(hope my english isn´t too poor – i´m out of practice for a while now)
Matt,
I think I found a good one, check out http://www.alabamaarchives.org/0/1/2/hubcap-tire-wheel/3
@Adam Senour: I usually do that if I want to see other discussions about some site (linked to from page I am reading). I know there’s some operator for that, but I don’t remember it, so I just cut-and-paste url in Google. There, site itself appears on top mostly, and I proceed to that page from there, occasionaly…
Hi Matt, this is the best advice on this theme for all time of my searches of the necessary information. I now just am engaged in a spelling of article describing the Google’s advanced search. Your information Will be very useful to me, and I can make article rather useful to my readers. Thank’s again 🙂
i hate hijacking sounds dirty and bad 🙂 thnx for article
Does the inurl parameter work for file types? For example, if I only want PDF or DOC files for results?
This is very helpful. I’ve been playing around with inurl for a while and am still finding new practices for it. Can’t wait to get my hands on some of the other Google operators.