The inurl: operator is one of the most misunderstood features of Google, especially when discussing “hijacking,” so before I ask for Bigdaddy feedback, let’s take a concrete example using the inurl: operator and a real site. Thanks to AlexS for permission to use his (fun and addictive) site, neatorama.com, as an example. If you do the search [inurl:neatorama.com] you’ll see a number of results that are not from neatorama.com. Specifically, the results for [inurl:neatorama.com -site:neatorama.com] show non-neatorama.com results. Does that mean that these other urls are “hijacking” content from neatorama.com? That’s not what’s happening here. Take the url http://hot.blogrolling.com/search_linked.phtml?q=http://www.neatorama.com, for example. If you check out that url, it’s not a redirect at all. Instead, the “neatorama.com” in the url is a parameter for blogrolling.com, which shows the blogs that have neatorama.com in listed in their blogrolls. Showing the blogrolling.com url is a perfectly fine result for a search [inurl:neatorama.com], because the blogrolling.com result does include “neatorama.com” in the url.
Q: So what’s a clear-cut case when I can be sure that my content is being hijacked or that there’s an issue with Google?
A: If you owned neatorama.com and did the search [site:neatorama.com] and saw results that were not from neatorama.com, that’s something that we’d be very interested in hearing about.
Q: If I do a Bigdaddy report and mention [inurl:mydomain.com], is that a valid report of hijacking?
A: As we showed in the example above, inurl:mydomain.com just searches for “mydomain” and “com” showing up adjacent to each other in a url–there’s no requirement that the url has to be on mydomain.com. So if all you do is say “inurl:mydomain.com returns results that aren’t from my domain!!! I’ve been hijacked!!!!!” then you will not get the best response from our engineers. We’ve seen this misconception enough that I’m telling you about it in advance so that you can avoid giving feedback like that.
Q: What if there is another url and it does a 302 to my domain?
A: First off, I wouldn’t worry if the url is uncrawled (just a url showing with no snippet or description). That means that we saw a reference to a url, but didn’t crawl it. If/when we crawl that url reference, we would see the 302 and would handle it just fine.
Q: So put all this together for me. How would you report a potential 302 problem that I spotted using inurl: so that it gets the proper attention?
A: Here’s how I’d say it.
Hi, I did a search [inurl:mydomain.com -site:mydomain.com] and I noticed a bad result. The #3 result is www.weirdresult.com/redirect2.php?url=www.mydomain.com. When I visit the weirdresult.com result, it does a 302 redirect to mydomain.com. I know the weirdresult.com url has been crawled, because I see a snippet for the result. And when I view the cached page on weirdresult.com, I see the content of my home page. My home page does appear when I do the search [site:mydomain.com], but I don’t think it helps quality to have this weirdresult.com result in your index. Can you check this out? Thanks.
A subtle point to note in the report above: the more specific you can be (e.g. avoid pronouns or using “url” without specifying which url you mean), the more it will help. Just saying “this url” or “that url” can be ambiguous. You will be quite familiar with your situation, but the person reading your feedback won’t. The note above is also polite. It may make you feel better to use words like “hijacking” or to sprinkle exclamation points liberally over your feedback, but I’d recommend doing reports in a normal tone, not a “I’m three seconds away from sending you death threats” tone. Just a tip on getting maximum bang-for-your-buck with your feedback report.
Q: Are there other ways to report a potential canonicalization/dupe issue?
A: Sure. Here’s another useful report that I’ve seen recently (sanitized to protect the reporter):
Hi, I did a search for [firstname lastname] and noticed a weird result at #4. The result is www.weirdresult.com, and if you view the cached page it’s just a “this domain is parked” page. I think you might have confused weirdresult.com and mypersonalsite.com, because I checked and weirdresult.com and mypersonalsite.com are both on the same webhost and the same IP address. I don’t know if it’s an issue with Google or with my webhost, but I’m pretty sure that somehow these two domains got mixed up. weirdresult.com has nothing to do with me and doesn’t have any matches for my name as far as I can tell.
That’s a perfectly fine report too.
Just to repeat, if you send in bigdaddy feedback and it just says “I used inurl: and I saw results that weren’t from mydomain.com” you probably won’t get much attention, because that’s how inurl: is supposed to work. At a minimum, you’ll want to verify (and then mention in your feedback) that www.weirdresult.com/redirect2.php?url=www.mydomain.com 1) was crawled because you see a snippet, 2) does do a redirect to your site, and (maybe) 3) shows content from your site. An engineer can do better investigation on your feedback if you give clear, specific, unambiguous info in your report.