Better click tracking with Auto-tagging

Okay, I’m curious about something. When Google wrote a 17 page white paper about flaws in click fraud studies, how many people here read it from start to finish? If you didn’t get a chance to read it back then, you’re in luck. Shuman Ghosemajumder, a product manager at Google, summarizes the high-order bits in two posts, here and here. The two paragraphs that stood out to me were:

Here’s the problem: web logs, whether generated by an advertisers, or by third-party code on an advertiser’s site, cannot directly track ad clicks. Instead, they track visits to a special landing page URL on the advertiser’s site (e.g. http://example.com/?adwords ) as a proxy for how many ad clicks occurred. The assumption they’re relying upon is that each visit to that URL corresponds to a unique click, and vice versa. But in practice this is not the case. Once a user visits that page, they often browse through the site, navigating through sub pages, and then return to the original landing page by hitting the back button. When the landing page is reloaded in the browser, it appears in the web log as though additional ad “clicks” are occurring. Google can count ad clicks reliably as a click on a Google ad will cause the web browser to contact Google and then we redirect it to the advertiser’s landing page. A reload of the advertiser’s landing does not contact Google again. In addition, the referrer URL which is passed by the browser when users hit the back button is actually the original referrer URL (which says the page came from an ad click) which gets cached, so there is no analysis which can be done based on logs alone which can resolve this. This is where the fictitious clicks come from. ….

So is there a solution to this? Yes. Third-party analytics (not click fraud) firms have been aware of the page reload issue for many years, and generally use redirects (rather than web log based tracking) to avoid it. If one is tied to using web site logs (or landing page code generating logs) however, the only solution is to use the AdWords auto-tagging feature. Auto-tagging has been available since 2005, and is a feature which appends a unique ID to the landing page URL for every click, so that the cases of (a) multiple clicks and (b) multiple reloads of the landing page can be easily distinguished.

I think Shuman did a really good job summarizing that logs alone can’t be accurate. To help me visualize it, I tried to draw a picture:

Path of clicks with autotagging turned on

In my diagram, a user does the following
A) clicks on a Google ad and arrives at an advertiser’s landing page
B) hits the reload button
C) navigates to a different page
D) hits the back button

Please pardon my utter lack of artistic skills. 🙂 If I’m reading Shuman’s post correctly, events A (the click on an ad), B (reloading the page), and D (hitting the back button) can show up in logs as accesses to the landing page. Because in the logs those accesses look like ad clicks, it might look like one IP address is clicking an ad three times.

So how can you tell real ad clicks from reloads/back-button events? Use Auto-tagging, which is a feature that Google has offered since 2005 and that I don’t think any other major search engine offers. What does auto-tagging do? Every ad click from Google gets tagged with a unique id. So if your landing page was “example.com/widgets.html” and you turned on Auto-tagging, an ad click to that page would look like “example.com/widgets.html?gclid=COasyKJXyYECFRlvMAodRFXJ”

Want to know how many unique ad-clicks were delivered to your site by Google? Just count the unique gclid parameters. And if I see the unique id “COasyKJXyYECFRlvMAodRFXJ” show up three times in my log, I know that Google charges me at most once for that unique id (they mention that in the 17 page white paper). I hope Shuman’s post or the diagram above makes it clear that just counting accesses to your ad landing pages in your logs will never give an accurate ad-click count. For example, studies in the 1990s found that the back button accounted for 30-40% of all navigation events. If you turn on Autotagging (which is enabled by default when you link your AdWords account with Google Analytics, or you can turn it on without signing up for Analytics), then you don’t need to worry about reloads or the back button (or opening new windows in IE).

I’m happy to add the disclaimer that I work on webspam in the search quality group, so I’m not an expert on pay-per-click advertising or invalid clicks. If I’ve said anything incorrect in this post, let me know and I’ll happily correct it. But if you’re using AdWords, I would definitely recommend turning on Auto-tagging.

By the way, if this post was at all interesting, I’d recommend checking out that white paper (pdf link). This time start on page 12 instead of page 1. 🙂

Update: A good post over at the AdWords blog provides actionable information about exactly how to report suspicious traffic, as well as some answers to common questions/concerns.

53 Responses to Better click tracking with Auto-tagging (Leave a comment)

  1. As I have pointed out before, there are serious flaws with Google’s defense and methodology on this issue. Until those flaws are addressed, simply reiterating the same points every time a new challenge to Google’s credibility on click fraud detection is posted somewhere on the Web (and I am aware of the new report that was just published) doesn’t do anything to help Google’s cause.

    Your organization may know more about what people do to manipulate clicks than is worthwhile to reveal, but what it has revealed so far leaves those of us who have been around for many years with the very firm, clear, and undispellable impression that Google is either hiding a lot of information or is unfortunately outgunned and way behind the front-runners in this race.

    Either way, rehashing highly disputed points isn’t helping the situation. That is why people continue to doubt Google’s claims on this matter.

  2. You’ll have to excuse my ignorance on this topic, but I’d like to come at this question from the opposite point of view of the person who puts AdSense ads on their site.

    Using your scenarios, are my Page Views being bumped up with page reloads, back buttons, etc.? Obviously, this affects my CTR, which affects earnings.

    Personally, when people click on links I open up new windows to “avoid” those problems.

  3. Matt – thanks for your post. You interpreted my blog post correctly.

    Michael – thanks for your feedback. I’d be happy to answer whatever questions you have about these points. You mention that they are “highly disputed”, but I don’t think that’s accurate. As far as I know none of the data in our report has to-date been directly challenged. You might get the impression that it’s disputed from the public denials that some click fraud firms have made that this is just our opinion, but as I said they haven’t specifically challenged any of the data points. The click counting issues that Matt notes above are simply the way that Internet Explorer, Firefox, and other browsers work, and some firms are choosing to ignore that. The fact that they are not taking into account the clicks we protect advertisers against in their estimates is also something they admit.

    Please do let me know what you specifically think is a disputed point (here, on my blog, or via email), and I’d be happy to answer. Thanks again for your feedback.

  4. Vincent, the effects of reloads on CTR doesn’t really affect your earnings, just the stats that you see about them. All that will happen (assuming Adsense suffers the same issues) is that your page views are artificially inflated and your CTR reduced – the earnings stay the same.

    Say you’ve got a one page site, 100 people come to it and 10 of them click on an ad – 100 views, 10 clicks, 10% CTR. Now suppose all 100 reloaded your page during their visit for some strange reason, now it’s 200 views, 10 clicks, 5% CTR. Your CTR has halved – but your earnings are the same because clicks are the same.

    Excessive openings-in-new-windows is going to irritate the heck out of people, drive them away, and reduce the important number – clicks – in pursuit of relatively unimportant ones – pageviews & CTR.

  5. Thanks Matt & Shuman, that certainly explains the reasoning.

    However, the issue I’ve always had with this type of tracking is that the pages often get indexed by spiders.

    Have a look: http://www.google.com/search?hl=en&q=allinurl%3Agclid

    Even your own pages …. e.g
    services.google.com/toolbar/firefox?gclid=CKXwr5OWwIYCFUpyLAodmB3-NA

    So two questions:

    1. what happens when someone clicks on a link in the Google Algorithmic SERPs where it has a ?gclid= code appended to the URL? Doesn’t this cause confusion in stats (Algorthicmic V Paid)

    2. Does this cause duplicate content issues (tagged page V untagged page)?

    Thanks for taking the time to explain.

  6. Guys – Schuman makes many valid points in his paper and posts and anyone involved with search and ecommerce knows that no tracking method / technology be it automated tagging like Google’s, click stream analysis technologies (ISAPI filters), script or no script page tagging or log file analysis is ever fully accurate.

    The point I’d make is that whilst Google’s defends itself from criticism by examining the maths of 3rd party analytics and drawing attention to some flaws – this is fine.

    Is this really distorting the click fraud (invalid clicks as Schuman would undoubtedly say) to such an extent that it is a non-issue.

    The answer to this comes not from Google or any other search engine, but from advertisers, who can monitor intensely conversion and customer behaviour and put that into context of what is normal for their business and sector and the answer is a resounding no.

    Click fraud still goes on, it can be significant in highly competitive and expensive markets e.g. European financial services and rather than divert attention onto a side issue of like “did Company X get its maths right”, I’d personally like to hear bold, clear and open statements from the search engines on their view of the scale of the problem in various markets, their ideas / actions to combat it (without giving away IP – but erring on the side of too much information than too little).

    I’d also like to see, fast and postitve collaboration between search engines, analytics firms, major advertisers, marketing agencies and industry bodies like SEMPO, IAB.

    I don’t want to see a horse designed by a committee come up with a camel, but rather the major player Google take the lead, take fraud even more seriously and rapidly raise confidence amongst advertisers.

  7. from more technical view – yes, reload is a new visit to your page, but hitting back on MOST of the browsers will load same page from browser’s cache. if a user sits behind caching web proxy even reloading the page will not count as additional visit since caching proxy will server the same page from it’s own cache.

    in some situations it may be different, of course, but pseudo-standard is defined above.

    that’s my $0.02

  8. My company is in the UK and spent around GBP3500 per year on adwords advertising.

    We stopped around 1 year ago because we thought something fishy was going on. We had not heard about click fraud, but something just didn’t seem right. I’m afraid we did not report it to Google, as we didn’t have any evidence or confidence they would listen. I realise this was not fair on Google.

    We never used analysis programs, and always looked at the raw logs from the web server.

    What we though was happening was that a competitor was ‘using’ our daily budget in the morning, because patterns started to emerge. We get a click from a user on a big ISP that varies it’s IP address, usually sometimes between 9:00am and 11:00am. The click would come in to the home page, access one page at random and then disappear.

    We’d get another one later on in the morning from another network that varies it’s IP address later on. Because we had a reasonable daily limit (remember this cost £3000 pa) we’d effectively disappear from adwords.

    About 3 IPS are involved, always ones that has a different IP. The browser versions were always one of two versions of IE.

    Now I’ve written it, it seems like shallow evidence but I remember having the logs in front of me and thinking ‘what is going on?’

  9. If google really wanted to eliminate click fraud, they would change the entire adwords model. I have a local friend whom spends quite a bit of time placing adwords, and they are always shady competitors which ‘hitbot’ his ads in an attempt to cost him so much money, he pulls his ads. I quit using adwords long ago because of all the shady compititon, and the dirty tricks they use.

    I’d rather see google charge per day for a particular keyword and spot. Ie, for the keyword ‘acme’ the #1 spot is worth $120 a day. Unlimited clicks. It would make hitbotting someone elses ads to make them quit the campaign a thing of the past.

  10. I was trying to understand the concept of “Click Fraud”. I read some articles and blogs, failry understood but could not clearly. After reading your aricles I got the concept clear. Thanks for posting this detailed blog with picture help.

  11. Respectfully, I believe that this comment is not completely accurate:

    “Want to know how many unique ad-clicks were delivered to your site by Google? Just count the unique gclid parameters.”

    Reason: Google includes some links with the gclid querystring parameter in its index of non-paid search results.

    So when you simply count some of the instances of the ‘gclid’ parameter hits in your log files, some of these hits could be from Google or other search engines.

    To see what I mean, perform a Google query like this:
    allinurl:gclid=

  12. Everything helps, but one can not wonder if the very simple solutions are still some of the best…..

    Openning up a new window in IE can be done by using just one very simple piece of code on the landing page

    The other option is to use JavaScript links

    A sophisticated tracker should allow the user to customize the Time period INTERVAL between visits to have it considered a seperate visit – tracking the IP, or cookie of session ID

    BTW:
    Also if some are saving the landing page to their favorites – additional safeguards should be taken

  13. We used to have similar problems with duplicate clicks caused by “back” button in Nokaut.pl (it is a price comparison website quite popular in Poland).

    Of course we haven’t charged our customers for them (we have some additional filters which are using IP addresses, cookies, etc), but because of those clicks, our click stats were not exact enough and had unnecessary records we had to process and verify.

    Now we are using JavaScript document.location.replace() method to redirect our visitors from our click count page to final customer page and it works without problems.

    As an example, visit http://www.nokaut.pl/Click/Offer/N3IhKruYzV07leDgP7diJyJlqVmEwCRP8FbCHYn$Pvwa*25bLNJpkwYPXruncgLs_title and then try to click Back button.

  14. From an AdWords advertiser’s point of view, click fraud is not the real problem. Distribution fraud is more of a threat. Google’s practice of distributing search engine ads on parked domains is worse than click fraud. I think Shuman’s got click fraud under control. Now, who at Google is going to solve the distribution fraud crisis?

  15. What about when your AdWords account reports more clicks than your server log files report? 15 clicks charged for in January yet only 9 unique occurences of the gclid in the raw logs.

    Am I missing counting somewhere, or can the gclid be hidden by the user (for example if they turn their referrer reporting off in their browser).

  16. Hi Matt,

    The problem I’ve seen with Autotagging in the past is when it is used in conjunction with Google Analytics.

    Ever since the introduction of Adwords Analysis in Google Analytics, I’ve noticed that if you have the autotagging feature enabled it displays the keyword you are bidding on in Adwords and not always what people have actually typed in to get to your site.

    This can have disasterous effects on the relevancy of your advertising (and ROI) when dealing with broad and phase match keywords depending on your industry. You also can miss out on finding some of the highly effective long tail keywords that are out there.

    The only way I have found to get around this is:
    1. Turn off autotagging
    2. Always use exact match (not always a good option)
    3. Use another analytics program

    Has anyone else noticed this problem?

  17. While the autotagging code will seemingly solve the problems in tracking clickfraud, here’s a tangential problem that we’re wrestling with:

    We tag all of our AdWords urls with a keyword/source identifier. When the click arrives at our site, we use a 301 redirect to canonicalize the URL and store the keywordID/source in a cookie for matching revenue and cost analysis.

    Adsbot crawls these URLs for quality purposes. We *think* that a feature of the BigDaddy infrastructure is that Adsbot will share its crawl data with the index, thus it is advantageous that all forms of google bots see the canonical form of the URL.

    So here’s a new wrinkle- Firefox will cache URLs in which it gets a 301 redirect and use that cache instead of getting the URL from the server the next time the non-canonical URL is requested. This doesn’t happen in large numbers, yet.

    So any thoughts on what we should do? 301 redirects to canonical URLs seem like the right thing to do. We could use a 302 on Adwords redirects but worry that Adsbot will be confused at why some forms of the canonical URL get a 301 and some get 302s. Or we could enable autotagging simply to create a unique URL everytime (even though we wouldn’t really use the autotagging data).

  18. I would hope by now that most of us would know that logs are not accurate, but I’m kind of dubious that anything really is. We’re attempting to track browser behaviour, not actual people, after all.
    I have one client who checks his adwords campaign most days by clicking on a particular term ‘to make sure it’s all working’ (yes, he knows it costs him!).

    I strongly suspect that there are regular customers, competitors, employees and contractors at many companies that use Google Adwords as a navigation tool. They found the website the first time by googling for something, and next time they want it, they google the same thing, and click the ad again. They may even do it at the same time most days.

    Looking at some of my logs, there do seem to be people using the natural listings that way. In fact, I use Google that way myself sometimes. Don’t see why ad clicks would be any different? Plus there is the whole thing that any concept of ‘session’ is really a vague estimate. Did my session end and did another person make another session later? Or did the cat barf on my rug, and it took me a while to clean it up?

    Rather than fraudulent clicks, I’ve got mysterious sales. (Woe is me. Not!) The sales that are coming from one of my Adwords campaign are notably more than the adwords conversion tracking records. If I turn adwords off, the sales go away and the client screams at me: they have to be coming in from Adwords somehow. But the conversion tracker only reports a handful of them. I’m assuming this is down to the ‘cat on rug’ session interruption factor.

  19. I have a tangentally related question that’s more curiosity than anything: what happens if an ad is clicked on and the site either down or unreachable (i.e. “Cannot find server or DNS error”)? Does that still count as a click to the advertiser’s site or not?

    I’ve noticed this from time to time when I click on an ad (ON SITES BELONGING TO OTHERS before anyone gets excited),, so I was just wondering from both the standpoint of an advertiser and a publisher…does the advertiser get charged, and does the publisher get paid?

    I also would like to second Richard Ball’s opinion. The “Adsense arbitrage” game is starting to get a little bit out of hand, with the number of pages and sites popping up with low-content, high-ad setups.

    If you want a prime example, I keep seeing Google ads for AllTheServices.com on a number of sites. I don’t know what they’re paying per click, but they’re probably getting it back and then some. (I only used one example, but there are millions.)

    I don’t know who they are…I don’t know anything about them. I just keep seeing their ads, that’s all.

    Then again…I can also respect that Big G’s hands are somewhat tied in this regard. It would take a cooperative effort from all PPC ad networks to eliminate sites like AllTheServices.com, and that’s probably not going to happen any time soon. If Big G steps up and doesn’t allow this site to advertise, Yahoo! might. If they don’t, IntelliTxt might. And if they don’t…etc. and so on.

    So I can see how it would be a tricky situation…you don’t want to turn down green in the blue (the blue being Sergey and Larry’s pants pockets.) But sooner or later (hopefully sooner rather than later), you guys will have to.

    Anyway, just wanted to speak my piece. Thanks.

  20. totally unrealted .. where did news.google.com go this morning? I was gettting the 502 error page .. and intermitent slowness / lack of images in many cases.

  21. Shuman Ghosemajumder Said: “You mention that they are “highly disputed”, but I don’t think that’s accurate.”

    It’s completely accurate, since I am one of the people who directly challenged Google’s points last year.

    “You might get the impression that it’s disputed from the public denials that some click fraud firms have made that this is just our opinion…”

    You are not addressing pertinent facts that have been repeatedly raised regarding Google’s naive approach (at least in your public documents) concerning the analysis of click-manipulation.

    The technology was well developed before Google even existed, as I have repeatedly pointed out before. People were using it to manipulate click-through rates for banner ads, Web polls, hit counters, and other click-counting services as far back as 1996. As a directory operator myself, I had to disable click-counting in 1998 because it was being manipulated by people.

    The more sophisticated operations use networks of servers scattered across multiple NOCs, employing software that spoofs user agents, identifies itself with multiple IP addresss across a wide variety of C-Blocks, and randomizing routines that are intended to simulate users clicking through links and spending anywhere from 3 seconds to several minutes on the pages.

    The technology was employed on the commercial side for the intentional manipulation of DirectHit results, Goto.com paid ads, affiliate programs (such as those operated by Amazon, Commission Junction, ClickBank, etc.) and large banner networks.

    Anything where someone felt they could gain an advantage, make some money, or deprive a competitive of an advantage or the ability to earn money has been targeted by click manipulators.

    To date, Google demonstrated absolutely no knowledge of these systems, no ability to detect them, and no response plan for managing the potential for exorbitant fraud. Click manipulation is not defined by how browsers handle specific links. It’s defined by how determined people are (and historically have been) to gain an advantage over anyone and everyone.

    Eithet you’re aware of these technologies, how they work, and what they are capable of and are not discussing them publicly in order not to tip your hand, or you are not aware of them. Either way, you’re not discussing well-established, well-known technologies that even scrip-kiddies have had the ability to use and exploit for years.

    That is a serious reason for major concern. People need to know that Google is taking steps to neutralize or at least minimize the potential harm that can be inflicted upon a lot of different sectors in several ways. The first step is to recognize the magnitude of the technology.

    You really need to look beyond what is happening with the browsers. It’s not the browsers that are the problem.

    I cannot get more explicit than I have been here and elsewhere.

  22. I’m a search engineer at an SEO/PPC online advertising company. We put through millions in advertising at google and yahoo every year.

    The case where competitors are clicking on your ads to drive costs up puts no kind of dent in our budgets… it’s almost stupid to bother. The belief that competitors of such small businesses are having it done to them.. I’m sorry I just don’t see it, the why I mean.

    We run three seperate tracking codes on every campaign, by keyword (including adwords) and they NEVER agree. There are always going to be factors that we’ll miss out and changing the adwords model isn’t going to help unless everyone else changes their models to match.

  23. Thank you for the information. I like the visual representation for us spatial learners. Not to appear as an armchair quarterback, but I wonder why Google did not foresee the problem with the misrepresentation of information from the beginning?

  24. Matt,

    Are you going to post my very detailed post that I made earlier? Its a good example of what adwords customers see

  25. ===
    Google’s practice of distributing search engine ads on parked domains is worse than click fraud.
    ===

    I completely agree. When is Google going to do the right thing and separate out Parked Domains from the rest of the Search Partners network. I get more fraudulent and garbage clicks from there then anywhere else. I had a recent campaign where over 15% of my clicks came from ONE parked domain. I finally blocked them, but now I have to fight it out to get them removed.

  26. @Eimantas,
    you’re correct if you’re thinking in terms of loganalysis, but page tagging will record uses of the back button, regardless of cache. Tagging has other issues, but this isn’t the place.

  27. Michael Martinez – Thanks for your comments. I’ve responded to your questions here, on Danny’s post, and on the comment you left on my blog here:

    http://shumans.com/articles/000049.php

    Thanks again for your feedback. Again, as I said in my reply, the click fraud attempt methods you describe above are well known to us, and our click quality team deals with even more sophisticated types of attacks on a regular basis.

  28. wow, I am just glad I don’t have to worry about this, being the designer and all.

  29. Shuman, thanks for stopping by over here. You’re the expert on this; I just really wanted to draw a picture to communicate my understanding of the back-button/reload/new-window issue.

  30. In response to your statement:

    “Here’s the problem: web logs, whether generated by an advertisers, or by third-party code on an advertiser’s site, cannot directly track ad clicks.”

    I can easily counter with:

    Web logs, when generated by a search engine or ad network, cannot reliably capture the intent of clicks.

    I tend to agree with Michael Martinez’ points, and think this is where the discussion needs to be focused. However, whenever anyone starts such discussion, they are met by silence (except for assurances that Google is working on the problem and confident that they’re doing a good job).

    I would like to see a blue-chip panel of Internet technical experts formed who are willing to discuss the click fraud problem in terms of what can actually be done to detect and/or eliminate it from a fundamental architectural perspective. For example, people such as Bruce Schneier, Lauren Weinstein, and Ben Edelman, who’ve studied the problem. Even Vint Cerf (of Google) has pointed out that perhaps 25% of all PCs can be used in botnets today. Botnets are known to generate many types of fraudulent traffic, including fraudulent clicks.

  31. CPCcurmudgeon (or Greg 🙂 ),

    I would actually love it if the discussion on click fraud was focused on the technical issues. I speak about the technical issues as often as I can (on panels, with advertisers, etc.), and am looking forward to sharing more from our team and on my blog.

    The problem is that the public discussion has been focused on “click fraud estimates” which have very little basis in the those technical details, so we have to spend too much of our time debunking such claims for mass audiences.

    For technical folks like yourself, that’s obviously not very interesting, and we want to see serious public inquiry and research into this too. This is why we’ve reached out to various academics and research groups who have expressed an interest in studying this. Surprisingly, there aren’t that many folks that are actively pursuing this – we’d like to encourage there to be more.

  32. Man, I bet Matt’s regretting opening up THIS can of worms.

    Stupid idea/question/possible partial solution: is there enough of an advertiser base to rotate ads once clicked or on a timer throughout the pages of a site?

    For example, this scenario:

    1) A 336 x 280 ad block on a.com contains 4 ads.

    2) Ad #3 is clicked on for b.com .

    3) User hits back button to return to a.com .

    4) Ad block reloads itself every 60 seconds or whatever time interval is set, and Ad #3 is replaced by some other site.

    5) Ad #3 can’t appear on a.com for X amount of time or until all of the contextually appropriate advertisers are cycled through, whichever comes first.

    It wouldn’t kill click fraud outright, but it would at least minimize some of the damage caused by competitor clickthroughs (since Ad 3 can only be clicked once per time period). If they can only click once, then it defeats a lot of the purpose for them.

    Just a random thought. Nothing more than that. It’s not intended to be a universal solution to all of the clickthrough ills in society or anything.

  33. Tom Churm, you said,

    Respectfully, I believe that this comment is not completely accurate:

    “Want to know how many unique ad-clicks were delivered to your site by Google? Just count the unique gclid parameters.”

    Reason: Google includes some links with the gclid querystring parameter in its index of non-paid search results.

    So when you simply count some of the instances of the ‘gclid’ parameter hits in your log files, some of these hits could be from Google or other search engines.

    To see what I mean, perform a Google query like this:
    allinurl:gclid=

    But I believe my original statement (“Want to know how many unique ad-clicks were delivered to your site by Google? Just count the unique gclid parameters.”) is still true. For example, suppose Google delivers 10 ad-clicks. You would see 10 unique gclid’s show up, so you know that Google charged you for at most 10 ad clicks.

    The fact that other search engines might then index those pages doesn’t change the fact that Google will only charge you at most 10 times. So you might see 20-30 instances of accesses to those gclid= urls, but you would still only see 10 unique ids, and you’d know that you’d only be charged at most once for each unique id.

    So other search engines crawling/indexing these pages don’t result in you being charged any more, and by collecting unique IDs, these accesses wouldn’t do any harm. However, if you’d like your logs to be as tidy as possible, here’s a couple things you could do:
    1. Make a directory like /landingpage/ and exclude it in robots.txt. This works for all major search engines. You could still let the individual urls in /landingpage/ redirect to your final page if you wanted.

    2. Use this robots.txt:
    User-agent: *
    Disallow: *gclid=*
    I tested this with Google’s free robots.txt checker, read more at
    http://sitemaps.blogspot.com/2006/02/more-stats-and-analysis-of-robotstxt.html
    and I know that Yahoo supports wildcards as well.

    So to sum up, I don’t think engines crawling gclid= urls cause any harm, because a) Google doesn’t charge you again and b) these urls would have duplicate ids, so counting unique gclid= parameters should still work. *However*, if you want to keep things tidy, there are still easy ways to prevent crawling of gclid= urls.

    Tom Churm, does that make sense? Another way to say it is to do your allinurl:gclid query and get a url like “www.examplesuperstore.com/?gclid=CK_C7bvOi4oCFSWzgAodpT1xtw” from Google’s search results. Now, I could click on that url or reload it all day long, but each time the unique ID would be “CK_C7bvOi4oCFSWzgAodpT1xtw”. So examplesuperstore.com would know that they got charged at most once for that ID, and they could happily ignore every other instance of that ID in their logs.

  34. Click Fraud, although possible to minimize, will never be out all together. Advertisers must look at click fraud as part of their budget. With ROI measurements vastly more detailed than traditional media, click fraud is a lot less of an impact than that of major in-effecnies in un-trackable offline media.

    Click Fraud is here here to stay, plan your budget and ROI accordingly.

  35. Most of the research I’ve seen on click fraud detection proposes new advertising models, such as the paper Microsoft Research published on displaying ads with a certain probability. The general feeling is that there is an inherent vulnerability in PPC, given the current Internet architecture. (Which is what I’ve been saying all along.)

    Ironically, to get discussion of these topics out into the open, there have to be flawed click fraud studies released, or lawsuits. There isn’t general discourse on the inherent vulnerability of PPC originated from its providers.

  36. I don’t want to seem like I’m copping out but, after reading the summaries, would you still recommend reading the full white paper?

    Thanks,
    Rezbi

  37. I think a better check before anybody can put up AdSense ads on their sites and also check sites where ads are placed on after being accepted into the AdSense program would help a lot against click fraud.
    I don’t think that real sites with seriuos people have a lot of click fraud but I can understand if spam sites do get a lot of fraudulent clicks.
    AdSense ads should be linked to the web site they have been setup with and if the code and url or something smarter check doesn’t fit, don’t show any ads!
    Another thing about AdSense and maybe OT but could we get paid in stock instead 🙂

  38. Hi Matt, I pretty much agree, except for this:
    “b) these urls would have duplicate ids, so counting unique gclid= parameters should still work.”.

    That’s assuming the gclid’s are real, valid gclids generated by Google.

    Let’s say lots of links to your website including dummy ‘gclid=’ querystring parameters were added to an organic search index, or to some webpage somewhere that got clicked on a lot. (Not too likely, hopefully, but possible.)

    Hits to your site including the gclid parameter in no way guarantee that these are ‘real’ gclid’s…

    If these are not valid glids, no problem regarding your Google costs, because you’re not getting charged for them.

    But if you’re scanning your logs to doublecheck how many clicks you’re getting from Google ads, it could make it difficult to get accurate results.

  39. Vincent on the subject of CTR, I Google definitely knows how many ad clicks happened. I believe using the back-button doesn’t cause an explicit page reload to Google (on our search pages or on our AdSense ads) so the impressions don’t get inflated this way either, so I believe Google’s CTR is accurate and not affected by page views or reloads.

  40. Hey Matt,

    You *might* want to look into why Googlebot indexes pages with the auto-tagging variables, and dumps the original page as duplicate content. That’s why one very large client is turning auto-tagging off, even though they would *love* to use it.

  41. Great post. The gclid definitely adds value. I also didn’t know that you can add wildcard characters to the robots.txt file to exclude the gclid. I’m definitely going to try this out.

  42. Matt writes: “Here’s the problem: web logs, whether generated by an advertisers, or by third-party code on an advertiser’s site, cannot directly track ad clicks.”

    My question: Is there a (good) reason that Google doesn’t let users count clicks directly, in their weblogs? This would be easy to do. Every time a user clicks on an ad, in addition to Google redirecting that user to the landing page, Google itself could send an http “ping” to a specified target page on the advertiser’s site, used only for that purpose. Since these pings would come only from Google, and could contain some authentication code (if the IP source isn’t good enough), they are definitive evidence for the advertiser of exactly how many clicks occurred.

    It seems to me that it’s in Google’s interest to give its advertisers accurate information about clicks. Is there a (good) reason to retain control of this information only within Google?

  43. I can think of several reasons why this is a bad idea:

    * Google will not have the same connectivity to the target site as the
    person clicking on the ad, generally speaking. There is no
    guarantee that there will be a one-to-one correspondence between
    user clicks and Google pings. It will just be one more data point
    that the advertisers struggle to reconcile with everything else
    (their own reporting, third-party reporting, Google Analytics, etc.).
    * If there are implementation errors on the Google end, or it’s gamed
    in some way, it won’t be of any use to the advertiser.
    * Even if the clicks somehow did correspond, it doesn’t tell the
    advertiser what s/he really needed to know. Was the click
    fraudulent? Would s/he have been willing to pay for it if it were?

    I can make the argument that Google could provide advertisers what
    they’ve been asking for fair easier with much higher accuracy: a
    detailed report of clicks by IP (or possibly IP block by name),
    referrer, user agent, and amount charged, such as what is provided to
    people who subscribe to phone services with metered billing. While it
    wouldn’t address the third of my points, it would at least put Google
    on the level of other service providers in terms of transparency.

    Yet, I must reiterate. Why do we insist on these bandaids, when the
    core problem is that PPC is particularly vulnerable to click fraud?
    Why not address the core problem? In just about every other
    (computer) industry, when faced with a core vulnerability, the
    decision was made to move to a less vulnerable model. But the top
    search engines insist that the problem is under control, when clearly
    it is not.

  44. hi,

    well as far as Google AdWords Ads are concerns that auto-tagging opt is just great but how to measure shopping engines and price comparisons shopping engines clicks to track out as that industry is also causing the merchants fraudulent clicks issues.

  45. It’s good to know auto tagging since I am going to start again with adwords. However, I find the appendix to the URL very ugly…

  46. thanks for the tip i never really even thought of tagging the adwords campaigns like this, though i was trying to figure a way out to track the incoming traffic better myself, this seems to solve that issue..

    thanks again for the tip

  47. I would like to add my voice to requesting access to IP addresses of the clicks. Although my experience with Googles handling of click fraud has been positive, I would feel better if I had access to the same data that Google uses for its fraud research.

    Watching for spikes in click thru rates and collapsing cost per conversion have both been the best indicators for click fraud. In all cases, Google has acted quickly and usually in agreement with our assessment.

  48. Page reloads can be occur due to various reasons, including:

    1) user browses more deeply into the advertiser’s site, then hits back button.
    2) user presses browser reload button on the landing page.
    3) user opens a new window in Internet Explorer, causing a reload of the landing page.

    My question – Is these fictitious clicks due to detection of above mentioned reasons will effect on CTR?

  49. Great Post Matt,

    Basically I found it disappointing that Google did not continue the ability to use the auto tagging for other traffic sources than adwords, but after google analytics became free I am now mollified.

    Most true 3rd party apps provide this type of quick redirection, but I have also had good luck with some (cough) other web analytics packages that let you basically make your own auto tagging on the fly. Its really nice to just create a new marketing campaign by simply making a link, and the software automatically parses that and it shows up as a separate campaign in the reports automatically.

    Call me stupid, but I consider this efficient, but then again I think everyone should do things from the command line…….

    Steve Blom
    Yada Yada Marketing
    a marketing firm based in Tampa Florida

  50. Excellent explanation. This has been sending me in circles trying to figure out why redirects weren’t working. If I read the full .pdf I think my head would explode.

  51. Hi Matt,

    Very good post indeed. However some of the links in the post don’t work so you may want to update them in case someone like me comes 3 years later 🙂

    Cheers
    Diyan

  52. I don’t know if you ever see the Google GACP Program forum, but I posted this question for you re this subject:

    [Matt Cutts Respond (Please)] – why does Googlebot index ?gclid URLs?

    A client is seeing this and I thought it was a really important question and was surprised that Googlebot didn’t just ignore these URLs? There’s about 2m in the index at the moment: allinurl:{?gclid}. (OK they are not all true gclids).

    What are the consequences of someone following these? Following an indexed gclid URL, the utmz is set to:

    1.1301435919.1.1.utmgclid=CIKMwr_IkaMCFRUaewodFlAdqg|utmccn=(not%20set)|utmcmd=(not%20set)|utmctr=allinurl:{?gclid=}

    That’s a mess I think.

  53. These referrer URLs and gclid params are very interesting. Is there a way, whereby I could actually fake a conversion. This is pretty useful if most of your sales convert offline. e.g. customer searches google, visits web site, enquiries, emails go back and forth, eventually they call or email in, and purchase. Sadly at present, that wouldn’t show as a conversion, and I would think my adwords campaign was under performing. However, if I trapped some of the referrer URLs, or glicd’s when the customer first visited (or submitted an enquiry form), could I embed them in perhaps a receipt page, that I email a link to, when they finally convert?

css.php