Okay, I’m curious about something. When Google wrote a 17 page white paper about flaws in click fraud studies, how many people here read it from start to finish? If you didn’t get a chance to read it back then, you’re in luck. Shuman Ghosemajumder, a product manager at Google, summarizes the high-order bits in two posts, here and here. The two paragraphs that stood out to me were:
Here’s the problem: web logs, whether generated by an advertisers, or by third-party code on an advertiser’s site, cannot directly track ad clicks. Instead, they track visits to a special landing page URL on the advertiser’s site (e.g. http://example.com/?adwords ) as a proxy for how many ad clicks occurred. The assumption they’re relying upon is that each visit to that URL corresponds to a unique click, and vice versa. But in practice this is not the case. Once a user visits that page, they often browse through the site, navigating through sub pages, and then return to the original landing page by hitting the back button. When the landing page is reloaded in the browser, it appears in the web log as though additional ad “clicks” are occurring. Google can count ad clicks reliably as a click on a Google ad will cause the web browser to contact Google and then we redirect it to the advertiser’s landing page. A reload of the advertiser’s landing does not contact Google again. In addition, the referrer URL which is passed by the browser when users hit the back button is actually the original referrer URL (which says the page came from an ad click) which gets cached, so there is no analysis which can be done based on logs alone which can resolve this. This is where the fictitious clicks come from. ….
So is there a solution to this? Yes. Third-party analytics (not click fraud) firms have been aware of the page reload issue for many years, and generally use redirects (rather than web log based tracking) to avoid it. If one is tied to using web site logs (or landing page code generating logs) however, the only solution is to use the AdWords auto-tagging feature. Auto-tagging has been available since 2005, and is a feature which appends a unique ID to the landing page URL for every click, so that the cases of (a) multiple clicks and (b) multiple reloads of the landing page can be easily distinguished.
I think Shuman did a really good job summarizing that logs alone can’t be accurate. To help me visualize it, I tried to draw a picture:
In my diagram, a user does the following
A) clicks on a Google ad and arrives at an advertiser’s landing page
B) hits the reload button
C) navigates to a different page
D) hits the back button
Please pardon my utter lack of artistic skills. If I’m reading Shuman’s post correctly, events A (the click on an ad), B (reloading the page), and D (hitting the back button) can show up in logs as accesses to the landing page. Because in the logs those accesses look like ad clicks, it might look like one IP address is clicking an ad three times.
So how can you tell real ad clicks from reloads/back-button events? Use Auto-tagging, which is a feature that Google has offered since 2005 and that I don’t think any other major search engine offers. What does auto-tagging do? Every ad click from Google gets tagged with a unique id. So if your landing page was “example.com/widgets.html” and you turned on Auto-tagging, an ad click to that page would look like “example.com/widgets.html?gclid=COasyKJXyYECFRlvMAodRFXJ”
Want to know how many unique ad-clicks were delivered to your site by Google? Just count the unique gclid parameters. And if I see the unique id “COasyKJXyYECFRlvMAodRFXJ” show up three times in my log, I know that Google charges me at most once for that unique id (they mention that in the 17 page white paper). I hope Shuman’s post or the diagram above makes it clear that just counting accesses to your ad landing pages in your logs will never give an accurate ad-click count. For example, studies in the 1990s found that the back button accounted for 30-40% of all navigation events. If you turn on Autotagging (which is enabled by default when you link your AdWords account with Google Analytics, or you can turn it on without signing up for Analytics), then you don’t need to worry about reloads or the back button (or opening new windows in IE).
I’m happy to add the disclaimer that I work on webspam in the search quality group, so I’m not an expert on pay-per-click advertising or invalid clicks. If I’ve said anything incorrect in this post, let me know and I’ll happily correct it. But if you’re using AdWords, I would definitely recommend turning on Auto-tagging.
By the way, if this post was at all interesting, I’d recommend checking out that white paper (pdf link). This time start on page 12 instead of page 1.
Update: A good post over at the AdWords blog provides actionable information about exactly how to report suspicious traffic, as well as some answers to common questions/concerns.