Privacy Preserving Ad Click Attribution For the Web
A typical website is made of numerous components coming from a wide variety of sources. Many of the sources that make up a website are opaque to the user, and some third-party resources are designed to identify and track users as they browse the web, often in order to retarget ads and measure ad campaign effectiveness.
The combination of third-party web tracking and ad campaign measurement has led many to conflate web privacy with a web free of advertisements. We think that’s a misunderstanding. Online ads and measurement of their effectiveness do not require Site A, where you clicked an ad, to learn that you purchased something on Site B. The only data needed for measurement is that someone who clicked an ad on Site A made a purchase on Site B.
Today we are presenting a new technology to allow attribution of ad clicks on the web while preserving user privacy. We used the following principles as we designed this technology:
- Users should not be uniquely identified across websites for the purposes of ad click attribution. This means the combined data of an ad click and a conversion should not be attributable to a single user at web scale. To achieve this, our design has the following properties:
- Up to 64 ad campaigns can be measured in parallel per website where ads are placed by an advertiser. This low number means ad campaign IDs cannot be turned into user identifiers.
- Up to 64 conversion events can be distinguished on the advertiser’s own website. This means conversion IDs are also restricted from being turned into user identifiers.
- Only websites that users visit should be involved in measuring ad clicks and conversions. This means that opaque third-parties should not receive ad click attribution reports and we enforce it by requiring that the ad link is part of a first-party webpage and by only reporting on which first-party website a conversion happened.
- The browser should act on behalf of the user and do its best to preserve privacy while reporting on ad click attribution. We achieve this by:
- Sending attribution reports in a dedicated Private Browsing Mode even though the user is in regular browsing mode.
- Disallowing data like cookies for reporting purposes.
- Delaying reports randomly between 24 and 48 hours.
- Not supporting Privacy Preserving Ad Click Attribution at all when the user is in Private Browsing Mode.
- The browser vendor should not learn about the user’s ad clicks or conversions. For this reason, we designed the feature to do all of its work on-device. The browser vendor does not see any of the ad click attribution data.
Critically, our solution avoids placing trust in any of the parties involved — the ad network, the merchant, or any other intermediaries — and dramatically limits the entropy of data passed between them to prevent communication of a tracking identifier.
Ad Click Attribution in a Nutshell
Here’s a simple example of ad click attribution:
An online store runs an ad on a search engine website. If a user clicks the ad and eventually buys something, both the online store and the search engine website where the ad was placed want to know; they want the purchase to be attributed to the ad click so that the store knows where to focus their advertising budget. Such attribution is used for measurement of which ads are effective.
Traditional, Privacy-Invasive Ad Click Attribution
Traditionally, ad click attribution has been done through the use of cookies and so-called “tracking pixels.” Here’s an illustration of how this works:
The illustration above shows the user John:
- Searching for “grill” on search.example,
- Clicking an ad which takes him to shop.example, and
- Finally adding a $90 grill to a shopping cart.
Following each action on shop.example, shop.example fires a tracking pixel (a request for an invisible image) to search.example to report progress toward a purchase.
In browsers without appropriate privacy protections, search.example will identify John through his cookies every time shop.example fires such a tracking pixel to search.example. This pervasive technology allows search.example to learn everything John does on shop.example and all other websites that fire similar tracking pixels. Even worse, all these pixels fire regardless of whether John has clicked an ad or not.
Needless to say, tracking pixels that carry cookies enable sites such as search.example to build up a huge profile of people’s interests, purchasing power, habits, age, et cetera. We refer to this as cross-site tracking and Safari prevents it from happening through the WebKit feature Intelligent Tracking Prevention (ITP).
As more and more browsers acknowledge the problems of cross-site tracking, we should expect privacy-invasive ad click attribution to become a thing of the past.
Privacy Preserving Ad Click Attribution
We propose a modern way of doing ad click attribution that doesn’t allow for cross-site tracking of users but does provide a means of measuring the effectiveness of online ads. It is built into the browser itself and runs on-device which means that the browser vendor does not get to see what ads are clicked or which purchases are made.
Privacy Preserving Ad Click Attribution has three steps:
- Store ad clicks. This is done by the page hosting the ad at the time of an ad click.
- Match conversions against stored ad clicks. This is done on the website the ad navigated to as a result of the click. Conversions do not have to happen right after a click and do not have to happen on the specific landing page, just the same website.
- Send out ad click attribution data. This is done by the browser after a conversion matches an ad click.
Let’s go through these steps in detail and which steps we’ve taken to preserve the user’s privacy.
Step 1: Store Ad Clicks
Anchor elements, often referred to as links, now support two new, optional attributes called adDestination and adCampaignID.
As shown in the illustration below, adDestination is the domain the ad click is navigating the user to, and adCampaignID is the identifier of the ad campaign.
If the user clicks the ad link on search.example, the browser will follow the navigation, through potential redirects, to make sure that the user actually lands on shop.example. If so, the browser stores the ad click, comprising the following data (presented here in plain English): The user clicked shop.example’s ad campaign 55 on search.example.
Here are the important privacy aspects of this step:
- The link needs to be an element on the first-party website (the main frame), not a link in an iframe. This is to meet user expectations and to be able to provide control to the user. Users can only be expected to understand which first-party website they clicked an ad on and which first-party website they made a purchase on. We also think it’s important that the first-party website that serves the ad is the one attributed for the performance of the ad campaign.
- Neither search.example nor shop.example can read the stored ad click data or detect that it exists.
- The browser only stores ad clicks for a limited time. In WebKit’s implementation that is seven days.
- The entropy of the ad campaign ID needs to be properly restricted to not become a cross-site tracking vector. WebKit’s implementation allows a value between 0 and 63, i.e. a maximum of 64 shop.example ad campaigns running in parallel on search.example.
Step 2: Match Conversions Against Stored Ad Clicks
To achieve ad click attribution, the browser needs to be able to match conversions with stored ad clicks. What are conversions?
- Adding an item to the shopping cart is a conversion.
- Signing up for a new service is a conversion.
- Entering shipping or payment information is a conversion.
- Pulling the trigger and actually buying something is a conversion.
Matching conversions to ad clicks allows shop.example to understand that a specific ad campaign may be effective in getting customers to add items to their shopping carts but something in the checkout flow throws them off.
How does Privacy Preserving Ad Click Attribution detect a conversion and match it with a stored ad click? It makes use of the legacy tracking pixels!
In the illustration above, an existing request to the existing tracking pixel is redirected by search.example on its own server infrastructure to a well-known location in order to signal to the browser that this is in fact a conversion happening. Note that privacy protections such as ITP will typically make sure that no cookies are sent in this request.
The path parameter “20” at the end of the well-known location is the conversion data. This gives shop.example an opportunity to say something about the conversion such as where in the sales funnel the customer is, what the value of the conversion is, what time of day it is, or whatever they decide is relevant for them.
The redirect to the well-known location may also include an optional priority parameter which indicates the importance of this particular conversion in the case of multiple conversions matching the same stored ad click.
Here are the important privacy aspects of this step:
- Neither search.example nor shop.example know whether there is any stored ad click data to be matched against.
- Neither search.example nor shop.example are told by the browser whether there was a match or not.
- The entropy of the ad conversion data needs to be properly restricted to not become a cross-site tracking vector. WebKit’s implementation allows a value between 0 and 63, i.e. six bits to distinguish conversion events. As mentioned earlier, shop.example decides what goes into these bits. For instance, they may spend two bits on monetary value in four buckets: {less than $10, between $10 and $50, between $51 and $200, above $200}.
We expect to also implement a JavaScript API to send this information to the .well-known location to remove the requirement for tracking pixels but we’d like to openly discuss what should go into that API since it is much more forward looking than retrofitting existing tracking pixels.
Step 3: Send Out Ad Click Attribution Data
Now we come to the third and final step — the browser reports that a conversion happened for a user that had previously clicked an ad.
Once the browser has matched a conversion against a stored ad click, it sets a timer, randomized between 24 and 48 hours. When that timer fires, the browser makes an ephemeral, stateless POST request to the same well-known location. In our example, the request would go to https://search.example/.well-known/ad-click-attribution/20/55, with the referrer request header set to https://shop.example.
In plain English this report would say: 24 to 48 hours ago, some user who previously clicked shop.example’s ad campaign 55 on search.example, converted with data 20 on shop.example.
Once the ephemeral, stateless POST request goes out, the stored ad click is consumed and cannot be converted further. This is in part why we have the minimum delay of 24 hours. During that delay, shop.example has the opportunity to signal further conversions, for instance down a sales funnel, and only the most important conversion will be sent in the POST request. The importance is controlled through the optional priority parameter in the conversion redirect, as mentioned above.
Here are the important privacy aspects of this step:
- Neither search.example nor shop.example know that an attribution request has been scheduled.
- The 24–48 hour delay makes sure a conversion that happens shortly after an ad click will not allow for speculative profiling of the user by search.example. The randomness in the delay makes sure that the request does not in itself reveal when during the day the conversion happened. If shop.example wants time of day data, they will have to spend some of their six bits of conversion data on it.
- The ephemeral, stateless request makes sure the request is not associated with state built up through other browsing. Ephemeral in this sense is referred to as Private Browsing in Safari.
- The well-known location allows for a simple rule if Content Blockers wants to block such conversion reporting.
Privacy Considerations
For ad click attribution to happen, some bits of data about what happened across two websites need to be sent. Today’s practice of ad click attribution has no practical limit on the bits of data, which allows for full cross-site tracking of users using cookies. This is privacy invasive and thus we are obliged to prevent such ad click attribution from happening in Safari and WebKit.
But by keeping the entropy of attribution data low enough, we believe the reporting can be done in a privacy preserving way.
Here is a summary of our privacy considerations for Privacy Preserving Ad Click Attribution:
- Only links served on first-party pages should be able to store ad click attribution data. This ensures that users have a chance of understanding how Privacy Preserving Ad Click Attribution works.
- Neither the website where the ad click happens nor the website where the conversion happens should be able to see whether ad click data has been stored, has been matched, or is scheduled for reporting.
- Ad clicks should only be stored for a limited time, such as a week. Users cannot be expected to understand that a purchase they make today is attributed to an ad click they made months ago.
- The entropy of both ad campaign ID and conversion data needs to be restricted to a point where this data cannot be repurposed for cross-site tracking of users. We propose six bits each for these two pieces of data, or values between 0 and 63.
- Ad click attribution requests should be delayed randomly between 24 to 48 hours. This makes sure that a conversion that happens shortly after an ad click will not allow for speculative cross-site profiling of the user. The randomness in the delay makes sure the request does not in itself reveal when during the day the conversion happened.
- The browser should not guarantee any specific order in which multiple ad click attribution requests are sent, since the order itself could be abused to increase the entropy and allow for cross-site tracking of users.
- The browser should use an ephemeral session (a.k.a. private or incognito mode) to make ad click attribution requests.
- The browser should not use or accept any credentials such as cookies, client certificates, or Basic Authentication in ad click attribution requests or responses.
- The browser should offer a way to turn ad click attribution on and off. We intend to have the default setting to be on to encourage websites to move to this technology and abandon general cross-site tracking.
- The browser should not enable ad click attribution in private/incognito mode.
Try It Out In Safari Technology Preview!
We’re happy to offer Privacy Preserving Ad Click Attribution as an experimental feature in Safari Technology Preview 82+.
First, enable the Develop menu, then go to the Experimental Features submenu.
There you’ll find “Ad Click Attribution” which enables the feature itself, and “Ad Click Attribution Debug Mode” which enables debug logging for developers and shortens the 24–48 hour delay to a static one minute delay, also for use by developers.
Debugging the Link Attributes
A cross-site anchor element that wants to push ad click attribution data into the browser looks like this:
<a href="https://some.site.example" addestination="https://shop.example" adcampaignid="55">
To debug such elements, you use the Web Inspector’s console with the “Preserve Log” setting enabled. Here are a few examples of console warnings you may see if there’s something wrong with your attribution attributes:
Both adcampaignid and addestination need to be set for Ad Click Attribution to work.
Ad Click Attribution is only supported in the main frame.
This tells you the anchor element is not part of the main frame.
addestination can not be the same site as the current website.
This technology is meant for cross-site attribution of ad clicks. There is no need for it within the same website.
Debugging Storage of Ad Clicks
For debugging anything beyond the anchor element, you need to use the system log (syslog). Here’s how you achieve that:
- Enable Ad Click Attribution Debug Mode in the Develop–>Experimental Features submenu.
- In your macOS Terminal, run:
log stream -info | grep AdClickAttribution
.
Now if you click a cross-site element with adDestination and adCampaignID attributes, you should expect to see the following in your syslog:
Storing an ad click.
Debugging Conversions
A conversion is signaled through a same-site HTTP redirect to /.well-known/ad-click-attribution/[a decimal value between 0 and 63 representing the conversion data]. Same-site here means search.example needs to be the server redirecting to https://search.example/.well-known/ad-click-attribution/. The reason for this is that search.example should be in control of when stored ad clicks on its site are consumed. Note that the conversion redirect is done as a subresource on shop.example so we don’t mean same-site as the main frame.
Once you do such a redirect, the syslog might feature one of the following error messages:
Conversion was not accepted because the HTTP redirect was not same-site.
This is the requirement mentioned above, i.e. it has to be search.example redirecting to search.example/.well-known/ad-click-attribution/.
Conversion was not accepted because it was requested in an HTTP redirect that is same-site as the first-party.
Again, this technology is meant for cross-site attribution of ad clicks. There is no need for it within the same website.
Conversion was not accepted because the URL's protocol is not HTTPS or the URL contains one or more of username, password, query string, and fragment.
The request to the well-known location has to be HTTPS and cannot contain a username, password, query string, or fragment.
Conversion was not accepted because the URL path did not start with /.well-known/ad-click-attribution/.
Conversion was not accepted because the conversion data could not be parsed or was higher than the allowed maximum of 63.
Conversion was not accepted because the URL path contained unrecognized parts.
This is a catch-all error message for when the URL has unrecognized path elements or is not of the correct length.
Detecting Successful Conversions
If you got everything right in the redirect to the well-known location, you should see the following message in the syslog:
Got a conversion with conversion data: 20 and priority: 0.
Here you see the priority parameter. It is a way for the server to signal how important a particular conversion is so that the browser can report the most important one. Take the sales funnel example. There, multiple conversions will happen in succession: add to shopping cart, enter shipping info, enter payment info, and finalize purchase. Most likely, the finalized purchase is the conversion that should be reported together with the ad campaign ID. Priority can be 0 to 63, higher means higher priority, and the priority value is only used for internal bookkeeping, i.e. not sent in any request.
Henceforth, lets assume the redirect is done with conversion data 20 and priority 12, like so:
https://search.example/.well-known/ad-click-attribution/20/12
Now, if there’s a stored ad click that matches this conversion, you’ll see detailed conversion information in the syslog:
Converted a stored ad click with conversion data: 20 and priority: 12.
This is when a previously unconverted ad click is converted.
Re-converted an ad click with a new one with conversion data: 20 and priority: 12 because it had higher priority.
This is for when there’s a conversion of higher priority that matches an already scheduled conversion request. The ad click is kept, re-converted with the high priority conversion, and scheduled for reporting.
Replaced a previously converted ad click with a new one with conversion data: 20 and priority: 12 because it had higher priority.
This is for when there is a different ad click (the user may have clicked more than one ad) with a scheduled conversion request but with lower priority. The newly converted ad click with higher priority replaces the old one.
Finally, you’ll see the scheduling of the report request in the syslog:
Setting timer for firing conversion requests to the debug mode timeout of 60 seconds where the regular timeout would have been 111003 seconds.
This is special-cased for Ad Click Attribution Debug Mode. Instead of the 24 to 48 hour delay, there’s only a 60 second delay. The log message tells what the real delay would have been, in this case ≈31 hours.
Receiving Conversion Reports
When the scheduled timer fires, an HTTP POST request is made to ./well-known/ad-click-attribution/[conversion data]/[ad campaign ID], effectively reporting that a conversion happened for a user that previously clicked an associated advertisement. In our example, this request would go to:
https://search.example/.well-known/ad-click-attribution/20/55
… with the referrer request header set to:
https://shop.example/
When this request is about to go out, you’ll see the following syslog entry:
About to fire an attribution request for a conversion.
If something went wrong with the request, you’ll see it in the syslog:
Received error: [error message] for ad click attribution request.
Where To Send Feedback and Bug Reports
Privacy Preserving Ad Click Attribution is in the early stage of being proposed as a standard through the W3C Web Platform Incubator Community Group (WICG). Please join the discussion and file issues to discuss how this technology fits with your use cases.
If you find that the experimental feature in Safari Technology Preview doesn’t work as explained, please file a WebKit bug at https://bugs.webkit.org and CC John Wilander.
For technical inquiries on Privacy Preserving Ad Click Attribution, you’ll find me on Twitter: @johnwilander