Attribution Capture
Consent dependency: This section describes the full attribution capture capability. In practice, which parameters are captured and where they are stored depends on the active consent tier (see 07-consent-integration.md). Click ID capture requires advertising consent. Client-side storage requires analytics consent. Server-side UTM/referrer capture operates independently of consent.
🏗️ Why Attribution Must Be Native
Section titled “🏗️ Why Attribution Must Be Native”Attribution data degrades the moment it leaves the first HTTP request. A user clicks a Google Ad, lands on your site with ?gclid=Cj0KCQjw...&utm_source=google&utm_medium=cpc, and the clock starts. If you rely on a client-side tag to capture those parameters, three things work against you:
- Safari ITP reduces JavaScript cookies containing gclid or fbclid to 24 hours. A server-side
Set-Cookieheader in the first HTTP response survives. A JavaScript cookie set by gtag.js does not. - Ad blockers block the scripts that would read the URL. If
gtag.jsnever loads, the parameters are never captured. Your own application code runs unconditionally. - Page navigations lose the query string. If the user navigates to a second page before the tag fires, the UTM parameters are gone from the URL. Server-side capture on the landing page request captures them before anything else happens.
Native attribution capture means: the server reads the URL and referrer on the first request, extracts the parameters, and stores them. No dependency on external scripts, no race conditions, no blocked requests.
📥 Parameters to Capture
Section titled “📥 Parameters to Capture”Standard UTM Parameters
Section titled “Standard UTM Parameters”| Parameter | Purpose | Example |
|---|---|---|
utm_source | Traffic source | google, facebook, newsletter |
utm_medium | Marketing medium | cpc, email, social, organic |
utm_campaign | Campaign name | spring_sale_2026, brand_awareness |
utm_term | Paid search keyword | running+shoes, best+crm |
utm_content | Ad variation identifier | ad_v2, banner_top, cta_red |
UTM parameters are not personal data. They describe the campaign, not the user. No DPA has classified UTM values as personal data under GDPR. They are safe to capture and store at all consent tiers where any analytics processing is permitted.
Click Identifiers
Section titled “Click Identifiers”| Parameter | Platform | Typical Format | Typical Length | Validity Window | Personal Data (GDPR) |
|---|---|---|---|---|---|
gclid | Google Ads | Protobuf + Base64 | 30-100+ chars | 90 days | Yes |
gbraid | Google Ads (iOS, web-to-app) | Opaque token | Variable | 90 days | No (aggregate) |
wbraid | Google Ads (iOS, app-to-web) | Opaque token | Variable | 90 days | No (aggregate) |
dclid | Google Marketing Platform (CM360/DV360) | 64-bit integer | Up to 20 chars | 90 days | Yes |
fbclid | Meta / Facebook | Alphanumeric | ~61 chars | 7 days click / 1 day view | Yes |
msclkid | Microsoft Ads | UUID-like hex | ~30+ chars | 90 days | Yes |
ttclid | TikTok | Opaque string | Variable | 1, 7, or 14 days (configurable) | Yes |
li_fat_id | UUID | 36 chars | 30 days (7 days on Safari) | Yes | |
epik | Opaque string | Variable | 45 days (configurable) | Yes | |
twclid | X / Twitter | Alphanumeric | ~26 chars | Per campaign settings | Yes |
ScCid | Snapchat | UUID | 36 chars | 28 days click / 1 day view | Yes |
All click identifiers are personal data under GDPR. Recital 30 explicitly includes “online identifiers” in the definition of personal data (Article 4(1)). Click IDs are unique to a user interaction, can be linked to other data to identify an individual, and are used by advertising platforms to build user profiles. The only exceptions are Google’s gbraid and wbraid, which are aggregate identifiers designed to be non-user-specific — though their status is arguable when combined with other signals.
Custom Parameters
Section titled “Custom Parameters”Implementations should support a configurable allowlist for client-specific parameters. Examples:
ref(affiliate referral codes)promo(promotion identifiers)partner(partner attribution)- Internal campaign tracking parameters specific to the business
Parameters not on the allowlist are ignored. This prevents capturing arbitrary query parameters (search queries, PII in URLs, session tokens) that have no attribution value.
📊 Attribution Model
Section titled “📊 Attribution Model”First Touch + Last Touch
Section titled “First Touch + Last Touch”UIAF captures two attribution snapshots per user:
first_touch: Captured on the FIRST visit that carries attribution parameters. Never overwritten. Records how the user originally discovered the site.last_touch: Updated on EVERY visit that carries new attribution parameters. Records the most recent marketing touchpoint before conversion.count: Integer counter incremented on every attribution-carrying visit. Indicates how many marketing touchpoints preceded conversion.
Why First + Last (Not All Touches)
Section titled “Why First + Last (Not All Touches)”Multi-touch attribution models that record every touchpoint are analytically powerful but operationally complex. They require ordered arrays of unbounded length, increase storage requirements, and create payload size issues for client-side storage.
First touch + last touch covers the vast majority of analytical needs:
- First touch answers: “What channel acquired this user?”
- Last touch answers: “What channel drove this conversion?”
- Count answers: “How many touchpoints did this user have?”
These three data points cover approximately 90% of attribution questions asked by marketing teams. A developer who needs full path analysis can extend the model by storing an array of all touches — UIAF does not prevent this, but the base specification keeps it simple.
🔄 When Attribution Updates
Section titled “🔄 When Attribution Updates”Attribution data updates only when the current visit carries attribution signals. A direct visit (no UTM parameters, no referrer, no click IDs) does not modify the attribution record.
The key rule: first_touch is write-once. Once set, it never changes for that user. last_touch is write-always when new attribution data is present.
🔍 Referrer Classification
Section titled “🔍 Referrer Classification”The HTTP Referer header (or client-side document.referrer as fallback) provides context about where the user came from. UIAF classifies referrers into categories for the source and medium fields when UTM parameters are absent.
Classification Hierarchy
Section titled “Classification Hierarchy”When UTM parameters are present, they take precedence over referrer-based classification. When UTMs are absent, classify based on referrer domain:
-
Search engines — Match by domain:
google.com,google.co.*(all TLDs),bing.com,yahoo.com,duckduckgo.com,baidu.com,yandex.com,yandex.ru,ecosia.org,search.brave.comMedium:organic -
Social — Match by domain:
facebook.com,instagram.com,twitter.com,x.com,linkedin.com,tiktok.com,reddit.com,pinterest.com,youtube.com,threads.netMedium:social -
AI assistants — Match by domain:
chatgpt.com,perplexity.ai,claude.ai,gemini.google.com,copilot.microsoft.comMedium:ai -
Email — Determined by
utm_medium=emailor known webmail domains:mail.google.com,outlook.live.com,mail.yahoo.comMedium:email -
Referral — Any other non-empty referrer from a different domain. Medium:
referral -
Direct — Empty referrer OR same-domain referrer. Medium:
(none), source:(direct)
Why Server-Side Referrer Is Better
Section titled “Why Server-Side Referrer Is Better”Modern browsers default to Referrer-Policy: strict-origin-when-cross-origin. This means cross-origin requests send only the origin (e.g., https://www.google.com/) — the path is stripped. Client-side document.referrer follows the same policy.
Server-side access to the Referer header gets at least the origin on the first HTTP request, before any JavaScript executes. This matters because:
document.referrermay be empty if the page was opened via JavaScript (window.open), bookmarks, or certain app deep links- Single-page applications that do client-side routing lose the referrer after the initial page load
- Server-side capture happens before ad blockers can interfere
For referrer classification, the origin is sufficient — you only need the domain to classify the source.
Edge Case: Empty Referrer with UTM Parameters
Section titled “Edge Case: Empty Referrer with UTM Parameters”When the referrer is empty but the URL contains UTM parameters, use the UTM data for classification. utm_source takes precedence over referrer for determining the source. This handles cases where:
- The referrer was stripped by the referring site’s
Referrer-Policy: no-referrer - The user navigated via an app that does not send referrers (email clients, messaging apps)
- The link was opened in a new browser context
🔗 Click ID Handling
Section titled “🔗 Click ID Handling”Critical Rules
Section titled “Critical Rules”1. Capture on the first HTTP request, server-side.
Do not rely on client-side JavaScript to read click IDs from the URL. Safari ITP caps JavaScript-set cookies containing gclid or fbclid to 24 hours. A server-side Set-Cookie header in the initial HTTP response is not subject to this restriction (provided IP alignment — see 05-browser-landscape.md).
2. Store alongside attribution data, not in separate cookies.
Click IDs are part of the attribution record. Storing them in separate cookies (as platform tags do with _gcl_aw, _fbc, etc.) fragments the data and makes each cookie independently vulnerable to expiration or deletion. Store them as fields within the unified attribution data structure.
3. NEVER strip click IDs from the URL.
Removing gclid, fbclid, or other click IDs from the URL after capture breaks downstream systems:
- Google’s own Consent Mode detection reads
gclidfrom the URL - Platform pixels (Meta Pixel, TikTok Pixel) need the click ID in the URL to function
- Analytics platforms may independently capture the click ID from the landing page URL
- URL-based deduplication between client-side and server-side event streams relies on the click ID being present
Capture the value. Leave the URL intact.
4. Include expiration metadata.
Each click ID has a platform-defined validity window. Store the capture timestamp and compute the expiration so downstream systems can assess whether the click ID is still valid for conversion attribution.
| Click ID | Recommended Storage Duration | Rationale |
|---|---|---|
gclid | 90 days | Matches Google’s maximum conversion window |
dclid | 90 days | CM360 offline upload window is 60 days; 90 days covers edge cases |
fbclid | 90 days | Meta flags values older than 90 days |
msclkid | 90 days | Microsoft’s official recommendation |
ttclid | 90 days | Conservative; actual CTA window is 1-14 days |
li_fat_id | 30 days | LinkedIn’s cookie lifetime; no benefit to storing longer |
epik | 90 days | Pinterest tag cookies last 1 year; 90 covers all attribution windows |
twclid | 90 days | X rejects events older than 90 days |
ScCid | 37 days | Snapchat CAPI accepts up to 37 days post-click |
📐 Attribution Data Structure
Section titled “📐 Attribution Data Structure”{ "attribution": { "first_touch": { "source": "google", "medium": "cpc", "campaign": "spring_sale", "term": "running shoes", "content": "ad_v2", "click_ids": { "gclid": "Cj0KCQjw84anBhCtARIsAISI-xfSUJmQ8Z..." }, "referrer": "google.com", "landing_page": "/products/shoes", "timestamp": 1647291600 }, "last_touch": { "source": "facebook", "medium": "paid_social", "campaign": "retargeting_q2", "term": null, "content": "carousel_v3", "click_ids": { "fbclid": "IwAR2F4-dbP0l7Mn1IawQQGCINEz..." }, "referrer": "facebook.com", "landing_page": "/products/shoes", "timestamp": 1647982200 }, "count": 3 }}Field definitions:
source: Traffic source, fromutm_sourceor referrer classificationmedium: Marketing medium, fromutm_mediumor referrer classificationcampaign: Campaign name, fromutm_campaign(null if absent)term: Search keyword, fromutm_term(null if absent)content: Ad content variant, fromutm_content(null if absent)click_ids: Object mapping parameter names to values. Only populated click IDs are included.referrer: Referrer domain extracted from theRefererheaderlanding_page: URL path only (no domain, no query string). Query strings are excluded because they may contain PII (email, name) or click IDs that should not leak into attribution storage at restricted consent tierstimestamp: Unix timestamp (seconds) of when this touchpoint was recordedcount: Total number of attribution-carrying visits
💾 Storage
Section titled “💾 Storage”Attribution data is stored client-side for persistence across sessions. The storage strategy mirrors the identity persistence approach (see 02-identity-management.md).
- Primary:
localStorage— JSON stringified under keyuiaf_attribution. Survives browser close. Subject to Safari’s 7-day purge if the user does not visit within 7 days. - Backup: first-party cookie — If the serialized attribution data fits within 4KB, store a compressed version in a cookie. Server-set cookies (
Set-Cookieheader) survive Safari ITP restrictions. Use this as the recovery source when localStorage is purged. - Session cache:
sessionStorage— Copy of current attribution data for fast read access during the session. Avoids repeated localStorage reads. Cleared on tab close.
When reading attribution data, check in order: sessionStorage (fastest) -> localStorage -> cookie (recovery). When writing, update all three.
💻 Mock Code
Section titled “💻 Mock Code”// captureAttribution(url, consent, referrer)// url: a URL string (window.location.href, or a route URL)// consent: ConsentState object (section 07)// referrer: referrer URL string. On initial page load: document.referrer.// On SPA navigation: the previous route URL (passed by router hook).// Optional — defaults to document.referrer if omitted.// returns: the full attribution object {first_touch, last_touch, count, is_new_touch}// ALWAYS returns — even on direct visits with no new attribution.
function captureAttribution(url, consent, referrer) { referrer = referrer OR document.referrer // explicit param or browser default
// 1. Parse URL parameters params = parseQueryString(url)
// 2. Extract UTM parameters utms = { source: params.get("utm_source"), medium: params.get("utm_medium"), campaign: params.get("utm_campaign"), term: params.get("utm_term"), content: params.get("utm_content") }
// 3. Extract click IDs -- ONLY if ads consent is granted click_ids = {} if consent.ads_allowed { CLICK_ID_PARAMS = ["gclid", "gbraid", "wbraid", "dclid", "fbclid", "msclkid", "ttclid", "li_fat_id", "epik", "twclid", "ScCid"] for param in CLICK_ID_PARAMS { value = params.get(param) if value != null { click_ids[param] = value } } } // If ads consent is denied, click IDs are never captured or stored. // They remain in the URL (not stripped) for platform pixels to read.
// 4. Extract custom parameters from allowlist custom_params = {} for param in config.custom_param_allowlist { value = params.get(param) if value != null { custom_params[param] = value } }
// 5. Classify referrer (from explicit parameter, not hidden global) referrer_domain = extractDomain(referrer) referrer_classification = classifyReferrer(referrer_domain)
// 6. Determine if this visit has NEW attribution data current_hostname = extractHostname(url) has_new_attribution = ( utms.source != null OR utms.medium != null OR utms.campaign != null OR Object.keys(click_ids).length > 0 OR (referrer_domain != null AND referrer_domain != current_hostname) )
// 7. Load existing attribution from storage existing = loadAttribution() // from localStorage/sessionStorage
// 8. If no new attribution, return existing state unchanged if not has_new_attribution { existing.is_new_touch = false // No new attribution on this page load return existing }
// 9. Build new touchpoint record touchpoint = { source: utms.source OR referrer_classification.source, medium: utms.medium OR referrer_classification.medium, campaign: utms.campaign, term: utms.term, content: utms.content, click_ids: click_ids, referrer: referrer_domain, landing_page: extractPath(url), timestamp: now() }
// 10. Update model if existing.first_touch == null { existing.first_touch = touchpoint } existing.last_touch = touchpoint existing.count = (existing.count OR 0) + 1
// 11. Store -- ONLY in permitted storage mechanisms if consent.analytics_allowed { storeAttribution(existing) // localStorage + sessionStorage }
existing.is_new_touch = true // New attribution captured on this page load
// 12. Return the full attribution state (always populated) return existing}⚠️ Edge Cases
Section titled “⚠️ Edge Cases”Same user, two campaigns in one session. User clicks a Google Ad, lands on the site, then opens a Meta ad link in the same browser session. first_touch remains the Google Ad visit. last_touch updates to the Meta ad. count increments to 2.
Direct visit after campaign visit. User clicks a campaign link on Monday, returns directly on Wednesday. The Wednesday visit has no attribution parameters and no meaningful referrer. No update occurs. last_touch remains the Monday campaign visit.
Click ID expired but still in URL. A user bookmarked a URL containing ?gclid=... and revisits weeks later. The gclid is still in the URL. Capture it, include the current timestamp. Let the receiving endpoint or ad platform decide whether the click ID is still within its attribution window. UIAF does not enforce expiration at capture time.
Very long URLs with many parameters. Capture only parameters on the allowlist (UTMs, known click IDs, custom allowlist). All other query parameters are ignored. This bounds the data volume and prevents accidentally capturing PII or sensitive data from arbitrary query strings.
Referrer is empty but UTMs are present. Common when links are opened from native apps (email clients, messaging apps, some social apps). UTM parameters take precedence. Classify based on utm_source and utm_medium. Set referrer to null.
Multiple click IDs in one URL. Rare but possible (e.g., ?gclid=...&msclkid=... from a misconfigured redirect chain). Capture all of them. Store all in the click_ids object. The receiving endpoint determines which one is authoritative for each platform.
User clears browser storage. Attribution data is lost. On the next visit with attribution parameters, first_touch is set fresh. This is expected behavior — UIAF cannot persist data that the user has explicitly deleted. If server-side attribution storage is implemented, the server copy can be used to restore context when the user is re-identified (see 02-identity-management.md for identity recovery).