Skip to content

Attribution Capture

Consent dependency: This section describes the full attribution capture capability. In practice, which parameters are captured and where they are stored depends on the active consent tier (see 07-consent-integration.md). Click ID capture requires advertising consent. Client-side storage requires analytics consent. Server-side UTM/referrer capture operates independently of consent.


Attribution data degrades the moment it leaves the first HTTP request. A user clicks a Google Ad, lands on your site with ?gclid=Cj0KCQjw...&utm_source=google&utm_medium=cpc, and the clock starts. If you rely on a client-side tag to capture those parameters, three things work against you:

  1. Safari ITP reduces JavaScript cookies containing gclid or fbclid to 24 hours. A server-side Set-Cookie header in the first HTTP response survives. A JavaScript cookie set by gtag.js does not.
  2. Ad blockers block the scripts that would read the URL. If gtag.js never loads, the parameters are never captured. Your own application code runs unconditionally.
  3. Page navigations lose the query string. If the user navigates to a second page before the tag fires, the UTM parameters are gone from the URL. Server-side capture on the landing page request captures them before anything else happens.

Native attribution capture means: the server reads the URL and referrer on the first request, extracts the parameters, and stores them. No dependency on external scripts, no race conditions, no blocked requests.


ParameterPurposeExample
utm_sourceTraffic sourcegoogle, facebook, newsletter
utm_mediumMarketing mediumcpc, email, social, organic
utm_campaignCampaign namespring_sale_2026, brand_awareness
utm_termPaid search keywordrunning+shoes, best+crm
utm_contentAd variation identifierad_v2, banner_top, cta_red

UTM parameters are not personal data. They describe the campaign, not the user. No DPA has classified UTM values as personal data under GDPR. They are safe to capture and store at all consent tiers where any analytics processing is permitted.

ParameterPlatformTypical FormatTypical LengthValidity WindowPersonal Data (GDPR)
gclidGoogle AdsProtobuf + Base6430-100+ chars90 daysYes
gbraidGoogle Ads (iOS, web-to-app)Opaque tokenVariable90 daysNo (aggregate)
wbraidGoogle Ads (iOS, app-to-web)Opaque tokenVariable90 daysNo (aggregate)
dclidGoogle Marketing Platform (CM360/DV360)64-bit integerUp to 20 chars90 daysYes
fbclidMeta / FacebookAlphanumeric~61 chars7 days click / 1 day viewYes
msclkidMicrosoft AdsUUID-like hex~30+ chars90 daysYes
ttclidTikTokOpaque stringVariable1, 7, or 14 days (configurable)Yes
li_fat_idLinkedInUUID36 chars30 days (7 days on Safari)Yes
epikPinterestOpaque stringVariable45 days (configurable)Yes
twclidX / TwitterAlphanumeric~26 charsPer campaign settingsYes
ScCidSnapchatUUID36 chars28 days click / 1 day viewYes

All click identifiers are personal data under GDPR. Recital 30 explicitly includes “online identifiers” in the definition of personal data (Article 4(1)). Click IDs are unique to a user interaction, can be linked to other data to identify an individual, and are used by advertising platforms to build user profiles. The only exceptions are Google’s gbraid and wbraid, which are aggregate identifiers designed to be non-user-specific — though their status is arguable when combined with other signals.

Implementations should support a configurable allowlist for client-specific parameters. Examples:

  • ref (affiliate referral codes)
  • promo (promotion identifiers)
  • partner (partner attribution)
  • Internal campaign tracking parameters specific to the business

Parameters not on the allowlist are ignored. This prevents capturing arbitrary query parameters (search queries, PII in URLs, session tokens) that have no attribution value.


UIAF captures two attribution snapshots per user:

  • first_touch: Captured on the FIRST visit that carries attribution parameters. Never overwritten. Records how the user originally discovered the site.
  • last_touch: Updated on EVERY visit that carries new attribution parameters. Records the most recent marketing touchpoint before conversion.
  • count: Integer counter incremented on every attribution-carrying visit. Indicates how many marketing touchpoints preceded conversion.

Multi-touch attribution models that record every touchpoint are analytically powerful but operationally complex. They require ordered arrays of unbounded length, increase storage requirements, and create payload size issues for client-side storage.

First touch + last touch covers the vast majority of analytical needs:

  • First touch answers: “What channel acquired this user?”
  • Last touch answers: “What channel drove this conversion?”
  • Count answers: “How many touchpoints did this user have?”

These three data points cover approximately 90% of attribution questions asked by marketing teams. A developer who needs full path analysis can extend the model by storing an array of all touches — UIAF does not prevent this, but the base specification keeps it simple.


Attribution data updates only when the current visit carries attribution signals. A direct visit (no UTM parameters, no referrer, no click IDs) does not modify the attribution record.

The key rule: first_touch is write-once. Once set, it never changes for that user. last_touch is write-always when new attribution data is present.


The HTTP Referer header (or client-side document.referrer as fallback) provides context about where the user came from. UIAF classifies referrers into categories for the source and medium fields when UTM parameters are absent.

When UTM parameters are present, they take precedence over referrer-based classification. When UTMs are absent, classify based on referrer domain:

  1. Search engines — Match by domain: google.com, google.co.* (all TLDs), bing.com, yahoo.com, duckduckgo.com, baidu.com, yandex.com, yandex.ru, ecosia.org, search.brave.com Medium: organic

  2. Social — Match by domain: facebook.com, instagram.com, twitter.com, x.com, linkedin.com, tiktok.com, reddit.com, pinterest.com, youtube.com, threads.net Medium: social

  3. AI assistants — Match by domain: chatgpt.com, perplexity.ai, claude.ai, gemini.google.com, copilot.microsoft.com Medium: ai

  4. Email — Determined by utm_medium=email or known webmail domains: mail.google.com, outlook.live.com, mail.yahoo.com Medium: email

  5. Referral — Any other non-empty referrer from a different domain. Medium: referral

  6. Direct — Empty referrer OR same-domain referrer. Medium: (none), source: (direct)

Modern browsers default to Referrer-Policy: strict-origin-when-cross-origin. This means cross-origin requests send only the origin (e.g., https://www.google.com/) — the path is stripped. Client-side document.referrer follows the same policy.

Server-side access to the Referer header gets at least the origin on the first HTTP request, before any JavaScript executes. This matters because:

  • document.referrer may be empty if the page was opened via JavaScript (window.open), bookmarks, or certain app deep links
  • Single-page applications that do client-side routing lose the referrer after the initial page load
  • Server-side capture happens before ad blockers can interfere

For referrer classification, the origin is sufficient — you only need the domain to classify the source.

Edge Case: Empty Referrer with UTM Parameters

Section titled “Edge Case: Empty Referrer with UTM Parameters”

When the referrer is empty but the URL contains UTM parameters, use the UTM data for classification. utm_source takes precedence over referrer for determining the source. This handles cases where:

  • The referrer was stripped by the referring site’s Referrer-Policy: no-referrer
  • The user navigated via an app that does not send referrers (email clients, messaging apps)
  • The link was opened in a new browser context

1. Capture on the first HTTP request, server-side.

Do not rely on client-side JavaScript to read click IDs from the URL. Safari ITP caps JavaScript-set cookies containing gclid or fbclid to 24 hours. A server-side Set-Cookie header in the initial HTTP response is not subject to this restriction (provided IP alignment — see 05-browser-landscape.md).

2. Store alongside attribution data, not in separate cookies.

Click IDs are part of the attribution record. Storing them in separate cookies (as platform tags do with _gcl_aw, _fbc, etc.) fragments the data and makes each cookie independently vulnerable to expiration or deletion. Store them as fields within the unified attribution data structure.

3. NEVER strip click IDs from the URL.

Removing gclid, fbclid, or other click IDs from the URL after capture breaks downstream systems:

  • Google’s own Consent Mode detection reads gclid from the URL
  • Platform pixels (Meta Pixel, TikTok Pixel) need the click ID in the URL to function
  • Analytics platforms may independently capture the click ID from the landing page URL
  • URL-based deduplication between client-side and server-side event streams relies on the click ID being present

Capture the value. Leave the URL intact.

4. Include expiration metadata.

Each click ID has a platform-defined validity window. Store the capture timestamp and compute the expiration so downstream systems can assess whether the click ID is still valid for conversion attribution.

Click IDRecommended Storage DurationRationale
gclid90 daysMatches Google’s maximum conversion window
dclid90 daysCM360 offline upload window is 60 days; 90 days covers edge cases
fbclid90 daysMeta flags values older than 90 days
msclkid90 daysMicrosoft’s official recommendation
ttclid90 daysConservative; actual CTA window is 1-14 days
li_fat_id30 daysLinkedIn’s cookie lifetime; no benefit to storing longer
epik90 daysPinterest tag cookies last 1 year; 90 covers all attribution windows
twclid90 daysX rejects events older than 90 days
ScCid37 daysSnapchat CAPI accepts up to 37 days post-click

{
"attribution": {
"first_touch": {
"source": "google",
"medium": "cpc",
"campaign": "spring_sale",
"term": "running shoes",
"content": "ad_v2",
"click_ids": {
"gclid": "Cj0KCQjw84anBhCtARIsAISI-xfSUJmQ8Z..."
},
"referrer": "google.com",
"landing_page": "/products/shoes",
"timestamp": 1647291600
},
"last_touch": {
"source": "facebook",
"medium": "paid_social",
"campaign": "retargeting_q2",
"term": null,
"content": "carousel_v3",
"click_ids": {
"fbclid": "IwAR2F4-dbP0l7Mn1IawQQGCINEz..."
},
"referrer": "facebook.com",
"landing_page": "/products/shoes",
"timestamp": 1647982200
},
"count": 3
}
}

Field definitions:

  • source: Traffic source, from utm_source or referrer classification
  • medium: Marketing medium, from utm_medium or referrer classification
  • campaign: Campaign name, from utm_campaign (null if absent)
  • term: Search keyword, from utm_term (null if absent)
  • content: Ad content variant, from utm_content (null if absent)
  • click_ids: Object mapping parameter names to values. Only populated click IDs are included.
  • referrer: Referrer domain extracted from the Referer header
  • landing_page: URL path only (no domain, no query string). Query strings are excluded because they may contain PII (email, name) or click IDs that should not leak into attribution storage at restricted consent tiers
  • timestamp: Unix timestamp (seconds) of when this touchpoint was recorded
  • count: Total number of attribution-carrying visits

Attribution data is stored client-side for persistence across sessions. The storage strategy mirrors the identity persistence approach (see 02-identity-management.md).

  • Primary: localStorage — JSON stringified under key uiaf_attribution. Survives browser close. Subject to Safari’s 7-day purge if the user does not visit within 7 days.
  • Backup: first-party cookie — If the serialized attribution data fits within 4KB, store a compressed version in a cookie. Server-set cookies (Set-Cookie header) survive Safari ITP restrictions. Use this as the recovery source when localStorage is purged.
  • Session cache: sessionStorage — Copy of current attribution data for fast read access during the session. Avoids repeated localStorage reads. Cleared on tab close.

When reading attribution data, check in order: sessionStorage (fastest) -> localStorage -> cookie (recovery). When writing, update all three.


// captureAttribution(url, consent, referrer)
// url: a URL string (window.location.href, or a route URL)
// consent: ConsentState object (section 07)
// referrer: referrer URL string. On initial page load: document.referrer.
// On SPA navigation: the previous route URL (passed by router hook).
// Optional — defaults to document.referrer if omitted.
// returns: the full attribution object {first_touch, last_touch, count, is_new_touch}
// ALWAYS returns — even on direct visits with no new attribution.
function captureAttribution(url, consent, referrer) {
referrer = referrer OR document.referrer // explicit param or browser default
// 1. Parse URL parameters
params = parseQueryString(url)
// 2. Extract UTM parameters
utms = {
source: params.get("utm_source"),
medium: params.get("utm_medium"),
campaign: params.get("utm_campaign"),
term: params.get("utm_term"),
content: params.get("utm_content")
}
// 3. Extract click IDs -- ONLY if ads consent is granted
click_ids = {}
if consent.ads_allowed {
CLICK_ID_PARAMS = ["gclid", "gbraid", "wbraid", "dclid",
"fbclid", "msclkid", "ttclid",
"li_fat_id", "epik", "twclid", "ScCid"]
for param in CLICK_ID_PARAMS {
value = params.get(param)
if value != null {
click_ids[param] = value
}
}
}
// If ads consent is denied, click IDs are never captured or stored.
// They remain in the URL (not stripped) for platform pixels to read.
// 4. Extract custom parameters from allowlist
custom_params = {}
for param in config.custom_param_allowlist {
value = params.get(param)
if value != null {
custom_params[param] = value
}
}
// 5. Classify referrer (from explicit parameter, not hidden global)
referrer_domain = extractDomain(referrer)
referrer_classification = classifyReferrer(referrer_domain)
// 6. Determine if this visit has NEW attribution data
current_hostname = extractHostname(url)
has_new_attribution = (
utms.source != null OR
utms.medium != null OR
utms.campaign != null OR
Object.keys(click_ids).length > 0 OR
(referrer_domain != null AND referrer_domain != current_hostname)
)
// 7. Load existing attribution from storage
existing = loadAttribution() // from localStorage/sessionStorage
// 8. If no new attribution, return existing state unchanged
if not has_new_attribution {
existing.is_new_touch = false // No new attribution on this page load
return existing
}
// 9. Build new touchpoint record
touchpoint = {
source: utms.source OR referrer_classification.source,
medium: utms.medium OR referrer_classification.medium,
campaign: utms.campaign,
term: utms.term,
content: utms.content,
click_ids: click_ids,
referrer: referrer_domain,
landing_page: extractPath(url),
timestamp: now()
}
// 10. Update model
if existing.first_touch == null {
existing.first_touch = touchpoint
}
existing.last_touch = touchpoint
existing.count = (existing.count OR 0) + 1
// 11. Store -- ONLY in permitted storage mechanisms
if consent.analytics_allowed {
storeAttribution(existing) // localStorage + sessionStorage
}
existing.is_new_touch = true // New attribution captured on this page load
// 12. Return the full attribution state (always populated)
return existing
}

Same user, two campaigns in one session. User clicks a Google Ad, lands on the site, then opens a Meta ad link in the same browser session. first_touch remains the Google Ad visit. last_touch updates to the Meta ad. count increments to 2.

Direct visit after campaign visit. User clicks a campaign link on Monday, returns directly on Wednesday. The Wednesday visit has no attribution parameters and no meaningful referrer. No update occurs. last_touch remains the Monday campaign visit.

Click ID expired but still in URL. A user bookmarked a URL containing ?gclid=... and revisits weeks later. The gclid is still in the URL. Capture it, include the current timestamp. Let the receiving endpoint or ad platform decide whether the click ID is still within its attribution window. UIAF does not enforce expiration at capture time.

Very long URLs with many parameters. Capture only parameters on the allowlist (UTMs, known click IDs, custom allowlist). All other query parameters are ignored. This bounds the data volume and prevents accidentally capturing PII or sensitive data from arbitrary query strings.

Referrer is empty but UTMs are present. Common when links are opened from native apps (email clients, messaging apps, some social apps). UTM parameters take precedence. Classify based on utm_source and utm_medium. Set referrer to null.

Multiple click IDs in one URL. Rare but possible (e.g., ?gclid=...&msclkid=... from a misconfigured redirect chain). Capture all of them. Store all in the click_ids object. The receiving endpoint determines which one is authoritative for each platform.

User clears browser storage. Attribution data is lost. On the next visit with attribution parameters, first_touch is set fresh. This is expected behavior — UIAF cannot persist data that the user has explicitly deleted. If server-side attribution storage is implemented, the server copy can be used to restore context when the user is re-identified (see 02-identity-management.md for identity recovery).