Identity Resolution
🧩 Why Identity Resolution Matters
Section titled “🧩 Why Identity Resolution Matters”Every visitor starts as a random UUID. That UUID tells you nothing about who the person is — it tells you that a browser visited your site. Identity resolution is the process of connecting that anonymous UUID to a real person: an email address, a customer ID, a phone number.
Without identity resolution, a single customer who visits on their phone, their laptop, and their work desktop looks like three different people. Their attribution history is split across three UIDs. Their conversion data is fragmented. The marketing team sees three journeys instead of one.
Identity resolution bridges this gap. When the user authenticates (login, signup, purchase, form submission), the website captures the link between the current anonymous UID and a known identifier.
🔗 Anonymous to Authenticated Linking
Section titled “🔗 Anonymous to Authenticated Linking”When to Link
Section titled “When to Link”An identity link event fires whenever the website obtains a verified identifier for the current user:
- Login — user authenticates with email/password, SSO, or social login
- Account creation — user signs up with email, phone, or both
- Purchase — checkout collects email, phone, billing details
- Form submission — lead form, newsletter signup, contact form with identifiable fields
- Profile update — user changes email or phone number (new identifier to link)
The trigger is always: “the website now knows something identifiable about this anonymous UID.”
What to Send
Section titled “What to Send”The website sends an identity_link event to the endpoint. The payload associates the current UID with one or more hashed identifiers:
{ "event": "identity_link", "identity": { "uid": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6.1647291600", "session_id": "a3b8c9d0-1234-5678-9abc-def012345678" }, "identifiers": { "email_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "phone_hash": "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592", "customer_id": "CUS12345" }, "_meta": { "ts": 1647291600000, "tier": "T1", "page_url": "https://example.com/account/login" }}Division of Responsibility
Section titled “Division of Responsibility”This is the critical architectural boundary:
- The website captures the authentication moment, hashes the PII, and sends the
identity_linkevent with the current UID and hashed identifiers. The website does not maintain an identity graph, does not query previous associations, and does not resolve cross-device relationships. - The endpoint receives
identity_linkevents, builds and maintains the identity graph, resolves conflicts, merges duplicate identities, and answers queries about user identity from downstream systems.
The website is a sensor. The endpoint is the brain.
🔐 Hashing Specification
Section titled “🔐 Hashing Specification”All PII must be hashed before transmission. The website sends hashes, never raw values. This is both a security requirement (see 08-data-handling.md) and a practical one — ad platforms (Meta CAPI, Google Enhanced Conversions) accept and expect SHA-256 hashed identifiers.
Algorithm
Section titled “Algorithm”SHA-256. This is the minimum acceptable algorithm. It is the industry standard for ad platform matching (Meta, Google, TikTok, LinkedIn all accept SHA-256 hashed identifiers). Do not use MD5 (broken, collisions trivial) or SHA-1 (deprecated, collision attacks demonstrated).
Normalization Before Hashing
Section titled “Normalization Before Hashing”Normalization is critical. The same email address formatted differently produces a completely different hash. If the website hashes User@Gmail.com and Meta has user@gmail.com on file, the hashes will not match. The conversion is lost.
Normalization rules by identifier type:
Email:
- Convert to lowercase
- Trim leading and trailing whitespace
- For Gmail addresses only: remove dots from the local part (
j.doe@gmail.combecomesjdoe@gmail.com). Gmail ignores dots; other providers do not - Do not remove plus-addressing (
user+tag@example.comstays as-is — the plus suffix may be the user’s actual address at non-Gmail providers)
Phone:
- Convert to E.164 format:
+followed by country code and number, no spaces, dashes, or parentheses (555) 123-4567with US country code becomes+1555123456706-12345678with NL country code becomes+31612345678
Name (first, last):
- Convert to lowercase
- Trim leading and trailing whitespace
- Remove extra internal spaces (
" John Doe "becomes"john doe")
Pseudocode
Section titled “Pseudocode”// PSEUDOCODE -- Adapt to your platform
function hashIdentifier(type, value): if value is null or empty: return null
normalized = value
if type == "email": normalized = lowercase(trim(value)) if normalized ends with "@gmail.com": local_part = normalized before "@" local_part = remove_all(local_part, ".") normalized = local_part + "@gmail.com"
else if type == "phone": normalized = toE164(value) // library-dependent
else if type == "name": normalized = lowercase(trim(value)) normalized = collapse_whitespace(normalized)
return sha256(normalized)Optional. If used, the salt MUST be consistent across all systems that need to match hashes. A salted hash from the website will not match an unsalted hash at the ad platform. In practice, most ad platform integrations (Meta CAPI, Google Enhanced Conversions) expect unsalted SHA-256 hashes. Use salt only for internal identity resolution where all parties share the same salt.
What NOT to Hash
Section titled “What NOT to Hash”Never hash click IDs or platform cookies. These values must be sent raw:
gclid,gbraid,wbraid— Google needs the raw value to match conversions_fbc,_fbp— Meta needs raw values; hashing breaks CAPI matching entirely (see LEARNINGS.md)msclkid,ttclid,li_fat_id,epik,twclid,ScCid— all platforms expect raw click identifiers
Click IDs are personal data (GDPR Recital 30), but they are sent to the platforms that generated them. The privacy consideration is consent, not hashing.
📱 Cross-Device Linking
Section titled “📱 Cross-Device Linking”The Only Reliable Method
Section titled “The Only Reliable Method”Cross-device identity resolution is reliable only through authenticated sessions (deterministic matching). The user must log in on each device. There is no other trustworthy approach.
Probabilistic matching (IP address + user agent + screen resolution + timezone) is unreliable (accuracy degrades quickly as device populations grow), legally risky (processing additional personal data without clear basis), and ethically questionable (users do not expect or understand it). UIAF does not use probabilistic matching.
Fingerprinting (canvas, WebGL, audio context) is actively blocked by Safari AFP (since Safari 26), restricted by Firefox, and prohibited by GDPR Article 5(1)(c) (data minimization) when deterministic alternatives exist. UIAF does not use fingerprinting.
Cross-Device Flow
Section titled “Cross-Device Flow”The website does the same thing on both devices: capture the login, hash the email, send identity_link. The endpoint does the resolution. The website on Device B does not know about Device A, does not query the identity graph, and does not need to.
What the Endpoint Does (Out of Scope, but Relevant)
Section titled “What the Endpoint Does (Out of Scope, but Relevant)”The endpoint’s identity graph is outside this specification, but the identity_link event is designed to give it everything it needs:
- The current UID (which device/browser this is)
- One or more hashed identifiers (which person this is)
- Timestamp and session context (when this link was established)
How the endpoint resolves conflicts, merges identity clusters, and serves downstream queries is its own problem. UIAF provides the raw signals.
🌐 Cross-Domain Linking
Section titled “🌐 Cross-Domain Linking”For organizations operating multiple domains (e.g., shop.example.com and blog.example.com, or brand-a.com and brand-b.com), the UID cookie is scoped to a single domain. A user visiting both domains gets two different UIDs unless the domains coordinate.
Subdomain sharing is handled by the cookie Domain attribute (see 02-identity-management.md). The approaches below address entirely separate domains.
Approach 1: URL Token Handoff
Section titled “Approach 1: URL Token Handoff”The source domain generates a short-lived, single-use token and appends it to outbound links pointing to the target domain.
Requirements:
- Token expires in minutes (120 seconds recommended). Long-lived tokens are a security risk.
- Token is single-use. After validation, the token service invalidates it. Replay attacks are not possible.
- Token validation is server-side. The token is never trusted client-side.
- The
_uiaf_tokenparameter is stripped from the URL after processing (both from the visible URL viahistory.replaceStateand from any stored page URL).
Approach 2: Shared Backend Service
Section titled “Approach 2: Shared Backend Service”Both domains query a central identity service via server-side API calls.
- Both domains send
identity_linkevents to the same endpoint - The endpoint maintains UID-to-domain mappings
- When the user authenticates on Domain B with the same email, the endpoint resolves both UIDs to the same person — identical to the cross-device flow
This approach requires shared backend infrastructure and works only after authentication. It does not enable anonymous cross-domain linking, but it avoids the complexity of token handoff.
Why the iframe Approach Is Dead
Section titled “Why the iframe Approach Is Dead”Historically, cross-domain identity sharing used iframes: Domain A embeds an invisible iframe from Domain B, the iframe reads Domain B’s cookie, and postMessage passes the UID back to Domain A.
This no longer works:
- Safari: blocks all third-party cookies since Safari 13.1 (March 2020)
- Firefox: partitions third-party cookies by top-level domain via Total Cookie Protection (TCP) since Firefox 86 (February 2021) — the iframe sees a different cookie store than direct visits
- Brave: blocks third-party cookies by default
- Chrome: implements Storage Access API requirements and CHIPS partitioning
- Privacy regulations: CNIL and other DPAs have specifically flagged cross-domain iframe cookie sharing as requiring explicit consent, even when first-party cookies are involved
Storage partitioning (CHIPS in Chrome, TCP in Firefox) means that even if the iframe can set a cookie, it is partitioned — the cookie is scoped to the embedding domain, not the iframe’s domain. The identity stored in the iframe is inaccessible when the user visits Domain B directly.
Do not invest in iframe-based cross-domain identity. It is a dead path.
⚠️ Edge Cases
Section titled “⚠️ Edge Cases”User Logs In With Different Email on Same Device
Section titled “User Logs In With Different Email on Same Device”Two identity_link events fire with the same UID but different email_hash values. The endpoint must decide:
- Same person, multiple emails: merge into one identity cluster. Common for users with personal and work email.
- Different people, shared device: do not merge. This requires heuristics (e.g., two distinct login sessions minutes apart with different customer IDs likely indicate different people).
The website sends both events. Conflict resolution is the endpoint’s responsibility.
Shared Device (Family Computer)
Section titled “Shared Device (Family Computer)”Multiple users log in on the same browser with the same UID. The endpoint receives multiple identity_link events mapping one UID to many distinct identities. This is an anomaly signal: a single UID linked to five different email hashes and three different customer IDs is almost certainly a shared device, not one person with five emails.
The endpoint should detect this pattern and either flag the UID as shared (excluding it from person-level analysis) or create separate identity nodes for each authenticated user while noting the shared device context.
User Deletes Cookies but Logs In Again
Section titled “User Deletes Cookies but Logs In Again”The user clears all browser storage, returns to the site, and gets a new UID (UID-B). They then log in with the same email as before. The identity_link event sends UID-B with the same email_hash previously associated with UID-A.
The endpoint resolves this: same email_hash, different UIDs. The identity graph merges UID-B into the existing identity cluster. The user’s attribution history from UID-A is preserved, and UID-B is added as a new device/session node.
Consent Constraints on Identity Linking
Section titled “Consent Constraints on Identity Linking”Identity linking involves hashed PII (email, phone). This is personal data processing that requires appropriate consent.
| Tier | identity_link allowed? | Constraint |
|---|---|---|
| T0 | Yes | No consent mechanism present |
| T1 | Yes | Full consent granted |
| T2 | No | Hashed PII sent to endpoint enables cross-device tracking and ad platform matching — this requires advertising consent (ad_user_data), which is denied at T2 |
| T3 | No | No consent for personal data processing |
| T4 | No | System dormant |
At T2, the user consented to analytics but NOT to sending user-provided data for advertising. Since identity_link transmits hashed PII that enables cross-device resolution and ad platform Enhanced Conversions, it requires ad_user_data consent. The endpoint cannot guarantee the hashed data won’t reach ad platforms.
If a site needs identity linking purely for first-party analytics (no ad platform forwarding), this requires explicit architectural guarantees at the endpoint level and should be documented as a site-specific policy decision, not a default behavior.