Identity Resolution

🧩 Why Identity Resolution Matters

Every visitor starts as a random UUID. That UUID tells you nothing about who the person is — it tells you that a browser visited your site. Identity resolution is the process of connecting that anonymous UUID to a real person: an email address, a customer ID, a phone number.

Without identity resolution, a single customer who visits on their phone, their laptop, and their work desktop looks like three different people. Their attribution history is split across three UIDs. Their conversion data is fragmented. The marketing team sees three journeys instead of one.

Identity resolution bridges this gap. When the user authenticates (login, signup, purchase, form submission), the website captures the link between the current anonymous UID and a known identifier.

🔗 Anonymous to Authenticated Linking

When to Link

An identity link event fires whenever the website obtains a verified identifier for the current user:

Login — user authenticates with email/password, SSO, or social login
Account creation — user signs up with email, phone, or both
Purchase — checkout collects email, phone, billing details
Form submission — lead form, newsletter signup, contact form with identifiable fields
Profile update — user changes email or phone number (new identifier to link)

The trigger is always: “the website now knows something identifiable about this anonymous UID.”

What to Send

The website sends an identity_link event to the endpoint. The payload associates the current UID with one or more hashed identifiers:

{
  "event": "identity_link",
  "identity": {
    "uid": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6.1647291600",
    "session_id": "a3b8c9d0-1234-5678-9abc-def012345678"
  },
  "identifiers": {
    "email_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "phone_hash": "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592",
    "customer_id": "CUS12345"
  },
  "_meta": {
    "ts": 1647291600000,
    "tier": "T1",
    "page_url": "https://example.com/account/login"
  }
}

Division of Responsibility

This is the critical architectural boundary:

The website captures the authentication moment, hashes the PII, and sends the identity_link event with the current UID and hashed identifiers. The website does not maintain an identity graph, does not query previous associations, and does not resolve cross-device relationships.
The endpoint receives identity_link events, builds and maintains the identity graph, resolves conflicts, merges duplicate identities, and answers queries about user identity from downstream systems.

The website is a sensor. The endpoint is the brain.

🔐 Hashing Specification

All PII must be hashed before transmission. The website sends hashes, never raw values. This is both a security requirement (see 08-data-handling.md) and a practical one — ad platforms (Meta CAPI, Google Enhanced Conversions) accept and expect SHA-256 hashed identifiers.

Algorithm

SHA-256. This is the minimum acceptable algorithm. It is the industry standard for ad platform matching (Meta, Google, TikTok, LinkedIn all accept SHA-256 hashed identifiers). Do not use MD5 (broken, collisions trivial) or SHA-1 (deprecated, collision attacks demonstrated).

Normalization Before Hashing

Normalization is critical. The same email address formatted differently produces a completely different hash. If the website hashes User@Gmail.com and Meta has user@gmail.com on file, the hashes will not match. The conversion is lost.

Normalization rules by identifier type:

Email:

Convert to lowercase
Trim leading and trailing whitespace
For Gmail addresses only: remove dots from the local part (j.doe@gmail.com becomes jdoe@gmail.com). Gmail ignores dots; other providers do not
Do not remove plus-addressing (user+tag@example.com stays as-is — the plus suffix may be the user’s actual address at non-Gmail providers)

Phone:

Convert to E.164 format: + followed by country code and number, no spaces, dashes, or parentheses
(555) 123-4567 with US country code becomes +15551234567
06-12345678 with NL country code becomes +31612345678

Name (first, last):

Convert to lowercase
Trim leading and trailing whitespace
Remove extra internal spaces (" John Doe " becomes "john doe")

Pseudocode

// PSEUDOCODE -- Adapt to your platform

function hashIdentifier(type, value):
    if value is null or empty:
        return null

    normalized = value

    if type == "email":
        normalized = lowercase(trim(value))
        if normalized ends with "@gmail.com":
            local_part = normalized before "@"
            local_part = remove_all(local_part, ".")
            normalized = local_part + "@gmail.com"

    else if type == "phone":
        normalized = toE164(value)  // library-dependent

    else if type == "name":
        normalized = lowercase(trim(value))
        normalized = collapse_whitespace(normalized)

    return sha256(normalized)

Salt

Optional. If used, the salt MUST be consistent across all systems that need to match hashes. A salted hash from the website will not match an unsalted hash at the ad platform. In practice, most ad platform integrations (Meta CAPI, Google Enhanced Conversions) expect unsalted SHA-256 hashes. Use salt only for internal identity resolution where all parties share the same salt.

What NOT to Hash

Never hash click IDs or platform cookies. These values must be sent raw:

gclid, gbraid, wbraid — Google needs the raw value to match conversions
_fbc, _fbp — Meta needs raw values; hashing breaks CAPI matching entirely (see LEARNINGS.md)
msclkid, ttclid, li_fat_id, epik, twclid, ScCid — all platforms expect raw click identifiers

Click IDs are personal data (GDPR Recital 30), but they are sent to the platforms that generated them. The privacy consideration is consent, not hashing.

📱 Cross-Device Linking

The Only Reliable Method

Cross-device identity resolution is reliable only through authenticated sessions (deterministic matching). The user must log in on each device. There is no other trustworthy approach.

Probabilistic matching (IP address + user agent + screen resolution + timezone) is unreliable (accuracy degrades quickly as device populations grow), legally risky (processing additional personal data without clear basis), and ethically questionable (users do not expect or understand it). UIAF does not use probabilistic matching.

Fingerprinting (canvas, WebGL, audio context) is actively blocked by Safari AFP (since Safari 26), restricted by Firefox, and prohibited by GDPR Article 5(1)(c) (data minimization) when deterministic alternatives exist. UIAF does not use fingerprinting.

Cross-Device Flow

The website does the same thing on both devices: capture the login, hash the email, send identity_link. The endpoint does the resolution. The website on Device B does not know about Device A, does not query the identity graph, and does not need to.

What the Endpoint Does (Out of Scope, but Relevant)

The endpoint’s identity graph is outside this specification, but the identity_link event is designed to give it everything it needs:

The current UID (which device/browser this is)
One or more hashed identifiers (which person this is)
Timestamp and session context (when this link was established)

How the endpoint resolves conflicts, merges identity clusters, and serves downstream queries is its own problem. UIAF provides the raw signals.

🌐 Cross-Domain Linking

For organizations operating multiple domains (e.g., shop.example.com and blog.example.com, or brand-a.com and brand-b.com), the UID cookie is scoped to a single domain. A user visiting both domains gets two different UIDs unless the domains coordinate.

Subdomain sharing is handled by the cookie Domain attribute (see 02-identity-management.md). The approaches below address entirely separate domains.

Approach 1: URL Token Handoff

The source domain generates a short-lived, single-use token and appends it to outbound links pointing to the target domain.

Requirements:

Token expires in minutes (120 seconds recommended). Long-lived tokens are a security risk.
Token is single-use. After validation, the token service invalidates it. Replay attacks are not possible.
Token validation is server-side. The token is never trusted client-side.
The _uiaf_token parameter is stripped from the URL after processing (both from the visible URL via history.replaceState and from any stored page URL).

Approach 2: Shared Backend Service

Both domains query a central identity service via server-side API calls.

Both domains send identity_link events to the same endpoint
The endpoint maintains UID-to-domain mappings
When the user authenticates on Domain B with the same email, the endpoint resolves both UIDs to the same person — identical to the cross-device flow

This approach requires shared backend infrastructure and works only after authentication. It does not enable anonymous cross-domain linking, but it avoids the complexity of token handoff.

Why the iframe Approach Is Dead

Historically, cross-domain identity sharing used iframes: Domain A embeds an invisible iframe from Domain B, the iframe reads Domain B’s cookie, and postMessage passes the UID back to Domain A.

This no longer works:

Safari: blocks all third-party cookies since Safari 13.1 (March 2020)
Firefox: partitions third-party cookies by top-level domain via Total Cookie Protection (TCP) since Firefox 86 (February 2021) — the iframe sees a different cookie store than direct visits
Brave: blocks third-party cookies by default
Chrome: implements Storage Access API requirements and CHIPS partitioning
Privacy regulations: CNIL and other DPAs have specifically flagged cross-domain iframe cookie sharing as requiring explicit consent, even when first-party cookies are involved

Storage partitioning (CHIPS in Chrome, TCP in Firefox) means that even if the iframe can set a cookie, it is partitioned — the cookie is scoped to the embedding domain, not the iframe’s domain. The identity stored in the iframe is inaccessible when the user visits Domain B directly.

Do not invest in iframe-based cross-domain identity. It is a dead path.

⚠️ Edge Cases

User Logs In With Different Email on Same Device

Two identity_link events fire with the same UID but different email_hash values. The endpoint must decide:

Same person, multiple emails: merge into one identity cluster. Common for users with personal and work email.
Different people, shared device: do not merge. This requires heuristics (e.g., two distinct login sessions minutes apart with different customer IDs likely indicate different people).

The website sends both events. Conflict resolution is the endpoint’s responsibility.

Shared Device (Family Computer)

Multiple users log in on the same browser with the same UID. The endpoint receives multiple identity_link events mapping one UID to many distinct identities. This is an anomaly signal: a single UID linked to five different email hashes and three different customer IDs is almost certainly a shared device, not one person with five emails.

The endpoint should detect this pattern and either flag the UID as shared (excluding it from person-level analysis) or create separate identity nodes for each authenticated user while noting the shared device context.

User Deletes Cookies but Logs In Again

The user clears all browser storage, returns to the site, and gets a new UID (UID-B). They then log in with the same email as before. The identity_link event sends UID-B with the same email_hash previously associated with UID-A.

The endpoint resolves this: same email_hash, different UIDs. The identity graph merges UID-B into the existing identity cluster. The user’s attribution history from UID-A is preserved, and UID-B is added as a new device/session node.

Identity linking involves hashed PII (email, phone). This is personal data processing that requires appropriate consent.

Tier	identity_link allowed?	Constraint
T0	Yes	No consent mechanism present
T1	Yes	Full consent granted
T2	No	Hashed PII sent to endpoint enables cross-device tracking and ad platform matching — this requires advertising consent (`ad_user_data`), which is denied at T2
T3	No	No consent for personal data processing
T4	No	System dormant

At T2, the user consented to analytics but NOT to sending user-provided data for advertising. Since identity_link transmits hashed PII that enables cross-device resolution and ad platform Enhanced Conversions, it requires ad_user_data consent. The endpoint cannot guarantee the hashed data won’t reach ad platforms.

If a site needs identity linking purely for first-party analytics (no ad platform forwarding), this requires explicit architectural guarantees at the endpoint level and should be documented as a site-specific policy decision, not a default behavior.