Skip to content

Identity Resolution


Every visitor starts as a random UUID. That UUID tells you nothing about who the person is — it tells you that a browser visited your site. Identity resolution is the process of connecting that anonymous UUID to a real person: an email address, a customer ID, a phone number.

Without identity resolution, a single customer who visits on their phone, their laptop, and their work desktop looks like three different people. Their attribution history is split across three UIDs. Their conversion data is fragmented. The marketing team sees three journeys instead of one.

Identity resolution bridges this gap. When the user authenticates (login, signup, purchase, form submission), the website captures the link between the current anonymous UID and a known identifier.


An identity link event fires whenever the website obtains a verified identifier for the current user:

  • Login — user authenticates with email/password, SSO, or social login
  • Account creation — user signs up with email, phone, or both
  • Purchase — checkout collects email, phone, billing details
  • Form submission — lead form, newsletter signup, contact form with identifiable fields
  • Profile update — user changes email or phone number (new identifier to link)

The trigger is always: “the website now knows something identifiable about this anonymous UID.”

The website sends an identity_link event to the endpoint. The payload associates the current UID with one or more hashed identifiers:

{
"event": "identity_link",
"identity": {
"uid": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6.1647291600",
"session_id": "a3b8c9d0-1234-5678-9abc-def012345678"
},
"identifiers": {
"email_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"phone_hash": "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592",
"customer_id": "CUS12345"
},
"_meta": {
"ts": 1647291600000,
"tier": "T1",
"page_url": "https://example.com/account/login"
}
}

This is the critical architectural boundary:

  • The website captures the authentication moment, hashes the PII, and sends the identity_link event with the current UID and hashed identifiers. The website does not maintain an identity graph, does not query previous associations, and does not resolve cross-device relationships.
  • The endpoint receives identity_link events, builds and maintains the identity graph, resolves conflicts, merges duplicate identities, and answers queries about user identity from downstream systems.

The website is a sensor. The endpoint is the brain.


All PII must be hashed before transmission. The website sends hashes, never raw values. This is both a security requirement (see 08-data-handling.md) and a practical one — ad platforms (Meta CAPI, Google Enhanced Conversions) accept and expect SHA-256 hashed identifiers.

SHA-256. This is the minimum acceptable algorithm. It is the industry standard for ad platform matching (Meta, Google, TikTok, LinkedIn all accept SHA-256 hashed identifiers). Do not use MD5 (broken, collisions trivial) or SHA-1 (deprecated, collision attacks demonstrated).

Normalization is critical. The same email address formatted differently produces a completely different hash. If the website hashes User@Gmail.com and Meta has user@gmail.com on file, the hashes will not match. The conversion is lost.

Normalization rules by identifier type:

Email:

  1. Convert to lowercase
  2. Trim leading and trailing whitespace
  3. For Gmail addresses only: remove dots from the local part (j.doe@gmail.com becomes jdoe@gmail.com). Gmail ignores dots; other providers do not
  4. Do not remove plus-addressing (user+tag@example.com stays as-is — the plus suffix may be the user’s actual address at non-Gmail providers)

Phone:

  1. Convert to E.164 format: + followed by country code and number, no spaces, dashes, or parentheses
  2. (555) 123-4567 with US country code becomes +15551234567
  3. 06-12345678 with NL country code becomes +31612345678

Name (first, last):

  1. Convert to lowercase
  2. Trim leading and trailing whitespace
  3. Remove extra internal spaces (" John Doe " becomes "john doe")
// PSEUDOCODE -- Adapt to your platform
function hashIdentifier(type, value):
if value is null or empty:
return null
normalized = value
if type == "email":
normalized = lowercase(trim(value))
if normalized ends with "@gmail.com":
local_part = normalized before "@"
local_part = remove_all(local_part, ".")
normalized = local_part + "@gmail.com"
else if type == "phone":
normalized = toE164(value) // library-dependent
else if type == "name":
normalized = lowercase(trim(value))
normalized = collapse_whitespace(normalized)
return sha256(normalized)

Optional. If used, the salt MUST be consistent across all systems that need to match hashes. A salted hash from the website will not match an unsalted hash at the ad platform. In practice, most ad platform integrations (Meta CAPI, Google Enhanced Conversions) expect unsalted SHA-256 hashes. Use salt only for internal identity resolution where all parties share the same salt.

Never hash click IDs or platform cookies. These values must be sent raw:

  • gclid, gbraid, wbraid — Google needs the raw value to match conversions
  • _fbc, _fbp — Meta needs raw values; hashing breaks CAPI matching entirely (see LEARNINGS.md)
  • msclkid, ttclid, li_fat_id, epik, twclid, ScCid — all platforms expect raw click identifiers

Click IDs are personal data (GDPR Recital 30), but they are sent to the platforms that generated them. The privacy consideration is consent, not hashing.


Cross-device identity resolution is reliable only through authenticated sessions (deterministic matching). The user must log in on each device. There is no other trustworthy approach.

Probabilistic matching (IP address + user agent + screen resolution + timezone) is unreliable (accuracy degrades quickly as device populations grow), legally risky (processing additional personal data without clear basis), and ethically questionable (users do not expect or understand it). UIAF does not use probabilistic matching.

Fingerprinting (canvas, WebGL, audio context) is actively blocked by Safari AFP (since Safari 26), restricted by Firefox, and prohibited by GDPR Article 5(1)(c) (data minimization) when deterministic alternatives exist. UIAF does not use fingerprinting.

The website does the same thing on both devices: capture the login, hash the email, send identity_link. The endpoint does the resolution. The website on Device B does not know about Device A, does not query the identity graph, and does not need to.

What the Endpoint Does (Out of Scope, but Relevant)

Section titled “What the Endpoint Does (Out of Scope, but Relevant)”

The endpoint’s identity graph is outside this specification, but the identity_link event is designed to give it everything it needs:

  • The current UID (which device/browser this is)
  • One or more hashed identifiers (which person this is)
  • Timestamp and session context (when this link was established)

How the endpoint resolves conflicts, merges identity clusters, and serves downstream queries is its own problem. UIAF provides the raw signals.


For organizations operating multiple domains (e.g., shop.example.com and blog.example.com, or brand-a.com and brand-b.com), the UID cookie is scoped to a single domain. A user visiting both domains gets two different UIDs unless the domains coordinate.

Subdomain sharing is handled by the cookie Domain attribute (see 02-identity-management.md). The approaches below address entirely separate domains.

The source domain generates a short-lived, single-use token and appends it to outbound links pointing to the target domain.

Requirements:

  • Token expires in minutes (120 seconds recommended). Long-lived tokens are a security risk.
  • Token is single-use. After validation, the token service invalidates it. Replay attacks are not possible.
  • Token validation is server-side. The token is never trusted client-side.
  • The _uiaf_token parameter is stripped from the URL after processing (both from the visible URL via history.replaceState and from any stored page URL).

Both domains query a central identity service via server-side API calls.

  • Both domains send identity_link events to the same endpoint
  • The endpoint maintains UID-to-domain mappings
  • When the user authenticates on Domain B with the same email, the endpoint resolves both UIDs to the same person — identical to the cross-device flow

This approach requires shared backend infrastructure and works only after authentication. It does not enable anonymous cross-domain linking, but it avoids the complexity of token handoff.

Historically, cross-domain identity sharing used iframes: Domain A embeds an invisible iframe from Domain B, the iframe reads Domain B’s cookie, and postMessage passes the UID back to Domain A.

This no longer works:

  • Safari: blocks all third-party cookies since Safari 13.1 (March 2020)
  • Firefox: partitions third-party cookies by top-level domain via Total Cookie Protection (TCP) since Firefox 86 (February 2021) — the iframe sees a different cookie store than direct visits
  • Brave: blocks third-party cookies by default
  • Chrome: implements Storage Access API requirements and CHIPS partitioning
  • Privacy regulations: CNIL and other DPAs have specifically flagged cross-domain iframe cookie sharing as requiring explicit consent, even when first-party cookies are involved

Storage partitioning (CHIPS in Chrome, TCP in Firefox) means that even if the iframe can set a cookie, it is partitioned — the cookie is scoped to the embedding domain, not the iframe’s domain. The identity stored in the iframe is inaccessible when the user visits Domain B directly.

Do not invest in iframe-based cross-domain identity. It is a dead path.


User Logs In With Different Email on Same Device

Section titled “User Logs In With Different Email on Same Device”

Two identity_link events fire with the same UID but different email_hash values. The endpoint must decide:

  • Same person, multiple emails: merge into one identity cluster. Common for users with personal and work email.
  • Different people, shared device: do not merge. This requires heuristics (e.g., two distinct login sessions minutes apart with different customer IDs likely indicate different people).

The website sends both events. Conflict resolution is the endpoint’s responsibility.

Multiple users log in on the same browser with the same UID. The endpoint receives multiple identity_link events mapping one UID to many distinct identities. This is an anomaly signal: a single UID linked to five different email hashes and three different customer IDs is almost certainly a shared device, not one person with five emails.

The endpoint should detect this pattern and either flag the UID as shared (excluding it from person-level analysis) or create separate identity nodes for each authenticated user while noting the shared device context.

The user clears all browser storage, returns to the site, and gets a new UID (UID-B). They then log in with the same email as before. The identity_link event sends UID-B with the same email_hash previously associated with UID-A.

The endpoint resolves this: same email_hash, different UIDs. The identity graph merges UID-B into the existing identity cluster. The user’s attribution history from UID-A is preserved, and UID-B is added as a new device/session node.

Identity linking involves hashed PII (email, phone). This is personal data processing that requires appropriate consent.

Tieridentity_link allowed?Constraint
T0YesNo consent mechanism present
T1YesFull consent granted
T2NoHashed PII sent to endpoint enables cross-device tracking and ad platform matching — this requires advertising consent (ad_user_data), which is denied at T2
T3NoNo consent for personal data processing
T4NoSystem dormant

At T2, the user consented to analytics but NOT to sending user-provided data for advertising. Since identity_link transmits hashed PII that enables cross-device resolution and ad platform Enhanced Conversions, it requires ad_user_data consent. The endpoint cannot guarantee the hashed data won’t reach ad platforms.

If a site needs identity linking purely for first-party analytics (no ad platform forwarding), this requires explicit architectural guarantees at the endpoint level and should be documented as a site-specific policy decision, not a default behavior.