Skip to content

Data Handling


📋 What Is Personal Data in This Context

Section titled “📋 What Is Personal Data in This Context”

Not everything UIAF touches is personal data, and not everything that is personal data requires the same treatment. The classification determines which legal obligations apply.

DataPersonal data?WhySource
Random UUID (uiaf_uid)YesDesigned to re-identify returning users. A unique identifier assigned to distinguish one visitor from another meets the GDPR definition even without a name attached (Breyer ruling, CJEU C-582/14: dynamic IP addresses are personal data when the controller has legal means to identify the person)GDPR Art. 4(1)
Hashed emailYesPseudonymization is not anonymization. The hash can be reversed by brute force or matched against known hashes. The Bavarian DPA fined EUR 3M for treating hashed emails as anonymous dataGDPR Recital 26
Click IDs (gclid, fbclid, msclkid, etc.)YesOnline identifiers that link to user profiles maintained by advertising platforms. Explicitly covered by GDPR’s definition of personal dataGDPR Recital 30
UTM parameters (utm_source, utm_medium, etc.)NoDescribe campaigns, not users. No DPA has classified UTM values as personal data. They carry no user-specific information
IP address (even truncated)YesCNIL (France), Austrian DSB, and Italian Garante have all ruled that truncated IP addresses remain personal data. Truncation is minimization, not anonymization — the address can still be correlated with other dataMultiple DPA rulings
User agent stringDebatableCan contribute to fingerprinting when combined with other signals, but alone is insufficient for identification. Not consistently classified by DPAs. Treat with caution but not as definitive PII

SHA-256 is the minimum acceptable algorithm for all PII before transmission to the endpoint. This is both a security measure and a compatibility requirement — Meta CAPI, Google Enhanced Conversions, TikTok Events API, and LinkedIn CAPI all accept SHA-256 hashed identifiers.

Hashing without normalization is useless. The same email formatted differently produces a completely different hash. All normalization rules are specified in 06-identity-resolution.md and must be applied before hashing.

Hash server-side whenever the architecture permits. Client-side hashing means the raw PII passes through JavaScript, where it is:

  • Visible in browser developer tools
  • Accessible to any script running on the page (including injected scripts from XSS)
  • Potentially logged by browser extensions

Server-side hashing means the raw PII travels from the form submission to the server, is hashed in server memory, and only the hash is included in the endpoint payload. The raw value never reaches the client-side UIAF code.

When server-side hashing is not possible (e.g., pure client-side SPA with no server rendering), hash immediately upon capture — before storing in any client-side variable beyond the immediate function scope.

// PSEUDOCODE -- Adapt to your platform
function normalizeAndHash(type, value):
if value is null or empty:
return null
// Normalize (rules from 06-identity-resolution.md)
if type == "email":
normalized = lowercase(trim(value))
if normalized ends with "@gmail.com":
local_part = split(normalized, "@")[0]
local_part = remove_all(local_part, ".")
normalized = local_part + "@gmail.com"
else if type == "phone":
normalized = toE164(value)
else if type == "name":
normalized = lowercase(trim(collapse_whitespace(value)))
else:
normalized = trim(value)
return sha256(normalized)

Click IDs and platform cookies must be sent raw. Platforms need the original value to match conversions:

  • _fbc, _fbp — hashing breaks Meta CAPI matching entirely
  • gclid, gbraid, wbraid — Google requires raw values
  • msclkid, ttclid, li_fat_id, epik, twclid, ScCid — all platforms expect raw click identifiers

These are personal data (see table above), but they are sent to the platforms that generated them. The privacy control is consent, not obfuscation.


Capture only what you need. GDPR Article 5(1)(c) requires that personal data be “adequate, relevant and limited to what is necessary.” In practice:

URL parameters: Capture only allowlisted parameters (UTMs, click IDs, configured custom parameters). Do not capture all query parameters indiscriminately. Query strings can contain search queries, session tokens, PII (?email=john@example.com), and other values with no attribution purpose.

IP addresses: Do not forward raw IP addresses in the endpoint payload. If the endpoint needs IP for geolocation or ad platform matching, it should extract it from the HTTP request headers itself (the transport layer already carries it). Including IP in the JSON payload creates an additional copy with no benefit.

URL paths: Do not capture or store URL path segments that might contain PII. Paths like /user/john-smith/profile or /order/confirmation/john@example.com leak personal data into attribution records. Capture the page path for context, but sanitize or exclude paths matching known PII patterns.

Cookie size: Keep all UIAF cookies under 4KB total. The UID cookie is approximately 50 bytes with attributes. Attribution data stored in cookies (if any) must be compact. Do not store full UTM strings in cookies when a hash or truncated reference suffices for session continuity.

Unused parameters: Do not capture attribution data you will never analyze. If your organization never segments by utm_content, do not capture it. Every stored parameter is personal data processing that must be justified.


UIAF processes personal data. Data subjects have rights under GDPR (and equivalent legislation) that the implementation must support.

The user may request all data associated with their identity. The website’s obligation is limited: it stores only the UID, session ID, and attribution data in browser storage. The bulk of the data is at the endpoint.

The implementation must provide a mechanism (typically an API call or admin interface) that, given a UID, retrieves all associated data from the endpoint. This is the endpoint’s responsibility to fulfill, but the website must be able to identify the requesting user’s UID to initiate the request.

When a user requests deletion, two things must happen:

  1. Client-side: Clear all UIAF data from the browser — cookies, localStorage, sessionStorage
  2. Server-side: Notify the endpoint to purge all data associated with the UID
// PSEUDOCODE -- Adapt to your platform
function handleDeletionRequest(uid):
// 1. Clear all client-side storage
deleteClientCookie("uiaf_uid")
deleteClientCookie("uiaf_attribution")
// Delete any other UIAF cookies
try:
localStorage.removeItem("uiaf_uid")
localStorage.removeItem("uiaf_attribution")
localStorage.removeItem("uiaf_first_touch")
localStorage.removeItem("uiaf_last_touch")
localStorage.removeItem("uiaf_retry_queue")
catch (error):
pass // Storage may be unavailable
try:
sessionStorage.removeItem("uiaf_uid")
sessionStorage.removeItem("uiaf_session_id")
catch (error):
pass
// 2. Notify endpoint to purge server-side data
sendToEndpoint({
event: "data_deletion_request",
identity: { uid: uid },
_meta: {
ts: currentTimestamp(),
request_type: "erasure"
}
})
// 3. After deletion, the user is fully anonymous again
// Do NOT generate a new UID -- the user has exercised a right

The endpoint must cascade the deletion to all downstream systems that received data for this UID. This cascade is outside UIAF’s scope but the data_deletion_request event initiates it.

Data associated with a UID must be exportable in a machine-readable format (JSON, CSV). This is an endpoint capability — the website triggers the request, the endpoint compiles and delivers the export.


Every UIAF cookie must be set with security attributes that prevent common attack vectors.

AttributeValueSecurity purpose
SecuretruePrevents transmission over unencrypted HTTP. Without this, cookies are sent in plaintext on HTTP connections, vulnerable to network interception
SameSiteLaxPrevents CSRF attacks. Cookie is not sent on cross-site POST requests (form submissions from other domains). Allows normal top-level GET navigation (clicking a link to your site)
Path/Scopes the cookie to the entire site. Avoids accidental leakage to subpath applications sharing the same domain
Max-Age34560000400 days. Limits the exposure window. Without an explicit expiry, session cookies persist only until browser close (too short). Values exceeding 400 days are silently capped by browsers (RFC 6265bis)

The HttpOnly decision is documented in 02-identity-management.md. The tradeoff between XSS protection and client-side recovery capability is architecture-dependent.


All communication between the website and the endpoint must use HTTPS (TLS 1.2 or higher). This is non-negotiable. HTTP transmissions expose the entire payload — UIDs, hashed PII, click IDs, attribution data — to network-level interception.

Sensitive data belongs in the POST request body, never in URL query parameters. URLs are logged in server access logs, cached by proxies and CDNs, stored in browser history, and visible in the Referer header of subsequent requests. A GET request with ?email_hash=abc123&uid=xyz leaks personal data into every system that touches the URL.

If the browser sends payloads directly to the endpoint (as opposed to the website’s own server proxying the request), the endpoint must return appropriate CORS headers:

  • Access-Control-Allow-Origin: the specific origin(s) of the website(s), never *
  • Access-Control-Allow-Methods: POST
  • Access-Control-Allow-Headers: Content-Type

Wildcard CORS (*) permits any website to send data to your endpoint. Use explicit origin allowlisting.


Server logs are a common source of data leaks. They persist for weeks or months, are often stored without encryption, and are accessible to operations teams who may not be authorized to view personal data.

Never log raw PII in server logs. Email addresses, phone numbers, and names must not appear in application logs, even during debugging. Log the hash if you need to trace a specific identifier.

Never log full cookie values in debug output. A log line containing Cookie: uiaf_uid=f81d4fae-7dec-11d0-a765-00a0c91e6bf6.1647291600 exposes the user’s persistent identifier to anyone with log access. Log the cookie name and a truncated hash of the value if you need to trace cookie behavior.

Never include raw IP addresses in endpoint payloads. The HTTP request already carries the IP in headers. Duplicating it in the JSON body creates an additional copy that travels through the endpoint’s processing pipeline, gets stored in the endpoint’s database, and must be covered by deletion requests. Let the transport layer handle IP — do not promote it to an explicit data field.

Structured logging. If your platform supports structured logging (JSON log entries with fields), use it. Mark fields containing personal data with a pii: true flag or equivalent so log aggregation systems can filter or redact them automatically.