Data Handling
📋 What Is Personal Data in This Context
Section titled “📋 What Is Personal Data in This Context”Not everything UIAF touches is personal data, and not everything that is personal data requires the same treatment. The classification determines which legal obligations apply.
| Data | Personal data? | Why | Source |
|---|---|---|---|
Random UUID (uiaf_uid) | Yes | Designed to re-identify returning users. A unique identifier assigned to distinguish one visitor from another meets the GDPR definition even without a name attached (Breyer ruling, CJEU C-582/14: dynamic IP addresses are personal data when the controller has legal means to identify the person) | GDPR Art. 4(1) |
| Hashed email | Yes | Pseudonymization is not anonymization. The hash can be reversed by brute force or matched against known hashes. The Bavarian DPA fined EUR 3M for treating hashed emails as anonymous data | GDPR Recital 26 |
Click IDs (gclid, fbclid, msclkid, etc.) | Yes | Online identifiers that link to user profiles maintained by advertising platforms. Explicitly covered by GDPR’s definition of personal data | GDPR Recital 30 |
UTM parameters (utm_source, utm_medium, etc.) | No | Describe campaigns, not users. No DPA has classified UTM values as personal data. They carry no user-specific information | — |
| IP address (even truncated) | Yes | CNIL (France), Austrian DSB, and Italian Garante have all ruled that truncated IP addresses remain personal data. Truncation is minimization, not anonymization — the address can still be correlated with other data | Multiple DPA rulings |
| User agent string | Debatable | Can contribute to fingerprinting when combined with other signals, but alone is insufficient for identification. Not consistently classified by DPAs. Treat with caution but not as definitive PII | — |
🔐 Hashing Requirements
Section titled “🔐 Hashing Requirements”Algorithm and Application
Section titled “Algorithm and Application”SHA-256 is the minimum acceptable algorithm for all PII before transmission to the endpoint. This is both a security measure and a compatibility requirement — Meta CAPI, Google Enhanced Conversions, TikTok Events API, and LinkedIn CAPI all accept SHA-256 hashed identifiers.
Normalization
Section titled “Normalization”Hashing without normalization is useless. The same email formatted differently produces a completely different hash. All normalization rules are specified in 06-identity-resolution.md and must be applied before hashing.
Server-Side Hashing
Section titled “Server-Side Hashing”Hash server-side whenever the architecture permits. Client-side hashing means the raw PII passes through JavaScript, where it is:
- Visible in browser developer tools
- Accessible to any script running on the page (including injected scripts from XSS)
- Potentially logged by browser extensions
Server-side hashing means the raw PII travels from the form submission to the server, is hashed in server memory, and only the hash is included in the endpoint payload. The raw value never reaches the client-side UIAF code.
When server-side hashing is not possible (e.g., pure client-side SPA with no server rendering), hash immediately upon capture — before storing in any client-side variable beyond the immediate function scope.
Pseudocode
Section titled “Pseudocode”// PSEUDOCODE -- Adapt to your platform
function normalizeAndHash(type, value): if value is null or empty: return null
// Normalize (rules from 06-identity-resolution.md) if type == "email": normalized = lowercase(trim(value)) if normalized ends with "@gmail.com": local_part = split(normalized, "@")[0] local_part = remove_all(local_part, ".") normalized = local_part + "@gmail.com" else if type == "phone": normalized = toE164(value) else if type == "name": normalized = lowercase(trim(collapse_whitespace(value))) else: normalized = trim(value)
return sha256(normalized)What NOT to Hash
Section titled “What NOT to Hash”Click IDs and platform cookies must be sent raw. Platforms need the original value to match conversions:
_fbc,_fbp— hashing breaks Meta CAPI matching entirelygclid,gbraid,wbraid— Google requires raw valuesmsclkid,ttclid,li_fat_id,epik,twclid,ScCid— all platforms expect raw click identifiers
These are personal data (see table above), but they are sent to the platforms that generated them. The privacy control is consent, not obfuscation.
✂️ Data Minimization
Section titled “✂️ Data Minimization”Capture only what you need. GDPR Article 5(1)(c) requires that personal data be “adequate, relevant and limited to what is necessary.” In practice:
URL parameters: Capture only allowlisted parameters (UTMs, click IDs, configured custom parameters). Do not capture all query parameters indiscriminately. Query strings can contain search queries, session tokens, PII (?email=john@example.com), and other values with no attribution purpose.
IP addresses: Do not forward raw IP addresses in the endpoint payload. If the endpoint needs IP for geolocation or ad platform matching, it should extract it from the HTTP request headers itself (the transport layer already carries it). Including IP in the JSON payload creates an additional copy with no benefit.
URL paths: Do not capture or store URL path segments that might contain PII. Paths like /user/john-smith/profile or /order/confirmation/john@example.com leak personal data into attribution records. Capture the page path for context, but sanitize or exclude paths matching known PII patterns.
Cookie size: Keep all UIAF cookies under 4KB total. The UID cookie is approximately 50 bytes with attributes. Attribution data stored in cookies (if any) must be compact. Do not store full UTM strings in cookies when a hash or truncated reference suffices for session continuity.
Unused parameters: Do not capture attribution data you will never analyze. If your organization never segments by utm_content, do not capture it. Every stored parameter is personal data processing that must be justified.
⚖️ Data Subject Rights
Section titled “⚖️ Data Subject Rights”UIAF processes personal data. Data subjects have rights under GDPR (and equivalent legislation) that the implementation must support.
Right to Access (GDPR Art. 15)
Section titled “Right to Access (GDPR Art. 15)”The user may request all data associated with their identity. The website’s obligation is limited: it stores only the UID, session ID, and attribution data in browser storage. The bulk of the data is at the endpoint.
The implementation must provide a mechanism (typically an API call or admin interface) that, given a UID, retrieves all associated data from the endpoint. This is the endpoint’s responsibility to fulfill, but the website must be able to identify the requesting user’s UID to initiate the request.
Right to Deletion (GDPR Art. 17)
Section titled “Right to Deletion (GDPR Art. 17)”When a user requests deletion, two things must happen:
- Client-side: Clear all UIAF data from the browser — cookies, localStorage, sessionStorage
- Server-side: Notify the endpoint to purge all data associated with the UID
// PSEUDOCODE -- Adapt to your platform
function handleDeletionRequest(uid): // 1. Clear all client-side storage deleteClientCookie("uiaf_uid") deleteClientCookie("uiaf_attribution") // Delete any other UIAF cookies
try: localStorage.removeItem("uiaf_uid") localStorage.removeItem("uiaf_attribution") localStorage.removeItem("uiaf_first_touch") localStorage.removeItem("uiaf_last_touch") localStorage.removeItem("uiaf_retry_queue") catch (error): pass // Storage may be unavailable
try: sessionStorage.removeItem("uiaf_uid") sessionStorage.removeItem("uiaf_session_id") catch (error): pass
// 2. Notify endpoint to purge server-side data sendToEndpoint({ event: "data_deletion_request", identity: { uid: uid }, _meta: { ts: currentTimestamp(), request_type: "erasure" } })
// 3. After deletion, the user is fully anonymous again // Do NOT generate a new UID -- the user has exercised a rightThe endpoint must cascade the deletion to all downstream systems that received data for this UID. This cascade is outside UIAF’s scope but the data_deletion_request event initiates it.
Right to Portability (GDPR Art. 20)
Section titled “Right to Portability (GDPR Art. 20)”Data associated with a UID must be exportable in a machine-readable format (JSON, CSV). This is an endpoint capability — the website triggers the request, the endpoint compiles and delivers the export.
🍪 Cookie Security
Section titled “🍪 Cookie Security”Every UIAF cookie must be set with security attributes that prevent common attack vectors.
| Attribute | Value | Security purpose |
|---|---|---|
Secure | true | Prevents transmission over unencrypted HTTP. Without this, cookies are sent in plaintext on HTTP connections, vulnerable to network interception |
SameSite | Lax | Prevents CSRF attacks. Cookie is not sent on cross-site POST requests (form submissions from other domains). Allows normal top-level GET navigation (clicking a link to your site) |
Path | / | Scopes the cookie to the entire site. Avoids accidental leakage to subpath applications sharing the same domain |
Max-Age | 34560000 | 400 days. Limits the exposure window. Without an explicit expiry, session cookies persist only until browser close (too short). Values exceeding 400 days are silently capped by browsers (RFC 6265bis) |
The HttpOnly decision is documented in 02-identity-management.md. The tradeoff between XSS protection and client-side recovery capability is architecture-dependent.
🔐 Transport Security
Section titled “🔐 Transport Security”HTTPS Required
Section titled “HTTPS Required”All communication between the website and the endpoint must use HTTPS (TLS 1.2 or higher). This is non-negotiable. HTTP transmissions expose the entire payload — UIDs, hashed PII, click IDs, attribution data — to network-level interception.
POST Body, Not URL Parameters
Section titled “POST Body, Not URL Parameters”Sensitive data belongs in the POST request body, never in URL query parameters. URLs are logged in server access logs, cached by proxies and CDNs, stored in browser history, and visible in the Referer header of subsequent requests. A GET request with ?email_hash=abc123&uid=xyz leaks personal data into every system that touches the URL.
CORS Headers
Section titled “CORS Headers”If the browser sends payloads directly to the endpoint (as opposed to the website’s own server proxying the request), the endpoint must return appropriate CORS headers:
Access-Control-Allow-Origin: the specific origin(s) of the website(s), never*Access-Control-Allow-Methods:POSTAccess-Control-Allow-Headers:Content-Type
Wildcard CORS (*) permits any website to send data to your endpoint. Use explicit origin allowlisting.
🚫 What NOT to Log
Section titled “🚫 What NOT to Log”Server logs are a common source of data leaks. They persist for weeks or months, are often stored without encryption, and are accessible to operations teams who may not be authorized to view personal data.
Never log raw PII in server logs. Email addresses, phone numbers, and names must not appear in application logs, even during debugging. Log the hash if you need to trace a specific identifier.
Never log full cookie values in debug output. A log line containing Cookie: uiaf_uid=f81d4fae-7dec-11d0-a765-00a0c91e6bf6.1647291600 exposes the user’s persistent identifier to anyone with log access. Log the cookie name and a truncated hash of the value if you need to trace cookie behavior.
Never include raw IP addresses in endpoint payloads. The HTTP request already carries the IP in headers. Duplicating it in the JSON body creates an additional copy that travels through the endpoint’s processing pipeline, gets stored in the endpoint’s database, and must be covered by deletion requests. Let the transport layer handle IP — do not promote it to an explicit data field.
Structured logging. If your platform supports structured logging (JSON log entries with fields), use it. Mark fields containing personal data with a pii: true flag or equivalent so log aggregation systems can filter or redact them automatically.