GDPR Compliance: Technical Implementation for Database Systems

Understand GDPR requirements: deletion, portability, consent, agreements, breach notification. Database implementation strategies.

published: reading time: 29 min read author: GeekWorkBench

GDPR Compliance: Technical Implementation for Database Systems

GDPR has been enforceable for years, yet many teams still struggle with the technical implementation. Part of the problem is that GDPR is principles-based rather than prescriptive—it tells you what to achieve, not exactly how to do it.

This guide cuts through the ambiguity: what GDPR actually requires in practice, how to implement those requirements in your database systems, and where the difficult trade-offs sit.

flowchart LR
    Request[("Deletion<br/>Request")]

    subgraph Verify["Verification Step"]
        V1[("Check legal<br/>basis for retention")]
        V2[("Identify all<br/>data copies")]
        V3[("Notify<br/>processors")]
    end

    subgraph Execute["Deletion Execution"]
        D1[("Soft delete<br/>user record")]
        D2[("Delete from<br/>operational tables")]
        D3[("Delete from<br/>backup (if keyed)")]
        D4[("Destroy<br/>encryption key")]
    end

    subgraph Cascade["Cascade to Related Data"]
        C1[("User sessions<br/>and tokens")]
        C2[("User activity<br/>and logs")]
        C3[("Third-party<br/>processor shares")]
        C4[("Analytics<br/>aggregates")]
    end

    subgraph Audit["Audit Trail"]
        Log[("Deletion<br/>request log")]
        Confirm[("Confirmation<br/>to user")]
        Report[("Compliance<br/>report")]
    end

    Request --> Verify
    Verify --> V1 --> V2 --> V3 --> Execute
    V2 -.->|identify| Cascade
    Execute --> D1 --> D2
    D2 --> D3
    D2 --> D4
    D3 -.->|if cryptographically<br/>deletable| Cascade
    D1 -.->|cascade| Cascade
    D1 --> Audit
    Audit --> Log --> Confirm --> Report

Introduction

GDPR is principles-based, not prescriptive. It tells you what outcomes to achieve — deletion rights, data portability, consent mechanisms, breach notification — but gives you flexibility on how to implement them. That flexibility is both an advantage and a hazard: you can design systems that actually work for users, but you can also build systems that look compliant but fail the first real audit.

This guide covers the technical implementation of each GDPR requirement in database systems: right to deletion (soft delete, hard delete, cryptographic erasure, backup handling), right to data portability (export formats, machine-readable structures), consent tracking (granularity, revocation), and breach notification (detection, escalation, timelines). It also covers the tricky parts: data that lives in backups, third-party processors, and analytics aggregates.

The Core Principles That Drive Technical Decisions

GDPR’s principles that most affect database design:

Lawfulness, fairness, and transparency — Process data only with valid legal basis. Document what you’re doing.

Purpose limitation — Collect data only for specified, explicit purposes. Don’t repurpose data without new consent.

Data minimization — Collect only what you need. Not “we might need this later.”

Accuracy — Keep data accurate and up to date. Allow users to correct errors.

Storage limitation — Don’t keep data longer than necessary. Implement retention limits.

Integrity and confidentiality — Protect data appropriately. Encryption, access controls, audit logging.

Accountability — Demonstrate compliance. Document decisions.

These principles translate directly to database design decisions.

Right to Deletion: The Hard Problem

“Delete their data” sounds straightforward until you implement it.

What Deletion Means

GDPR Article 17 states users have the right to erasure—“the right to be forgotten.” But the right is not absolute. It applies when:

  • Data is no longer necessary for the purpose collected
  • User withdraws consent
  • Data was processed unlawfully
  • User objects and no overriding legitimate interest exists

Data can be retained when:

  • Legal obligation requires retention (tax records, employment law)
  • Legal claims require retention (litigation holds)
  • Public interest requires retention (regulatory reporting)
  • Archiving for research/historical purposes
  • Establishment, exercise, or defense of legal claims

Database Deletion Implementation

True deletion requires understanding your data architecture:

-- Create soft-delete infrastructure
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP WITH TIME ZONE;
ALTER TABLE users ADD COLUMN deletion_requested_at TIMESTAMP WITH TIME ZONE;

CREATE INDEX idx_users_deleted ON users(deleted_at) WHERE deleted_at IS NOT NULL;

-- Create deletion request table for audit trail
CREATE TABLE deletion_requests (
    id BIGSERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    requested_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    confirmed_at TIMESTAMP WITH TIME ZONE,
    cascade_to_subsidiaries BOOLEAN DEFAULT true,
    status VARCHAR(20) DEFAULT 'pending'
);

-- Function to handle user deletion request
CREATE OR REPLACE FUNCTION process_deletion_request(request_id INTEGER)
RETURNS void AS $$
DECLARE
    req RECORD;
    deleted_count INTEGER;
BEGIN
    SELECT * INTO req FROM deletion_requests WHERE id = request_id;

    IF req.status <> 'pending' THEN
        RAISE EXCEPTION 'Request already processed';
    END IF;

    -- Mark deletion requested
    UPDATE users SET deletion_requested_at = CURRENT_TIMESTAMP
    WHERE id = req.user_id;

    -- Cascade to related tables (handle carefully)
    UPDATE user_sessions SET deleted_at = CURRENT_TIMESTAMP
    WHERE user_id = req.user_id AND deleted_at IS NULL;

    UPDATE user_preferences SET deleted_at = CURRENT_TIMESTAMP
    WHERE user_id = req.user_id AND deleted_at IS NULL;

    UPDATE user_activity SET deleted_at = CURRENT_TIMESTAMP
    WHERE user_id = req.user_id AND deleted_at IS NULL;

    -- Hard delete after retention period (for truly sensitive data)
    -- This would run via scheduled job after retention period expires

    -- Confirm deletion request
    UPDATE deletion_requests
    SET status = 'completed', confirmed_at = CURRENT_TIMESTAMP
    WHERE id = request_id;

    RAISE NOTICE 'Deletion request % processed for user %', request_id, req.user_id;
END;
$$ LANGUAGE plpgsql;

The Backup Problem

Here’s where it gets complicated: backups.

GDPR doesn’t explicitly address backups, but regulators expect reasonable practices. If a user exercises their right to deletion, what about backups taken before the deletion request?

Option 1: Don’t backup personal data (impractical) Option 2: Encrypt backups with user-specific keys (complex key management) Option 3: Accept that backups contain deleted data with documented retention policies Option 4: Delete from backups when technically feasible (expensive)

Industry consensus has shifted toward accepting that backups may contain deleted data, with mitigations:

from cryptography.fernet import Fernet
import hashlib
import base64

class BackupEncryptionManager:
    """Encrypt backups with user-derived keys for selective deletion capability"""

    def __init__(self, master_key: bytes):
        self.master_key = master_key

    def derive_user_key(self, user_id: int) -> bytes:
        """Derive unique key per user for their data"""
        # In production: use proper KDF (PBKDF2, Argon2)
        salt = f"user_{user_id}_backup_salt".encode()
        return hashlib.pbkdf2_hmac(
            'sha256',
            self.master_key,
            salt,
            iterations=100000
        )

    def encrypt_for_backup(self, user_id: int, data: bytes) -> bytes:
        """Encrypt data with user-specific key"""
        user_key = self.derive_user_key(user_id)
        fernet = Fernet(base64.urlsafe_b64encode(user_key[:32]))
        return fernet.encrypt(data)

    def delete_user_data(self, user_id: int) -> bool:
        """Deletion means we lose the key - data is irrecoverable"""
        # In production: securely delete the derived key
        # The encrypted backup becomes unreadable without the key
        delete_user_key(self.derive_user_key(user_id))
        return True

This approach: Backups are encrypted. When a user deletes, their encryption key is destroyed. Backup data exists but is cryptographically inaccessible. This is a legitimate approach accepted by many auditors.

Data Portability: Exporting User Data

GDPR Article 20 gives users the right to receive their data in “structured, commonly used, machine-readable format” and to transmit that data to another controller.

Implementing Data Export

import json
from datetime import datetime
from dataclasses import dataclass, asdict
from typing import List

@dataclass
class UserDataExport:
    export_date: str
    user_id: str
    profile: dict
    activity: List[dict]
    preferences: dict
    communications: List[dict]

def export_user_data(user_id: int, db_connection) -> dict:
    """Generate comprehensive user data export"""

    # Gather all user data
    user_data = {
        'export_date': datetime.utcnow().isoformat() + 'Z',
        'user_id': str(user_id),
        'profile': fetch_user_profile(user_id, db_connection),
        'activity': fetch_user_activity(user_id, db_connection),
        'preferences': fetch_user_preferences(user_id, db_connection),
        'purchases': fetch_user_purchases(user_id, db_connection),
        'messages': fetch_user_messages(user_id, db_connection),
    }

    # Format as JSON-LD for machine readability
    export = {
        '@context': 'https://schemas.gdpr-export.example.com/v1',
        '@type': 'UserDataExport',
        **user_data
    }

    return export

def format_export_file(user_data: dict, format: str = 'json') -> bytes:
    """Format export in requested format"""
    if format == 'json':
        return json.dumps(user_data, indent=2).encode('utf-8')
    elif format == 'csv':
        return convert_to_csv(user_data)
    elif format == 'xml':
        return convert_to_xml(user_data)
    else:
        raise ValueError(f"Unsupported format: {format}")

What to Include in Export

The definition of “all personal data” is debated. Common interpretations include:

  • Account profile information
  • Transaction history
  • Activity logs associated with the account
  • Communications
  • Preferences and settings
  • Any derived data about the user

What’s typically excluded:

  • Internal identifiers not meaningful to the user
  • Aggregated/anonymized data (no longer personal)
  • Data from third parties (the user should go to those parties)
  • Legal notes or internal comments

When consent is your legal basis, you must prove when, how, and what consent was given.

-- Consent tracking table
CREATE TABLE consent_records (
    id BIGSERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    consent_type VARCHAR(50) NOT NULL,  -- 'marketing_email', 'data_processing', etc.
    consent_version VARCHAR(20) NOT NULL,  -- Track which version of terms
    granted BOOLEAN NOT NULL,
    granted_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    granted_ip INET,
    granted_user_agent TEXT,
    withdrawn_at TIMESTAMP WITH TIME ZONE,
    withdrawn_ip INET,
    source VARCHAR(50),  -- 'web_form', 'mobile_app', 'api', 'paper'
    legal_text_hash VARCHAR(64)  -- Hash of terms at time of consent
);

CREATE INDEX idx_consent_user_type ON consent_records(user_id, consent_type);
CREATE INDEX idx_consent_active ON consent_records(user_id) WHERE withdrawn_at IS NULL;

-- Consent change history
CREATE TABLE consent_history (
    id BIGSERIAL PRIMARY KEY,
    consent_record_id INTEGER REFERENCES consent_records(id),
    change_type VARCHAR(20) NOT NULL,  -- 'grant', 'withdraw', 'update'
    changed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    ip_address INET,
    user_agent TEXT
);
from functools import wraps
from datetime import datetime

def requires_consent(consent_type: str):
    """Decorator to enforce consent requirement"""
    def decorator(func):
        @wraps(func)
        def wrapper(user_id: int, *args, **kwargs):
            # Check active consent
            consent = db.query("""
                SELECT granted, consent_version, granted_at
                FROM consent_records
                WHERE user_id = %s
                  AND consent_type = %s
                  AND withdrawn_at IS NULL
                ORDER BY granted_at DESC
                LIMIT 1
            """, (user_id, consent_type))

            if not consent or not consent[0]['granted']:
                raise ConsentRequiredError(
                    f"User {user_id} has not granted {consent_type} consent"
                )

            # Log the access for audit
            log_consented_access(user_id, consent_type, func.__name__)

            return func(user_id, *args, **kwargs)
        return wrapper
    return decorator

@requires_consent('marketing_email')
def send_marketing_email(user_id: int, content: str):
    """Send marketing email only if user consented"""
    user = get_user(user_id)
    return email_service.send(user.email, content)

Data Processing Agreements

When you share data with third parties (processors), GDPR requires written agreements that specify:

  • What data is processed
  • Purpose of processing
  • How long processing continues
  • Security requirements
  • Sub-processor restrictions
  • Audit rights

Tracking Processors

CREATE TABLE data_processors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    legal_entity VARCHAR(255),
    dpa_signed_date DATE,
    dpa_expiry_date DATE,
    data_categories TEXT[],  -- ['personal', 'financial', 'health']
    processing_purposes TEXT[],
    retention_period INTERVAL,
    security_certifications TEXT[],  -- ['SOC2', 'ISO27001']
    last_audit_date DATE,
    contact_email VARCHAR(255)
);

CREATE TABLE processor_data_shares (
    id SERIAL PRIMARY KEY,
    processor_id INTEGER REFERENCES data_processors(id),
    table_name VARCHAR(100),
    data_volume_monthly BIGINT,
    share_started DATE,
    share_ended DATE,
    active BOOLEAN DEFAULT true
);

-- Sub-processors require notification or consent
CREATE TABLE sub_processors (
    id SERIAL PRIMARY KEY,
    parent_processor_id INTEGER REFERENCES data_processors(id),
    name VARCHAR(255) NOT NULL,
    purpose TEXT,
    notified_to_users BOOLEAN DEFAULT false,
    consent_required BOOLEAN DEFAULT false
);

Data Breach Notification

GDPR requires notification within 72 hours of becoming aware of a breach. This is a significant operational challenge.

Breach Detection and Response

from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import Optional, List

@dataclass
class DataBreach:
    breach_id: str
    detected_at: datetime
    description: str
    data_types: List[str]
    user_count: int
    processor_count: int
    status: str  # 'detected', 'investigating', 'contained', 'resolved'
    notification_required: bool
    supervisory_authority: str
    affected_users_notified: bool

class BreachResponseManager:
    SUPERVISORY_AUTHORITY = 'ico.org.uk'  # or relevant authority

    def __init__(self, db, notification_service):
        self.db = db
        self.notification_service = notification_service

    def handle_breach_detected(self, breach: DataBreach):
        """Initial breach response workflow"""

        # Log breach
        breach_id = self.record_breach(breach)

        # Start 72-hour clock
        notification_deadline = breach.detected_at + timedelta(hours=72)

        # Assess scope
        self.assess_breach_scope(breach_id)

        # Determine notification requirements
        if self.requires_authority_notification(breach):
            self.schedule_authority_notification(breach_id, notification_deadline)

        if self.requires_user_notification(breach):
            self.schedule_user_notification(breach_id)

        # Begin containment
        self.initiate_containment(breach_id)

    def requires_authority_notification(self, breach: DataBreach) -> bool:
        """Determine if supervisory authority must be notified"""
        # High risk to rights and freedoms of individuals
        # Unencrypted personal data
        # Sensitive categories (health, financial)
        high_risk_categories = ['health', 'financial', 'biometric', 'genetic']

        if any(cat in breach.data_types for cat in high_risk_categories):
            return True

        if breach.user_count > 1000:
            return True

        return False

    def draft_authority_notification(self, breach: DataBreach) -> dict:
        """Draft notification to supervisory authority"""
        return {
            'authority': self.SUPERVISORY_AUTHORITY,
            'breach_id': breach.breach_id,
            'detected_at': breach.detected_at.isoformat(),
            'description': breach.description,
            'data_types': breach.data_types,
            'approximate_users_affected': breach.user_count,
            'likely_consequences': self.assess_consequences(breach),
            'measures_taken': self.get_containment_measures(breach.breach_id),
            'contact_dpo': 'dpo@company.com'
        }

Breach Notification Template

## Data Breach Notification to [Supervisory Authority]

**Breach Reference:** [Internal reference number]

**Date of Detection:** [YYYY-MM-DD HH:MM UTC]

**Date of This Notification:** [YYYY-MM-DD HH:MM UTC]

**Nature of Breach:**
[Description including categories of data and approximate number of
individuals concerned]

**Categories Concerned:**

- [ ] Names
- [ ] Contact details
- [ ] Financial details
- [ ] [Other categories]

**Approximate Number Affected:** [X] individuals

**Likely Consequences:**
[Description of consequences for individuals]

**Measures Taken:**
[Measures already taken or proposed to address the breach]

**Contact:** [Data Protection Officer contact details]

Right to Erasure vs Backup Retention Trade-offs

DimensionImmediate Hard DeleteSoft Delete + Scheduled PurgeCryptographic Deletion (Key Destroy)
GDPR compliance speedFastest — data gone immediatelySlower — depends on purge scheduleFast — key destroyed, data unreadable
Backup implicationsBackups may still contain dataBackups contain soft-deleted dataBackups encrypted, unreadable without key
RecoverabilityNone after deletionRecoverable until purgeNone — key destruction irreversible
Operational complexityHigh — must track all copiesMediumMedium
Audit trail requirementDeletion must be loggedDeletion request logged, purge loggedKey destruction logged and auditable
Best forLow data volume, low retentionMost applicationsEncrypted storage systems

Production Failure Scenarios

FailureImpactMitigation
Deletion request processed before backup retention expiresUser data in old backups technically not deletedAccept with documented policy, or re-encrypt backups with user-specific keys
Third-party processor data share not trackedUnauthorized retention of user dataMaintain processor registry, audit shares quarterly
Consent withdrawal not propagated to all systemsProcessing continues without legal basisEvent-driven consent propagation, test with end-to-end audit
Data portability export containing other users’ dataCross-user data leakStrict isolation in export query, review before delivery
Breach notification deadline missed (72h)Regulatory fine — up to 2% of global revenueAutomated breach detection, pre-drafted notification templates

Capacity Estimation: Deletion Job Sizing and Backup Rehydration Time

GDPR right-to-erasure operations have capacity implications that must be planned.

Deletion job sizing formula:

records_deleted_per_hour = target_deletion_rate × database_writes_per_hour
total_deletion_time_hours = total_records_to_delete / records_deleted_per_hour

For an e-commerce platform deleting 1M user records where each record has 10 related tables with cascading deletes:

  • Deleting 1 user record cascades to ~500 related records across orders, addresses, preferences, sessions
  • At 100 deletions/second: 50,000 records/hour × 1,000 users = 50M related records/hour
  • For 1M users: 50M user records + cascading deletes = processing several hundred million total records
  • Practical deletion rate: 10-50K records/second depending on indexes and cascade depth
  • Total time for 500M records at 30K records/second: ~4.6 hours

For GDPR’s 30-day deletion window, this is manageable. But if the database has poorly indexed foreign keys, cascading deletes can lock tables for minutes, making batch deletion impractical. Always test deletion performance on a production-size dataset before deployment.

Backup rehydration time formula:

rehydration_time_hours = archived_data_gb / restore_throughput_gb_per_hour
restore_throughput_gb_per_hour = (restore_bandwidth_mbps / 8) × compression_factor

When GDPR requires deletion from backups (some interpretations require this): restoring from backup, deleting records, re-arc hiveing to new backup, and reuploading is the only approach if in-place deletion from compressed backups is impossible. For 10TB of compressed backups restored at 100MB/s effective throughput: 10TB / (100MB/s) = 100,000 seconds ≈ 28 hours per backup. With 30-day retention requiring scanning multiple backups, full backup rehydration for erasure can take weeks.

Practical approach: Do not store GDPR-covered data in long-retention compressed backups. Instead, use key-per-user encryption (each user’s PII encrypted with their own DEK). GDPR deletion = destroy the user’s DEK. The encrypted backup remains but is unreadable without the destroyed key. This reduces GDPR deletion from a weeks-long backup rehydration to a key destruction operation.

Quick Recap Checklist

Use this checklist when reviewing GDPR compliance implementation:

  • Data inventory mapping all personal data fields to purpose and legal basis
  • Deletion request workflow implemented with cascade to all data copies
  • Cryptographic erasure implemented for backup retention compliance
  • Consent records schema with version tracking and immutable history
  • Data processing agreements in place with all third-party processors
  • Breach detection and 72-hour notification workflow documented
  • Data portability export pipeline tested and functional
  • Retention policies enforced via automated jobs
  • Data classification framework applied to all database tables
  • Annual GDPR compliance audit scheduled and documented

Technical Architecture for GDPR

GDPR compliance isn’t a feature—it’s an architectural concern.

Data Classification

from enum import Enum

class DataCategory(Enum):
    PUBLIC = "public"           # No restrictions
    INTERNAL = "internal"       # Internal use only
    CONFIDENTIAL = "confidential"  # Access controlled
    RESTRICTED = "restricted"   # Maximum protection

# Map database columns to categories
COLUMN_CLASSIFICATION = {
    'users.email': DataCategory.CONFIDENTIAL,
    'users.ssn': DataCategory.RESTRICTED,
    'users.phone': DataCategory.CONFIDENTIAL,
    'users.address': DataCategory.CONFIDENTIAL,
    'transactions.amount': DataCategory.CONFIDENTIAL,
    'transactions.card_last_four': DataCategory.RESTRICTED,
    'activity_logs.ip_address': DataCategory.CONFIDENTIAL,
}

Retention Enforcement

-- Create retention policies
CREATE TABLE retention_policies (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(100) NOT NULL,
    column_name VARCHAR(100),  -- NULL means use table-level retention
    retention_period INTERVAL NOT NULL,
    legal_basis VARCHAR(255),  -- Why this retention period
    review_date DATE,
    active BOOLEAN DEFAULT true
);

INSERT INTO retention_policies (table_name, column_name, retention_period, legal_basis)
VALUES
    ('user_sessions', NULL, INTERVAL '30 days', 'Session management'),
    ('user_activity', NULL, INTERVAL '1 year', 'Analytics (anonymized after 90 days)'),
    ('support_tickets', NULL, INTERVAL '3 years', 'Customer service records'),
    ('financial_transactions', NULL, INTERVAL '7 years', 'Tax and legal compliance'),
    ('consent_records', NULL, INTERVAL '10 years', 'Proof of consent'),
    ('audit_logs', NULL, INTERVAL '5 years', 'Security and compliance');

Interview Questions

1. A user exercises their right to erasure. Their data spans 50 tables with complex foreign key relationships. How do you implement this?

Two approaches: hard delete with cascading erasure, or cryptographic erasure. Hard delete: identify all tables containing the user's data, order deletions by dependency (child tables first), execute in batches to avoid locking. Use DELETE FROM orders WHERE customer_id = $1 in batches of 1000 with RETURNING 1 and commit between batches. This takes seconds to minutes for typical data volumes. Cryptographic erasure: encrypt each user's PII with a per-user key, store the key in a separate key store. GDPR deletion = delete the user's encryption key. The data remains in the database but is unreadable. This approach avoids complex cascading deletes and is compliant for most use cases.

2. Your company has 5-year backup retention but GDPR requires deletion of a user's data. Are you required to delete from backups?

Legal interpretation varies. Some GDPR authorities hold that backups are still "data in storage" and must be addressed. The practical resolution: use cryptographic erasure (encrypt with per-user keys) so old backups are unreadable without the destroyed key. If you cannot use cryptographic erasure, document the technical limitation and implement a scheduled re-encryption of backups without the deleted user's key — acceptable to most auditors if the timeline is reasonable and documented.

3. A user requests data portability. Their data exists in 12 different tables spanning orders, preferences, and activity logs. How do you generate a comprehensive export?

Data portability under GDPR Article 20 requires a "commonly used, machine-readable format" — typically JSON or CSV. Build a data export pipeline that: queries each relevant table filtering by user_id, joins related data into a structured JSON format with clear schema, includes metadata (export date, source tables, schema version), and generates a downloadable archive. Complexity arises with historical data that has been anonymized or aggregated — you must disclose what data is available versus what has been deleted or anonymized per retention policy. Export time scales with data volume: a typical user with 5 years of history might generate a 50-200MB export. Generate asynchronously (not in real-time) and email a download link.

4. Your company is acquired mid-operation. How do you handle the acquired company's GDPR obligations for data that came from EU users?

GDPR obligations transfer with the data. In most acquisitions, the acquiring company inherits the data processing obligations of the acquired company. Technical steps: treat the acquired company's user database as a separate data controller, map all EU user data in the combined systems, obtain fresh consent or establish a legitimate interest legal basis for any new processing purposes, and ensure the acquisition contract includes data processing agreements that assign GDPR liability correctly. If both companies had EU userbases, their combined GDPR obligations may require re-registration of consent for the combined entity's expanded processing scope.

5. A user claims they never gave consent for marketing emails, but your consent_records table shows they did. How do you investigate and resolve?

Check the consent record: granted timestamp, consent_version (which terms were active), legal_text_hash (hash of the exact terms presented), source (web form, mobile app, paper), granted IP and user agent. The legal_text_hash proves exactly what was shown at the time of consent. If the hash matches the current terms version, consent was valid. If the terms presented were materially different from what the user claims to have agreed to, you may have a legitimate dispute. Resolution: provide the user with a copy of exactly what they consented to (via the hash), and if the consent record is legitimate, the user's claim is without merit. If the consent record shows consent for a different version of terms or was obtained improperly, correct the record and honor the user's objection.

6. Your organization discovers a data breach affecting 50,000 EU user accounts. Walk through the technical steps of GDPR breach response.

Immediate: isolate the breach source to stop ongoing data loss. Document: what data was accessed, for how long, what the attack vector was. Classification: categorize the data types (names, emails, financial, health). Assessment: determine if high-risk categories (health, financial, biometric) or >1000 users affected — both trigger mandatory supervisory authority notification. Notification deadline: 72 hours from awareness. If you cannot provide full details within 72 hours, provide interim notification and follow up. User notification: required if high risk to rights and freedoms. Technical response: revoke compromised credentials, rotate affected encryption keys, patch the vulnerability, and verify no ongoing access. Post-incident: document lessons learned, update security controls, and assess if DPO notification is required even if supervisory authority notification was not mandatory.

7. How do you implement data minimization when your analytics team needs aggregate user behavior data?

Data minimization means collecting only what is necessary for the specified purpose. For analytics: use aggregation and anonymization so individual users cannot be identified. Instead of storing "user X visited page Y at time Z," store "page Y received 100 visits from users in segment A." Aggregation thresholds: ensure groups are large enough that individual cannot be identified (typically minimum 10-50 users per group). For session analytics: store session duration, pages visited, and conversion events — not the exact sequence of actions tied to user_id. Implement data retention: after 90 days, aggregate to monthly statistics and delete individual session records. The test: if you cannot identify an individual from the data (even with additional information), the data is properly minimized.

8. A user exercises their right to erasure, but their data has been pseudonymized and is used for ML model training. Are you required to act?

Pseudonymized data is still personal data under GDPR — it can be re-identified using the token mapping stored separately. True anonymization (irreversible) would not require action, but pseudonymization does. For ML training: the model was trained on personal data, so when a user requests erasure, the impact on the trained model must be addressed. Options: retrain the model without the deleted user's data, apply machine unlearning techniques, or exclude the user from future training batches. The GDPR Working Party's guidance treats pseudonymization as personal data requiring protection, not as a way to bypass erasure rights. Document your approach in the DPIA and privacy notice.

9. A user exercises their right to erasure, but their data has been anonymized and aggregated into statistical reports. Are you required to act?

Anonymized data is no longer personal data under GDPR — anonymous data cannot identify a natural person, so erasure rights do not apply. However, the bar for true anonymization is high: the data must not be re-identifiable even with additional information. If the anonymization is reversible (pseudo-anonymization, not true anonymization), erasure rights may still apply. Assessment: evaluate the anonymization technique against possible re-identification attacks (uniqueness, linkage attacks, background knowledge). If there is any reasonable means of re-identification, treat the data as personal data and honor erasure requests. Document your anonymization methodology and its irreversibility.

10. Your company processes personal data of EU residents from multiple EU member states. Which supervisory authority has jurisdiction?

The lead supervisory authority is in the EU member state where your main establishment (or single establishment) is located — the "one-stop-shop" mechanism. If you have no establishment in the EU, each member state where data subjects are located has jurisdiction over processing affecting their residents. Practical implication: you may need to comply with multiple national implementations of GDPR (which vary by country), not just the EU-wide framework. Identify which supervisory authorities have jurisdiction over your specific processing activities and register with each as required.

11. How do you implement data portability for a complex data model where user data spans dozens of tables with many-to-many relationships?

Build a recursive data export that walks the relationship graph starting from the user entity. For each table linked to the user (directly or via many-to-many join tables), export the related records. Structure the export as a JSON document with clear relationships: {"user": {...}, "addresses": [...], "orders": [...], "order_items": [...], "preferences": {...}}. For many-to-many relationships, include the join table records as well. Use a deterministic entity ID in the export (not the production database ID) to avoid leaking internal structure. The complexity scales with relationship depth — export one level deep by default, two levels for direct relationships, and provide clear documentation of what each level contains.

12. Your company uses a cloud-based CRM that processes EU personal data. What are your GDPR obligations as a data controller versus the CRM vendor?

You (the controller) remain responsible for GDPR compliance regardless of where data is processed. Your obligations: conduct due diligence to verify the CRM vendor provides adequate security (Article 28 DPA), ensure the DPA covers all required terms (processing purpose, security measures, sub-processor restrictions), maintain your own records of processing activities, and respond to data subject requests even when the processor holds the data. The CRM vendor (processor) must only process data per your instructions, maintain appropriate security, assist with DSARs, and notify you of breaches. Both parties can face liability — controllers for inadequate oversight, processors for security failures.

13. A DPO (Data Protection Officer) is required for your organization. What are the key responsibilities and how do you ensure the role is effective?

The DPO must be independent, not instructed by the organization on how to carry out their duties. Key responsibilities: monitor GDPR compliance, advise on Data Protection Impact Assessments (DPIAs), cooperate with supervisory authorities, and serve as the contact point for data subjects. The DPO does not make compliance decisions — that remains with the controller — but must be consulted on privacy matters. Ensure the DPO has sufficient resources (budget, staff, access to systems) and that their advice is documented and considered. Organizations with large-scale systematic monitoring or large-scale processing of special category data require a DPO.

14. How do you handle GDPR's requirements when personal data is used for machine learning model training?

ML training uses personal data for a purpose (model improvement) that may differ from the original collection purpose. Requirements: purpose compatibility assessment — if the original purpose was not compatible with ML training, obtain fresh consent or establish a different legal basis; data minimization — train with the minimum data necessary, not full records; anonymization before training — remove direct identifiers and pseudonymous identifiers before training; documentation — record the training data used, the purpose, and the legal basis; erasure rights — when a user requests deletion, the impact on trained models must be addressed (model retraining without the deleted user's data). Some regulators have issued guidance on ML and GDPR that should be followed.

15. Your company is launching a new product feature that collects user location data. Walk through the GDPR compliance steps before launch.

Pre-launch GDPR steps: conduct a Data Protection Impact Assessment (DPIA) — location data is special category data requiring mandatory DPIA; determine the legal basis (consent is typical for optional features); design the data collection to be minimal — only collect location data necessary for the feature; implement consent capture that is specific, informed, and freely given; create the data export mechanism (data portability) before launch; implement the erasure mechanism before launch; update your privacy notice to reflect the new collection; update records of processing activities; ensure processors handling location data have appropriate DPAs. Launch is blocked until DPIA mitigation measures are implemented.

16. A user requests deletion of their data, but a regulatory requirement (tax law) mandates retention for 7 years. How do you reconcile these?

Legal obligations override deletion requests when the legal basis for retention applies. Document the conflict: the deletion request is recorded, the legal obligation is identified and cited, and the user is informed that their data cannot be deleted due to a legal obligation. The data remains under restricted access — only the minimum necessary for the legal obligation is retained, and it is not used for any other purpose. When the retention period expires, deletion proceeds. If the regulatory period is indefinite or very long (e.g., anti-money laundering), consider anonymization as an alternative — anonymous data is no longer personal and deletion rights do not apply.

17. How do you implement privacy by design when building a new database schema for a user-facing application?

Privacy by design principles applied to schema design: collect only necessary fields — question whether each field is essential for the stated purpose; separate identifying information from non-identifying data — store PII in separate tables with foreign key relationships so PII can be deleted independently; design for erasure — avoid deep coupling of PII across many tables, use soft-delete to mark records for erasure and hard-delete in batch; minimize retention — add deleted_at columns and automated purge jobs; encrypt sensitive fields at the application layer; limit access controls at the schema level — role-based access so only necessary services can access PII tables.

18. Your company experiences a ransomware attack. Ransomware operators demand payment in exchange for not publishing exfiltrated personal data. What are your GDPR breach notification obligations?

GDPR breach obligations apply regardless of whether personal data was exfiltrated or encrypted by ransomware. Assessment: did the breach involve personal data? If yes, notification is required. If the ransomware encrypted data without exfiltration, assess whether the controller can confirm no data was accessed. In most ransomware cases, exfiltration cannot be ruled out — treat as confirmed breach. Notification to supervisory authority within 72 hours: describe the nature of the breach, categories and approximate number of data subjects affected, likely consequences, measures taken. If exfiltrated data contains high-risk personal data, also notify affected data subjects without undue delay.

19. How do you handle GDPR compliance for a company acquisition where the acquired company's EU user database is merged into the acquirer's systems?

The acquisition triggers several GDPR obligations: the acquirer inherits the acquired company's role as data controller for the merged EU user data; conduct a joint DPIA for the merged data processing; obtain fresh consent or establish legitimate interest for the new combined processing purposes (acquiring company purposes may differ from original purposes); update privacy notices to reflect the new controller and combined processing; ensure all processor agreements are assigned or renewed with the acquirer as the new controller; audit the acquirer's security posture for the acquired data; and consider data minimization — the acquired data may contain more data than the acquirer needs, triggering a review and potential reduction.

20. Your company uses third-party analytics that receives personal data in the clear via JavaScript tracking. What are the compliance risks and mitigations?

Third-party JavaScript analytics creates significant compliance risk: personal data is transmitted to a third party (processor) without adequate safeguards. Risks: the third party may not have a compliant DPA, data may be stored in jurisdictions with inadequate protection, and the third party's security posture is outside your control. Mitigations: implement a compliant DPA with the analytics vendor (verify they are certified — ISO 27001, SOC 2); use server-side analytics or tag management systems that proxy data to the third party; anonymize data before transmission (IP anonymization, data minimization); configure the analytics to not capture PII by design; and consider switching to privacy-preserving alternatives (Plausible, Matomo self-hosted).


Further Reading


Conclusion

GDPR compliance requires serious architectural thinking. The right to deletion means designing for deletion from day one—understanding your data flows, retention requirements, and the trade-offs around backups. Data portability means keeping exportable records. Consent tracking means proving consent with immutable audit trails.

The technical implementation isn’t optional—it’s how you demonstrate accountability under Article 5(2).

For related topics on protecting personal data, see our data masking guide for non-production environments, audit logging for tracking data access, and encryption at rest for protecting stored personal data.

Category

Related Posts

Data Masking Strategies for Non-Production Environments

Learn static and dynamic data masking: nulling, shuffling, hashing, and range techniques. Understand GDPR and PII considerations for PostgreSQL and Oracle.

#database #data-masking #security

PII Handling: Protecting Personal Data in Data Systems

Learn techniques for identifying, protecting, and managing personally identifiable information across your data platform.

#data-engineering #pii #data-protection

Audit Logging: Tracking Data Changes for Compliance

Implement audit logging for compliance. Learn row-level change capture with triggers and CDC, log aggregation strategies, and retention policies.

#database #audit #compliance