GDPR Compliance: Technical Implementation for Database Systems

Understand GDPR requirements: deletion, portability, consent, agreements, breach notification. Database implementation strategies.

published: reading time: 16 min read

GDPR Compliance: Technical Implementation for Database Systems

GDPR has been enforceable for years, yet many teams still struggle with the technical implementation. Part of the problem is that GDPR is principles-based rather than prescriptive—it tells you what to achieve, not exactly how to do it.

This guide cuts through the ambiguity: what GDPR actually requires in practice, how to implement those requirements in your database systems, and where the difficult trade-offs sit.

flowchart LR
    Request[("Deletion<br/>Request")]

    subgraph Verify["Verification Step"]
        V1[("Check legal<br/>basis for retention")]
        V2[("Identify all<br/>data copies")]
        V3[("Notify<br/>processors")]
    end

    subgraph Execute["Deletion Execution"]
        D1[("Soft delete<br/>user record")]
        D2[("Delete from<br/>operational tables")]
        D3[("Delete from<br/>backup (if keyed)")]
        D4[("Destroy<br/>encryption key")]
    end

    subgraph Cascade["Cascade to Related Data"]
        C1[("User sessions<br/>and tokens")]
        C2[("User activity<br/>and logs")]
        C3[("Third-party<br/>processor shares")]
        C4[("Analytics<br/>aggregates")]
    end

    subgraph Audit["Audit Trail"]
        Log[("Deletion<br/>request log")]
        Confirm[("Confirmation<br/>to user")]
        Report[("Compliance<br/>report")]
    end

    Request --> Verify
    Verify --> V1 --> V2 --> V3 --> Execute
    V2 -.->|identify| Cascade
    Execute --> D1 --> D2
    D2 --> D3
    D2 --> D4
    D3 -.->|if cryptographically<br/>deletable| Cascade
    D1 -.->|cascade| Cascade
    D1 --> Audit
    Audit --> Log --> Confirm --> Report

The Core Principles That Drive Technical Decisions

GDPR’s principles that most affect database design:

Lawfulness, fairness, and transparency — Process data only with valid legal basis. Document what you’re doing.

Purpose limitation — Collect data only for specified, explicit purposes. Don’t repurpose data without new consent.

Data minimization — Collect only what you need. Not “we might need this later.”

Accuracy — Keep data accurate and up to date. Allow users to correct errors.

Storage limitation — Don’t keep data longer than necessary. Implement retention limits.

Integrity and confidentiality — Protect data appropriately. Encryption, access controls, audit logging.

Accountability — Demonstrate compliance. Document decisions.

These principles translate directly to database design decisions.

Right to Deletion: The Hard Problem

“Delete their data” sounds straightforward until you implement it.

What Deletion Means

GDPR Article 17 states users have the right to erasure—“the right to be forgotten.” But the right is not absolute. It applies when:

  • Data is no longer necessary for the purpose collected
  • User withdraws consent
  • Data was processed unlawfully
  • User objects and no overriding legitimate interest exists

Data can be retained when:

  • Legal obligation requires retention (tax records, employment law)
  • Legal claims require retention (litigation holds)
  • Public interest requires retention (regulatory reporting)
  • Archiving for research/historical purposes
  • Establishment, exercise, or defense of legal claims

Database Deletion Implementation

True deletion requires understanding your data architecture:

-- Create soft-delete infrastructure
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP WITH TIME ZONE;
ALTER TABLE users ADD COLUMN deletion_requested_at TIMESTAMP WITH TIME ZONE;

CREATE INDEX idx_users_deleted ON users(deleted_at) WHERE deleted_at IS NOT NULL;

-- Create deletion request table for audit trail
CREATE TABLE deletion_requests (
    id BIGSERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    requested_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    confirmed_at TIMESTAMP WITH TIME ZONE,
    cascade_to_subsidiaries BOOLEAN DEFAULT true,
    status VARCHAR(20) DEFAULT 'pending'
);

-- Function to handle user deletion request
CREATE OR REPLACE FUNCTION process_deletion_request(request_id INTEGER)
RETURNS void AS $$
DECLARE
    req RECORD;
    deleted_count INTEGER;
BEGIN
    SELECT * INTO req FROM deletion_requests WHERE id = request_id;

    IF req.status <> 'pending' THEN
        RAISE EXCEPTION 'Request already processed';
    END IF;

    -- Mark deletion requested
    UPDATE users SET deletion_requested_at = CURRENT_TIMESTAMP
    WHERE id = req.user_id;

    -- Cascade to related tables (handle carefully)
    UPDATE user_sessions SET deleted_at = CURRENT_TIMESTAMP
    WHERE user_id = req.user_id AND deleted_at IS NULL;

    UPDATE user_preferences SET deleted_at = CURRENT_TIMESTAMP
    WHERE user_id = req.user_id AND deleted_at IS NULL;

    UPDATE user_activity SET deleted_at = CURRENT_TIMESTAMP
    WHERE user_id = req.user_id AND deleted_at IS NULL;

    -- Hard delete after retention period (for truly sensitive data)
    -- This would run via scheduled job after retention period expires

    -- Confirm deletion request
    UPDATE deletion_requests
    SET status = 'completed', confirmed_at = CURRENT_TIMESTAMP
    WHERE id = request_id;

    RAISE NOTICE 'Deletion request % processed for user %', request_id, req.user_id;
END;
$$ LANGUAGE plpgsql;

The Backup Problem

Here’s where it gets complicated: backups.

GDPR doesn’t explicitly address backups, but regulators expect reasonable practices. If a user exercises their right to deletion, what about backups taken before the deletion request?

Option 1: Don’t backup personal data (impractical) Option 2: Encrypt backups with user-specific keys (complex key management) Option 3: Accept that backups contain deleted data with documented retention policies Option 4: Delete from backups when technically feasible (expensive)

Industry consensus has shifted toward accepting that backups may contain deleted data, with mitigations:

from cryptography.fernet import Fernet
import hashlib
import base64

class BackupEncryptionManager:
    """Encrypt backups with user-derived keys for selective deletion capability"""

    def __init__(self, master_key: bytes):
        self.master_key = master_key

    def derive_user_key(self, user_id: int) -> bytes:
        """Derive unique key per user for their data"""
        # In production: use proper KDF (PBKDF2, Argon2)
        salt = f"user_{user_id}_backup_salt".encode()
        return hashlib.pbkdf2_hmac(
            'sha256',
            self.master_key,
            salt,
            iterations=100000
        )

    def encrypt_for_backup(self, user_id: int, data: bytes) -> bytes:
        """Encrypt data with user-specific key"""
        user_key = self.derive_user_key(user_id)
        fernet = Fernet(base64.urlsafe_b64encode(user_key[:32]))
        return fernet.encrypt(data)

    def delete_user_data(self, user_id: int) -> bool:
        """Deletion means we lose the key - data is irrecoverable"""
        # In production: securely delete the derived key
        # The encrypted backup becomes unreadable without the key
        delete_user_key(self.derive_user_key(user_id))
        return True

This approach: Backups are encrypted. When a user deletes, their encryption key is destroyed. Backup data exists but is cryptographically inaccessible. This is a legitimate approach accepted by many auditors.

Data Portability: Exporting User Data

GDPR Article 20 gives users the right to receive their data in “structured, commonly used, machine-readable format” and to transmit that data to another controller.

Implementing Data Export

import json
from datetime import datetime
from dataclasses import dataclass, asdict
from typing import List

@dataclass
class UserDataExport:
    export_date: str
    user_id: str
    profile: dict
    activity: List[dict]
    preferences: dict
    communications: List[dict]

def export_user_data(user_id: int, db_connection) -> dict:
    """Generate comprehensive user data export"""

    # Gather all user data
    user_data = {
        'export_date': datetime.utcnow().isoformat() + 'Z',
        'user_id': str(user_id),
        'profile': fetch_user_profile(user_id, db_connection),
        'activity': fetch_user_activity(user_id, db_connection),
        'preferences': fetch_user_preferences(user_id, db_connection),
        'purchases': fetch_user_purchases(user_id, db_connection),
        'messages': fetch_user_messages(user_id, db_connection),
    }

    # Format as JSON-LD for machine readability
    export = {
        '@context': 'https://schemas.gdpr-export.example.com/v1',
        '@type': 'UserDataExport',
        **user_data
    }

    return export

def format_export_file(user_data: dict, format: str = 'json') -> bytes:
    """Format export in requested format"""
    if format == 'json':
        return json.dumps(user_data, indent=2).encode('utf-8')
    elif format == 'csv':
        return convert_to_csv(user_data)
    elif format == 'xml':
        return convert_to_xml(user_data)
    else:
        raise ValueError(f"Unsupported format: {format}")

What to Include in Export

The definition of “all personal data” is debated. Common interpretations include:

  • Account profile information
  • Transaction history
  • Activity logs associated with the account
  • Communications
  • Preferences and settings
  • Any derived data about the user

What’s typically excluded:

  • Internal identifiers not meaningful to the user
  • Aggregated/anonymized data (no longer personal)
  • Data from third parties (the user should go to those parties)
  • Legal notes or internal comments

When consent is your legal basis, you must prove when, how, and what consent was given.

-- Consent tracking table
CREATE TABLE consent_records (
    id BIGSERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    consent_type VARCHAR(50) NOT NULL,  -- 'marketing_email', 'data_processing', etc.
    consent_version VARCHAR(20) NOT NULL,  -- Track which version of terms
    granted BOOLEAN NOT NULL,
    granted_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    granted_ip INET,
    granted_user_agent TEXT,
    withdrawn_at TIMESTAMP WITH TIME ZONE,
    withdrawn_ip INET,
    source VARCHAR(50),  -- 'web_form', 'mobile_app', 'api', 'paper'
    legal_text_hash VARCHAR(64)  -- Hash of terms at time of consent
);

CREATE INDEX idx_consent_user_type ON consent_records(user_id, consent_type);
CREATE INDEX idx_consent_active ON consent_records(user_id) WHERE withdrawn_at IS NULL;

-- Consent change history
CREATE TABLE consent_history (
    id BIGSERIAL PRIMARY KEY,
    consent_record_id INTEGER REFERENCES consent_records(id),
    change_type VARCHAR(20) NOT NULL,  -- 'grant', 'withdraw', 'update'
    changed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    ip_address INET,
    user_agent TEXT
);
from functools import wraps
from datetime import datetime

def requires_consent(consent_type: str):
    """Decorator to enforce consent requirement"""
    def decorator(func):
        @wraps(func)
        def wrapper(user_id: int, *args, **kwargs):
            # Check active consent
            consent = db.query("""
                SELECT granted, consent_version, granted_at
                FROM consent_records
                WHERE user_id = %s
                  AND consent_type = %s
                  AND withdrawn_at IS NULL
                ORDER BY granted_at DESC
                LIMIT 1
            """, (user_id, consent_type))

            if not consent or not consent[0]['granted']:
                raise ConsentRequiredError(
                    f"User {user_id} has not granted {consent_type} consent"
                )

            # Log the access for audit
            log_consented_access(user_id, consent_type, func.__name__)

            return func(user_id, *args, **kwargs)
        return wrapper
    return decorator

@requires_consent('marketing_email')
def send_marketing_email(user_id: int, content: str):
    """Send marketing email only if user consented"""
    user = get_user(user_id)
    return email_service.send(user.email, content)

Data Processing Agreements

When you share data with third parties (processors), GDPR requires written agreements that specify:

  • What data is processed
  • Purpose of processing
  • How long processing continues
  • Security requirements
  • Sub-processor restrictions
  • Audit rights

Tracking Processors

CREATE TABLE data_processors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    legal_entity VARCHAR(255),
    dpa_signed_date DATE,
    dpa_expiry_date DATE,
    data_categories TEXT[],  -- ['personal', 'financial', 'health']
    processing_purposes TEXT[],
    retention_period INTERVAL,
    security_certifications TEXT[],  -- ['SOC2', 'ISO27001']
    last_audit_date DATE,
    contact_email VARCHAR(255)
);

CREATE TABLE processor_data_shares (
    id SERIAL PRIMARY KEY,
    processor_id INTEGER REFERENCES data_processors(id),
    table_name VARCHAR(100),
    data_volume_monthly BIGINT,
    share_started DATE,
    share_ended DATE,
    active BOOLEAN DEFAULT true
);

-- Sub-processors require notification or consent
CREATE TABLE sub_processors (
    id SERIAL PRIMARY KEY,
    parent_processor_id INTEGER REFERENCES data_processors(id),
    name VARCHAR(255) NOT NULL,
    purpose TEXT,
    notified_to_users BOOLEAN DEFAULT false,
    consent_required BOOLEAN DEFAULT false
);

Data Breach Notification

GDPR requires notification within 72 hours of becoming aware of a breach. This is a significant operational challenge.

Breach Detection and Response

from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import Optional, List

@dataclass
class DataBreach:
    breach_id: str
    detected_at: datetime
    description: str
    data_types: List[str]
    user_count: int
    processor_count: int
    status: str  # 'detected', 'investigating', 'contained', 'resolved'
    notification_required: bool
    supervisory_authority: str
    affected_users_notified: bool

class BreachResponseManager:
    SUPERVISORY_AUTHORITY = 'ico.org.uk'  # or relevant authority

    def __init__(self, db, notification_service):
        self.db = db
        self.notification_service = notification_service

    def handle_breach_detected(self, breach: DataBreach):
        """Initial breach response workflow"""

        # Log breach
        breach_id = self.record_breach(breach)

        # Start 72-hour clock
        notification_deadline = breach.detected_at + timedelta(hours=72)

        # Assess scope
        self.assess_breach_scope(breach_id)

        # Determine notification requirements
        if self.requires_authority_notification(breach):
            self.schedule_authority_notification(breach_id, notification_deadline)

        if self.requires_user_notification(breach):
            self.schedule_user_notification(breach_id)

        # Begin containment
        self.initiate_containment(breach_id)

    def requires_authority_notification(self, breach: DataBreach) -> bool:
        """Determine if supervisory authority must be notified"""
        # High risk to rights and freedoms of individuals
        # Unencrypted personal data
        # Sensitive categories (health, financial)
        high_risk_categories = ['health', 'financial', 'biometric', 'genetic']

        if any(cat in breach.data_types for cat in high_risk_categories):
            return True

        if breach.user_count > 1000:
            return True

        return False

    def draft_authority_notification(self, breach: DataBreach) -> dict:
        """Draft notification to supervisory authority"""
        return {
            'authority': self.SUPERVISORY_AUTHORITY,
            'breach_id': breach.breach_id,
            'detected_at': breach.detected_at.isoformat(),
            'description': breach.description,
            'data_types': breach.data_types,
            'approximate_users_affected': breach.user_count,
            'likely_consequences': self.assess_consequences(breach),
            'measures_taken': self.get_containment_measures(breach.breach_id),
            'contact_dpo': 'dpo@company.com'
        }

Breach Notification Template

## Data Breach Notification to [Supervisory Authority]

**Breach Reference:** [Internal reference number]

**Date of Detection:** [YYYY-MM-DD HH:MM UTC]

**Date of This Notification:** [YYYY-MM-DD HH:MM UTC]

**Nature of Breach:**
[Description including categories of data and approximate number of
individuals concerned]

**Categories Concerned:**

- [ ] Names
- [ ] Contact details
- [ ] Financial details
- [ ] [Other categories]

**Approximate Number Affected:** [X] individuals

**Likely Consequences:**
[Description of consequences for individuals]

**Measures Taken:**
[Measures already taken or proposed to address the breach]

**Contact:** [Data Protection Officer contact details]

Right to Erasure vs Backup Retention Trade-offs

DimensionImmediate Hard DeleteSoft Delete + Scheduled PurgeCryptographic Deletion (Key Destroy)
GDPR compliance speedFastest — data gone immediatelySlower — depends on purge scheduleFast — key destroyed, data unreadable
Backup implicationsBackups may still contain dataBackups contain soft-deleted dataBackups encrypted, unreadable without key
RecoverabilityNone after deletionRecoverable until purgeNone — key destruction irreversible
Operational complexityHigh — must track all copiesMediumMedium
Audit trail requirementDeletion must be loggedDeletion request logged, purge loggedKey destruction logged and auditable
Best forLow data volume, low retentionMost applicationsEncrypted storage systems

Production Failure Scenarios

FailureImpactMitigation
Deletion request processed before backup retention expiresUser data in old backups technically not deletedAccept with documented policy, or re-encrypt backups with user-specific keys
Third-party processor data share not trackedUnauthorized retention of user dataMaintain processor registry, audit shares quarterly
Consent withdrawal not propagated to all systemsProcessing continues without legal basisEvent-driven consent propagation, test with end-to-end audit
Data portability export containing other users’ dataCross-user data leakStrict isolation in export query, review before delivery
Breach notification deadline missed (72h)Regulatory fine — up to 2% of global revenueAutomated breach detection, pre-drafted notification templates

Capacity Estimation: Deletion Job Sizing and Backup Rehydration Time

GDPR right-to-erasure operations have capacity implications that must be planned.

Deletion job sizing formula:

records_deleted_per_hour = target_deletion_rate × database_writes_per_hour
total_deletion_time_hours = total_records_to_delete / records_deleted_per_hour

For an e-commerce platform deleting 1M user records where each record has 10 related tables with cascading deletes:

  • Deleting 1 user record cascades to ~500 related records across orders, addresses, preferences, sessions
  • At 100 deletions/second: 50,000 records/hour × 1,000 users = 50M related records/hour
  • For 1M users: 50M user records + cascading deletes = processing several hundred million total records
  • Practical deletion rate: 10-50K records/second depending on indexes and cascade depth
  • Total time for 500M records at 30K records/second: ~4.6 hours

For GDPR’s 30-day deletion window, this is manageable. But if the database has poorly indexed foreign keys, cascading deletes can lock tables for minutes, making batch deletion impractical. Always test deletion performance on a production-size dataset before deployment.

Backup rehydration time formula:

rehydration_time_hours = archived_data_gb / restore_throughput_gb_per_hour
restore_throughput_gb_per_hour = (restore_bandwidth_mbps / 8) × compression_factor

When GDPR requires deletion from backups (some interpretations require this): restoring from backup, deleting records, re-arc hiveing to new backup, and reuploading is the only approach if in-place deletion from compressed backups is impossible. For 10TB of compressed backups restored at 100MB/s effective throughput: 10TB / (100MB/s) = 100,000 seconds ≈ 28 hours per backup. With 30-day retention requiring scanning multiple backups, full backup rehydration for erasure can take weeks.

Practical approach: Do not store GDPR-covered data in long-retention compressed backups. Instead, use key-per-user encryption (each user’s PII encrypted with their own DEK). GDPR deletion = destroy the user’s DEK. The encrypted backup remains but is unreadable without the destroyed key. This reduces GDPR deletion from a weeks-long backup rehydration to a key destruction operation.

Interview Questions

Q: A user exercises their right to erasure. Their data spans 50 tables with complex foreign key relationships. How do you implement this?

Two approaches: hard delete with cascading erasure, or cryptographic erasure. Hard delete: identify all tables containing the user’s data, order deletions by dependency (child tables first), execute in batches to avoid locking. Use DELETE FROM orders WHERE customer_id = $1 in batches of 1000 with RETURNING 1 and commit between batches. This takes seconds to minutes for typical data volumes. Cryptographic erasure: encrypt each user’s PII with a per-user key, store the key in a separate key store. GDPR deletion = delete the user’s encryption key. The data remains in the database but is unreadable. This approach avoids complex cascading deletes and is compliant for most use cases.

Q: Your company has 5-year backup retention but GDPR requires deletion of a user’s data. Are you required to delete from backups?

Legal interpretation varies. Some GDPR authorities hold that backups are still “data in storage” and must be addressed. The practical resolution: use cryptographic erasure (encrypt with per-user keys) so old backups are unreadable without the destroyed key. If you cannot use cryptographic erasure, document the technical limitation and implement a scheduled re-encryption of backups without the deleted user’s key — acceptable to most auditors if the timeline is reasonable and documented.

Q: A user requests data portability. Their data exists in 12 different tables spanning orders, preferences, and activity logs. How do you generate a comprehensive export?

Data portability under GDPR Article 20 requires a “commonly used, machine-readable format” — typically JSON or CSV. Build a data export pipeline that: queries each relevant table filtering by user_id, joins related data into a structured JSON format with clear schema, includes metadata (export date, source tables, schema version), and generates a downloadable archive. Complexity arises with historical data that has been anonymized or aggregated — you must disclose what data is available versus what has been deleted or anonymized per retention policy. Export time scales with data volume: a typical user with 5 years of history might generate a 50-200MB export. Generate asynchronously (not in real-time) and email a download link.

Q: Your company is acquired mid-operation. How do you handle the acquired company’s GDPR obligations for data that came from EU users?

GDPR obligations transfer with the data. In most acquisitions, the acquiring company inherits the data processing obligations of the acquired company. Technical steps: treat the acquired company’s user database as a separate data controller, map all EU user data in the combined systems, obtain fresh consent or establish a legitimate interest legal basis for any new processing purposes, and ensure the acquisition contract includes data processing agreements that assign GDPR liability correctly. If both companies had EU userbases, their combined GDPR obligations may require re-registration of consent for the combined entity’s expanded processing scope.


Technical Architecture for GDPR

GDPR compliance isn’t a feature—it’s an architectural concern.

Data Classification

from enum import Enum

class DataCategory(Enum):
    PUBLIC = "public"           # No restrictions
    INTERNAL = "internal"       # Internal use only
    CONFIDENTIAL = "confidential"  # Access controlled
    RESTRICTED = "restricted"   # Maximum protection

# Map database columns to categories
COLUMN_CLASSIFICATION = {
    'users.email': DataCategory.CONFIDENTIAL,
    'users.ssn': DataCategory.RESTRICTED,
    'users.phone': DataCategory.CONFIDENTIAL,
    'users.address': DataCategory.CONFIDENTIAL,
    'transactions.amount': DataCategory.CONFIDENTIAL,
    'transactions.card_last_four': DataCategory.RESTRICTED,
    'activity_logs.ip_address': DataCategory.CONFIDENTIAL,
}

Retention Enforcement

-- Create retention policies
CREATE TABLE retention_policies (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(100) NOT NULL,
    column_name VARCHAR(100),  -- NULL means use table-level retention
    retention_period INTERVAL NOT NULL,
    legal_basis VARCHAR(255),  -- Why this retention period
    review_date DATE,
    active BOOLEAN DEFAULT true
);

INSERT INTO retention_policies (table_name, column_name, retention_period, legal_basis)
VALUES
    ('user_sessions', NULL, INTERVAL '30 days', 'Session management'),
    ('user_activity', NULL, INTERVAL '1 year', 'Analytics (anonymized after 90 days)'),
    ('support_tickets', NULL, INTERVAL '3 years', 'Customer service records'),
    ('financial_transactions', NULL, INTERVAL '7 years', 'Tax and legal compliance'),
    ('consent_records', NULL, INTERVAL '10 years', 'Proof of consent'),
    ('audit_logs', NULL, INTERVAL '5 years', 'Security and compliance');

Conclusion

GDPR compliance requires serious architectural thinking. The right to deletion means designing for deletion from day one—understanding your data flows, retention requirements, and the trade-offs around backups. Data portability means keeping exportable records. Consent tracking means proving consent with immutable audit trails.

The technical implementation isn’t optional—it’s how you demonstrate accountability under Article 5(2).

For related topics on protecting personal data, see our data masking guide for non-production environments, audit logging for tracking data access, and encryption at rest for protecting stored personal data.

Category

Related Posts

Data Masking Strategies for Non-Production Environments

Learn static and dynamic data masking: nulling, shuffling, hashing, and range techniques. Understand GDPR and PII considerations for PostgreSQL and Oracle.

#database #data-masking #security

PII Handling: Protecting Personal Data in Data Systems

Learn techniques for identifying, protecting, and managing personally identifiable information across your data platform.

#data-engineering #pii #data-protection

Audit Logging: Tracking Data Changes for Compliance

Implement audit logging for compliance. Learn row-level change capture with triggers and CDC, log aggregation strategies, and retention policies.

#database #audit #compliance