Docker Volumes: Persisting Data Across Container Lifecycles

Understand how to use Docker volumes and bind mounts to persist data, share files between containers, and manage stateful applications.

published: reading time: 14 min read

Docker Volumes: Persisting Data Across Container Lifecycles

Containers are ephemeral. Start a container, write some data, delete the container, and your data is gone. Unless you explicitly persist it.

Docker provides two mechanisms for persistent storage: volumes and bind mounts. Understanding when to use which, and how to manage storage for stateful applications, is essential for anyone running databases, file servers, or any application that cares about its data.

Why Volumes Matter

The container philosophy favors stateless applications. Your application should be able to be destroyed and recreated from the image, with all configuration and state coming from the outside.

But reality intrudes. Databases need to keep data. File servers store files. Applications have local caches that matter.

Volumes solve this by decoupling storage from container lifecycle. The volume exists independent of any container. Delete the container, the volume persists. Create a new container, mount the same volume, and your data is there.

+------------------+     +------------------+
|  Container A     |     |  Container B    |
|  /data --------+------+--> /data          |
+------------------+     +------------------+
                          |                 |
                   +------+------------------+------+
                   |           Volume                    |
                   |     (postgres_data)                 |
                   +-------------------------------------+

This separation also enables interesting patterns: multiple containers sharing the same data, backup strategies that do not interrupt running applications, and storage that travels with the application.

When to Use Each Mount Type

Choose the right mount type based on your requirements:

Use named volumes when:

  • You need persistent data that survives container deletion
  • You want Docker to manage storage location and backups
  • Portability across hosts matters (volumes work identically on any Docker host)
  • You are running databases, caches with persistence, or any stateful application

Use bind mounts when:

  • You need development code hot-reload (edit files on host, see changes in container)
  • You need to expose host configuration files or certificates
  • Performance is critical and you accept the host dependency

Use tmpfs when:

  • You are handling sensitive data that should never touch disk
  • You need temporary processing space for high-speed operations
  • You want data to disappear automatically when the container stops

Types of Mounts

Docker provides three types of mounts, each with different use cases.

Named Volumes

Named volumes are the recommended approach for most persistent data. Docker manages the storage location and provides features like backup and migration.

# Create a volume
docker volume create my_data

# Run a container with the volume
docker run -d \
    -v my_data:/var/lib/postgresql/data \
    postgres:15-alpine

In Docker Compose:

services:
  db:
    image: postgres:15-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Docker creates the volume automatically if it does not exist. You can also explicitly define it to set drivers, labels, or other options.

Bind Mounts

Bind mounts map a specific host directory into the container. The host path must exist before you start the container.

# Map host directory into container
docker run -d \
    -v /host/path:/container/path \
    nginx:latest

In Docker Compose:

services:
  app:
    image: node:20-alpine
    volumes:
      - ./src:/app/src:ro # Read-only bind mount
      - /host/logs:/app/logs

Bind mounts are useful for development: you edit code on your host and see changes immediately in the container. They are also how you expose configuration files or certificates from the host.

The tradeoff: bind mounts couple your container to the host filesystem. Your container is no longer portable across hosts because it depends on a specific host path existing.

tmpfs Mounts

tmpfs mounts store data in memory only. The data never touches the disk. This is useful for sensitive data that should not persist: session tokens, temporary processing data, secrets you want to ensure disappear when the container stops.

# Store in memory only
docker run -d \
    --tmpfs /app/secrets \
    myapp:latest

In Docker Compose:

services:
  cache:
    image: redis:7-alpine
    tmpfs:
      - /data # Entire data directory in memory

tmpfs mounts are limited by memory. If you write too much data, the container gets killed by the OOM killer.

Comparing Mount Types

TypeUse CasePersistencePerformancePortability
Named volumeDatabases, app dataSurvives container deletionGoodPortable across hosts
Bind mountDevelopment, config filesDepends on hostBestHost-dependent
tmpfsSecrets, sensitive temp dataIn-memory onlyFastestN/A (ephemeral)

Creating and Managing Volumes

# Create a volume
docker volume create my_volume

# List all volumes
docker volume ls

# Inspect a volume
docker volume inspect my_volume

# Remove unused volumes
docker volume prune

# Remove specific volume (container must not be using it)
docker volume rm my_volume

Volume Lifecycle

# Run container with volume
docker run -d --name db -v postgres_data:/var/lib/postgresql/data postgres:15-alpine

# Stop and remove container
docker stop db
docker rm db

# Data persists in volume
docker volume ls  # postgres_data still there

# Run new container with same volume
docker run -d --name db2 -v postgres_data:/var/lib/postgresql/data postgres:15-alpine

# New container has access to previous data

Volume Lifecycle Flow

flow LR
    A[Create Volume] --> B[Mount to Container]
    B --> C[Write Data]
    C --> D[Stop Container]
    D --> E[Delete Container]
    E --> F{Data Persists?}
    F -->|Yes| G[Mount to New Container]
    F -->|No| H[Backup Available?]
    H -->|Yes| G
    H -->|No| I[Data Lost]
    G --> C

Volume Initialization

When you mount an empty volume into a container directory that already has data (from the image), Docker copies that data into the volume:

# Image has /data with default files
# Volume is empty

docker run -v my_volume:/data myimage

# Docker copies /data/* to volume
# Container sees the default files
# Volume now has the initialized data

This matters for database images: the database initializes its data directory on first start if the directory is empty. Mounting an existing volume with an already-initialized database skips the initialization.

Volume Drivers and Remote Storage

Named volumes can use different drivers for different storage backends. The default local driver stores data on the host filesystem.

Available Drivers

  • local: Default driver, stores data on host filesystem
  • rexray/ebs: Amazon EBS for persistent block storage
  • rexray/gfs: Google Filestore
  • vieux/sshfs: Mount remote filesystem over SSH
  • cloudn/ossfs: Alibaba Cloud OSS
  • 港港Azure: Azure File Storage

Using the SSHFS Driver

For development across multiple machines:

# Install the driver
docker plugin install vieux/sshfs

# Create volume with SSHFS
docker volume create \
    --driver vieux/sshfs \
    -o sshcmd=user@remotehost:/data \
    -o password=password \
    remote_data

The container can now access files on the remote host as if they were local. This is useful for shared development data or accessing remote storage during builds.

EBS Driver Example

For production databases requiring durable block storage:

services:
  db:
    image: postgres:15-alpine
    volumes:
      - db_data:/var/lib/postgresql/data
    deploy:
      resources:
        limits:
          memory: 4G

volumes:
  db_data:
    driver: rexray/ebs
    driver_opts:
      size: 100
      volumetype: gp3
      iops: 3000

The volume gets created on Amazon EBS, attaches to the EC2 instance, and provides durable persistent storage that survives container restarts on different hosts.

Backup and Migration Strategies

Backing up container data requires understanding the volume mount points.

Backup a Volume

# Create backup container with volume mount
docker run --rm \
    -v postgres_data:/data \
    -v $(pwd):/backup \
    alpine \
    tar czf /backup/postgres_backup.tar.gz -C /data .

This creates an Alpine container, mounts the volume and the current directory, and archives the volume contents to the host.

Restore from Backup

# Extract backup into volume
docker run --rm \
    -v postgres_data:/data \
    -v $(pwd):/backup \
    alpine \
    sh -c "rm -rf /data/* && tar xzf /backup/postgres_backup.tar.gz -C /data"

This removes existing data and extracts the backup. Use with caution.

Using Named Volumes with Docker Compose

Docker Compose makes backup straightforward:

# Start services
docker-compose up -d

# Create backup of all volumes
for volume in $(docker-compose ls -q); do
    docker run --rm \
        -v ${volume}:/data \
        -v $(pwd):/backup \
        alpine \
        tar czf /backup/${volume}.tar.gz -C /data .
done

Volume Migration Between Hosts

Moving a volume to a different Docker host:

  1. Stop the application using the volume
  2. Create a tar archive of the volume
  3. Copy the archive to the new host
  4. Extract into a new volume
  5. Start the application with the new volume
# On source host
docker stop myapp
docker run --rm \
    -v my_volume:/data \
    -v $(pwd):/backup \
    alpine \
    tar czf /backup/my_volume.tar.gz -C /data .

# Copy to new host
scp my_volume.tar.gz newhost:/tmp/

# On new host
docker volume create my_volume
docker run --rm \
    -v my_volume:/data \
    -v /tmp:/backup \
    alpine \
    tar xzf /backup/my_volume.tar.gz -C /data

Sharing Data Between Containers

Multiple containers can mount the same volume. This enables patterns like:

  • Shared configuration: A volume with config files mounted into multiple services
  • Shared data: A data volume mounted by a producer and consumer
  • Web server logs: Centralized logging from multiple web server containers

Example: Shared Configuration

services:
  web:
    image: nginx:latest
    volumes:
      - config:/etc/nginx/conf.d

  api:
    image: myapi:latest
    volumes:
      - config:/app/config

volumes:
  config:

A config management sidecar could populate the config volume, and both services would see the updated configuration without restart.

Example: Producer-Consumer Pattern

services:
  producer:
    image: myproducer:latest
    volumes:
      - shared_data:/app/output
    depends_on:
      - consumer

  consumer:
    image: myconsumer:latest
    volumes:
      - shared_data:/app/input

volumes:
  shared_data:

The producer writes to /app/output, and the consumer reads from /app/input. Both hit the same underlying storage.

Warning: Concurrent Write Access

If multiple containers write to the same volume without coordination, data corruption is possible. Docker does not provide locking or coordination.

For databases, use database-native replication or clustering. For file-based data, implement application-level locking or use a distributed filesystem designed for concurrent access.

Volume Troubleshooting

Check Volume Mounts

# See volume mounts for a container
docker inspect -f '{{json .Mounts}}' mycontainer | jq

# See all volumes and their containers
docker volume ls -f dangling=true  # Unused volumes

Capacity Estimation

Estimate volume size before deployment to avoid running out of space:

PostgreSQL:

Data SizeWrite VolumeRecommended
Development1-10GBLocal volume
Small production10-50GBCloud block storage (EBS, PD)
Medium50-500GBNetwork storage with HA
Large500GB+Distributed storage with replication

Rule of thumb: allocate 3x your expected data size to account for WAL logs, indexes, and temporary tables.

MySQL:

Data SizeWrite VolumeRecommended
Development1-10GBLocal volume
Small production10-100GBCloud block storage
Medium100GB-1TBSAN storage with snapshots
Large1TB+Distributed database storage

Redis:

Redis is memory-first. Allocate maxmemory + 20-30% overhead for persistence (RDB snapshots and AOF).

# Redis memory estimation
maxmemory 2gb  # Expected data
# Volume needs: ~2.6GB (data + background save buffers)

Repair Volume Permissions

Containers often run as non-root. If you copy data into a volume from an image, the data is owned by whatever user the image uses.

# Fix permissions by running as root temporarily
docker run --rm -v my_volume:/data alpine chown -R 1001:1001 /data

When Volumes Fill Up

# Check Docker data directory size
docker system df -v

# Find large volumes
docker volume ls | xargs -I{} docker volume inspect {} --format '{{.Name}}: {{.Mountpoint}}'

# Clean up
docker volume prune  # Removes unused volumes
docker system prune -a  # Removes unused images and containers too

Production Failure Scenarios

Docker volumes fail in ways that are not always obvious. Here are the most common issues.

Volume Fill-Up Causing Container Crash

When a volume fills up, the application running inside the container typically crashes or enters a read-only state.

Symptoms: Database refuses writes, “No space left on device” errors in container logs.

Diagnosis:

# Check volume size on host
df -h /var/lib/docker/volumes/postgres_data/_data

# Check inside container
docker exec -it postgres df -h

Mitigation: Set up monitoring on volume capacity. Use volume size limits where supported. Implement cleanup policies for old data.

Permission Denied After Volume Migration

When migrating volumes between hosts or after restoring from backup, permissions may not match what the container expects.

Symptoms: “Permission denied” errors when container tries to read/write volume, even though data exists.

Diagnosis:

# Check volume ownership on host
ls -la /var/lib/docker/volumes/postgres_data/_data

# Check container user and group
docker exec -it postgres id

Mitigation: Use docker run -u to match expected UID/GID, or ensure backup/restore preserves ownership. Consider using volume drivers that handle permission mapping.

Volume Driver Plugin Failure

Remote volume drivers (SSHFS, EBS, etc.) can fail due to network issues or authentication problems.

Symptoms: docker: Error response from daemon: VolumeDriver.Mount: rpc error: timeout.

Diagnosis:

# Check plugin status
docker plugin ls

# Check volume status
docker volume inspect my_volume

# Test SSH connectivity manually
ssh user@remotehost ls /data

Mitigation: Use volume driver plugins that support HA and reconnection. Test connectivity regularly. Consider using cloud-native storage (EBS, GCE Persistent Disk) with proper retry logic.

Volume Driver Issues

If a volume driver fails:

# Check driver status
docker plugin ls

# Enable a plugin if needed
docker plugin enable vieux/sshfs

# Inspect volume driver details
docker volume inspect my_volume

Observability Hooks

Track these metrics to prevent volume-related incidents:

Volume Capacity:

# Check volume size
docker volume inspect my_volume --format '{{.Mountpoint}}'
du -sh /var/lib/docker/volumes/my_volume/_data

# Set up alerting on volume usage
# 80% usage = warning threshold
df -h /var/lib/docker/volumes/

Backup Success/Failure:

# Check recent backup files
ls -la /backups/postgres_*.tar.gz

# Alert if no backup in 24h
find /backups -name "*.tar.gz" -mtime -1

Volume I/O Metrics:

# Monitor container disk I/O
docker stats --no-stream

# Check for I/O errors
dmesg | grep -i docker

Conclusion

Volumes are how Docker handles persistent data. Named volumes are the default choice for application data: managed by Docker, portable across hosts, and easy to back up. Bind mounts suit development workflows and configuration. tmpfs handles sensitive ephemeral data.

For production databases, consider volume drivers that provide network storage or cloud block storage. This decouples your data from any specific host and enables proper backup strategies.

Now that you understand how Docker stores data, explore Container Images to understand how to build smaller, more secure images, or Docker Compose to see how volumes work in multi-container applications.

Trade-off Summary

Mount TypePersistencePerformancePortabilityBest For
Named volumeSurvives deletionGoodPortable across hostsDatabases, app data
Bind mountHost-dependentBestHost-dependentDevelopment, config files
tmpfsIn-memory onlyFastestN/A (ephemeral)Secrets, sensitive temp data
User-defined bridgeSurvives deletionGoodPortable across hostsMulti-container apps
Default bridgeSurvives deletionGoodPortableLegacy single-container apps

Quick Recap Checklist

Use this checklist when working with Docker volumes:

  • Use named volumes for persistent application data (databases, caches)
  • Use bind mounts for development code hot-reload
  • Use tmpfs for sensitive ephemeral data that should never touch disk
  • Always back up volumes before container deletion or upgrades
  • Monitor volume capacity and set up alerts at 80% usage
  • Test backup and restore procedures before production
  • Use volume drivers (EBS, GCE PD) for production stateful workloads
  • Check permissions after migrating volumes between hosts
  • Never fill up volumes — implement cleanup and rotation policies
  • Test volume driver reconnection and failover in staging

Category

Related Posts

Kubernetes Storage: PersistentVolumes, Claims, and StorageClasses

Implement persistent storage in Kubernetes using PersistentVolumes, PersistentVolumeClaims, and StorageClasses for stateful applications across different cloud providers.

#kubernetes #storage #persistence

Artifact Management: Build Caching, Provenance, and Retention

Manage CI/CD artifacts effectively—build caching for speed, provenance tracking for security, and retention policies for cost control.

#cicd #devops #artifacts

Container Images: Building, Optimizing, and Distributing

Learn how Docker container images work, layer caching strategies, image optimization techniques, and how to publish your own images to container registries.

#docker #containers #devops