Docker Volumes: Persisting Data Across Container Lifecycles
Understand how to use Docker volumes and bind mounts to persist data, share files between containers, and manage stateful applications.
Docker Volumes: Persisting Data Across Container Lifecycles
Containers are ephemeral. Start a container, write some data, delete the container, and your data is gone. Unless you explicitly persist it.
Docker provides two mechanisms for persistent storage: volumes and bind mounts. Understanding when to use which, and how to manage storage for stateful applications, is essential for anyone running databases, file servers, or any application that cares about its data.
Why Volumes Matter
The container philosophy favors stateless applications. Your application should be able to be destroyed and recreated from the image, with all configuration and state coming from the outside.
But reality intrudes. Databases need to keep data. File servers store files. Applications have local caches that matter.
Volumes solve this by decoupling storage from container lifecycle. The volume exists independent of any container. Delete the container, the volume persists. Create a new container, mount the same volume, and your data is there.
+------------------+ +------------------+
| Container A | | Container B |
| /data --------+------+--> /data |
+------------------+ +------------------+
| |
+------+------------------+------+
| Volume |
| (postgres_data) |
+-------------------------------------+
This separation also enables interesting patterns: multiple containers sharing the same data, backup strategies that do not interrupt running applications, and storage that travels with the application.
When to Use Each Mount Type
Choose the right mount type based on your requirements:
Use named volumes when:
- You need persistent data that survives container deletion
- You want Docker to manage storage location and backups
- Portability across hosts matters (volumes work identically on any Docker host)
- You are running databases, caches with persistence, or any stateful application
Use bind mounts when:
- You need development code hot-reload (edit files on host, see changes in container)
- You need to expose host configuration files or certificates
- Performance is critical and you accept the host dependency
Use tmpfs when:
- You are handling sensitive data that should never touch disk
- You need temporary processing space for high-speed operations
- You want data to disappear automatically when the container stops
Types of Mounts
Docker provides three types of mounts, each with different use cases.
Named Volumes
Named volumes are the recommended approach for most persistent data. Docker manages the storage location and provides features like backup and migration.
# Create a volume
docker volume create my_data
# Run a container with the volume
docker run -d \
-v my_data:/var/lib/postgresql/data \
postgres:15-alpine
In Docker Compose:
services:
db:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
Docker creates the volume automatically if it does not exist. You can also explicitly define it to set drivers, labels, or other options.
Bind Mounts
Bind mounts map a specific host directory into the container. The host path must exist before you start the container.
# Map host directory into container
docker run -d \
-v /host/path:/container/path \
nginx:latest
In Docker Compose:
services:
app:
image: node:20-alpine
volumes:
- ./src:/app/src:ro # Read-only bind mount
- /host/logs:/app/logs
Bind mounts are useful for development: you edit code on your host and see changes immediately in the container. They are also how you expose configuration files or certificates from the host.
The tradeoff: bind mounts couple your container to the host filesystem. Your container is no longer portable across hosts because it depends on a specific host path existing.
tmpfs Mounts
tmpfs mounts store data in memory only. The data never touches the disk. This is useful for sensitive data that should not persist: session tokens, temporary processing data, secrets you want to ensure disappear when the container stops.
# Store in memory only
docker run -d \
--tmpfs /app/secrets \
myapp:latest
In Docker Compose:
services:
cache:
image: redis:7-alpine
tmpfs:
- /data # Entire data directory in memory
tmpfs mounts are limited by memory. If you write too much data, the container gets killed by the OOM killer.
Comparing Mount Types
| Type | Use Case | Persistence | Performance | Portability |
|---|---|---|---|---|
| Named volume | Databases, app data | Survives container deletion | Good | Portable across hosts |
| Bind mount | Development, config files | Depends on host | Best | Host-dependent |
| tmpfs | Secrets, sensitive temp data | In-memory only | Fastest | N/A (ephemeral) |
Creating and Managing Volumes
# Create a volume
docker volume create my_volume
# List all volumes
docker volume ls
# Inspect a volume
docker volume inspect my_volume
# Remove unused volumes
docker volume prune
# Remove specific volume (container must not be using it)
docker volume rm my_volume
Volume Lifecycle
# Run container with volume
docker run -d --name db -v postgres_data:/var/lib/postgresql/data postgres:15-alpine
# Stop and remove container
docker stop db
docker rm db
# Data persists in volume
docker volume ls # postgres_data still there
# Run new container with same volume
docker run -d --name db2 -v postgres_data:/var/lib/postgresql/data postgres:15-alpine
# New container has access to previous data
Volume Lifecycle Flow
flow LR
A[Create Volume] --> B[Mount to Container]
B --> C[Write Data]
C --> D[Stop Container]
D --> E[Delete Container]
E --> F{Data Persists?}
F -->|Yes| G[Mount to New Container]
F -->|No| H[Backup Available?]
H -->|Yes| G
H -->|No| I[Data Lost]
G --> C
Volume Initialization
When you mount an empty volume into a container directory that already has data (from the image), Docker copies that data into the volume:
# Image has /data with default files
# Volume is empty
docker run -v my_volume:/data myimage
# Docker copies /data/* to volume
# Container sees the default files
# Volume now has the initialized data
This matters for database images: the database initializes its data directory on first start if the directory is empty. Mounting an existing volume with an already-initialized database skips the initialization.
Volume Drivers and Remote Storage
Named volumes can use different drivers for different storage backends. The default local driver stores data on the host filesystem.
Available Drivers
- local: Default driver, stores data on host filesystem
- rexray/ebs: Amazon EBS for persistent block storage
- rexray/gfs: Google Filestore
- vieux/sshfs: Mount remote filesystem over SSH
- cloudn/ossfs: Alibaba Cloud OSS
- 港港Azure: Azure File Storage
Using the SSHFS Driver
For development across multiple machines:
# Install the driver
docker plugin install vieux/sshfs
# Create volume with SSHFS
docker volume create \
--driver vieux/sshfs \
-o sshcmd=user@remotehost:/data \
-o password=password \
remote_data
The container can now access files on the remote host as if they were local. This is useful for shared development data or accessing remote storage during builds.
EBS Driver Example
For production databases requiring durable block storage:
services:
db:
image: postgres:15-alpine
volumes:
- db_data:/var/lib/postgresql/data
deploy:
resources:
limits:
memory: 4G
volumes:
db_data:
driver: rexray/ebs
driver_opts:
size: 100
volumetype: gp3
iops: 3000
The volume gets created on Amazon EBS, attaches to the EC2 instance, and provides durable persistent storage that survives container restarts on different hosts.
Backup and Migration Strategies
Backing up container data requires understanding the volume mount points.
Backup a Volume
# Create backup container with volume mount
docker run --rm \
-v postgres_data:/data \
-v $(pwd):/backup \
alpine \
tar czf /backup/postgres_backup.tar.gz -C /data .
This creates an Alpine container, mounts the volume and the current directory, and archives the volume contents to the host.
Restore from Backup
# Extract backup into volume
docker run --rm \
-v postgres_data:/data \
-v $(pwd):/backup \
alpine \
sh -c "rm -rf /data/* && tar xzf /backup/postgres_backup.tar.gz -C /data"
This removes existing data and extracts the backup. Use with caution.
Using Named Volumes with Docker Compose
Docker Compose makes backup straightforward:
# Start services
docker-compose up -d
# Create backup of all volumes
for volume in $(docker-compose ls -q); do
docker run --rm \
-v ${volume}:/data \
-v $(pwd):/backup \
alpine \
tar czf /backup/${volume}.tar.gz -C /data .
done
Volume Migration Between Hosts
Moving a volume to a different Docker host:
- Stop the application using the volume
- Create a tar archive of the volume
- Copy the archive to the new host
- Extract into a new volume
- Start the application with the new volume
# On source host
docker stop myapp
docker run --rm \
-v my_volume:/data \
-v $(pwd):/backup \
alpine \
tar czf /backup/my_volume.tar.gz -C /data .
# Copy to new host
scp my_volume.tar.gz newhost:/tmp/
# On new host
docker volume create my_volume
docker run --rm \
-v my_volume:/data \
-v /tmp:/backup \
alpine \
tar xzf /backup/my_volume.tar.gz -C /data
Sharing Data Between Containers
Multiple containers can mount the same volume. This enables patterns like:
- Shared configuration: A volume with config files mounted into multiple services
- Shared data: A data volume mounted by a producer and consumer
- Web server logs: Centralized logging from multiple web server containers
Example: Shared Configuration
services:
web:
image: nginx:latest
volumes:
- config:/etc/nginx/conf.d
api:
image: myapi:latest
volumes:
- config:/app/config
volumes:
config:
A config management sidecar could populate the config volume, and both services would see the updated configuration without restart.
Example: Producer-Consumer Pattern
services:
producer:
image: myproducer:latest
volumes:
- shared_data:/app/output
depends_on:
- consumer
consumer:
image: myconsumer:latest
volumes:
- shared_data:/app/input
volumes:
shared_data:
The producer writes to /app/output, and the consumer reads from /app/input. Both hit the same underlying storage.
Warning: Concurrent Write Access
If multiple containers write to the same volume without coordination, data corruption is possible. Docker does not provide locking or coordination.
For databases, use database-native replication or clustering. For file-based data, implement application-level locking or use a distributed filesystem designed for concurrent access.
Volume Troubleshooting
Check Volume Mounts
# See volume mounts for a container
docker inspect -f '{{json .Mounts}}' mycontainer | jq
# See all volumes and their containers
docker volume ls -f dangling=true # Unused volumes
Capacity Estimation
Estimate volume size before deployment to avoid running out of space:
PostgreSQL:
| Data Size | Write Volume | Recommended |
|---|---|---|
| Development | 1-10GB | Local volume |
| Small production | 10-50GB | Cloud block storage (EBS, PD) |
| Medium | 50-500GB | Network storage with HA |
| Large | 500GB+ | Distributed storage with replication |
Rule of thumb: allocate 3x your expected data size to account for WAL logs, indexes, and temporary tables.
MySQL:
| Data Size | Write Volume | Recommended |
|---|---|---|
| Development | 1-10GB | Local volume |
| Small production | 10-100GB | Cloud block storage |
| Medium | 100GB-1TB | SAN storage with snapshots |
| Large | 1TB+ | Distributed database storage |
Redis:
Redis is memory-first. Allocate maxmemory + 20-30% overhead for persistence (RDB snapshots and AOF).
# Redis memory estimation
maxmemory 2gb # Expected data
# Volume needs: ~2.6GB (data + background save buffers)
Repair Volume Permissions
Containers often run as non-root. If you copy data into a volume from an image, the data is owned by whatever user the image uses.
# Fix permissions by running as root temporarily
docker run --rm -v my_volume:/data alpine chown -R 1001:1001 /data
When Volumes Fill Up
# Check Docker data directory size
docker system df -v
# Find large volumes
docker volume ls | xargs -I{} docker volume inspect {} --format '{{.Name}}: {{.Mountpoint}}'
# Clean up
docker volume prune # Removes unused volumes
docker system prune -a # Removes unused images and containers too
Production Failure Scenarios
Docker volumes fail in ways that are not always obvious. Here are the most common issues.
Volume Fill-Up Causing Container Crash
When a volume fills up, the application running inside the container typically crashes or enters a read-only state.
Symptoms: Database refuses writes, “No space left on device” errors in container logs.
Diagnosis:
# Check volume size on host
df -h /var/lib/docker/volumes/postgres_data/_data
# Check inside container
docker exec -it postgres df -h
Mitigation: Set up monitoring on volume capacity. Use volume size limits where supported. Implement cleanup policies for old data.
Permission Denied After Volume Migration
When migrating volumes between hosts or after restoring from backup, permissions may not match what the container expects.
Symptoms: “Permission denied” errors when container tries to read/write volume, even though data exists.
Diagnosis:
# Check volume ownership on host
ls -la /var/lib/docker/volumes/postgres_data/_data
# Check container user and group
docker exec -it postgres id
Mitigation: Use docker run -u to match expected UID/GID, or ensure backup/restore preserves ownership. Consider using volume drivers that handle permission mapping.
Volume Driver Plugin Failure
Remote volume drivers (SSHFS, EBS, etc.) can fail due to network issues or authentication problems.
Symptoms: docker: Error response from daemon: VolumeDriver.Mount: rpc error: timeout.
Diagnosis:
# Check plugin status
docker plugin ls
# Check volume status
docker volume inspect my_volume
# Test SSH connectivity manually
ssh user@remotehost ls /data
Mitigation: Use volume driver plugins that support HA and reconnection. Test connectivity regularly. Consider using cloud-native storage (EBS, GCE Persistent Disk) with proper retry logic.
Volume Driver Issues
If a volume driver fails:
# Check driver status
docker plugin ls
# Enable a plugin if needed
docker plugin enable vieux/sshfs
# Inspect volume driver details
docker volume inspect my_volume
Observability Hooks
Track these metrics to prevent volume-related incidents:
Volume Capacity:
# Check volume size
docker volume inspect my_volume --format '{{.Mountpoint}}'
du -sh /var/lib/docker/volumes/my_volume/_data
# Set up alerting on volume usage
# 80% usage = warning threshold
df -h /var/lib/docker/volumes/
Backup Success/Failure:
# Check recent backup files
ls -la /backups/postgres_*.tar.gz
# Alert if no backup in 24h
find /backups -name "*.tar.gz" -mtime -1
Volume I/O Metrics:
# Monitor container disk I/O
docker stats --no-stream
# Check for I/O errors
dmesg | grep -i docker
Conclusion
Volumes are how Docker handles persistent data. Named volumes are the default choice for application data: managed by Docker, portable across hosts, and easy to back up. Bind mounts suit development workflows and configuration. tmpfs handles sensitive ephemeral data.
For production databases, consider volume drivers that provide network storage or cloud block storage. This decouples your data from any specific host and enables proper backup strategies.
Now that you understand how Docker stores data, explore Container Images to understand how to build smaller, more secure images, or Docker Compose to see how volumes work in multi-container applications.
Trade-off Summary
| Mount Type | Persistence | Performance | Portability | Best For |
|---|---|---|---|---|
| Named volume | Survives deletion | Good | Portable across hosts | Databases, app data |
| Bind mount | Host-dependent | Best | Host-dependent | Development, config files |
| tmpfs | In-memory only | Fastest | N/A (ephemeral) | Secrets, sensitive temp data |
| User-defined bridge | Survives deletion | Good | Portable across hosts | Multi-container apps |
| Default bridge | Survives deletion | Good | Portable | Legacy single-container apps |
Quick Recap Checklist
Use this checklist when working with Docker volumes:
- Use named volumes for persistent application data (databases, caches)
- Use bind mounts for development code hot-reload
- Use tmpfs for sensitive ephemeral data that should never touch disk
- Always back up volumes before container deletion or upgrades
- Monitor volume capacity and set up alerts at 80% usage
- Test backup and restore procedures before production
- Use volume drivers (EBS, GCE PD) for production stateful workloads
- Check permissions after migrating volumes between hosts
- Never fill up volumes — implement cleanup and rotation policies
- Test volume driver reconnection and failover in staging
Category
Related Posts
Kubernetes Storage: PersistentVolumes, Claims, and StorageClasses
Implement persistent storage in Kubernetes using PersistentVolumes, PersistentVolumeClaims, and StorageClasses for stateful applications across different cloud providers.
Artifact Management: Build Caching, Provenance, and Retention
Manage CI/CD artifacts effectively—build caching for speed, provenance tracking for security, and retention policies for cost control.
Container Images: Building, Optimizing, and Distributing
Learn how Docker container images work, layer caching strategies, image optimization techniques, and how to publish your own images to container registries.