Docker Volumes: Persisting Data Across Container Lifecycles
Understand how to use Docker volumes and bind mounts to persist data, share files between containers, and manage stateful applications.
Docker Volumes: Persisting Data Across Container Lifecycles
Containers are ephemeral. Start a container, write some data, delete the container, and your data is gone. Unless you explicitly persist it.
Docker provides two mechanisms for persistent storage: volumes and bind mounts. Knowing when to use which, and how to manage storage for stateful applications, is essential for anyone running databases, file servers, or any application that cares about its data.
Introduction
The container philosophy favors stateless applications. Your application should be able to be destroyed and recreated from the image, with all configuration and state coming from the outside.
But reality intrudes. Databases need to keep data. File servers store files. Applications have local caches that matter.
Volumes solve this by decoupling storage from container lifecycle. The volume exists independent of any container. Delete the container, the volume persists. Create a new container, mount the same volume, and your data is there.
+------------------+ +------------------+
| Container A | | Container B |
| /data --------+------+--> /data |
+------------------+ +------------------+
| |
+------+------------------+------+
| Volume |
| (postgres_data) |
+-------------------------------------+
This separation also enables interesting patterns: multiple containers sharing the same data, backup strategies that do not interrupt running applications, and storage that travels with the application.
When to Use Each Mount Type
Choose the right mount type based on your requirements:
Use named volumes when:
- You need persistent data that survives container deletion
- You want Docker to manage storage location and backups
- Portability across hosts matters (volumes work identically on any Docker host)
- You are running databases, caches with persistence, or any stateful application
Use bind mounts when:
- You need development code hot-reload (edit files on host, see changes in container)
- You need to expose host configuration files or certificates
- Performance is critical and you accept the host dependency
Use tmpfs when:
- You are handling sensitive data that should never touch disk
- You need temporary processing space for high-speed operations
- You want data to disappear automatically when the container stops
Types of Mounts
Docker provides three types of mounts, each with different use cases.
Named Volumes
Named volumes are the recommended approach for most persistent data. Docker manages the storage location and provides features like backup and migration.
# Create a volume
docker volume create my_data
# Run a container with the volume
docker run -d \
-v my_data:/var/lib/postgresql/data \
postgres:15-alpine
In Docker Compose:
services:
db:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
Docker creates the volume automatically if it does not exist. You can also explicitly define it to set drivers, labels, or other options.
Bind Mounts
Bind mounts map a specific host directory into the container. The host path must exist before you start the container.
# Map host directory into container
docker run -d \
-v /host/path:/container/path \
nginx:latest
In Docker Compose:
services:
app:
image: node:20-alpine
volumes:
- ./src:/app/src:ro # Read-only bind mount
- /host/logs:/app/logs
Bind mounts are useful for development: you edit code on your host and see changes immediately in the container. They are also how you expose configuration files or certificates from the host.
The tradeoff: bind mounts couple your container to the host filesystem. Your container is no longer portable across hosts because it depends on a specific host path existing.
tmpfs Mounts
tmpfs mounts store data in memory only. The data never touches the disk. This is useful for sensitive data that should not persist: session tokens, temporary processing data, secrets you want to ensure disappear when the container stops.
# Store in memory only
docker run -d \
--tmpfs /app/secrets \
myapp:latest
In Docker Compose:
services:
cache:
image: redis:7-alpine
tmpfs:
- /data # Entire data directory in memory
tmpfs mounts are limited by memory. If you write too much data, the container gets killed by the OOM killer.
Comparing Mount Types
| Type | Use Case | Persistence | Performance | Portability |
|---|---|---|---|---|
| Named volume | Databases, app data | Survives container deletion | Good | Portable across hosts |
| Bind mount | Development, config files | Depends on host | Best | Host-dependent |
| tmpfs | Secrets, sensitive temp data | In-memory only | Fastest | N/A (ephemeral) |
Creating and Managing Volumes
# Create a volume
docker volume create my_volume
# List all volumes
docker volume ls
# Inspect a volume
docker volume inspect my_volume
# Remove unused volumes
docker volume prune
# Remove specific volume (container must not be using it)
docker volume rm my_volume
Volume Lifecycle
# Run container with volume
docker run -d --name db -v postgres_data:/var/lib/postgresql/data postgres:15-alpine
# Stop and remove container
docker stop db
docker rm db
# Data persists in volume
docker volume ls # postgres_data still there
# Run new container with same volume
docker run -d --name db2 -v postgres_data:/var/lib/postgresql/data postgres:15-alpine
# New container has access to previous data
Volume Lifecycle Flow
flow LR
A[Create Volume] --> B[Mount to Container]
B --> C[Write Data]
C --> D[Stop Container]
D --> E[Delete Container]
E --> F{Data Persists?}
F -->|Yes| G[Mount to New Container]
F -->|No| H[Backup Available?]
H -->|Yes| G
H -->|No| I[Data Lost]
G --> C
Volume Initialization
When you mount an empty volume into a container directory that already has data (from the image), Docker copies that data into the volume:
# Image has /data with default files
# Volume is empty
docker run -v my_volume:/data myimage
# Docker copies /data/* to volume
# Container sees the default files
# Volume now has the initialized data
This matters for database images: the database initializes its data directory on first start if the directory is empty. Mounting an existing volume with an already-initialized database skips the initialization.
Volume Drivers and Remote Storage
Named volumes can use different drivers for different storage backends. The default local driver stores data on the host filesystem.
Available Drivers
- local: Default driver, stores data on host filesystem
- rexray/ebs: Amazon EBS for persistent block storage
- rexray/gfs: Google Filestore
- vieux/sshfs: Mount remote filesystem over SSH
- cloudn/ossfs: Alibaba Cloud OSS
- Azure Files: Azure File Storage
Using the SSHFS Driver
For development across multiple machines:
# Install the driver
docker plugin install vieux/sshfs
# Create volume with SSHFS
docker volume create \
--driver vieux/sshfs \
-o sshcmd=user@remotehost:/data \
-o password=password \
remote_data
The container can now access files on the remote host as if they were local. This is useful for shared development data or accessing remote storage during builds.
EBS Driver Example
For production databases requiring durable block storage:
services:
db:
image: postgres:15-alpine
volumes:
- db_data:/var/lib/postgresql/data
deploy:
resources:
limits:
memory: 4G
volumes:
db_data:
driver: rexray/ebs
driver_opts:
size: 100
volumetype: gp3
iops: 3000
The volume gets created on Amazon EBS, attaches to the EC2 instance, and provides durable persistent storage that survives container restarts on different hosts.
Backup and Migration Strategies
Backing up container data requires understanding the volume mount points.
Backup a Volume
# Create backup container with volume mount
docker run --rm \
-v postgres_data:/data \
-v $(pwd):/backup \
alpine \
tar czf /backup/postgres_backup.tar.gz -C /data .
This creates an Alpine container, mounts the volume and the current directory, and archives the volume contents to the host.
Restore from Backup
# Extract backup into volume
docker run --rm \
-v postgres_data:/data \
-v $(pwd):/backup \
alpine \
sh -c "rm -rf /data/* && tar xzf /backup/postgres_backup.tar.gz -C /data"
This removes existing data and extracts the backup. Use with caution.
Using Named Volumes with Docker Compose
Docker Compose makes backup straightforward:
# Start services
docker-compose up -d
# Create backup of all volumes
for volume in $(docker-compose ls -q); do
docker run --rm \
-v ${volume}:/data \
-v $(pwd):/backup \
alpine \
tar czf /backup/${volume}.tar.gz -C /data .
done
Volume Migration Between Hosts
Moving a volume to a different Docker host:
- Stop the application using the volume
- Create a tar archive of the volume
- Copy the archive to the new host
- Extract into a new volume
- Start the application with the new volume
# On source host
docker stop myapp
docker run --rm \
-v my_volume:/data \
-v $(pwd):/backup \
alpine \
tar czf /backup/my_volume.tar.gz -C /data .
# Copy to new host
scp my_volume.tar.gz newhost:/tmp/
# On new host
docker volume create my_volume
docker run --rm \
-v my_volume:/data \
-v /tmp:/backup \
alpine \
tar xzf /backup/my_volume.tar.gz -C /data
Sharing Data Between Containers
Multiple containers can mount the same volume. This enables patterns like:
- Shared configuration: A volume with config files mounted into multiple services
- Shared data: A data volume mounted by a producer and consumer
- Web server logs: Centralized logging from multiple web server containers
Example: Shared Configuration
services:
web:
image: nginx:latest
volumes:
- config:/etc/nginx/conf.d
api:
image: myapi:latest
volumes:
- config:/app/config
volumes:
config:
A config management sidecar could populate the config volume, and both services would see the updated configuration without restart.
Example: Producer-Consumer Pattern
services:
producer:
image: myproducer:latest
volumes:
- shared_data:/app/output
depends_on:
- consumer
consumer:
image: myconsumer:latest
volumes:
- shared_data:/app/input
volumes:
shared_data:
The producer writes to /app/output, and the consumer reads from /app/input. Both hit the same underlying storage.
Warning: Concurrent Write Access
If multiple containers write to the same volume without coordination, data corruption is possible. Docker does not provide locking or coordination.
For databases, use database-native replication or clustering. For file-based data, implement application-level locking or use a distributed filesystem designed for concurrent access.
Volume Troubleshooting
Diagnosing Volume Problems
Check Volume Mounts
# See volume mounts for a container
docker inspect -f '{{json .Mounts}}' mycontainer | jq
# See all volumes and their containers
docker volume ls -f dangling=true # Unused volumes
Capacity Estimation
Estimate volume size before deployment to avoid running out of space:
PostgreSQL:
| Data Size | Write Volume | Recommended |
|---|---|---|
| Development | 1-10GB | Local volume |
| Small production | 10-50GB | Cloud block storage (EBS, PD) |
| Medium | 50-500GB | Network storage with HA |
| Large | 500GB+ | Distributed storage with replication |
Rule of thumb: allocate 3x your expected data size to account for WAL logs, indexes, and temporary tables.
MySQL:
| Data Size | Write Volume | Recommended |
|---|---|---|
| Development | 1-10GB | Local volume |
| Small production | 10-100GB | Cloud block storage |
| Medium | 100GB-1TB | SAN storage with snapshots |
| Large | 1TB+ | Distributed database storage |
Redis:
Redis is memory-first. Allocate maxmemory + 20-30% overhead for persistence (RDB snapshots and AOF).
# Redis memory estimation
maxmemory 2gb # Expected data
# Volume needs: ~2.6GB (data + background save buffers)
Repair Volume Permissions
Containers often run as non-root. If you copy data into a volume from an image, the data is owned by whatever user the image uses.
# Fix permissions by running as root temporarily
docker run --rm -v my_volume:/data alpine chown -R 1001:1001 /data
When Volumes Fill Up
# Check Docker data directory size
docker system df -v
# Find large volumes
docker volume ls | xargs -I{} docker volume inspect {} --format '{{.Name}}: {{.Mountpoint}}'
# Clean up
docker volume prune # Removes unused volumes
docker system prune -a # Removes unused images and containers too
Volume Fill-Up Causing Container Crash
When a volume fills up, the application running inside the container typically crashes or enters a read-only state.
Symptoms: Database refuses writes, “No space left on device” errors in container logs.
Diagnosis:
# Check volume size on host
df -h /var/lib/docker/volumes/postgres_data/_data
# Check inside container
docker exec -it postgres df -h
Mitigation: Set up monitoring on volume capacity. Use volume size limits where supported. Implement cleanup policies for old data.
Permission Denied After Volume Migration
When migrating volumes between hosts or after restoring from backup, permissions may not match what the container expects.
Symptoms: “Permission denied” errors when container tries to read/write volume, even though data exists.
Diagnosis:
# Check volume ownership on host
ls -la /var/lib/docker/volumes/postgres_data/_data
# Check container user and group
docker exec -it postgres id
Mitigation: Use docker run -u to match expected UID/GID, or ensure backup/restore preserves ownership. Consider using volume drivers that handle permission mapping.
Common Pitfalls / Anti-Patterns
Remote volume drivers (SSHFS, EBS, etc.) can fail due to network issues or authentication problems.
Symptoms: docker: Error response from daemon: VolumeDriver.Mount: rpc error: timeout.
Diagnosis:
# Check plugin status
docker plugin ls
# Check volume status
docker volume inspect my_volume
# Test SSH connectivity manually
ssh user@remotehost ls /data
Mitigation: Use volume driver plugins that support HA and reconnection. Test connectivity regularly. Consider using cloud-native storage (EBS, GCE Persistent Disk) with proper retry logic.
If a volume driver fails:
# Check driver status
docker plugin ls
# Enable a plugin if needed
docker plugin enable vieux/sshfs
# Inspect volume driver details
docker volume inspect my_volume
Observability Hooks
Track these metrics to prevent volume-related incidents:
Volume Capacity:
# Check volume size
docker volume inspect my_volume --format '{{.Mountpoint}}'
du -sh /var/lib/docker/volumes/my_volume/_data
# Set up alerting on volume usage
# 80% usage = warning threshold
df -h /var/lib/docker/volumes/
Backup Success/Failure:
# Check recent backup files
ls -la /backups/postgres_*.tar.gz
# Alert if no backup in 24h
find /backups -name "*.tar.gz" -mtime -1
Volume I/O Metrics:
# Monitor container disk I/O
docker stats --no-stream
# Check for I/O errors
dmesg | grep -i docker
Interview Questions
Expected answer points:
- Named volumes: Docker-managed persistent storage; survives container deletion; portable across hosts
- Bind mounts: Map host directory into container; useful for development with live code reloading
- tmpfs mounts: Store data in memory only; fastest option; data lost when container stops
- Choose based on persistence needs, portability requirements, and performance constraints
Expected answer points:
- Docker copies the existing data from the image into the volume on first mount
- The volume appears pre-populated when accessed from inside the container
- This allows database images to initialize their data directory on first start
- Mounting an already-initialized volume skips the initialization
- Important for reproducible container setups
Expected answer points:
- Create a backup container with the volume mounted and current directory as backup location
- Use tar to archive the volume contents: tar czf /backup/volume.tar.gz -C /data .
- Restore by extracting into an existing or new volume
- For production, consider volume drivers with built-in snapshot/backup features (EBS, GCE PD)
- Test backup/restore procedures before production deployment
Expected answer points:
- Bind mounts couple containers to host filesystem; container can potentially modify host files
- If container is compromised, attacker may gain access to host directories
- Containers should not run as root when using bind mounts to prevent privilege escalation
- Use read-only bind mounts (:ro) when container only needs to read data
- Avoid bind mounts for application data that needs Docker management features
Expected answer points:
- Docker does not provide locking or coordination for shared volume access
- Concurrent writes without coordination can cause data corruption
- For databases, use database-native replication or clustering features
- For file-based data, implement application-level locking or use distributed filesystems
- Consider using a sidecar pattern where one container manages the data
- Read-only bind mounts can prevent accidental writes
Expected answer points:
- Volume drivers like rexray/ebs provide persistent block storage from cloud providers
- Data survives container migration to different hosts
- Provides HA and replication features of cloud storage
- Enables proper backup strategies independent of container lifecycle
- Decouples data from any specific host; containers become truly ephemeral
- Tradeoff: adds complexity and may have different performance characteristics
Expected answer points:
- tmpfs stores data in memory only; data never touches disk
- When container stops, data is immediately lost (not written to disk during swap)
- Ideal for secrets, session tokens, temporary processing data
- tmpfs is fastest storage option; limited only by memory
- Tradeoff: limited by memory size; cannot persist data; not suitable for all workloads
- Remember container can still be inspected if host is compromised
Expected answer points:
- Containers often run as non-root; data copied into volume from image is owned by image user
- When migrating volumes between hosts, UID/GID may not match the new container
- Fix by running container with matching -u flag or fixing ownership with chown
- Use docker run --rm -v volume:/data alpine chown -R UID:GID /data to fix permissions
- Backup/restore operations may change ownership; verify after restore
Expected answer points:
- PostgreSQL: allocate 3x expected data size for WAL logs, indexes, and temporary tables
- MySQL: similar overhead for InnoDB logs and indexes
- Redis: maxmemory + 20-30% overhead for persistence (RDB snapshots and AOF)
- Consider growth rate and retention policies when sizing
- Set up monitoring at 80% usage threshold
- For production, use cloud block storage with easy expansion options
Expected answer points:
- When volume fills up, application typically crashes or enters read-only state
- Database refuses writes; "No space left on device" errors appear in logs
- Prevention: set up monitoring on volume capacity with alerts at 80%
- Implement cleanup and rotation policies for old data
- Use volume size limits where supported by the driver
- Regularly check docker system df -v for volume usage
Expected answer points:
- Bind mounts: host-dependent, fastest performance, but not portable across hosts
- Named volumes: Docker-managed, portable, backup/migration support built-in
- Bind mounts can have permission issues when container runs as non-root
- Named volumes handle permission management more gracefully
- For production databases, named volumes are generally preferred
- Bind mounts may be needed for legacy systems with specific path requirements
Expected answer points:
- Volumes persist independent of container lifecycle; deletion is separate
- When container is removed, volume remains with its data intact
- New containers can mount the same volume and access previous data
- Named volumes must be explicitly deleted with docker volume rm
- Dangling volumes (not used by any container) can be pruned with docker volume prune
Expected answer points:
- Network connectivity issues can cause volume mount to timeout
- Authentication problems with remote credentials block volume access
- EBS volumes may become detached if instance is stopped
- SSHFS performance degrades with network latency
- Mitigation: use HA configurations, test failover in staging, use cloud-native storage
- Monitor volume driver connectivity and set up alerts for disconnection
Expected answer points:
- Stop the application using the volume
- Create a tar archive of the volume using a backup container
- Copy the archive to the new host via scp or similar
- Create a new volume on the new host
- Extract the archive into the new volume
- Start the application with the new volume mount
- For cloud volumes, use provider migration tools instead
Expected answer points:
- Volume initialization happens when an empty volume is first mounted to a container
- Docker copies data from the image's volume content into the named volume
- Volume attachment is the act of making a volume available to a container
- Initialization only happens once; subsequent mounts just attach the volume
- If volume already contains data, initialization is skipped
Expected answer points:
- Monitor volume capacity: df -h on host, docker volume ls with inspect
- Track I/O metrics: docker stats --no-stream shows block I/O
- Set up alerting at 80% capacity threshold
- Monitor backup success/failure with checks for recent backup files
- Track container restart count as indirect volume health indicator
- Use volume driver metrics when available (cloud provider dashboards)
Expected answer points:
- Docker has no native volume snapshot mechanism
- Workaround: use volume driver features (EBS snapshots, GCE snapshots)
- Or use backup/restore pattern with tar archives
- Third-party tools like Restic can provide incremental backups
- Consider external databases with built-in backup features
- Plan for data migration rather than live snapshots
Expected answer points:
- Use tmpfs for data that should never persist to disk: secrets, tokens, session data
- Use tmpfs when performance is critical and data loss is acceptable
- Use tmpfs when you need to prevent sensitive data from being written to disk
- Do not use tmpfs for data that needs to survive container restart
- tmpfs is appropriate for caches, temporary processing, sensitive in-memory data
- tmpfs size is limited by available memory; consider container memory limits
Expected answer points:
- SELinux/AppArmor may block container access to host directories via bind mounts
- Docker automatically sets appropriate labels for named volumes
- Bind mounts may require :z or :Z flag to relabel for container access
- :ro makes mount read-only but does not change security labels
- If experiencing permission denials with bind mounts, check security module logs
- --security-opt flag can pass custom label configurations
Expected answer points:
- NFS volumes provide shared storage across Docker hosts for multi-container access
- Use when containers on different hosts need access to same data
- Performance depends on network latency; may not suit high-IOPS workloads
- Requires NFS server configuration and proper network connectivity
- Consider concurrent access patterns; NFS has different locking than local storage
- May need volume driver plugin for proper NFS integration
Further Reading
- Docker Volumes Documentation - Official guide to Docker volumes
- Bind Mounts Documentation - Using host directories in containers
- Docker SSHFS Plugin - Remote volume mounting over SSH
- Volume Plugins - Docker volume driver plugins
- Container Images - Building optimized container images
- Docker Networking - Container communication patterns
- Docker Compose - Multi-container applications
Conclusion
Category
Related Posts
Kubernetes Storage: PersistentVolumes, Claims, and StorageClasses
Implement persistent storage in Kubernetes using PersistentVolumes, PersistentVolumeClaims, and StorageClasses for stateful applications across different cloud providers.
Artifact Management: Build Caching, Provenance, and Retention
Manage CI/CD artifacts effectively—build caching for speed, provenance tracking for security, and retention policies for cost control.
Container Images: Building, Optimizing, and Distributing
Learn how Docker container images work, layer caching strategies, image optimization techniques, and how to publish your own images to container registries.