Understanding Docker Volumes: A Comprehensive Tutorial
Containers are designed to be short-lived and stateless, which creates interesting challenges when applications need to store and access data that should survive container restarts or replacements.
Docker volumes provide the solution to this fundamental problem. They allow data to persist beyond the lifecycle of individual containers and provide a mechanism for containers to share data with one another.
This tutorial will guide you through the concepts, usage patterns, and best practices for working with Docker volumes effectively.
The challenge of data persistence in containers
To understand why volumes are necessary, let's first explore the nature of containers. When Docker runs a container, it creates a writable layer on top of the immutable image layers. Any files written by the application inside the container get stored in this writable layer. However, when the container is removed, all data in this layer is permanently deleted.
This behavior aligns with the container philosophy of being ephemeral and replaceable, but it presents a clear problem for applications that need to maintain state, such as databases, content management systems, or any application that generates user data. Without a persistence mechanism, restarting a container would mean losing all accumulated data.
Consider a simple example: if you run a MySQL database in a container and store the database files within the container's filesystem, stopping and removing that container would destroy all your data. Clearly, this is not acceptable for production environments.
Docker storage options: An overview
Docker provides three main options for managing data persistence:
Bind mounts create a direct link between a directory on the host machine and a directory in the container. The container can read and write to this directory as if it were part of its own filesystem. Any changes made by the container are immediately visible on the host and vice versa.
tmpfs mounts store data in the host system's memory rather than on disk. This approach is useful for sensitive information that should not be persisted to disk but needs to be available to the container.
Volumes are the most flexible and recommended way to persist data. Unlike bind mounts, volumes are completely managed by Docker. They can be named, backed up, and transferred between containers more easily. They also work seamlessly across different operating systems, unlike bind mounts which can have permission issues.
The key difference between these options lies in how and where the data is stored, and how that storage is managed. Volumes provide the most abstraction and are generally considered the best practice for most persistence needs in Docker.
Docker volumes in depth
Docker volumes are managed by Docker itself and stored in a part of the host
filesystem that's managed by Docker (typically /var/lib/docker/volumes/
on
Linux systems). This location is important because it means:
- The volumes are isolated from the core functionality of the host system.
- Docker can provide guarantees about the behavior of the volumes.
- Non-Docker processes should not modify this data.
Volumes have their own lifecycle independent of containers. You can create a volume, attach it to one or more containers, detach it, and reattach it to different containers - all without losing any data. This separation of concerns allows for much greater flexibility in how you manage persistent data.
Volume drivers further extend this capability by allowing volumes to be stored on remote hosts or cloud providers, encrypted, or given other special capabilities.
Creating and managing volumes
Let's start with the basics of creating and managing Docker volumes. The
simplest way to create a volume is with the docker volume create
command:
docker volume create my_data
my_data
This creates a named volume called "my_data" that you can reference in other Docker commands. By default, Docker uses the local driver, which stores the volume on the local host filesystem.
You can also create a volume with specific options using the --opt
flag:
docker volume create --driver local \
--opt type=none \
--opt device=/home/user/data \
--opt o=bind \
my_custom_volume
This creates a volume that binds to a specific directory on the host system.
Inspecting volumes
To see detailed information about a volume, use the inspect command:
docker volume inspect my_data
[
{
"CreatedAt": "2023-03-17T14:05:22Z",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/my_data/_data",
"Name": "my_data",
"Options": {},
"Scope": "local"
}
]
The output shows important information like where the volume is stored on the host system (the Mountpoint) and which driver it uses.
Listing and removing volumes
To see all volumes on your system:
docker volume ls
DRIVER VOLUME NAME
local my_data
local my_custom_volume
To remove a volume when you no longer need it:
docker volume rm my_data
If you have unused volumes (not attached to any container), you can clean them all up at once:
docker volume prune -f
Deleted Volumes:
my_custom_volume
Total reclaimed space: 10B
The -f
flag force-removes volumes without asking for confirmation, so use it
cautiously.
Attaching volumes to containers
Now that we understand how to create and manage volumes, let's see how to use them with containers.
The traditional way to attach a volume to a container is with the -v
flag:
docker run -d \
--name postgres_db \
-v my_postgres_data:/var/lib/postgresql/data \
-e POSTGRES_PASSWORD=mysecretpassword \
postgres:14
In this command:
-d
runs the container in detached mode--name postgres_db
gives our container a name-v my_postgres_data:/var/lib/postgresql/data
mounts the volume "mypostgresdata" to the directory where PostgreSQL stores its data-e POSTGRES_PASSWORD=mysecretpassword
sets an environment variablepostgres:14
is the image we're using
The syntax for the -v
flag is volume_name:container_path[:options]
. If the
volume doesn't exist, Docker creates it automatically.
A more explicit and verbose alternative is the --mount
flag:
docker run -d \
--name postgres_db2 \
--mount source=my_postgres_data2,target=/var/lib/postgresql/data \
-e POSTGRES_PASSWORD=mysecretpassword \
postgres:14
The --mount
flag is more explicit about its parameters, making it clearer what
each part does. It's especially useful for more complex mounting scenarios.
If you want to ensure a container can't modify the data in a volume, you can mount it as read-only:
docker run -d \
--name web_server \
--mount source=web_content,target=/usr/share/nginx/html,readonly \
nginx:latest
With the -v
syntax, you would use:
docker run -d \
--name web_server \
-v web_content:/usr/share/nginx/html:ro \
nginx:latest
Volume use cases
Let's explore some common use cases for Docker volumes with practical examples.
The most common use case for volumes is database persistence. Here's a complete example setting up a MySQL database with volume persistence:
docker volume create mysql_data
docker run -d \
--name mysql_db \
-v mysql_data:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=my-secret-pw \
-e MYSQL_DATABASE=myapp \
-e MYSQL_USER=myapp_user \
-e MYSQL_PASSWORD=myapp_pass \
-p 3306:3306 \
mysql:8.0
Now, even if you stop and remove the container, your data will persist:
docker rm -f mysql_db
docker run -d \
--name mysql_db_new \
-v mysql_data:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=my-secret-pw \
-p 3306:3306 \
mysql:8.0
The new container will have access to all the data created by the original container.
Configuration sharing
Volumes can be used to share configuration files between containers:
docker volume create app_config
docker run --rm \
-v app_config:/config \
alpine:latest \
sh -c 'echo "server_url=https://example.com" > /config/app.conf'
docker run -d \
--name app1 \
-v app_config:/etc/myapp \
myapp:latest
docker run -d \
--name app2 \
-v app_config:/etc/myapp \
myapp:latest
This pattern allows you to centralize configuration and ensure consistency across multiple containers.
Sharing build artifacts
In a continuous integration setup, you might want to share build artifacts between build and runtime containers:
# Create a volume for build artifacts
docker volume create build_artifacts
# Run a build container
docker run --rm \
-v build_artifacts:/build \
-v $(pwd)/src:/source \
node:14 \
sh -c 'cd /source && npm install && npm run build && cp -r dist/* /build/'
# Run a web server to serve the built artifacts
docker run -d \
--name web \
-v build_artifacts:/usr/share/nginx/html \
-p 8080:80 \
nginx:latest
This separates the build environment from the runtime environment while allowing the runtime to access the build output.
Advanced volume concepts
So far, we've focused on named volumes, which are explicitly created and given a
name. Docker also supports anonymous volumes, which are created automatically
when you use the -v
flag with just a container path:
docker run -d \
--name postgres_anon \
-v /var/lib/postgresql/data \
-e POSTGRES_PASSWORD=mysecretpassword \
postgres:14
Docker assigns a random ID to this volume. You can see it with:
docker inspect postgres_anon -f '{{ .Mounts }}'
[{volume 8a7f02923b9a13a3c991ebf119bf1a2c4d968a91ef9730e0c4762a57d17a1234 /var/lib/docker/volumes/8a7f02923b9a13a3c991ebf119bf1a2c4d968a91ef9730e0c4762a57d17a1234/_data /var/lib/postgresql/data local true }]
Anonymous volumes are harder to manage and reuse, so named volumes are generally preferred for persistent data.
Using volumes with Docker Compose
Docker Compose makes it easy to define and use volumes for multi-container applications:
services:
db:
image: postgres:16
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: example
POSTGRES_USER: myapp
POSTGRES_DB: myapp_db
web:
build: ./app
volumes:
- ./app:/code
- ./static:/app/static
ports:
- "8000:8000"
depends_on:
- db
volumes:
postgres_data:
In this example, we define a named volume postgres_data
for the database
service and use both a bind mount (./app:/code
) and a host directory mount
(./static:/app/static
) for the web service.
To start the services with their volumes:
docker-compose up -d
[+] Running 4/4
✔ Network app_default Created 0.5s
✔ Volume "postgres_data" Created 0.0s
. . .
Security considerations
When working with volumes, consider these security best practices:
- Manage permissions carefully: Docker volumes inherit the permissions of the container. Ensure that the container runs with appropriate user permissions.
docker run -d \
--name secure_nginx \
--user $(id -u):$(id -g) \
-v web_content:/usr/share/nginx/html \
nginx:latest
Use read-only volumes when containers only need to read data.
Don't expose sensitive host directories through bind mounts.
Consider volume encryption for sensitive data, using third-party volume drivers that support encryption.
Final thoughts
Docker volumes provide the essential bridge between the ephemeral nature of containers and the need for persistent, shared data. By understanding how to create, manage, and utilize volumes effectively, you can build robust containerized applications that maintain state across container lifecycles.
Thanks for reading!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for us
Build on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github