Data volumes for database services

I have a general questions about data volumes and database services.
I noticed in a docker-compose that none of the databases had volumes for data files.
We are using mysql, kafka, aerospike, redis, and rabbitmq.
I saw a performance win I after I added the volumes. I am running a QE automation tests that bring up dockerized mysql, kafka, aerospike, redis, and rabbitmq.

e.g.
volumes:
- percona-shared:/shared
- percona-mysql:/var/lib/mysql

The tests finished faster with the data volumes.

Question: Is it more performant to use volumes for database services? If you don’t use volumes the data is basically in memory?

Steve

If no volume is mapped against a container path, then the data is written into the ephemeral container filesystem. Depending on the storage driver, this can be a full copy of the previous layer + the new date, or just new data in the write layer of the container filesystem. If the container is deleted, its filesystem is deleted as well - it is not suitable for persistent data that should survive re-creation of container (e.g. when you deploy a new container based on the new image of the same repo)

A volume without any declaration on the other hand is nothing else than a docker managed path in the filesystem them gets bind mounted into the container filesystem. Of course this is going to be faster.

You could also declare a volume with the tempfs type, which then would indeed be in memory.

A volume w/o declaration is the default when you don’t use named volumes? This uses ephemeral container filesystem.

faster (volume w/o declaration) uses the ephemeral container filesytem)
my-redis:
image: redis:7.0.4

slower(named volume)
my-redis:
image: redis:7.0.4
volumes:
- redis-data:/data

Ok volume w/o declaration is an anonymous volume.
new question
performance between anonymous volume and ephemeral filesystem. Which is faster? I take it the anonymous volume. I will run a test.

Steve

I forget to mention that if a path inside a container is declared as VOLUME, then if no volume is mapped against that folder an anonymous volume will be used - which behaves exactly like a default named volume, just with the difference that the name consists of random alphanumeric characters.

A volume (=storage from outside the container filesystem) is mounted into a path of the container filesystem. So whatever is written into that path is written outside the container.

I don’t understand what you try to tell me with fast and slower, as both use whatever filesystem is mounted in /var/lib/docker, but the container filesystem has a storage driver on top of. It doesn’t make sense the volume that directly writes on the host filesystem should be slower than a storage driver that writes on the host filesystem.

Before you ask further questions, please share the output of docker info, so we see what filesystem and storage driver you use, the exact image (or it’s dockerfile) + your compose file. I want to see and understand what you are doing.

Got it. I added the anonymous volumes.

$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.10.3)
  compose: Docker Compose (Docker Inc., v2.15.1)
  dev: Docker Dev Environments (Docker Inc., v0.1.0)
  extension: Manages Docker extensions (Docker Inc., v0.2.18)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.25.0)
  scout: Command line tool for Docker Scout (Docker Inc., v0.6.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 22
 Server Version: 20.10.23
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2456e983eb9e37e47538f59ea18f2043c9a73640
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.15.90.1-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 15.46GiB
 Name: docker-desktop
 ID: 5OMP:Q5EZ:KMX5:5RV3:LFJF:IRJ5:U3L2:QSZS:WMY4:YM5G:DB3C:74RV
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

version: '2.3'

services:
  dsp-engine:
    image: xxxx/dsp-engine:1.0.233-release-1.0
    container_name: dsp-engine
    hostname: dspengine
    environment:
      - JMX_HOSTNAME=0.0.0.0
      - APP_REDIS_CLUSTER_NODES=
      - APP_REDIS_STANDALONE_HOST=dsp-engine-redis
      - GRPC_SERVER_PORT=9011
      - SERVER_PORT=8090
    volumes:
      - ./conf/dsp:/app/active/conf
    ports:
      - "8090:8090"
      - "9011:9011"
      - "5010:5010"
    depends_on:
      - dsp-engine-redis
    networks:
      - adengine-bridge

  dsp-engine-redis:
    image: redis:7.0.4
    container_name: dsp-engine-redis
    hostname: redis
    restart: always
    volumes:
      - /data
    ports:
      - "6379:6379"
    networks:
      - adengine-bridge

  rabbitmq:
    image: xxx/amp-adengine-rabbitmq:3.2
    container_name: adengine-rabbitmq
    ports:
      - "5672:5672"
      - "15672:15672"
    networks:
      - adengine-bridge
    restart: always
    volumes:
      - /var/lib/rabbitmq

  aerospike:
    image: admarketplace/amp-adengine-aerospike:4.4
    container_name: adengine-aerospike
    hostname: aerospike
    ports:
      - "3000:3000"
    networks:
      - adengine-bridge
    restart: always
    volumes:
      - /opt/aerospike/data

  percona:
    image: xxx/amp-adengine-percona:5.7-git
    container_name: adengine-percona
    volumes:
      - percona-shared:/shared
      - /var/lib/mysql
    ports:
      - "3306:3306"
    networks:
      - adengine-bridge
    restart: always

  ampx-indexer:
    image: admarketplace/ampx-indexer:12.0
    container_name: adengine-ampx-indexer
    depends_on:
      percona:
        condition: service_healthy
    volumes:
      - percona-shared:/shared:ro
      - shared-index-location:/ampx-indexer/data
    networks:
      - adengine-bridge
    restart: always

  zookeeper:
    image: xxx/amp-zookeeper:3.4
    container_name: adengine-zookeeper
    networks:
      - adengine-bridge
    restart: always

  kafka:
    image: xxx/amp-adengine-kafka:1.1
    container_name: adengine-kafka
    depends_on:
      zookeeper:
        condition: service_healthy
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_HOST_NAME: kafka
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/kafka/data
    networks:
      - adengine-bridge
    restart: always

  adengine:
    image: xxx/amp-adengine:1.3.3155-dsp-engine-mvp
    container_name: adengine-adengine
    hostname: adengine
    ports:
      - "2222:2222"
      - "9000:9000"
      - "9191:9191"
      - "9001:9001"
    depends_on:
      ampx-indexer:
        condition: service_healthy
      rabbitmq:
        condition: service_healthy
      kafka:
        condition: service_healthy
    environment:
      - DSP_ENGINE_USE_MOCK=false
      - DSP_ENGINE_GRPC_HOST=dspengine
      - JAVA_RMI_HOST=adengine
    volumes:
      - shared-index-location:/ampx-indexer/data:ro
    networks:
      - adengine-bridge
    restart: always

  sspengine:
    image: xxx/ssp-engine:1.0.578-dsp-engine-mvp
    container_name: ssp-engine
    hostname: sspengine
    depends_on:
      adengine:
        condition: service_healthy
    healthcheck:
      test: ps aux | grep ssp-engine.jar
      interval: 10s
      timeout: 60s
      start_period: 10s
    environment:
      - INT_ENV=IT
      - DATACENTER=RIC1
    volumes:
      - ../../../maxmind-geoip-data:/mnt/bind/data/GeoIP
    ports:
      - "8080:8080"
    networks:
      - adengine-bridge
    restart: always

  dsp-engine-it:
    container_name: dsp-engine-it
    build:
      context: ./
      args:
        - UID=${UID}
        - GID=${GID}
    environment:
      - MAVEN_OPTS=-Xms512M -Xmx4096M
    volumes:
      - ../../:/home/dsp/qe-dsp-engine-api-tests
      - $HOME/.m2/repository:/home/dsp/.m2/repository
      - ./settings.xml:/home/dsp/.m2/settings.xml
      - ./toolchains.xml:/home/dsp/.m2/toolchains.xml
    depends_on:
      adengine:
        condition: service_healthy
      sspengine:
        condition: service_healthy
    networks:
      - adengine-bridge

networks:
  adengine-bridge:
    driver: bridge

volumes:
  percona-shared:
  shared-index-location:

I haven’t notived this is in the wrong catagory: “Tips & HowTos” is mend to be used when you publish them, not to ask for them. I moved it Docker Desktop for Windows and also wrapped your output and compose file in preformated text blocks, so they become easier to read.

Since you use Docker Desktop, try to avoid mapping Windows host folders into container folders, as this is always going to be slow.
Depending on whether you perform docker compose up in a Windows terminal, or in a terminal of a WSL distribution where you enabled Docker Desktop integration for (where the compose file and everything else is also on the filesystem of the WSL distribution and not on the windows host → e.g. /mnt/c, /mnt/d), you might see a huge difference in performance for following services:

  • dsp-engine
  • sspengine
  • dsp-engine-it

With Docker Desktop for Windows, anonymous volumes and named volumes store their data inside the WSL distribution docker-desktop-data, which uses an ext4 filesystem, and is way faster than mapping Windows host folders into the container.

Zookeeper has no volume declaration in your compose file and therefor relies on whether the Dockerfile of the image has VOLUME declarations for container paths, and thus implicitly creates an anonymous volume when the container is created. Generally, you can skip declaring anonymous volumes for containers in the compose file, when the Dockerfile of the used image already declares the path as VOLUME. Though, declaring them makes it obvious that those folders are stored outside the container filesystem in an anonymous volume.

Note: neither anonymous, nor named volumes are deleted when docker compose down is used, unless the argument -v is appended. Make sure that you remove orphaned anonymous volumes every now and then, to not occupy unnecessary space.

update: fixed some typos, changed some wording.

Hi Metin,
Thank you so much for the info. I have used docker before but not as a dev. This has been really insightful.
Stevev