Docker compose stop doesn't stop included containers

Hey there,
I’ve been running into a weird issue with databases corrupting and I seem to have tracked it down to one main issue: Apparently the docker-compose.yml files that I have included aren’t stopped when I run docker compose stop on the main docker-compose.yml.
Any idea why that is? Some containers come with preconfigured docker-compose files and I also want to de-clutter my main docker-compose.yml where possible
Any ideas how I can make this work?

Please share the output of docker compose config so we can see the full rendered config. Make sure to anonymize public ips, domain names and secrets, but leave the rest untouched.

Without having any context information, I would assume it’s related to the grace period when containers are stopped. After a container receives the SIGTERM signal cause by docker compose stop, the container is killed hard with a SIGKILL signal. The default grace period is 10 seconds, which is suited for most cases, but not all.

Your grace period is probably too short for the database to finish a graceful termination, hence resulting in corruption of data.

It’s definitely not the grace period. If I enter the directory with the second docker-compose.yml file and run docker compose stop there the containers in question immediately stop just as expected.

The corruption seems to stem from the fact that something is stopped that the database server depends on, while the database server itself isn’t stopped. I’m not sure what exactly causes that but docker doesn’t even try to stop the containers in question. If I look at the docker compose stop status, the containers don’t even show up.

The containers in question are invidious and invidious-db

name: podconf
services:
  caddy:
    build:
      context: /podconf
      dockerfile: /podconf/build/caddy.Dockerfile
      args:
        VER: 2.7.6
    container_name: caddy
    extra_hosts:
      - host.docker.internal=host-gateway
    networks:
      php-fpm: null
      webaccess: null
    ports:
      - mode: ingress
        target: 80
        published: "80"
        protocol: tcp
      - mode: ingress
        target: 443
        published: "443"
        protocol: tcp
    volumes:
      - type: bind
        source: /podconf/caddy/
        target: /etc/caddy
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/www/html
        target: /var/www/html
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/www/files-public
        target: /var/www/files
        bind:
          create_host_path: true
  cgit:
    build:
      context: /podconf/build/clearlinux-dockerfiles/cgit
      dockerfile: Dockerfile
    container_name: cgit
    networks:
      webaccess: null
    restart: always
    volumes:
      - type: bind
        source: /podconf/cgit-conf/cgitrc/
        target: /etc/cgitrc
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/cgit-conf/cgit/
        target: /usr/share/cgit
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/www/git/
        target: /var/www/git
        bind:
          create_host_path: true
  invidious:
    depends_on:
      invidious-db:
        condition: service_started
        required: true
    environment:
      INVIDIOUS_CONFIG: |
    healthcheck:
      test:
        - CMD-SHELL
        - wget -nv --tries=1 --spider http://127.0.0.1:3000/api/v1/trending || exit 1
      timeout: 5s
      interval: 30s
      retries: 2
    image: quay.io/invidious/invidious:latest
    networks:
      default: null
    ports:
      - mode: ingress
        target: 3000
        published: "9100"
        protocol: tcp
    restart: unless-stopped
  invidious-db:
    healthcheck:
      test:
        - CMD-SHELL
        - pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
    image: docker.io/library/postgres:14
    networks:
      default: null
    restart: unless-stopped
    volumes:
      - type: bind
        source: /podconf/invidious/postgres_data
        target: /var/lib/postgresql/data
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/invidious/config/sql
        target: /config/sql
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/invidious/docker/init-invidious-db.sh
        target: /docker-entrypoint-initdb.d/init-invidious-db.sh
        bind:
          create_host_path: true
  navidrome:
    container_name: navidrome
    environment:
      ND_BASEURL: /audio
      ND_LOGLEVEL: debug
      ND_PORT: "4040"
      ND_SCANSCHEDULE: 1h
      ND_SESSIONTIMEOUT: 24h
    image: deluan/navidrome:latest
    networks:
      webaccess: null
    restart: always
    user: 1000:1000
    volumes:
      - type: bind
        source: /podconf/navidrome/data
        target: /data
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/navidrome/music
        target: /music
        read_only: true
        bind:
          create_host_path: true
  netdata:
    cap_add:
      - SYS_PTRACE
      - SYS_ADMIN
    container_name: netdata
    image: docker.io/netdata/netdata:latest
    network_mode: host
    pid: host
    restart: always
    security_opt:
      - apparmor=unconfined
    volumes:
      - type: bind
        source: /podconf/netdata/config
        target: /etc/netdata
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/netdata/lib
        target: /var/lib/netdata
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/netdata/cache
        target: /var/cache/netdata
        bind:
          create_host_path: true
      - type: bind
        source: /etc/passwd
        target: /host/etc/passwd
        read_only: true
        bind:
          create_host_path: true
      - type: bind
        source: /etc/group
        target: /host/etc/group
        read_only: true
        bind:
          create_host_path: true
      - type: bind
        source: /proc
        target: /host/proc
        read_only: true
        bind:
          create_host_path: true
      - type: bind
        source: /sys
        target: /host/sys
        read_only: true
        bind:
          create_host_path: true
      - type: bind
        source: /etc/os-release
        target: /host/etc/os-release
        read_only: true
        bind:
          create_host_path: true
      - type: bind
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        read_only: true
        bind:
          create_host_path: true
  php-fpm:
    container_name: php-fpm
    image: docker.io/bitnami/php-fpm:latest
    networks:
      php-fpm: null
    volumes:
      - type: bind
        source: /podconf/www/html
        target: /var/www/html
        bind:
          create_host_path: true
  pihole:
    cap_add:
      - NET_ADMIN
    container_name: pihole
    environment:
      WEBPASSWORD: admin
    hostname: pihole
    image: pihole/pihole:latest
    networks:
      wg-easy:
        ipv4_address: 10.8.1.3
    ports:
      - mode: ingress
        target: 80
        published: "8081"
        protocol: tcp
    restart: unless-stopped
    volumes:
      - type: bind
        source: /podconf/pihole/etc-pihole/
        target: /etc/pihole
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/pihole/etc-dnsmasq.d/
        target: /etc/dnsmasq.d
        bind:
          create_host_path: true
  redlib:
    container_name: redlib
    image: quay.io/redlib/redlib:latest
    networks:
      default: null
    ports:
      - mode: ingress
        target: 8080
        published: "9120"
        protocol: tcp
  stirling-pdf:
    container_name: stirling-pdf
    environment:
      DOCKER_ENABLE_SECURITY: "false"
      INSTALL_BOOK_AND_ADVANCED_HTML_OPS: "true"
      LANGS: en_GB
    image: frooodle/s-pdf:latest
    networks:
      default: null
    ports:
      - mode: ingress
        target: 8080
        published: "9020"
        protocol: tcp
    volumes:
      - type: bind
        source: /podconf/stirling-pdf/tesseract
        target: /usr/share/tessdata
        bind:
          create_host_path: true
      - type: bind
        source: /podconf/stirling-pdf/config
        target: /configs
        bind:
          create_host_path: true

networks:
  default:
    name: podconf_default
    driver: bridge
  php-fpm:
    name: podconf_php-fpm
    driver: bridge
  webaccess:
    name: podconf_webaccess

I just wanna add that it’s entirely possible that I’ve missed something here. I’m not saying that I’m a docker expert, but when I include a file and the docs say that it helps with de-clutting the main docker-compose.yml then I expect included files to be stopped as well when I stop my main file.

It is highly unlikely that your problem is not related to the process inside the container being killed by SIGKILL, otherwise it would terminate gracefully without data corruption.

So either the process inside the container requires a different signal than SIGTERM to initiate the graceful termination, and always stops by a SIGKILL after the grade period. Or the graceful termination takes longer than the grace period.

I’m pretty sure it is.

I checked the logs (by attaching! to the containers), and immediately after I start the stopping process I get some weird corruption errors in the postgres logs, without any of the containers actually kicking me out (and as such showing that they’re being stopped, or have been killed).
Not even the main invidious container is showing any signs that it’s being stopped.

any updates on this? I don’t think that’s intended behaviour…

The first idea is to get a clear problem description.

You write about corrupt database, it seems your compose is using a default postgres image. Does the postgres log tell you it is shut down hard or needs to recover upon start?

Are the bind mounts local folders or on a remote share?

First off: all the mounts are bind mounts, nothing remote.

previously I wasn’t really willing to kill my containers again so I wanted to avoid triggering this again, but figuring it would help with the resolution I just did exactly that.

Having changed absolutely nothing, my included containers did now shut down as expected. I’m very confused now, especially because I somewhat trust by problem solving skills.

No visible, errors, but it still corrupted:

invidious-db-1  | 2024-07-16 08:48:25.375 UTC [1] LOG:  could not open file "postmaster.pid": No such file or directory
invidious-db-1  | 2024-07-16 08:48:25.375 UTC [1] LOG:  performing immediate shutdown because data directory lock file is invalid
invidious-db-1  | 2024-07-16 08:48:25.375 UTC [1] LOG:  received immediate shutdown request
invidious-db-1  | 2024-07-16 08:48:25.375 UTC [1] LOG:  could not open file "postmaster.pid": No such file or directory
invidious-db-1  | 2024-07-16 08:48:25.401 UTC [1] LOG:  database system is shut down
invidious-db-1  |
invidious-db-1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
invidious-db-1  |
invidious-db-1  | 2024-07-16 08:48:26.039 UTC [1] LOG:  starting PostgreSQL 14.12 (Debian 14.12-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
invidious-db-1  | 2024-07-16 08:48:26.040 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
invidious-db-1  | 2024-07-16 08:48:26.040 UTC [1] LOG:  listening on IPv6 address "::", port 5432
invidious-db-1  | 2024-07-16 08:48:26.046 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
invidious-db-1  | 2024-07-16 08:48:26.057 UTC [27] LOG:  database system was shut down at 2024-07-16 08:48:23 UTC
invidious-db-1  | 2024-07-16 08:48:26.058 UTC [27] LOG:  invalid primary checkpoint record
invidious-db-1  | 2024-07-16 08:48:26.058 UTC [27] PANIC:  could not locate a valid checkpoint record
invidious-db-1  | 2024-07-16 08:48:26.483 UTC [1] LOG:  startup process (PID 27) was terminated by signal 6: Aborted
invidious-db-1  | 2024-07-16 08:48:26.483 UTC [1] LOG:  aborting startup due to startup process failure
invidious-db-1  | 2024-07-16 08:48:26.498 UTC [1] LOG:  database system is shut down

I just noticed that data directory lock thing for the first time… - What is that?

Edit: I just checked, that pid file definitely exists. It really shouldn’t go missing randomly…

This is the log of both shutdown and startup of postgres?

yes. That’s the script running docker compose stop and then docker compose start

It’s really strange, haven’t seen that before. Did you check your disk for corruption?

Usually Postgres is very stable. You could try to isolate and run fewer services.

Sorry, you probably won’t get around some more testing.

yes. I’ve ran a five hour memtest, I’ve checked for corruption, I’ve updated all packages and I’ve checked the drive health. Everything is fine - or so it seems.

It also can’t really be another container interfering because I’ve got them all separated.

What a peculiar issue…

It’s still happening. Next time I run the backup I’m gonna have auditctl watch the file so I can find out which process deleted it. This is so annoying.

I think I figured it out. Docker had somehow lost track of the database container and that somehow caused my postmaster.pid to get lost and everything (maybe invidious was using the container that was lost, but docker had created another invidious database container, which then shut down the files as it should, deleting the postmaster.pid file before the actual relevant container had a chance to do that. I’m no expert regarding what might be possible here though.)

I noticed that the invidious-db container was still running after I had ran docker compose stop. I then decided to just clean out all the old garbage and whatever else there was (among them another database container that had gotten lost) using docker prune.

After that I ran docker compose up and since then my invidious instance no longer corrupts out of the blue when stopping all containers. (At least for now. I hope this is going to last)

This included file stuff just keeps on giving.

I just discovered that docker starts the containers within one included docker-compose.yml but not within the other. The paths are definitely correct and I’ve also tried moving the problematic one to the top of the include list, but no dice. I’ve also checked the file perms and they’re identical. The problematic docker-compose.yml works perfectly fine on its own.

I wonder if it has something to do with the directory name. one contains a hyphen, the other one doesn’t.

What on earth is going on here. I do not understand…

It seems like I found the issue and it seems like a little bit of an oversight to me.

Apparently one docker-compose.yml cannot find other (even included) files’ stopped or started containers.