Slow network startup in docker-compose

Hi there,

I have the problem, that it takes up to one minute until container can communicate to the outer world, allthough it’s state is “running”. The site effect of this is, that containers which ned things from external in entrypoint.sh will crash, as the can’t download or communicate with their external resources.

Is there a way to troubleshoot a containers network initialization?

My Infrastructure:
Containerhost: QNAP with Intel i5 8400T and 64GB RAM with latest QTS and Container Station using Docker-Compose
Storage: nvme SSD Raid 1
Management Software: Portainer Business Edition
Container-Sample where I have this issue: traefik latest, reverse Proxy

docker-compose.yml

version: "3.3"

services:
  traefik:
#    dns:
#      - "1.1.1.1"
#      - "8.8.8.8"
    image: traefik:latest
    restart: always
    container_name: traefik
    environment: 
        CF_DNS_API_TOKEN: 'mytoken'

#        TRAEFIK_CERTIFICATESRESOLVERS_MYRESOLVER_ACME_DNSCHALLENGE_DELAYBEFORECHECK: 120
    command:

      - --api.insecure=true # <== Enabling insecure api, NOT RECOMMENDED FOR PRODUCTION
      - --api.dashboard=true # <== Enabling the dashboard to view services, middlewares, routers, etc.
      - --api.debug=true # <== Enabling additional endpoints for debugging and profiling
      - --log.level=TRACE # <== Setting the level of the logs from traefik
      - --providers.docker=true # <== Enabling docker as the provider for traefik
      - --providers.docker.exposedbydefault=false # <== Don't expose every container to traefik
      - --providers.docker.network=web # <== Operate on the docker network named web
      - --entrypoints.web.address=192.168.178.3:80
      - --entrypoints.websecure.address=192.168.178.3:443
      #DNS Challenge
      - --certificatesresolvers.myresolver.acme.dnschallenge=true
      - --certificatesresolvers.myresolver.acme.dnschallenge.provider=cloudflare
      
      # ACME Base
      - --certificatesresolvers.myresolver.acme.email=postmaster@mydomain.com
      - --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
      - --entrypoints.websecure.http.tls=true
      - --entrypoints.websecure.http.tls.certresolver=myresolver
      - --entrypoints.websecure.http.tls.domains[0].main=mydomain.com
      - --entrypoints.websecure.http.tls.domains[0].sans=*.mydomain.com
      - --serverstransport.insecureskipverify=true


    volumes:
      - /var/run/docker.sock:/var/run/docker.sock # <== Volume for docker admin
      - /share/ContainerStation/persistent/traefik/dynamic.yaml:/dynamic.yaml # <== Volume for dynamic conf file, **ref: line 27
      - /share/ContainerStation/persistent/traefik/config.yml:/config.yml
      - /share/ContainerStation/persistent/traefik/letsencrypt:/letsencrypt
      - /share/ContainerStation/persistent/traefik/certs:/certs:ro
      - /share/ContainerStation/persistent/traefik/certs.yml:/certs.yml
      - /share/ContainerStation/persistent/traefik/entrypoint.sh:/entrypoint.sh
    networks:
       web: # <== Placing traefik on the network named web, to access containers on this network
       qnet-static-eth1-b03c93: # <== Static IP in server dmz
          ipv4_address: 192.168.178.3
    labels:
      - "traefik.enable=true" # <== Enable traefik on itself to view dashboard and assign subdomain to$
      - "traefik.http.routers.api.rule=Host(`monitor.mydomain.com`)" # <== Setting the domain for the d$
      - "traefik.http.routers.api.service=api@internal" # <== Enabling the api to be a service to acce$
networks:
  web:
    external: true
  qnet-static-eth1-b03c93:
    external: true

many thanks in advance

Not sure about the troubleshooting, or the reason the container takes a while to launch, but you can use define a healthcheck, and have the dependent services only launch once it is healthy

good hint. But I think I don’t really understand how to implement it.
Because the examples I found, always use a second container, which checks the 1st one if its up. But I must get sure, that the entrypoint.sh waits for execution until the network of the started container is fully functional.

Additionally there is a the strange fact, that I must ping out of my problem container, to get a proper network connection within around 30 seconds. If the ping isn’t executed it could take several minutes until the container is reachable… why ever …very strange…

I’m not entirely sure I understand what you mean about needing a secondary container

Basically the way the healthcheck works is you set a script that runs every interval, if that script exits with an error, the container is considered unhealthy.
You can define that healthcheck on the container which takes a while to launch, you can have it ping itself (localhost), or whatever is necessary to actually check the service has started.

Then, you can have your other services depend on that one being marked as healthy, so that they only start once it has already finished launching

As for the specifics, that’d depend on your application and project structure, but that’s the gist of it

Ok, I think I have an imagination problem :smiley:
I know, if I have a stack of several services, there is the possibility to set dependencies between these services. But in my case it’s a stack of only ONE service/container. So I must get sure, that the entrypoint.sh isn’t starting before the network inside this container is working properly.

I found examples like this here:

version: "2.1"
services:
    api:
        build: .
        container_name: api
        ports:
            - "8080:8080"
        depends_on:
            db:
                condition: service_healthy
    db:
        container_name: db
        image: mysql
        ports:
            - "3306"
        environment:
            MYSQL_ALLOW_EMPTY_PASSWORD: "yes"
            MYSQL_USER: "user"
            MYSQL_PASSWORD: "password"
            MYSQL_DATABASE: "database"
        healthcheck:
            test: ["CMD", "mysqladmin" ,"ping", "-h", "localhost"]
            timeout: 20s
            retries: 10

But here the “api” container waits until “db” is up. That’s clear to me. But doesn’t cover my issue.

Something like this here isn’t working, because the dependency would be the container itself what results in a “hen egg problem”.

version: "3.3"

services:
  traefik:
#    dns:
#      - "1.1.1.1"
#      - "8.8.8.8"
    image: traefik:latest
    restart: always
    container_name: traefik
    environment: 
        CF_DNS_API_TOKEN: 'mytoken'

#        TRAEFIK_CERTIFICATESRESOLVERS_MYRESOLVER_ACME_DNSCHALLENGE_DELAYBEFORECHECK: 120
    healthcheck:
        test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://192.168.178.10:60080/cgi-bin"]
        timeout: 10s
         retries: 100
    depends_on:
         traefik:
             condition: service_healthy
    command:
...

What I need is a mechanism, which starts the container, but postpones the entrypoint.sh until the network is up.

I would say I hope it is Docker Compose v2, but based on your shared code snippets, I don’t think so. So make sure you are using Docker Compsoe v2, the only supported compose. Sorry for not linking due to my attempt to quickly respond, but a google search should give you the answer quickly.

How do you know that? Are you sure that the container is the problem and not “the outer world”?
Once the container is started it should immediately have everything in place, including the network.

Then change the entrypoint. You cannot delay the entrypoint itself as it is part of the command that creates the process to run in the container. There is no running container without an already running process as the container is basically the isolation of that process. What you can do is change the entrypoint script and add a loop in which you add a second delay for example or more if needed, and try the command until it succeeds.