I have the problem, that it takes up to one minute until container can communicate to the outer world, allthough it’s state is “running”. The site effect of this is, that containers which ned things from external in entrypoint.sh will crash, as the can’t download or communicate with their external resources.
Is there a way to troubleshoot a containers network initialization?
My Infrastructure:
Containerhost: QNAP with Intel i5 8400T and 64GB RAM with latest QTS and Container Station using Docker-Compose
Storage: nvme SSD Raid 1
Management Software: Portainer Business Edition
Container-Sample where I have this issue: traefik latest, reverse Proxy
docker-compose.yml
version: "3.3"
services:
traefik:
# dns:
# - "1.1.1.1"
# - "8.8.8.8"
image: traefik:latest
restart: always
container_name: traefik
environment:
CF_DNS_API_TOKEN: 'mytoken'
# TRAEFIK_CERTIFICATESRESOLVERS_MYRESOLVER_ACME_DNSCHALLENGE_DELAYBEFORECHECK: 120
command:
- --api.insecure=true # <== Enabling insecure api, NOT RECOMMENDED FOR PRODUCTION
- --api.dashboard=true # <== Enabling the dashboard to view services, middlewares, routers, etc.
- --api.debug=true # <== Enabling additional endpoints for debugging and profiling
- --log.level=TRACE # <== Setting the level of the logs from traefik
- --providers.docker=true # <== Enabling docker as the provider for traefik
- --providers.docker.exposedbydefault=false # <== Don't expose every container to traefik
- --providers.docker.network=web # <== Operate on the docker network named web
- --entrypoints.web.address=192.168.178.3:80
- --entrypoints.websecure.address=192.168.178.3:443
#DNS Challenge
- --certificatesresolvers.myresolver.acme.dnschallenge=true
- --certificatesresolvers.myresolver.acme.dnschallenge.provider=cloudflare
# ACME Base
- --certificatesresolvers.myresolver.acme.email=postmaster@mydomain.com
- --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
- --entrypoints.websecure.http.tls=true
- --entrypoints.websecure.http.tls.certresolver=myresolver
- --entrypoints.websecure.http.tls.domains[0].main=mydomain.com
- --entrypoints.websecure.http.tls.domains[0].sans=*.mydomain.com
- --serverstransport.insecureskipverify=true
volumes:
- /var/run/docker.sock:/var/run/docker.sock # <== Volume for docker admin
- /share/ContainerStation/persistent/traefik/dynamic.yaml:/dynamic.yaml # <== Volume for dynamic conf file, **ref: line 27
- /share/ContainerStation/persistent/traefik/config.yml:/config.yml
- /share/ContainerStation/persistent/traefik/letsencrypt:/letsencrypt
- /share/ContainerStation/persistent/traefik/certs:/certs:ro
- /share/ContainerStation/persistent/traefik/certs.yml:/certs.yml
- /share/ContainerStation/persistent/traefik/entrypoint.sh:/entrypoint.sh
networks:
web: # <== Placing traefik on the network named web, to access containers on this network
qnet-static-eth1-b03c93: # <== Static IP in server dmz
ipv4_address: 192.168.178.3
labels:
- "traefik.enable=true" # <== Enable traefik on itself to view dashboard and assign subdomain to$
- "traefik.http.routers.api.rule=Host(`monitor.mydomain.com`)" # <== Setting the domain for the d$
- "traefik.http.routers.api.service=api@internal" # <== Enabling the api to be a service to acce$
networks:
web:
external: true
qnet-static-eth1-b03c93:
external: true
Not sure about the troubleshooting, or the reason the container takes a while to launch, but you can use define a healthcheck, and have the dependent services only launch once it is healthy
good hint. But I think I don’t really understand how to implement it.
Because the examples I found, always use a second container, which checks the 1st one if its up. But I must get sure, that the entrypoint.sh waits for execution until the network of the started container is fully functional.
Additionally there is a the strange fact, that I must ping out of my problem container, to get a proper network connection within around 30 seconds. If the ping isn’t executed it could take several minutes until the container is reachable… why ever …very strange…
I’m not entirely sure I understand what you mean about needing a secondary container
Basically the way the healthcheck works is you set a script that runs every interval, if that script exits with an error, the container is considered unhealthy.
You can define that healthcheck on the container which takes a while to launch, you can have it ping itself (localhost), or whatever is necessary to actually check the service has started.
Then, you can have your other services depend on that one being marked as healthy, so that they only start once it has already finished launching
As for the specifics, that’d depend on your application and project structure, but that’s the gist of it
Ok, I think I have an imagination problem
I know, if I have a stack of several services, there is the possibility to set dependencies between these services. But in my case it’s a stack of only ONE service/container. So I must get sure, that the entrypoint.sh isn’t starting before the network inside this container is working properly.
I would say I hope it is Docker Compose v2, but based on your shared code snippets, I don’t think so. So make sure you are using Docker Compsoe v2, the only supported compose. Sorry for not linking due to my attempt to quickly respond, but a google search should give you the answer quickly.
How do you know that? Are you sure that the container is the problem and not “the outer world”?
Once the container is started it should immediately have everything in place, including the network.
Then change the entrypoint. You cannot delay the entrypoint itself as it is part of the command that creates the process to run in the container. There is no running container without an already running process as the container is basically the isolation of that process. What you can do is change the entrypoint script and add a loop in which you add a second delay for example or more if needed, and try the command until it succeeds.