I’m trying to build a custom dockerfile, and I’m finding when I run it docker ps -a, docker-compose up, etc. etc. take forever to run, occasionally timing out. I haven’t had this issue until I’m running this package, and I’m curious the best ways to debug what’s going wrong. If someone knows the solution, so much the better, but I’m also just interested in learning where I can start to diagnose troubles like these.
docker-compose up --verbose yields a tonne of:
compose.parallel.feed_queue: Pending: set([<Service: slurmctld>, <Service: slurmdbd>])
and ultimately terminates with:
compose.parallel.feed_queue: Pending: set([<Service: slurmctld>, <Service: slurmdbd>])
compose.parallel.parallel_execute_iter: Failed: <Service: slurm_mysql>
compose.parallel.feed_queue: Pending: set([<Service: slurmctld>, <Service: slurmdbd>])
compose.parallel.feed_queue: <Service: slurmdbd> has upstream errors - not processing
compose.parallel.parallel_execute_iter: Failed: <Service: slurmdbd>
compose.parallel.feed_queue: Pending: set([<Service: slurmctld>])
compose.parallel.feed_queue: <Service: slurmctld> has upstream errors - not processing
compose.parallel.parallel_execute_iter: Failed: <Service: slurmctld>
compose.parallel.feed_queue: Pending: set([])
If I do a docker ps -a while this hang is going on, docker ps -a also hangs, ultimately giving me a timeout.
I find if I do a docker system prune it often springs back to life immediately, but it’s kinda annoying to do so.
My docker-compose looks like this:
version: "3"
services:
# Slurm MYSQL
slurm_mysql:
image: mysql:5.7
hostname: slurm_mysql
container_name: slurm_mysql
environment:
MYSQL_RANDOM_ROOT_PASSWORD: "yes"
MYSQL_DATABASE: slurm_acct_db
MYSQL_USER: slurm
MYSQL_PASSWORD: password
volumes:
- var_lib_mysql:/var/lib/mysql
restart: unless-stopped
networks:
- nginx_proxy
# Slurm database interface daemon
slurmdbd:
cap_add:
- SYS_ADMIN
- DAC_READ_SEARCH
restart: unless-stopped
networks:
- nginx_proxy
image: slurm-master:17.02.10
command: ["slurmdbd"]
container_name: slurmdbd
hostname: slurmdbd
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- var_log_slurm:/var/log/slurm
expose:
- "6819"
depends_on:
- slurm_mysql
privileged: true
# Slurm controller daemon
slurmctld:
cap_add:
- SYS_ADMIN
- DAC_READ_SEARCH
privileged: true
restart: unless-stopped
networks:
- nginx_proxy
image: slurm-master:17.02.10
command: ["slurm-web"]
container_name: slurmctld
hostname: slurmctld
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
- ./slurm-web:/home/slurm-web
expose:
- 6817
- 80
depends_on:
- "slurmdbd"
build:
context: ..
dockerfile: slurm-master/Dockerfile
environment:
- VIRTUAL_HOST=slurm-web.mywebsite.com
- VIRTUAL_NETWORK=nginx-proxy
- VIRTUAL_PORT=80
- LETSENCRYPT_HOST=slurm-web.mywebsite.com
- LETSENCRYPT_EMAIL=slurm-web@slurm-web.mywebsite.com
volumes:
etc_munge:
etc_slurm:
slurm_jobdir:
var_lib_mysql:
var_log_slurm:
networks:
nginx_proxy:
external: true