I’ve had this problem for a while but I always come to this point not knowing how to fix the problem.
I using docker-compose and have attempted to create an overlay network to connect two containers (running on separate VMs) within a docker swarm. I referenced this section of the official documentation to set this up:
I went ahead and created a swarm and joined the two nodes:
sudo docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
01bpw9tjjlzeyu3ta530piq2e arch160.domain.com Ready Active 20.10.8
5be93cjhrc5pxvmk36jt0h563 * archZFSProxy.domain.com Ready Active Leader 20.10.8
Within the docker-compose file for the manager I have the following:
version: '3.9'
networks:
net:
name: net
driver: bridge
ipam:
config:
- subnet: 10.190.0.0/24
watchtower-ubuntumc:
name: watchtower_ubuntumc
driver: bridge
openldap-net:
name: openldap-net
driver: overlay
attachable: true
ipam:
config:
- subnet: 10.90.0.0/24
I have the following container (openldap) utilizing this network:
services:
openldap:
build:
context: .
dockerfile: Dockerfile-openldap
# image: osixia/openldap-backup:latest
container_name: openldap
labels:
- "com.centurylinklabs.watchtower.enable=false"
- "com.centurylinklabs.watchtower.scope=archzfsproxy"
restart: always
hostname: openldap
domainname: domain.com
networks:
net:
openldap-net:
aliases:
- openldap1
ipv4_address: 10.90.0.2
If I inspect the network list from the manager I have the following:
❯ sudo docker network ls
NETWORK ID NAME DRIVER SCOPE
0afa863d1a38 bridge bridge local
c094888160f3 docker_gwbridge bridge local
6ea931cc3eda host host local
tsji27aqyqku ingress overlay swarm
d8197a60ed27 net bridge local
1037c20ae31f none null local
kqw7j9kxnkk6 openldap-net overlay swarm
b9e36dfe816d watchtower_ubuntumc bridge local
Although the worker node has worked in the past - the container can’t start because the overlay network isn’t reachable. Here are the relevant logs on the worker node:
sudo docker-compose up -d
WARNING: The Docker Engine you're using is running in swarm mode.
Compose does not use swarm mode to deploy services to multiple nodes in a swarm. All containers will be scheduled on the current node.
To deploy your application across the swarm, use `docker stack deploy`.
Starting openldap2 ... error
ERROR: for openldap2 Cannot start service openldap2: Could not attach to network zc17lbud1gsrr7amkrj72pjvc: rpc error: code = NotFound desc = network zc17lbud1gsrr7amkrj72pjvc not found
ERROR: for openldap2 Cannot start service openldap2: Could not attach to network zc17lbud1gsrr7amkrj72pjvc: rpc error: code = NotFound desc = network zc17lbud1gsrr7amkrj72pjvc not found
ERROR: Encountered errors while bringing up the project.
So its trying to find the docker network (which I’m guessing is the overlay network designated by zc17lbud1gsrr7amkrj72pjvc. Where is it getting this network ID??
Here are networks as seen by the worker container:
╰─ sudo docker network ls ─╯
NETWORK ID NAME DRIVER SCOPE
15ae93d56fa3 bridge bridge local
315bfa9f2ade docker-net bridge local
03274edc9e94 docker_gwbridge bridge local
5969c9f024f2 host host local
tsji27aqyqku ingress overlay swarm
bde961b8ece2 none null local
Here are sections of my docker-compose file for the worker node:
---
version: '3.9'
networks:
docker-net:
name: docker-net
driver: bridge
ipam:
config:
- subnet: 10.160.0.0/24
openldap-net:
external: true
name: openldap-net
driver: overlay
services:
openldap2:
# image: osixia/openldap-backup:1.4.0
build:
context: .
dockerfile: Dockerfile
container_name: openldap2
hostname: openldap2
domainname: domain.com
restart: unless-stopped
networks:
docker-net:
openldap-net:
aliases:
- openldap2
ipv4_address: 10.90.0.4
So I’m stumped. The worker container wont start because it’s looking for a specific network. This type of error usually happens when I have the VM’s up and running and then I manually restart the hosts or turn off the hypervisors. When restarting the containers up cold – I get this type of error with the worker container not being able to start.
How do I debug this issue further??