Hmm, my apologies, I have missed a small thing in my testing - it appears DNS resolution between Swarm containers is the thing that isn’t working correctly.
The containers are meant to talk to each other via name.
No, they don’t have overlapping IP ranges (as docker service inspect shows).
Looking at the logs of paperless-mvp_paperless, i get this error “Error: Error -2 connecting to redis:6379. Name or service not known…”, whereas I would expect it to work correctly.
public was created by docker network create -d overlay public.
You can always check the effective compose configuration with docker compose config.
I created the compose file in a folder called ab-network-test, that’s why the (project) name is ab-network-test:
Note: the null setting for the networks, just means there is no further customization (like setting an alias name). It is the default value.
So both services share a common network: the default network. Therefore, the dns-based service discovery should be able to resolve the containers by the service name, and the containers should be able to communicate using the default network.
Make sure there is no other redis service attached to the public network, otherwise the paperless service might pick randomly between the service from the public and default network.
Is it possible that the redis container is crashing? You have no restart policies configure for your services, thus if they die, they stay dead.
And I personally wouldn’t use depends_on when going into a larger Searm cluster. This is great for single host, but I wouldn’t use it in a distributed environment. But that’s just my personal opinion.
Is it? The default network is added to a service by default, unless the service’s network is specifically configured → then it must be explicitly added, like seen with the paperless service.
I just deployed the stack and tested it: docker run -it --net container:$(docker ps -q --filter=name=paperless) nicolaka/netshoot ping redis
Of course, it’s working
If this doesn’t work for you, then there must be something wrong with the overlay communication.
Could be easily tested by sticking the services to the same node using deplyoment constraints.
As long as I remember, depends_on was not implemented for Swarm services. Did they implement it recently?
Thank you both for your help. Upon further investigation, I noticed that there is another root problem with my docker swarm setup, that I believe is causing this issue (as you correctly surmised haha). On every worker node, dockerd is failing to communicate to other nodes over port 7946, which would explain the inability to communicate between services in instances where each service is deployed on a different node.
This is happening independently of Linux distribution.
I only see in netstat -tulpn:
tcp6 0 0 :::7946 :::* LISTEN 22249/dockerd
And no ipv4 equivalent. Changing --listen-addr and --advertise-addr on each node does not help.
Not sure where to go from here.
EDIT: so, doing --listen-addr 192.168.x.y instead of --listen-addr 0.0.0.0, where 192.168.x.y is the IP address of the node, does seem to work ?!