I have a swarm with two nodes on it, I’m able deploy my stack to the swarm but I’m having trouble with one of my services that require network access.
The compose section for the service looks like this:
# Icecream daemon
# Allows the host to be used as a build node for the scheduler
icecc-daemon:
image: git.example.com/devops/docker-services/icecc-daemon
build:
context: ./
dockerfile: services/icecc-daemon.dockerfile
restart: unless-stopped
ports:
- "8766:8766"
- "10245:10245"
depends_on:
- "icecc-scheduler"
deploy:
mode: global
and it’s dockerfile looks like this
# Use dev container because it already has cross compilers
FROM ubuntu:focal
RUN apt-get update \
&& apt-get install -y \
icecc \
build-essential \
libncurses-dev \
libssl-dev \
libelf-dev \
libudev-dev \
libpci-dev \
libiberty-dev \
&& apt-get autoclean \
&& rm -rf \
/var/lib/apt/lists/* \
/var/tmp/* \
/tmp/*
EXPOSE 10245
EXPOSE 8766
ENTRYPOINT [ "iceccd", "-vvv", "-n", "focal" ]
after deploying the stack 2 replicas of this services are created, as expected, yet only the service on the manager machine is able to connect to the scheduler(another service in the stack).
The service running on the worker node gives this error in it’s logs
It’s acting like it can’t get access to the host machine’s network which doesn’t make sense? How can the one of the manager access the host but the one on the worker can’t?
Seems like a functionality of the application inside the container needs one or more capabilites that are not available for the container. Usualy they are found in the documentation of the application (or can be identified by tools) and need to be added to the compose file using “cap_add”.
Since docker-ce 20.10,0, capabilites are supported with swarm deployments:
Please ignore that the compose file version 3 reference claims cap_add is not available for swarm deployments, it is just an inconsistency where the documentation did not catch up with the implementation
I guess it mgiht work with this capabilites added:
# Icecream daemon
# Allows the host to be used as a build node for the scheduler
icecc-daemon:
image: git.example.com/devops/docker-services/icecc-daemon
build:
context: ./
dockerfile: services/icecc-daemon.dockerfile
restart: unless-stopped
ports:
- "8766:8766"
- "10245:10245"
depends_on:
- "icecc-scheduler"
deploy:
mode: global
cap_add:
- NET_ADMIN
- NET_BROADCAST
If it works try to remove one of both capabilites and see if it still works with just one of them - or if it requires both. Though, it might as well require additional capabilites.
Make sure you use a version “3.8” or “3.9” for your compose file, as the docker compose version 3 reference does not indicate for which versions of the compose file schema this configuration element is valid for swarm stack deployments.
I applied those capabilities and now when I deploy it’ll only deploy the daemon as 1/1 rather then of 2 (still 2 nodes on the swarm) I have to make the worker node leave and rejoin for it to detect and deploy to it but after a few seconds it’ll go back to 1/1
Since you didn’t specify a network, swarm should create a swarm scoped default network (by default they are called {stack name}_default)
I have no idea about which capabiliites you use and how each one of these affects the host kernel, but you will want to apply least privilges here and only use the capabilities that are necessary.
I could undestand if you used the host network for the services, and the capabiliites would result in the host interface beeing modified from the container… but it doesn’t seem like this is the case.
Thus said: I have no idea why you experience what you experience, but it doesn’t make sense to me that you experience it at all.
The first time I used the --with-registry-auth flag to deploy, I forgot that. After adding that flag back I’m still getting the same error of it not being able to connect to the scheduler.
Though, swarm containers do not support privilged mode and they never will even
...
cap_add:
- ALL
...
will not result in the same capabilites a priliged container has.
I can’t realy help you with your problem other than pointing you in the direction that your problem is caused by missing capabities and that you need to add them to the container.