I’m trying to create two services in Docker swarm that can TFTP files between them.
The docker-compose.yml file I provide to swarm is as follows:
version: "3"
services:
tftp-server:
image: tftp-server
ports:
- "69/udp"
- "40000:40000/udp"
- "40001:40001/udp"
tftp-client:
image: tftp-client
And I deploy it with:
docker stack deploy -c docker-compose.yml tftp-example
My tftp-server image is built from the following Dockerfile:
FROM alpine:latest
RUN apk update; \
apk add tftp-hpa tcpdump
RUN mkdir /tftpd; \
touch /tftpd/test.txt; \
chmod -R 777 /tftpd
EXPOSE 69/udp
EXPOSE 40000/udp
EXPOSE 40001/udp
CMD in.tftpd -4 -Lvvv -R 40000:40001 --address 0.0.0.0:69 /tftpd
and my client is built from the following Dockerfile:
FROM alpine:latest
RUN apk update; \
apk add tftp-hpa
RUN echo "hello world" >> test.txt
CMD tail -f /dev/null
Once deployed, I exec into the client with the following command:
docker container exec -it $(docker container ls -aqf name=client) sh
And attempt a TFTP transfer with:
tftp -vvv tftp-example_tftp-server 69 -R 40000:40001 -c put /test.txt /tftpd/test.txt
With Docker version 17.03.1-ce, build c6d412e running on CentOS 7, this works as expected. The tftp command runs successfully with the following output:
Connected to tftp-example_tftp-server (10.0.0.4), port 69
putting /test.txt to tftp-example_tftp-server:/tftpd/test.txt [netascii]
Sent 13 bytes in 0.0 seconds [7217 bit/s]
running tcpdump -i any
within the tftp-server container shows the following traffic which confirms it’s running successfully. From the output, it appears the two containers are talking directly.
10:35:48.453304 eth2 In IP tftp-example_tftp-client.1.nz949j5gydq45moyekqz0g2fj.tftp-example_default.40000 > c1a95db058b1.69: TFTP, length 27, WRQ "/tftpd/test.txt" netascii
10:35:48.453834 eth2 Out IP c1a95db058b1.40000 > tftp-example_tftp-client.1.nz949j5gydq45moyekqz0g2fj.tftp-example_default.40000: UDP, length 4
10:35:48.453907 eth2 In IP tftp-example_tftp-client.1.nz949j5gydq45moyekqz0g2fj.tftp-example_default.40000 > c1a95db058b1.40000: UDP, length 17
10:35:48.454006 eth2 Out IP c1a95db058b1.40000 > tftp-example_tftp-client.1.nz949j5gydq45moyekqz0g2fj.tftp-example_default.40000: UDP, length 4
However, on Rocky 9, running Docker version 24.0.6, build ed223bc, with the exact same docker-compose.yml and Dockerfiles, the tftp command hangs, times out and seg faults () with the following output:
Connected to tftp-example_tftp-server (10.0.3.7), port 69
putting /test.txt to tftp-example_tftp-server:/tftpd/test.txt [netascii]
Transfer timed out.
Segmentation fault (core dumped)
The tcpdump -i any
output when run within the tftp-server container is as follows:
10:42:35.951161 eth1 B ARP, Request who-has 10.0.3.7 tell tftp-example_tftp-client.1.jh6hjhn9qwu3oiz0ac8ni0e6v.tftp-example_default, length 28
10:42:35.951291 eth1 B ARP, Request who-has 1cfd400d96d7 tell 10.0.3.4, length 28
10:42:35.951303 eth1 Out ARP, Reply 1cfd400d96d7 is-at 02:42:0a:00:03:08 (oui Unknown), length 28
10:42:35.951313 eth1 In IP 10.0.3.4.40000 > 1cfd400d96d7.69: TFTP, length 27, WRQ "/tftpd/test.txt" netascii
10:42:35.951894 eth1 Out IP 1cfd400d96d7.40000 > 10.0.3.4.40000: UDP, length 4
10:42:35.951922 eth1 In IP 10.0.3.4 > 1cfd400d96d7: ICMP 10.0.3.4 udp port 40000 unreachable, length 40
In this case, 1cfd400d96d7 is the containerID for the tftp-server, and interestingly, 10.0.3.4 is the load-balancer that docker swarm creates for the network that both service belong in (in this example, it is given the name lb-tftp-example_default)
According to the TFTP RFC:
The initial request happens over (conventionally) port 69. Then high ephemeral ports are used for the actual file transfer.
So from the tcpdump, it seems the tftp-client service talks to the tftp-server service on port 69, as expected. The server tries to begin file transfer over a high ephemeral port, sending the request to the swarm service load balancer. But rather than the LB forwarding the request onto the tftp-client container, it just returns with “port 4000 (in this case) on LB is unreachable”.
Does anyone know why this is happening? Is this a bug in the newer version of Docker Engine (specifically Swarm?) or is there some configuration that I’m missing to enable this to work?
Any help would be very greatly appreciated!