Docker UDP NAT bug

Expected behavior

Separate containers using UDP get separate NAT addresses.

Actual behavior

Both containers are allocated the same source port in the NAT. The NAT has no way to tell which container a reply packet is destined for.

Information

  • the output of:
    Version 1.12.0-rc2-beta17 (build: 9779)
    ff18c0c63c5ff3c4a4a925d191d5592d655779d7

  • host distribution and version ( OSX 10.10.x, OSX 10.11.x, Windows, etc )
    OSX El Capitan

Steps to reproduce the behavior

  1. On the host network interface of the mac, run tcpdump
  2. In container A, run
    netcat -p 8888 -u 1.1.1.1 8888
  3. Type a few lines to send packets
  4. Repeat netcat operation for container B
  5. In tcpdump output, note that all packet originate from the same IP and port. There is no distinction between the two containers. For bidirectional protocols, the NAT Is unable to return replies to the correct container.

14:34:54.276409 IP hostname.61990 > 1.1.1.1.ddi-udp-1: UDP, length 5
14:35:52.245710 IP hostname.61990 > 1.1.1.1.ddi-udp-1: UDP, length 5

This NAT implementation is broken for UDP. Every combination of private source address and private source port should be mapped to a unique external source address and port. (It is not necessary, and in fact not desirable, to map to a separate source port per destination address - so called “symmetric NAT” versus “full cone”. “Full cone” is greatly preferable for NAT traversal techniques.)

Looking on the xHyve host, it seems that docker sets up MASQUERADE in iptables in order to set up NAT functionality.

Adding the ‘–random’ flag to the iptables rule should resolve this NAT issue. E.g.

iptables -t nat -A POSTROUTING -s 172.17.0.0/16 \! -o docker0 -j MASQUERADE --random

Actually, I have to withdraw that assertion. The problem is not within the linux VM, it’s in how the linux VM is bridge to the host networking.

Looking at tcpdump inside the xhyve host, you can see that two containers have been NAT’ted to separate source ports:

root@moby:~# tcpdump -i eth0 host 1.1.1.1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 16:22:12.838729 IP 192.168.65.2.15547 > 1.1.1.1.8888: UDP, length 5 16:22:15.672820 IP 192.168.65.2.15548 > 1.1.1.1.8888: UDP, length 5

Listening at the same time on the mac’s native ethernet, you can see that these two packets have had their source port altered again, and both to the same port:

$ sudo tcpdump -i en4 host 1.1.1.1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en4, link-type EN10MB (Ethernet), capture size 262144 bytes
11:22:41.324018 IP hostname.56249 > 1.1.1.1.ddi-udp-1: UDP, length 5
11:22:43.382804 IP hostname.56249 > 1.1.1.1.ddi-udp-1: UDP, length 5`

So the first layer of address translation that happens inside linux is fine. It’s the xhyve NAT that’s broken.

Final followup - I updated to the Docker stable release for Mac, and found that this issue is already resolved. Thanks!