Any tools for debugging Swarm Overlay Network?

I’ve been trying to debug a network connection issue between two containers in a Swarm overlay network. Does anyone know any tools I’m not using to check connectivity or overlay network settings between the two containers?

The two containers (let’s call them A and B) are on separate ec2 instances in AWS. There’s also a 3rd container © in the same overlay network on a 3rd ec2 instance.

Here’s the behavior:

  • B has a service running on an open port.
  • A cannot reach B at all: telnet to B’s overlay network IP times out
  • C telnet to B’s service on the exposed port
  • B also cannot reach A

What I’ve checked:

Version Info:docker version

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:06:06 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:06:06 2017
 OS/Arch:      linux/amd64
 Experimental: false

Some more things I found – this is a great post on docker container networking: http://securitynik.blogspot.com/2016/12/docker-networking-internals-how-docker_16.html

I think I’ve also narrowed it down to the VXLAN for the docker overlay network I’m using. I believe it’s not forwarding traffic to the destination container because I see traffic from container A to container B on the VXLAN.

$:/home/ubuntu# docker network ls | grep some_network_name
rwr3wp7zmfr9        some_network_name   overlay             swarm

$: iptables -v -L DOCKER-OVERLAY | grep rwr3wp7zmfr9
262M  220G rwr3wp7zmfr9  all  --  any    ov-001002-rwr3w  anywhere             anywhere

$: brctl show ov-001002-rwr3w
ov-001002-rwr3w		8000.561fc7cf8fe0	no		veth53ca711
						                        vethd811b76
						                        vx-001002-rwr3w

<not shown -- ping 10.200.0.10 from 10.200.0.5 container where *.5 is container A and *.10 is container B>

$: tcpdump -ni vx-001002-rwr3w host 10.200.0.5 and host 10.200.0.10
23:29:57.814807 IP 10.200.0.5 > 10.200.0.10: ICMP echo request, id 1034, seq 4, length 64
...

As far as I’ve got so far. FYI for anyone else digging in the same spot.

and the security groups of the ecs instances has those ports open for traffic on all containers?

and they are in the same VPC?

Yea ports are open (2377, 7946, 4789) and containers in same VPC.

Also more info…given this setup:

host1: container JM and container TM1 on an overlay network
host2: container TM2 on same overlay network

IPs: 
JM: 10.200.0.10
TM1: 10.200.0.4
TM2: 10.200.0.5

host1@ip-192-168-10-69:/home/ubuntu# brctl show ov-001002-rwr3w
bridge name	bridge id		STP enabled	interfaces
ov-001002-rwr3w		8000.121799ebd445	no		veth3583c97
							                    veth563db27
							                    vethed78dd1
							                    vx-001002-rwr3w

host2@ip-192-168-10-72:/home/ubuntu# brctl show ov-001002-rwr3w
bridge name	bridge id		STP enabled	interfaces
ov-001002-rwr3w		8000.561fc7cf8fe0	no		veth53ca711
							                    vethd811b76
							                    vx-001002-rwr3w

ov-001002-rwr3w is the virtual ethernet interface for the overlay network the containers are on and is present on both host1 and host2. vx-001002-rwr3w is the VXLAN interface/tunnel that connects the two hosts. (there’s also 2 additional containers sitting on the network - reason for extra veth interfaces)

I have the following behavior:

A ping from TM2 container (on host2) to TM1 (on host1) results in traffic on host1’s vx interface and host2’s vx interface:

TM2container:/# ping 10.200.0.4
PING 10.200.0.4 (10.200.0.4) 56(84) bytes of data.
64 bytes from 10.200.0.4: icmp_seq=1 ttl=64 time=0.449 ms
64 bytes from 10.200.0.4: icmp_seq=2 ttl=64 time=0.266 ms

TM2host@ip-192-168-10-72:/home/ubuntu# tcpdump -ni vx-001002-rwr3w host 10.200.0.5 and host 10.200.0.4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vx-001002-rwr3w, link-type EN10MB (Ethernet), capture size 262144 bytes
00:41:12.297030 ARP, Request who-has 10.200.0.4 tell 10.200.0.5, length 28
00:41:12.297039 ARP, Reply 10.200.0.4 is-at 02:42:0a:c8:00:04, length 28
00:41:12.297101 IP 10.200.0.5 > 10.200.0.4: ICMP echo request, id 1180, seq 1, length 64
00:41:12.297409 IP 10.200.0.4 > 10.200.0.5: ICMP echo reply, id 1180, seq 1, length 64

TM1host@ip-192-168-10-69:/home/ubuntu# tcpdump -ni vx-001002-rwr3w host 10.200.0.5 and host 10.200.0.4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vx-001002-rwr3w, link-type EN10MB (Ethernet), capture size 262144 bytes
00:38:34.594879 IP 10.200.0.5 > 10.200.0.4: ICMP echo request, id 1180, seq 1, length 64
00:38:34.595024 ARP, Request who-has 10.200.0.5 tell 10.200.0.4, length 28
00:38:34.595026 ARP, Reply 10.200.0.5 is-at 02:42:0a:c8:00:05, length 28
00:38:34.595042 IP 10.200.0.4 > 10.200.0.5: ICMP echo reply, id 1180, seq 1, length 64
00:38:35.594390 IP 10.200.0.5 > 10.200.0.4: ICMP echo request, id 1180, seq 2, length 64

A ping from TM2 container to JM results in traffic only on host2’s vx interface:

TM2container:/# ping 10.200.0.10
PING 10.200.0.10 (10.200.0.10) 56(84) bytes of data.
^C
--- 10.200.0.10 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4032ms

TM2host@ip-192-168-10-72:/home/ubuntu# tcpdump -ni vx-001002-rwr3w host 10.200.0.5 and host 10.200.0.10
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vx-001002-rwr3w, link-type EN10MB (Ethernet), capture size 262144 bytes
00:43:12.008634 IP 10.200.0.5.42241 > 10.200.0.10.6123: Flags [S], seq 1438398978, win 27680, options [mss 1384,sackOK,TS val 1188301660 ecr 0,nop,wscale 2], length 0
00:43:15.033797 IP 10.200.0.5 > 10.200.0.10: ICMP echo request, id 1185, seq 1, length 64
00:43:16.042764 IP 10.200.0.5 > 10.200.0.10: ICMP echo request, id 1185, seq 2, length 64

TM1host@ip-192-168-10-69:/home/ubuntu# tcpdump -ni vx-001002-rwr3w host 10.200.0.5 and host 10.200.0.10
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vx-001002-rwr3w, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

I feel like the VXLAN is incorrectly mapping between TM2 and JM for some reason, but can’t get at how or why.

Any progress here, @johnwang412? I could use the help.