I’ve run into a curious issue on a series of CoreOS machines which I administer.
Essentially it boils down to the fact that about 1 out of every 20 containers I create has no in/out network access despite having the same configuration as all the others.
I can trivially reproduce this (and get myself a shell inside a broken container) by running:
docker run -it --rm --privileged nicolaka/netshoot sh -c 'getent hosts github.com || sh'
until it drops me into a shell like this:
core@enfigitrun21 ~ $ docker run -it --rm --privileged nicolaka/netshoot sh -c 'getent hosts github.com || sh'
192.30.253.113 github.com github.com
core@enfigitrun21 ~ $ docker run -it --rm --privileged nicolaka/netshoot sh -c 'getent hosts github.com || sh'
/ #
from that shell I can then test things like ifconfig:
eth0 Link encap:Ethernet HWaddr 02:42:0A:08:00:07
inet addr:10.8.0.7 Bcast:10.8.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:23 errors:0 dropped:0 overruns:0 frame:0
TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3785 (3.6 KiB) TX bytes:763 (763.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:672 (672.0 B) TX bytes:672 (672.0 B)
and then pinging the gateway at 10.8.0.1:
/ # ping -c 1 10.8.0.1
PING 10.8.0.1 (10.8.0.1) 56(84) bytes of data.
From 10.8.0.7 icmp_seq=1 Destination Host Unreachable
--- 10.8.0.1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
Disconnecting and reconnecting the bridge network will fix the container although that’s no use in my scenario.
Running strace on a ping statement show that:
recvmsg(3, {msg_namelen=128}, 0) = -1 EAGAIN (Resource temporarily unavailable)
which is presumably because the socket is not available but at that point I’m beyond my expertise.
I’ve got a repro scenario and can get a shell in a broken container so I feel like I should be able to debug this and I’m looking for someone with some tips on how I can attack this problem!