[Resolved] Linked container regularly becomes unreachable

zommerfelds · April 6, 2016, 7:19am

Hi. I’m having an issue with a linked container becoming unreachable from another container. I don’t know exactly at what layer the problem lies, but here is a description of my system:

I’m using Docker Cloud with one instance from AWS. I have a stack with a Postgres container (postgres:latest) and my own container for a simple Play web app that is linked to the Postgres container (links: - database). Everything works fine for a while.

For the last 5 days, I’m getting a notification every day around 07:00 UTC that the Play container stopped with exit code 255. The Play app crashes saying “Cannot connect to database [default]”, “Caused by: java.net.ConnectException: Connection timed out”. If I connect to the Play container and try to ping “database” I get

PING database.4d087578-b7f7-440e-85dc-315e6882ff2a.local.dockerapp.io (10.7.0.1): 56 data bytes ^C --- database.4d087578-b7f7-440e-85dc-315e6882ff2a.local.dockerapp.io ping statistics --- 6 packets transmitted, 0 packets received, 100% packet loss

I redeploy the database container every time this happens, and after that it becomes reachable again. Next time this happens I will take a look at the logs of the Postgres container before I redeploy, but to me it seems to be an issue with the whole container, not with Postgres itself.

Any pointers would be useful, as I’m new to Docker.

Regards,
Christian Z.

zommerfelds · April 7, 2016, 6:50am

Update:

There is nothing in the Postgres log (except for the previous restart). Also, I discovered that the database container is reachable under its IP address, but not under its hostname (from within the web app container). Why is the hostname not usable?

EDIT: This information might be wrong as sometimes the system suddenly starts working again and I might have thought that it’s a DNS issue. From the further posts it looks more like an IP issue.

zommerfelds · April 11, 2016, 8:13am

Update 2: I terminated my cluster (with one node) and started my stack in a new one. This solved the issue for at least one day now.

zommerfelds · April 19, 2016, 8:35am

Still having the same issue with linked containers. Even after it being OK for a few days.

kimfiedler · April 19, 2016, 8:45am

I have the exact same problem. Similar setup with only one node and my database is mongodb instead.
If you do a docker logs weave on your node do you also see loggings like this at around the time of the problem?

[allocator XX:XX:XX:XX:XX:XX] Ignored address 10.7.0.8 claimed by XXX - not in our universe

I have no idea if it’s really related but 10.7.0.8 is the IP of the container having issue connecting to my linked mongodb container…

zommerfelds · April 19, 2016, 9:01am

Hi kimfiedler,

Yeah, I do have exactly the same entry in the log! The IP belongs to the web app container that can’t access the DB. The times where this line is repeated seem to match up exactly with the times where the DB container is unreachable.

To the contrary of one of my earlier messages, the DB container was not reachable even with the IP address, so it might not be a DNS issue.

I hope that we can get this fixed, as this is a showstopper for me using Docker Cloud.

P.S. Found this, which is an old issue that might be related: https://github.com/tutumcloud/weave-daemon/issues/34

ziontech · April 20, 2016, 2:12pm

Seem to be experiencing the same issue here too, some containers seem to demonstrate the issue more than others. I’ve also got the same log entries near the times the instance connectivity drops.

zommerfelds · April 22, 2016, 7:26am

Thanks for sharing!

When the problem came back again, if I issue docker restart weave or docker restart weave-xxxxx.xxxxxxxxx I can temporarily ping the linked machine until it auto restarts. Once it restarts the problem comes back again. If I do a full stack redeploy it usually helps.
Can a Docker dev or an experienced user please give us some pointers or let us know where we should file a bug? Thanks

zommerfelds · May 19, 2016, 7:01am

Did anyone find a solution? I am gonna have to stop using Docker Cloud because of this issue

ziontech · May 19, 2016, 7:26pm

Nope - I’d love to find out whats happening, whether its an issue with our setup or with weave. I think its memory usage related, I’ve noticed that some of our intensive cron jobs that occur over the weekend can (seemingly) make the container unreachable from the parent linked container, they will only come back when we do a redeploy.

zommerfelds · May 20, 2016, 8:30am

Yeah, it could be memory related. Thanks for the hint. I just checked my machine and it is pretty low on memory. So maybe the issue is that there should be an error message if weave (or whatever it is) runs out of memory.

vidsyhq · July 15, 2016, 11:03am

Experiencing this issue w/ our Go services at the moment.

– @revett

zommerfelds · July 15, 2016, 12:07pm

Have you checked your memory usage? I ended up moving away from Docker Cloud and just running Docker manually on one node (and it works fine now). I’m guessing that with Docker Cloud one should provision a bit more memory than for a bare setup.

vidsyhq · July 15, 2016, 6:45pm

@zommerfelds memory is fine on all nodes

jcrombez · July 18, 2016, 6:51pm

Same problem here, since today, all my container links get broken randomly but all at the same time.

I have 4 web containers linked to 4 mysql containers (one for each) and a redis container. When the problem happen, all my apps are logging errors because they can’t reach their databse or their cache store, and i can’t ping any linked container using its name (but i can with its ip).

The first time the issue went away by itself after about an hour of down time.
As i write this post, it’s down again…

I can’t find what triggers the situation… i don’t see anything weird in the server monitoring when the links stop working.

jcrombez · July 18, 2016, 7:08pm

I think this topic is related : [RESOLVED] Dockerapp.io DNS Down?

borja · September 28, 2016, 1:00am

The latest Docker Cloud release is now available with support for Docker Engine 1.11.2-cs5, which introduces service discovery and DNS improvements, along with more reliable networking between containers.

For more information on this release and how to upgrade nodes to Docker Engine 1.11.2-cs5, check out: Docker Cloud Release Notes (09/27/2016)

mercstudio · October 11, 2017, 12:31pm

still happening on

docker version
Client:
Version: 17.09.0-ce
API version: 1.32
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:40:46 2017
OS/Arch: linux/amd64

Server:
Version: 17.09.0-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:39:27 2017
OS/Arch: linux/amd64
Experimental: false

neisantos · December 21, 2017, 2:31am

I’ve got exacly the same issue, it seems it happens when I use the network too heavily it temporary loose the link using the service name.

I managed to fix the issue for me bug ping db, copie the ip and add on my /etc/hosts… it seems to be a bug.

mercstudio · February 25, 2018, 3:47am

okay, but this does not sounds good as the internal ip is auto generated by swarm

Topic		Replies	Views
[CONTAINER ATTACH ERROR]: Weave attach failed: signal: killed Docker Hub	0	1172	March 17, 2017
DNS issue between containers in the same network Docker Hub dockercloud	12	3306	December 25, 2017
When container is started with the same name it fails to relink General docker	6	4139	July 19, 2016
Linking problems between two containers General	6	13978	August 3, 2017
Intermittent "host unreachable" between 2 containers (only!) General docker , portainer , networking , linux	13	933	November 24, 2024

[Resolved] Linked container regularly becomes unreachable

Related topics