Docker Community Forums

Share and learn in the Docker community.

Containers randomly stopping

dockercloud

(Clevertech) #1

On both our account and our client’s account we are seeing containers stopped randomly inside stacks, there is nothing in the logs, and no reason for them to be stopped, but we are seeing them stopping randomly throughout the day, especially over the last two days.

This is a major issue as it is causing downtime for both us and our client and reflecting poorly on the docker cloud service when there doesn’t seem to be any stability.


(Clevertech) #2

Also, Docker Cloud API fails (both directly and using CLI) randomly. And this happens at the same time of container stopping. Containers are NOT restarted automatically after the failure.


(Clevertech) #3

When the container dies, we receive a Slack notification, something on the line of:

XXX: Container stopped with exit code 128
XXX: Container stopped with exit code 143

etc. But this is not our apps failing, as it happen on uncorrelated container, and a bunch of them stop at the same time.


(Allan Sun) #4

You’d better state what tech stack you are using.

From your description I think it’s your application exited (the one in your CMD or ENTRYPOINT).

Normally container exit should not have anything to do with Docker Cloud API, it is used to control how your docker container is orchestrated. Have you defined ‘restart: always’ through your DockerCloud API?

I don’t experience any API connectivity issues recently, if you are having problem using the API, you’d better check your network connection.

The other thing worth to check is your DockerCloud Agent running on your Node, are they up to date?


(Alexander Harding) #5

I also have been seeing random stopping, maybe once or twice a week, due to not being able to connect to another container. Had two failures last night.

I have auto restart on and it restarts, almost always successfully, which is good.

(I get emails when my containers crash.)

Node.js is failing trying to connect to Couchdb or Redis randomly.

I get something like:

Error: getaddrinfo ENOTFOUND db db:5984

(Clevertech) #6

This seems to be resolved itself over the weekend, but it’s on more than one type of stack.

We’ve seen this happen on our haproxy stack, our logspout and on various node.js stacks.

So it’s not a matter of just a node app stopping, plus our logs would show if the app stopped.


(Nauraushaun) #7

We have also had problems similar to this. Unfortunately once a container stops it seems there’s not so much analysis you can do on why it stopped.
As suggested it’s probably your app quitting. But it might be memory issues. We found that sometimes an application would completely run out of memory causing it (and its container) to stop. This may have been because it was reading the system memory as the total available memory because the Java app didn’t know about the container memory limit.

Hopefully this gives you something to monitor. Before the container stops and you’re out of options.


(Clevertech) #8

Not our apps crashing as we are also seeing this happening in the load balancer stack as well, which is not running any apps.

We also have memory limits configured for all our apps, and monitoring in place for the apps themselves. It’s not app side.


(Clevertech) #9

We’ve seen this multiple times over the past 72 hours, on both our account and our client’s prod account.

This has happened three times so far today, including the haproxy load balancer and logspout going down separate from the apps.


(Imjosh2) #10

I’ve been seeing little network glitches sporadically, not sure if it’s related

This happens all the time but I’m not sure what it means exactly:

haproxy-1 haproxy-1.haproxy.xxx: INFO:haproxy:HTTPSConnectionPool(host=‘cloud.docker.com’, port=443): Read timed out. (read timeout=None)

This one worries me and I’ve seen very similar more than a few times over the last two or three weeks:

Oct 30 21:59:14 haproxy-1 haproxy: xx.xx.xx.xx:59648 [31/Oct/2016:01:59:13.923] port_443~ SERVICE_APPTEST/APPTEST_1 142/0/0/6/155 200 14522 - - ---- 1/1/0/1/0 0/0 "GET /xyz HTTP/1.1"
Oct 30 21:59:32 haproxy-1 haproxy: Server SERVICE_APPTEST/APPTEST_1 is DOWN, reason: Layer4 timeout, check duration: 2008ms. 0 active and 0 backup servers left. 6 sessions active, 0 requeued, 0 remaining in queue.
Oct 30 21:59:32 haproxy-1 haproxy: backend SERVICE_APPTEST has no server available!
Oct 30 21:59:44 apptest-1 appTest-1.test.xxxx: uncaught server error: {“name”:“MongoError”,“message”:“connection 338 to mongoTest:27017 timed out”}
Oct 30 21:59:48 haproxy-1 haproxy: Server SERVICE_APPTEST/APPTEST_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

So the app container was up at 21:59:14. 18 seconds later haproxy says it’s down. 12 seconds later we can tell the app container is up because it logs that it’s unable to connect to the mongo server. There’s no indication in the logs to indicate that the app container was actually down at any point - there’s no corresponding start up logging. It’s as if the internal network went down. 16 seconds later haproxy says it’s back up.


(Clevertech) #11

This hasn’t gone away for us on either our account or our client’s account.

Containers inside stacks continue to randomly stop and never come back up, and on docker cloud side, they’ll actually claim to still be up…

Nothing in logs to show an issue, and no, not an app crashing, as this is also happening with logs, and haproxy and containers that are not running node apps.


(Stephen Pope) #12

I see this problem as well, even when they are on auto restart.


(Clevertech) #13

This is still happening for us, containers stop for no reason and never restart.


(Jmiraglia) #14

I have not run into this issue, also running haproxy and node.js containers. I’m currently only using AWS west-2 for node clusters. Which cloud provider are you using for your nodes? Since containers are only being orchestrated by Docker Cloud (which would be at fault for not restarting the containers, but not at fault for them stopping unexpectedly), I’d think it’d be an issue with the cloud provider.


(Vlad) #15

Have you found the reason they stop? I’m experiencing the same issue.


(Blockchainsa) #16

We’re getting the same thing. We’re using DigitalOcean as our provider.

Anyone here with the same problem ?

Its extremely frustrating.


(Sufyanelahi) #17

experiencing the same on CENTOS7 on ali baba cloud.

are these relevant : https://bugs.centos.org/view.php?id=13718
https://github.com/moby/moby/issues/5618