Manager node fails while idle

jpringle11 · October 10, 2016, 7:20pm

Expected behavior

After launching a stack and starting services, the manager node should remain up and stable while essentially idle.

Actual behavior

The manager node is recycled by the auto-scale-group after failing a health check, though no activity has been occurring on the manager node.

Additional Information

Docker for AWS 1.12.2-rc1 (beta6)

Launched a very simple stack consisting of 1 manager and 1 worker (both t2.small) and deployed a small set of services (2 nginx containers, 2 java containers, 1 couchbase container). Containers and swarm worked great on Monday and through the week. No activity over the weekend, but the ASG reported that the manager failed a healthcheck, so it terminated the instance and relaunched it. Only the “manager activities” were running on the manager node, so there is no reason why it should fail a healthcheck and be recycled by the ASG.

Steps to reproduce the behavior

Launch stack with 1 manager and 1 worker of size t2.small
Deploy some basic services
Leave idle until the ASG for the manager recycles it (in my case it took 5 days)

I understand that 1 manager is not a production-level system, but I would at least expect it to stay up and stable if basically left alone.

Is there a way to determine why the node failed the ASG health check?

nathanleclaire · October 18, 2016, 3:12am

Would be great to get a docker-diagnose ID for this issue (only possible w/ newest AWS release unfortunately). I suspect a Docker daemon OOM since the t2.small is only about 2GB RAM and Java mem usage gets pretty steep fast IME.

Topic		Replies	Views
Is the manager ASG supposed to heal itself? General aws , amazonwebservices	12	2624	June 14, 2017
Swarm failure - voting does not work General aws	4	2465	August 29, 2016
AWS EC2 Instance restart recovery General aws , beta , amazonwebservices	9	4154	September 15, 2017
Swarm in Broken State after ASG replaced 2 out of 3 Managers General aws , docker , swarm	1	1772	August 1, 2017
Swarm nodes became Drain Swarm	0	1240	December 12, 2017

Manager node fails while idle

Expected behavior

Actual behavior

Additional Information

Steps to reproduce the behavior

Related topics