I am running on the latest Docker for AWS version.
I am running a very simple repetitive test - docker swarm deploy, and dock swarm rm - using the hello-world image. Every loop uses an additional 5-10 MB of RAM that never gets released. Eventually all RAM is out on one of the nodes. At that point the swarm exhibits multiple issues. The node that is out-of-memory is essentially unusable (CPU is 100% with no memory, running commands on the CLI takes 1-5 minutes). The other nodes get errors if running swarm commands. The only recourse is to terminate the (out-of-memory) node. Note that the other master nodes are almost out-of-memory too.
I have terminated the test prior to out-of-memory. I have pruned all resources and even waited 12+ hours but the memory was never released.
My system info. I have tested this on a 3 master swarm and with a 1 master swarm. Both run into the same out-of-memory issue.
~ $ docker info
Containers: 4
Running: 4
Paused: 0
Stopped: 0
Images: 6
Server Version: 17.09.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: awslogs
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: z6fetv7d34xn5y5m5fek79ml5
Is Manager: true
ClusterID: alhqzwsnds4d68ryos9fyb1rk
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.31.14.156
Manager Addresses:
172.31.14.156:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.49-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.785GiB
Name: ip-172-31-14-156.us-west-2.compute.internal
ID: OAZY:JMZV:4Q3I:M5VO:63WE:PNIZ:G6RU:ZC2G:WOOI:YTSD:HLWD:U5LG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 73
Goroutines: 1170
System Time: 2017-11-22T22:44:19.05826862Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
os=linux
region=us-west-2
availability_zone=us-west-2a
instance_type=t2.large
node_type=manager
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
My docker compose file (test.yml):
version: '3.0'
services:
hellotest:
image: hello-world
deploy:
mode: global
My test script (test.sh):
To run script: nohup ./test.sh &
#!/bin/sh
##### globals #####
# how many times to loop through test
LOOPS=1000
starttime=`date`
echo start: $starttime
counter=1
while [ $counter -le $LOOPS ]
do
echo loop number $counter
# start docker
cd /home/docker
echo starting docker: docker stack deploy --compose-file test.yml hello-test
docker stack deploy --compose-file test.yml hello-test
sleep 30
# stop docker
echo stopping docker: docker stack rm hello-test
docker stack rm hello-test
sleep 30
counter=$((counter+1))
done
endtime=`date`
echo end: $endtime