Docker Community Forums

Share and learn in the Docker community.

Docker Swarm Resource/Memory leak

aws

(Jkhongusc) #1

I am running on the latest Docker for AWS version.

I am running a very simple repetitive test - docker swarm deploy, and dock swarm rm - using the hello-world image. Every loop uses an additional 5-10 MB of RAM that never gets released. Eventually all RAM is out on one of the nodes. At that point the swarm exhibits multiple issues. The node that is out-of-memory is essentially unusable (CPU is 100% with no memory, running commands on the CLI takes 1-5 minutes). The other nodes get errors if running swarm commands. The only recourse is to terminate the (out-of-memory) node. Note that the other master nodes are almost out-of-memory too.

I have terminated the test prior to out-of-memory. I have pruned all resources and even waited 12+ hours but the memory was never released.

My system info. I have tested this on a 3 master swarm and with a 1 master swarm. Both run into the same out-of-memory issue.

~ $ docker info
Containers: 4
 Running: 4
 Paused: 0
 Stopped: 0
Images: 6
Server Version: 17.09.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: awslogs
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: z6fetv7d34xn5y5m5fek79ml5
 Is Manager: true
 ClusterID: alhqzwsnds4d68ryos9fyb1rk
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 172.31.14.156
 Manager Addresses:
  172.31.14.156:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.49-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.785GiB
Name: ip-172-31-14-156.us-west-2.compute.internal
ID: OAZY:JMZV:4Q3I:M5VO:63WE:PNIZ:G6RU:ZC2G:WOOI:YTSD:HLWD:U5LG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 73
 Goroutines: 1170
 System Time: 2017-11-22T22:44:19.05826862Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
 os=linux
 region=us-west-2
 availability_zone=us-west-2a
 instance_type=t2.large
 node_type=manager
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

My docker compose file (test.yml):

version: '3.0'

services:
  hellotest:
    image: hello-world
    deploy:
      mode: global

My test script (test.sh):
To run script: nohup ./test.sh &

#!/bin/sh

##### globals #####
# how many times to loop through test
LOOPS=1000


starttime=`date`
echo start: $starttime

counter=1
while [ $counter -le $LOOPS ]
do
        echo loop number $counter

        # start docker
        cd /home/docker
        echo starting docker: docker stack deploy --compose-file test.yml hello-test
        docker stack deploy --compose-file test.yml hello-test
        sleep 30

        # stop docker
        echo stopping docker: docker stack rm hello-test
        docker stack rm hello-test
        sleep 30

        counter=$((counter+1))
done


endtime=`date`
echo end: $endtime