Hi,
we are experiencing a huge number of open file descriptors by dockerd (at the moment around 120K) running on a production AWS x1.x32xlarge host.
There are 295 containers
$ docker ps -a | wc -l
295
The number keeps increasing approx by 10K per week
$ ls -l /proc/$(cat /var/run/docker.pid)/fd | wc -l
119560
The majority of them belongs to Linux network namespace
$ lsof -p $(cat /var/run/docker.pid) | grep net | wc -l
116535
The excerpt of lsof
dockerd 675230 root 11r REG 0,3 0 4026538136 net
dockerd 675230 root 12u netlink 0t0 2275117769 ROUTE
dockerd 675230 root 13u netlink 0t0 2275117770 XFRM
dockerd 675230 root 14u netlink 0t0 2275117771 NETFILTER
dockerd 675230 root 17u unix 0xffff88b6ffce1400 0t0 2275117774 /run/docker/libnetwork/aaff03a7b15114fb9b7922533316e7d9e8edeef655314b5081315e1550760702.sock type=STREAM
dockerd 675230 root 22u netlink 0t0 2275390572 ROUTE
dockerd 675230 root 76r REG 0,3 0 4026538136 net
dockerd 675230 root 94r REG 0,3 0 4026538136 net
dockerd 675230 root 99r REG 0,3 0 4026538136 net
dockerd 675230 root 107r REG 0,3 0 4026538136 net
dockerd 675230 root 111r REG 0,3 0 4026538136 net
dockerd 675230 root 116r REG 0,3 0 4026538136 net
dockerd 675230 root 144r REG 0,3 0 4026538136 net
dockerd 675230 root 150r REG 0,3 0 4026538136 net
dockerd 675230 root 151r REG 0,3 0 4026538136 net
dockerd 675230 root 158r REG 0,3 0 4026538136 net
dockerd 675230 root 163r REG 0,3 0 4026538136 net
dockerd 675230 root 173r REG 0,3 0 4026538136 net
dockerd 675230 root 175r REG 0,3 0 4026538136 net
dockerd 675230 root 184r REG 0,3 0 4026538136 net
dockerd 675230 root 186r REG 0,3 0 4026538136 net
dockerd 675230 root 190r REG 0,3 0 4026538136 net
dockerd 675230 root 191r REG 0,3 0 4026538136 net
dockerd 675230 root 193r REG 0,3 0 4026538136 net
dockerd 675230 root 196r REG 0,3 0 4026538136 net
dockerd 675230 root 198r REG 0,3 0 4026538136 net
dockerd 675230 root 201r REG 0,3 0 4026538136 net
dockerd 675230 root 210r REG 0,3 0 4026538136 net
Version of docker/OS and docker info can be found below
$ docker info
Containers: 294
Running: 294
Paused: 0
Stopped: 0
Images: 3360
Server Version: 18.09.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.18.0-1007-aws
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 1.876TiB
Name: docker-linux-1-dh
ID: UGZS:UFD3:GB4C:W5MX:JU2L:K7PH:6ZWS:4GPM:27Q5:UNNN:X3DC:YDT7
Docker Root Dir: /srv/docker_root
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 119564
Goroutines: 1947
System Time: 2019-06-15T13:54:21.755737722Z
EventsListeners: 7
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
Product License: Community Engine
Why does it happen? is this some kind of leak or misconfiguration?
Thank you in advance!