I have a docker swarm cluster with more than 60+ nodes(1 manager), which is running in production environment, we meet a very difficult problem, all the containers will be restarted automatically sometimes, could some one give some guidence? Appreciates for your help.
journalctl -u docker.service outputs of manager node:
Dec 09 11:32:13 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:13.881052082+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39336\": EOF" module=grpc
Dec 09 11:32:15 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:15.581259315+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:39798"
Dec 09 11:32:18 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:18.881458249+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39394\": EOF" module=grpc
Dec 09 11:32:20 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:20.582011748+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:47059"
Dec 09 11:32:23 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:23.580730960+08:00" level=warning msg="Health check for container 6a42a166a86914b92e724fce2fc1fd2e8a7174e1965047e711e054d3fcb2c8a9 error: context deadline exceeded"
Dec 09 11:32:23 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:23.878335604+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39438\": EOF" module=grpc
Dec 09 11:32:25 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:25.587366233+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:51345"
Dec 09 11:32:26 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:26.618406623+08:00" level=info msg="NetworkDB stats host-172-17-28-136(36fd4f7dcb85) - netID:eg6ios0n42rolpm8xffdvxj39 leaving:false netPeers:43 entries:84 Queue qLen:0 netMsg/s:1"
Dec 09 11:32:28 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:28.884212941+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39462\": EOF" module=grpc
Dec 09 11:32:30 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:30.588788615+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:53479"
Dec 09 11:32:33 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:33.880829064+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39482\": EOF" module=grpc
Dec 09 11:32:35 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:35.594238965+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:56144"
Dec 09 11:32:38 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:38.880164807+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39500\": EOF" module=grpc
Dec 09 11:32:40 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:40.594545896+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:33954"
Dec 09 11:32:43 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:43.878232310+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39518\": EOF" module=grpc
Dec 09 11:32:45 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:45.594935631+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:38073"
Dec 09 11:32:48 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:48.878437023+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39534\": EOF" module=grpc
Dec 09 11:32:50 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:50.600089293+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:42235"
Dec 09 11:32:53 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:53.878770607+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39550\": EOF" module=grpc
Dec 09 11:32:55 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:55.542443102+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:42143"
Dec 09 11:32:58 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:32:58.880799252+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39562\": EOF" module=grpc
Dec 09 11:33:00 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:00.546895778+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:36005"
Dec 09 11:33:03 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:03.878459214+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39576\": EOF" module=grpc
Dec 09 11:33:05 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:05.551876945+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:33857"
Dec 09 11:33:08 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:08.878753916+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39590\": EOF" module=grpc
Dec 09 11:33:10 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:10.557064636+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:35743"
Dec 09 11:33:13 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:13.878226424+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39604\": EOF" module=grpc
Dec 09 11:33:15 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:15.561920134+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:49116"
Dec 09 11:33:18 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:18.881451168+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39620\": EOF" module=grpc
Dec 09 11:33:20 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:20.566874367+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:37224"
Dec 09 11:33:23 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:23.626283600+08:00" level=warning msg="Health check for container 6a42a166a86914b92e724fce2fc1fd2e8a7174e1965047e711e054d3fcb2c8a9 error: context deadline exceeded"
Dec 09 11:33:23 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:23.885073517+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39632\": EOF" module=grpc
Dec 09 11:33:25 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:25.569279528+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:58914"
Dec 09 11:33:28 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:28.925749108+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39646\": EOF" module=grpc
Dec 09 11:33:30 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:30.570819619+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:48191"
Dec 09 11:33:33 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:33.924628462+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39664\": EOF" module=grpc
Dec 09 11:33:35 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:35.573992066+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:50646"
Dec 09 11:33:38 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:38.880401328+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39678\": EOF" module=grpc
Dec 09 11:33:40 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:40.579199371+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:55638"
Dec 09 11:33:43 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:43.924627631+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39692\": EOF" module=grpc
Dec 09 11:33:45 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:45.584391307+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:53806"
Dec 09 11:33:48 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:48.922577843+08:00" level=warning msg="grpc: Server.Serve failed to complete security handshake from \"172.17.28.168:39710\": EOF" module=grpc
Dec 09 11:33:50 host-172-17-28-136 dockerd[806]: time="2019-12-09T11:33:50.586868904+08:00" level=error msg="[resolver] more than 100 concurrent queries from 172.19.0.2:48380"
docker info outputs of manager node :
[root@host-172-17-28-136 ~]# docker info
Containers: 13
Running: 13
Paused: 0
Stopped: 0
Images: 13
Server Version: 18.09.8
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: q2tagzsa3p0vghuohyru14nwe
Is Manager: true
ClusterID: vxkafmvt1n25qsyupx4e46fa6
Managers: 1
Nodes: 44
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 9
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.17.28.136
Manager Addresses:
172.17.28.136:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.51GiB
Name: host-172-17-28-136
ID: FZKU:BOVV:HRTB:DNLU:N5C5:PSOI:7KHC:GTOA:RD7G:ZYTN:ANSB:N4B3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
172.17.28.136
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: API is accessible on http://0.0.0.0:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface