I am deploying a service to a swarm mode docker environment with two nodes.
The manager node is a ubuntu 14.04 system with 4 cores, the worker node is a ubuntu 14.04 system with 24 cores. The related TCP and UDP ports are surely available through ufw.
The service starts a major container (let’s say container-A) in the manger node and several other containers (container-B,C,D,E,F) in the worker node. They communicate with each other through an overlay network. However, sometimes, part of the containers (container-B,C) in worker node are failed to communicate with the main image in the manager node. Others (container-D,E,F)are still working.
B,C,D,E,F are the same image.
If I start all the images in the same node, such errors never appear.
In docker logs:
time="2017-04-25T18:48:04.323211240+02:00" level=error msg="Bulk sync to node **wokernode** timed out"
time="2017-04-25T18:50:27.322243909+02:00" level=warning msg="memberlist: Was able to reach **wokernode** via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP"
time="2017-04-25T20:11:04.327249446+02:00" level=error msg="Bulk sync to node **wokernode** timed out"
time="2017-04-26T01:09:34.325412120+02:00" level=error msg="Bulk sync to node **wokernode** timed out"
time="2017-04-26T03:15:59.322312091+02:00" level=warning msg="memberlist: Was able to reach **wokernode** via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP"
time="2017-04-26T07:12:04.409572284+02:00" level=error msg="Bulk sync to node **wokernode** timed out"
Docker versoin
Client:
Version: 17.03.1-ce
API version: 1.27
Go version: go1.7.5
Git commit: c6d412e
Built: Mon Mar 27 17:10:36 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.1-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: c6d412e
Built: Mon Mar 27 17:10:36 2017
OS/Arch: linux/amd64
Experimental: false
Docker info:
Containers: 126
Running: 3
Paused: 0
Stopped: 123
Images: 51
Server Version: 17.03.1-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 463
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local nvidia-docker
Network: bridge host macvlan null overlay
Swarm: active
NodeID: o7pt8pumr9712x2lys656f2md
Is Manager: true
ClusterID: 3phxvngu42fl38wg3sajltmoc
Managers: 1
Nodes: 4
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: XXX XXX XXX XXX
Manager Addresses:
XXX XXX XXX XXX
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
apparmor
Kernel Version: 4.4.0-66-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.65 GiB
Name: waifa
ID: OJT2:XBBJ:V4HS:JEQI:DFE4:5PKX:Z3NH:DVV3:7T3F:BQKD:E6IR:KQWM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: onlytailei
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false