Docker Community Forums

Share and learn in the Docker community.

"Bulk sync to node XX timed out" of service deploy in swarm mode through overlay network


(Onlytailei) #1

I am deploying a service to a swarm mode docker environment with two nodes.
The manager node is a ubuntu 14.04 system with 4 cores, the worker node is a ubuntu 14.04 system with 24 cores. The related TCP and UDP ports are surely available through ufw.

The service starts a major container (let’s say container-A) in the manger node and several other containers (container-B,C,D,E,F) in the worker node. They communicate with each other through an overlay network. However, sometimes, part of the containers (container-B,C) in worker node are failed to communicate with the main image in the manager node. Others (container-D,E,F)are still working.
B,C,D,E,F are the same image.

If I start all the images in the same node, such errors never appear.


In docker logs:

time="2017-04-25T18:48:04.323211240+02:00" level=error msg="Bulk sync to node **wokernode** timed out" 
time="2017-04-25T18:50:27.322243909+02:00" level=warning msg="memberlist: Was able to reach **wokernode** via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP" 
time="2017-04-25T20:11:04.327249446+02:00" level=error msg="Bulk sync to node **wokernode** timed out" 
time="2017-04-26T01:09:34.325412120+02:00" level=error msg="Bulk sync to node **wokernode** timed out" 
time="2017-04-26T03:15:59.322312091+02:00" level=warning msg="memberlist: Was able to reach **wokernode** via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP" 
time="2017-04-26T07:12:04.409572284+02:00" level=error msg="Bulk sync to node **wokernode** timed out" 


Docker versoin

Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:10:36 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:10:36 2017
 OS/Arch:      linux/amd64
 Experimental: false

Docker info:

Containers: 126
 Running: 3
 Paused: 0
 Stopped: 123
Images: 51
Server Version: 17.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 463
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local nvidia-docker
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: o7pt8pumr9712x2lys656f2md
 Is Manager: true
 ClusterID: 3phxvngu42fl38wg3sajltmoc
 Managers: 1
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: XXX XXX XXX XXX
 Manager Addresses:
  XXX XXX XXX XXX
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 4.4.0-66-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.65 GiB
Name: waifa
ID: OJT2:XBBJ:V4HS:JEQI:DFE4:5PKX:Z3NH:DVV3:7T3F:BQKD:E6IR:KQWM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: onlytailei
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false


(Onlytailei) #2

Docker network inspect in manager node:

[
    {
        "Name": "XXX_net",
        "Id": "ufsfwtpo0fh4wsypu02ba70qj",
        "Created": "2017-04-25T18:08:32.338241058+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.9.0/24",
                    "Gateway": "10.0.9.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {
            "all of the containers"
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "**wokernode**",
                "IP": "XXXXXXXXXX"
            },
            {
                "Name": "managernode",
                "IP": "XXXXXXXXXX"
            }
        ]
    }
]


(Onlytailei) #3

Docker network inspect in worker node

[
    {
        "Name": "XXX_net",
        "Id": "ufsfwtpo0fh4wsypu02ba70qj",
        "Created": "2017-04-25T18:08:24.850657556+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.9.0/24",
                    "Gateway": "10.0.9.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {
          "all of the containers"
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "manager node",
                "IP": "XXXXXXXXXX"
            },
            {
                "Name": "worker node",
                "IP": "XXXXXXXXX"
            }
        ]
    }
]


(Karanp) #4

I have 17.03.1-ce version of docker, running on RHEL 7.3 and Using swarm mode with one worker node and overlay network with devicemapper setup using thinpool .
Sep 6 17:05:56 172.30.0.10 dockerd: time=“2017-09-06T17:05:56.220841190Z” level=error msg=“Bulk sync to node apg-001-rhel-application-02-57a5929a31e7 timed out

And docker kernel went into panic mode and all docker containers were showing status “UP” , also same containers were showing status as Created. Is this a bug or is this behavior is result of some network glitch.


(Drawnkid) #5

iptables rule and container port has conflicts.

try :
sudo iptables -t nat -L -n --line-numbers | grep 7946
sudo iptables -t nat -D DOCKER 6926