Swarm and overlay network initialisation fails in production environment: "Failed creating ingress network: container ingress-sbox is already present in sandbox ingress_sbox"

EDIT: Fixed now. Turned out to be an issue with the “docker_gwbridge” interface not running, most likely caused by an inability for it to communicate with the default gateway due to being in a restricted networking environment. Re-creating this interface and manually specifying the subnet and gateway fixed the problem.

Issue type: Networking / overlay network
OS Version / Build: Ubuntu 16.04.4 LTS / kernal 4.13.0-38-generic
App version: Client 18.03.0-ce / Server 18.03.0-ce
Steps to reproduce:

  1. docker swarm init --advertise-addr <public static IP address>

Apr 16 08:05:34 server-name dockerd[7505]: time=“2018-04-16T08:05:34.974074188+01:00” level=error msg=“Failed creating ingress network: container ingress-sbox is already present in sandbox ingress_sbox”
Apr 16 08:05:34 server-name dockerd[7505]: time=“2018-04-16T08:05:34.974100979+01:00” level=warning msg=“Peer operation failed:Unable to find the peerDB for nid:70275pllnv7fq6wiww5g1hbd0 op:&{3 70275pllnv7fq6wiww5g1hbd0 false false false DeleteNetwork}”

  1. docker network create --driver=overlay --attachable cross-docker-network

Apr 16 08:06:47 server-name dockerd[7505]: time=“2018-04-16T08:06:47.698574079+01:00” level=error msg=“Failed creating ingress network: container ingress-sbox is already present in sandbox ingress_sbox”
Apr 16 08:06:47 server-name dockerd[7505]: time=“2018-04-16T08:06:47.698608905+01:00” level=warning msg="Peer operation failed:Unable to find the peerDB for nid:70275pllnv7fq6wiww5g1hbd0 op:&{3 70275pllnv7fq6wiww5g1hbd0

  1. docker run --name node-name --net cross-docker-network -d --ip 10.0.0.10 source-node

Apr 16 08:09:02 server-name kernel: [244742.675938] br0: renamed from ov-001001-9j6ey
Apr 16 08:09:02 server-name systemd-udevd[7201]: Could not generate persistent MAC address for vx-001001-9j6ey: No such file or directory
Apr 16 08:09:02 server-name kernel: [244742.700808] vxlan0: renamed from vx-001001-9j6ey
Apr 16 08:09:02 server-name kernel: [244742.716815] device vxlan0 entered promiscuous mode
Apr 16 08:09:02 server-name kernel: [244742.716927] br0: port 1(vxlan0) entered forwarding state
Apr 16 08:09:02 server-name kernel: [244742.716931] br0: port 1(vxlan0) entered forwarding state
Apr 16 08:09:02 server-name systemd-udevd[7241]: Could not generate persistent MAC address for veth6e0a17a: No such file or directory
Apr 16 08:09:02 server-name systemd-udevd[7240]: Could not generate persistent MAC address for veth6ade23a: No such file or directory
Apr 16 08:09:02 server-name kernel: [244742.768833] veth0: renamed from veth6e0a17a
Apr 16 08:09:02 server-name kernel: [244742.784841] device veth0 entered promiscuous mode
Apr 16 08:09:02 server-name kernel: [244742.784883] br0: port 2(veth0) entered forwarding state
Apr 16 08:09:02 server-name kernel: [244742.784887] br0: port 2(veth0) entered forwarding state
Apr 16 08:09:02 server-name kernel: [244742.788068] br0: port 2(veth0) entered disabled state
Apr 16 08:09:02 server-name kernel: [244742.788074] br0: port 1(vxlan0) entered disabled state
Apr 16 08:09:02 server-name kernel: [244742.789736] ov-001001-9j6ey: renamed from br0
Apr 16 08:09:02 server-name kernel: [244742.804700] device veth0 left promiscuous mode
Apr 16 08:09:02 server-name kernel: [244742.804709] ov-001001-9j6ey: port 2(veth0) entered disabled state
Apr 16 08:09:02 server-name kernel: [244742.820654] device vxlan0 left promiscuous mode
Apr 16 08:09:02 server-name kernel: [244742.820664] ov-001001-9j6ey: port 1(vxlan0) entered disabled state
Apr 16 08:09:02 server-name kernel: [244742.866177] vx-001001-9j6ey: renamed from vxlan0
Apr 16 08:09:02 server-name systemd-udevd[7298]: Could not generate persistent MAC address for vx-001001-9j6ey: No such file or directory
Apr 16 08:09:02 server-name kernel: [244742.894255] veth6e0a17a: renamed from veth0
Apr 16 08:09:02 server-name systemd-udevd[7325]: Could not generate persistent MAC address for veth6e0a17a: No such file or directory
Apr 16 08:09:02 server-name dockerd[7505]: time=“2018-04-16T08:09:02.760461183+01:00” level=error msg="6fcde449ed9d85519bf25ad9be871d49ea0ed9a0d6ab285fa096d3c8dc8a2216 cleanup: failed to delete container from containerd: no such container
Apr 16 08:09:02 server-name dockerd[7505]: time=“2018-04-16T08:09:02.760503309+01:00” level=error msg=“Handler for POST /v1.37/containers/6fcde449ed9d85519bf25ad9be871d49ea0ed9a0d6ab285fa096d3c8dc8a2216/start returned error: error creating external connectivity network: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network”

Description of issue:
I have a full swarm system spread across 3 Docker hosts (all VMs) that works great in production, but when deployed remotely in a live environment fails very early on in the initialisation process. It fails to properly create a swarm (see error log) though it does still create it and can be joined by other machines. Presumably because of this, it then fails to properly create an overlay network, and then finally, fails to stand up a container on this network.

I have update the Linux kernel from 4.4.0-62-generic to 4.13.0-38-generic in response to a number of people mentioning that it might be related to Linode issues, but this hasn’t helped (and to be honest as I’m not normally a kernel person this is a little beyond me).