Docker Community Forums

Share and learn in the Docker community.

Docker swarm ovelay network not working on AWS

Hey guys. So I’m trying to setup swarm on two AWS instances. Everything works except the overlay network. It seems docker isn’t connecting to it at all. I’ve already enabled all the necessary ports in the security group for both instances on the AWS console. Doing a tcp dump on the port 4789 displays no traffic at all. But it displays traffic for port 2377 and port 7946.
I’ve tried the configuration on docker 19.03.4-ce and also 18.06.3-ce but I still get the same result.

Here’s the output for the manager node for docker info:

    Containers: 1
     Running: 0
     Paused: 0
     Stopped: 1
    Images: 6
    Server Version: 18.06.3-ce
    Storage Driver: overlay2
     Backing Filesystem: extfs
     Supports d_type: true
     Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
     Volume: local
     Network: bridge host macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
    Swarm: active
     NodeID: yiyvvk7rcz033eorn9l0rydxd
     Is Manager: true
     ClusterID: y01nnnejg938bz1mtpjj6n1uo
     Managers: 1
     Nodes: 3
     Orchestration:
      Task History Retention Limit: 5
     Raft:
      Snapshot Interval: 10000
      Number of Old Snapshots to Retain: 0
      Heartbeat Tick: 1
      Election Tick: 10
     Dispatcher:
      Heartbeat Period: 5 seconds
     CA Configuration:
      Expiry Duration: 3 months
      Force Rotate: 0
     Autolock Managers: false
     Root Rotation In Progress: false
     Node Address: 18.184.183.97
     Manager Addresses:
      18.184.183.97:2377
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
    runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
    init version: fec3683
    Security Options:
     apparmor
     seccomp
      Profile: default
    Kernel Version: 4.15.0-1052-aws
    Operating System: Ubuntu 18.04.3 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 1
    Total Memory: 983.9MiB
    Name: ip-172-31-12-68
    ID: 5OQ4:XQZ5:M3ML:WEC5:2WTA:JTXX:ABK6:DYUK:3Z27:R4IH:ECGH:KEYU
    Docker Root Dir: /var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): false
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
     127.0.0.0/8
    Live Restore Enabled: false

Here’s the output for the worker node:

    Containers: 2
     Running: 1
     Paused: 0
     Stopped: 1
    Images: 3
    Server Version: 18.06.3-ce
    Storage Driver: overlay2
     Backing Filesystem: extfs
     Supports d_type: true
     Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
     Volume: local
     Network: bridge host macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
    Swarm: active
     NodeID: t0a1ttjoctmlhqobl1ur5qcwy
     Is Manager: false
     Node Address: 3.120.139.109
     Manager Addresses:
      18.184.183.97:2377
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
    runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
    init version: fec3683
    Security Options:
     apparmor
     seccomp
      Profile: default
    Kernel Version: 4.15.0-1052-aws
    Operating System: Ubuntu 18.04.3 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 1
    Total Memory: 983.9MiB
    Name: ip-172-31-4-225
    ID: DPB6:5FSV:DTSC:XVI3:Z762:4KJA:QZGE:FPGJ:KFAR:UHBC:L6AH:D2I6
    Docker Root Dir: /var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): false
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
     127.0.0.0/8
    Live Restore Enabled: false

The interesting thing is the same configuration works on digital ocean with any issues.

I’m suspecting there’s an AWS specific setup required. But I’m not entirely sure.

1 Like

As I used to run swarm without issues on AWS for roughly two years, I am quite sure this can not be a general problem.

Usualy those ports are everything you need. Though, why not loosen the rules to accept traffic on all ports, test and if you succeed lock down the rules again.

I’ve actually tried that already. Port 4789 still keeps misbehaving.

@meyay I’ve just run telnet on the manager mode, port 7946 gives the following result:

telnet ip 7946
Trying ip...
Connected to ip.

While port 4789 gives

telnet ip 4789
Trying ip...
telnet: Unable to connect to remote host: Connection timed out

Could this be an issue on aws and not docker swarm itself?

Your result is not realy surprising, as telnet can not connect to udp ports.
You migth want to follow this SO response to see examples on how to test udp connections.

Are you sure you created rules for udp traffic as well?
Are you using encryption on the swarm network?

I ran socat - UDP:ip:4789 and socat - UDP:ip:7946 and there was no output. It seems UDP traffic is not being transmitted :thinking:.
These are current rules on AWS (temporary):

The firewall isn’t up also on both servers.
I haven’t enabled encryption either.

For anybody who faces this issue. I got it working by using the private IPs instead of public ones.

1 Like

For anybody who faces this issue. I got it working by using the private IPs instead of public ones.

Thanks @michaelbukachi !

How did you do that? did you need to init swarm manager again? I do not want to stop my swarm to apply this change :frowning:

No, I didn’t have to. When joining a swarm just use the private IP address of the server instead of the public one.

2 Likes