Docker swarm overlay network not working

Created docker swarm cluster with three nodes and all nodes are in the same subnet. To test the docker swarm overlay network,Created an Nginx service using below command

$docker service create —replicas 1 -p 4200:80 —name web nginx

But I am not able to access the Nginx using any of the worker node ip’s in cluster. I am able to access the Nginx usign master node ip. No iptable and firewall services are running. All docker swarm related ports are open. When I run the tcpdump on master node on port 4789. There is no traffic flow on UDP. What can be the issue here? Any help on how to debug further?

Hello subbareddy425,

Did you try to reach your nginx service from any worker node ?
Try to connect to any worker node and type curl localhost:4200 from the command line.

Best Regards,
Fouscou B.

yes, I tried from local worker node. It just hung up.No response…Not even failed.

Hello subbareddy425,

From the manager node, type the following command : docker network ls
The result should show you a network named docker_gwbridge.
Then you inspect that network and retrieve the IP address of Gateway.
Lastly, you can try curl IP_ADDRESS_GATEWAY:4200

Best Regards,
Fouscou

Hello fouscou,

I am able to access the nginx using the gateway IP on master Node. But, When I do the same on worker node. It didn’t work. It failed with connection timeout error.

To provide more information about this issue. We have couple of swarm clusters built on VM’s. The swarm is working on one cluster where the VM’s residing on the separate physical clusters. The one having issues is on VM’s on different physical server cluster. Is there any internal settings that can cause this issue. We compared the server installation configuration. But, Don’t see any issue/difference.

Thanks,
Subba.

I have the same issue.

I’ve notice that I’m on Debian 10 and ETCD didn’t work correctly.
I didn’t know if the problem come from that. But I think it was related.
Did anyone know how to solve it ?
Or I just have to format and install Ubuntu 19.10 server ?

Here the Error I’ve see when I’ve try to reset k8s cluster. That’s what have show me that etcd didn’t work correctly.

    sudo kubeadm reset
    [reset] Reading configuration from the cluster...
    [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
    [reset] Are you sure you want to proceed? [y/N]: y
    [preflight] Running pre-flight checks
    [reset] Removing info for node "nocturlab-ks" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
    {"level":"warn","ts":"2020-04-24T09:07:06.536Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-5a5395aa-0330-4790-8b07-6aedc0befecc/public_ip:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
    {"level":"warn","ts":"2020-04-24T09:07:06.590Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-5a5395aa-0330-4790-8b07-6aedc0befecc/public_ip:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
    {"level":"warn","ts":"2020-04-24T09:07:06.700Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-5a5395aa-0330-4790-8b07-6aedc0befecc/public_ip:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
    {"level":"warn","ts":"2020-04-24T09:07:06.915Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-5a5395aa-0330-4790-8b07-6aedc0befecc/public_ip:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
    {"level":"warn","ts":"2020-04-24T09:07:07.333Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-5a5395aa-0330-4790-8b07-6aedc0befecc/public_ip:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
    W0424 09:07:58.900932    5094 removeetcdmember.go:61] [reset] failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
    .Please manually remove this etcd member using etcdctl
    [reset] Stopping the kubelet service
    [reset] Unmounting mounted directories in "/var/lib/kubelet"

    [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
    [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
    [reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

    The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

    The reset process does not reset or clean up iptables rules or IPVS tables.
    If you wish to reset iptables, you must do so manually by using the "iptables" command.

    If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
    to reset your system's IPVS tables.

    The reset process does not clean your kubeconfig files and you must remove them manually.
    Please, check the contents of the $HOME/.kube/config file.

@shiishii:
This thread is about issues with the docker network, which kubernetes does not use at all…
You might want to address your issue in a kubeadm or Kubernets forum.

My problem is on the swarm network.
Swarm network use ETCD, I Just notice with kubernetes that my etcd was broken. But I not use Kubernetes now. I use docker swarm mode.
And I suggest at the author of this topic to see if this problem didn’t come from his etcd.

I still don’t think that you have the same problem… The buildin swarm mode does not use etcd. It uses a buildin raft implementation to synchronize state amongst the master and worker nodes. The lagacy standalone swarm uses etcd to manage cluster state.

Anyway, I do not recommend to run etcd as a swarm service. if you intend to use an external etcd for kubernetes, I would strongly suggest to use the os package. It will be more stable and reduce startup time after a reboot.

I used to run a containerized etcd for the state of the storageos volume plugin. First as a swarm service, which was super unrealiable. Then as standalone containers on each of the master nodes. And finaly transitioned to use os package installation of etcd. If your infrastructure (as in kubernetes, a volume or network plugin ) relies on etcd to bootstrapp, this is the way to go. If your application requires etcd to work, it is usually fine to run it as a swarm service or kubernetes statefulset.

1 Like