I’ve been working with Docker Swarm and have run into an issue regarding service discovery. Specifically, one of my services is utilizing the “host network”, and I’ve learned from a discussion on this GitHub issue (Update service with --network=host failed · Issue #27 · docker/for-linux · GitHub) that I’m unable to simultaneously include my service in the overlay network.
This situation has created a significant roadblock for me because it has prevented the use of Docker Swarm’s DNSRR feature. Previously, I leveraged DNSRR for service discovery, particularly for identifying the IPs of active tasks.I am seeking a solution or feature that allows me to query all the tasks currently running under the service, including their private and/or public IPs.
Furthermore, when my service was attached to the overlay network, I was able to directly access other services using their DNS names. However, now that my service isn’t a part of the overlay network, I am compelled to use private IPs, which isn’t optimal.
Could anyone point me in the direction of a solution or workaround for these challenges? Any advice or insights would be greatly appreciated.
--network=host declares the absence of network isolation. You can not mix it with networks that require network isolation.
However, the long syntax for port publishing allows publishing ports with mode: host (instead of the default mode: ingress) for overlay networks. It will bind the host port on a node where at least one replica is running, this port will behave like it would with --network=host (e.g. it will retain source ips).
Hello @meyay ! Thanks for the response. Yes, this is the best way to do it apparently BUT the service I’m trying to deploy (Janus WebRTC Gateway) uses a large range of ports and unfortunately Docker doesn’t let me to map port ranges to host network, it only supports mapping single ports.
Morever, I further discovered that Exposing a large range of ports are problematic all-together, fur such scenarios (Asterisk, Janus, etc.) there are two viable options;
I feel like the best bet for me to use host-network and keep track of my host-networked services on a database and use the private ip’s for service communication. What do you think?
Each mapped port will delay the container start. I can’t tell you exactly by how much, but a range of couples of hundred ports will delay the start noticeably.
The macvlan network is also not realy a viable solution, unless you create a macvlan with an ip range with a single ip in it. Swarm services do not support static ipv4 configurations using ipv4_address, so the containers would get random ips within the macvlan ip range. The next obstacle is that there can not be more than one macvlan gateway using the same gateway ip.
In case of a service that requires larger port ranges, it appears to be valid approach.
There is one more thing you can try: declare a top level network element that uses the predefined host network (I doubt it will work, but it’s still worth trying):
Thanks for the information. There is a special case where I might need to use the macvlan (when I need multiple containers on a single large host). Having random IPs doesn’t sound too bad as I will be using consul for service discovery, but as you have suggested, assigning a unique gateway IP for each host sounds like a delicate subject.
This is indeed my current solution, now I have multiple services running on dedicated hosts (one service per EC2) and I used Route 53 for “service discovery” (soon to be replaced by consul)
Thank you so much for your valuable advice @meyay, really appreciate it.
You can actually create a swarm spanning macvlan: first your create the macvlan configuration on each node (if I am not mistaken, the physical device name , e.g. eth0, must be the same on all nodes). Then you can the network using the configurations.
Yes, macvlan is a more preferable option, and that’s how we initially deployed our WebRTC servers for testing, however AWS doesn’t support macvlan on their EC2 instances. If I ever have to deploy this on a on-prem I will re-visit MacVlan.
Initially I started with Fargate, thinking it would be easier to implement, but it proved to be difficult for the same reason that I mentioned on this post; Our webrtc server uses 10k+ ports and AWS’s load balancers does not support publishing port ranges, instead we had to publish each port individually. Some people said I could resort to AWS’s network load balancers; but It wasn’t clear if that would solve the problem. I think AWS Fargate was designed for common uses cases, basic Microservice architecture in mind, and WebRTC servers are not in that category.
Another reason I moved away from fargate was I did not want to rely on a specific cloud service provider’s solutions. I want to make it as infrastructure independent as possible. Recently I deployed our swarm on Huawei Cloud, and it only took a few hours. If possible I want to keep it this way.
I am planning to use Kubernetes instead, when I have some time I will make the switch. I might come here more often for help Thanks again for the valuable advises.
Farge ECS supports awsvpc, whihc provides an ip of the vpc subnet the container is running.
I don’t recall if NLB’s allow ranges, but I doubt they do. You would indeed need to create a target group per port and then a listener per port. Nothing you would want to manage via clickOps. This screams for automation
But then again, I can understand that you want to stick to vanilla docker, as ecs indeed is slightly different to use than docker. Back in the day in took me some time to understand that I need ClouldMap for Service Discovery.
Kubernetes is definitely the way to go. Last time I saw Swarm used in enterprise context was roughly 5 years ago. Most of our clients seem to have agreed back then that Kubernetes is the way to go - all container strategies I have seen based on Kubernetes.
Good to know We’ve been playing around with Docker Compose for local testing, and when it was time to test our system on the cloud, Docker Swarm just felt like the easiest route. But our original intention was to leverage Kubernetes, just didn’t have time to learn it