How do you guys deal with mesh performance?

Hey y’all.

How do you route stuff through your swarm? Using the mesh comes with quiet the performance penalty. It just sprays requests all over the swarm and does masses of unnecessary round trips.

My current approach is to publish my servcies ports in mode: host and have a deploy.mode: global instance of traefik running on every node, favoring the service on the same host over fallback to all other nodes. I can easily “discover” the services by using healthchecks. My services are quiet small and static. It’s just like 5 published ports in a stack. So I don’t mind configuring them static.

I would basically just use swarm for orchestration and build the overlay network myself.

Maybe it would be possible to get the info which node is running which service from the docker.sock and update traefik config accordingly. But other than one http request every ten seconds, I don’t see a big downside on just using healthchecks to do the same thing.

Interested to hear how you deal with things.

1 Like

For detecting which service is running on which node you could use docker’s DNS server which is hosted at 127.0.0.11. But in order to do so you will need to run your services under DNSRR mode.

Have a look at this blog post:

I am currently running the last example “One HAProxy container per node” but with Caddy instead of HAProxy and only on one node at the moment.

It’s a bit of work to set up, but the performance increase is quite dramatic.

1 Like

That won’t work. Docker’s DNS will only return the virtual IP(s) of each service. You can’t directly infer from that on which node a service is running. You example still relies on the docker service mesh.