I am seeking guidance or suggestion on how to best implement this scenario:
a large number of remote “workers” that are behind NAT/Firewall where no control is possible (so no port forwarding, opening, etc.)
a number of central “workers” where the remote “workers” connect to
there is no communication needed between the remote “workers”
the communication from remote to central is over the Internet so the traffic need to be encrypted.
The remote “workers” is built (compose) from a number of containers, so of which need to be addressable (and discoverable) from the central “workers”. Of course, the central “workers” are also multi-container.
By large I mean 10 to 100ks, so very large. It is so large that a single network per remote “workers” is not possible; the central “workers” are unlikely to support the corresponding large number of resulting interfaces. Even by splitting the load amongst a cluster of services.
I say “workers” using quote because docker-swarm is a possibility (to manage the whole thing, including discovery, etc.). But this is not an absolute must. However, alternative will be needed to manage the inventory.
I do not think the overlay network works here. First, it is unlikely to support the NAT traversal option for IPSEC. But more importantly, since it is a mesh network (unless I miss the concept of a tree, if it exists) it just does not scale.
Openeing Docker Engines to the internet is a bad idea, you should consider a site-to-site vpn.
Not beeing able to open ports prevents the participating docker engines to exchange backplane data. The manager nodes can’t exchange control data with the workers by magic, can they?
Swarm uses the Raft consensus algorithm, requires low latency connection between the nodes, does your internet connection provide this end-to-end?
Did you think about a cluster per DC, where manager and worker nodes are in a single network (let it have a broader netmask if you want)? It doesn’t sound like if you would be short in servers
You could still exchange payload between docker containers in multiple DCs using encrypted traffic(e.g. as in HTTPS).
A VPN is certainly a possibility but before mentioning it I wanted to see if other alternatives existed.
If swarm, as I said swarm is just a potential, not an absolute need, I was expecting the worker to initiate the connections. I forgot to mention that the central workers are not blocked by NAT or firewall but are rather behind a normal public hosted cluster (with load balancer if needed, etc.); this means remote to central is not an issue. But would the manager(s) need to initiate a connection? ( I am new to docker). In any case, if swarm is not possible then something else will need to orchestrate the relationship/addressing of various service/maintain an inventory, etc. outside the docker domain.
So, I guess the solution is to have the remote containers to run standalone (or composed when multiple services are needed at one “node”, including, for example the VPN) and somehow come up with some registration procedure, which could be tied to the VPN setup and teared down (but not needed to be tied).