@rusher81572 Sorry to hear about your issues. You seem to be running a Rpi cluster. Is that correct? Not sure if you are hitting any issue specific to being in Raspberry pi cluster. Would you mind opening an issue with detailed information about your setup and a sequence of steps to reproduce the problem?
Some of identified that the Raspbian Kernel was missing the vxlan
module. Running rpi-update
adds the module (plus a reboot).
I put these test scenarios together to try and document what was going wrong:
After the update I was able to get through them all. Iāve now got an 8-node cluster which can run my redis hit-counter in Swarmmode.
LMK if this helps @rusher81572 @mrjana
@mrjana @alexellis2 Thanks for the suggestions. Please note that my basic Swarm functionality testing done per the creation of this forum topic was with docker-machine on x86 hardware with VirtualBox using 1.12.0.
I ran rpi-update and updated the kernel. However, I did not see a vxlan module with lsmod so I used modprobe and now it is visible.
Testing Procedure:
Used three Piās with Docker 1.12.1-rc2, build 236317f, experimental, vxlan module loaded. Kernel 4.4.19-v7+.
docker swarm init
docker swarm join.....(for each Pi as needed)
1mmb8fgd9s8m9peodyr3x6qoc * rpi-2 Ready Active Leader
c26mil621fs4f0iwa1e8884ar rpi-3 Ready Active
em26jwtm7u3f800pkrl4fw2py rpi-4 Ready Active
docker network create chat -d overlay
docker service create --name mysql --network chat registry:5000/mysql
docker service create --name test --network chat -p 444:444 registry:5000/test
docker service ls
ID NAME REPLICAS IMAGE COMMAND
4n6r4fbiu0cl mysql 1/1 registry:5000/mysql
8h36vyo2luyy test 1/1 registry:5000/test
We can see here that the database and Node.js application is running on separate machines.
# docker node ps rpi-3
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
1zfxjmkbqgz7c0q9ji26h7qkg mysql.1 registry:5000/mysql rpi-3 Running Running about a minute ago
# docker node ps rpi-2
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
5dcwkxxdy4aunbvg6sk9q3wtw test.1 registry:5000/test rpi-2 Running Running about a minute ago
The service named test is running my Node.js app fine with database connectivity and is accessible from each host. More testing is needed with other apps but is looking good so far.
My only problem that remains now is with labels and constraints. That does not seem to be working right.
This command works and runs MySQL on rpi-3 as expected:
docker service create --name sql2 -l "node=rpi-3" --network chat registry:5000/mysql
Then I tried to run MySQL on a node and label that does not exist and it just went to a random node:
docker service create --name sql2 -l "node=rpi-1" --network chat registry:5000/mysql
For my final test, I created three MySQL services using the same label and they all went to random nodes which is not behavior that I would expect.
docker service create --name sql3 -l "node=rpi-4" --network chat registry:5000/mysql
docker service create --name sql2 -l "node=rpi-4" --network chat registry:5000/mysql
docker service create --name sql1 -l "node=rpi-4" --network chat registry:5000/mysql
@rusher81572 thanks for confirming the network connectivity behaviour with the rpi-update and 1.12.1-rc (though I would recommend using 1.12.1 released version).
Regarding the scheduling constraints, I donāt think you are using the correct options in docker service create
command. ā-lā just adds a label to any object. if you are looking to constraint the scheduling, you should be using the --constraint
option with appropriate supported constraints as mentioned here : https://docs.docker.com/engine/reference/commandline/service_create/#/specify-service-constraints .
I used constraints in classical Swarm (where you need an external kv store) - as I remember it involved editing the daemon script to add a label. I havenāt tried scheduling by hostname, but this looked relevant: https://github.com/docker/docker/pull/24397#issuecomment-231227571
Btw. Did the RC come via experimental.docker.com
? If you are finding that re-running the get.docker.com
command is not refreshing the Docker version to the general release, then you might want to run apt-get remove docker-engine
prior to the script.
Let us know how that goes - I will also try scheduling via label on my swarm and report back.
Iāve had a quick go on my 7 node cluster.
Set up via node hostname
$ docker service create --constraint node.hostname==pi2swarm7 --name hello1 --publish 3000:3000 --replicas=1 alexellis2/arm-alpinehello
6u0lvc8cm1d8unfasek2n6x2r
$ docker service ps hello1
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
97p4vkcgs42309srjvn6gb5d5 hello1.1 alexellis2/arm-alpinehello pi2swarm7 Running Running 3 minutes ago
This is an example with a custom node label:
$ docker node update pi2swarm2 --label-add db=1
$ docker service create --constraint 'node.labels.db == 1' --name hello2 --publish 3001:3000 --replicas=1 alexellis2/arm-alpinehello
$ docker service ps hello2
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
e5zb4g4myzsf35ipydt21fsvi hello2.1 alexellis2/arm-alpinehello pi2swarm2 Running Running 36 seconds ago
Hope these examples help with your set-up.
Constraints are working, thanks @alexellis2. I moved Swarm 1.12.1 into production on my Piās. I did notice a few issues.
First, I am running phpfpm in a container and tinyrss will constantly reload over and over again after clicking on āAll Articlesā. I scaled phpfpm to 1 instance and that looked like it resolved the problem. I would like to scale it out like I did before using nginx. Then started getting 504 Gateway Time-outs when accessing TinyRSS. The Wordpress blog and TinyRSS both connect to the same phpfpm container and wordpress works fine. There must be something wrong with the internal Swarm plumbing. I removed the service and created it again and it seems to work now. The only problem is that 2/3 of the phpfpm instances are now running on the same host.
There is also performance issues. I connect to the first Pi and that should load balance all my requests to other Piās. This was just like my setup before. But with Swarm taking over the load balancing with the mesh networking, it is very slow. This behavior is breaking all of my Node.js apps now.
Like previous docker versions, It is annoying that containers on a virtual network can not resolve each other if one of them is restarted.
Lastly, I noticed that scaling containers will sometimes schedule multiple instances on the same Pi. Is this expected behavior? I rather have 1 instance on each Pi.
@rusher81572 Can you please confirm if the issues that you raised earlier in this thread are addressed.
- Using 1.12.1
- multi-host networking issue that you raised in r-pi is resolved by rpi-update (and modprobe ?)
- Constraints issue that you brought up is a human error (invalid flag usage).
Regarding the other issues that you have raised in the recent comments, I think we should take this to docker/docker issue tracker so that we can gather more information (and correct any human errors) and also better visibility / support from maintainers.
Hello @mavenugo
Yes, the earlier issues in this thread are addressed which are:
Using 1.12.1
multi-host networking issue that you raised in r-pi is resolved by rpi-update (and modprobe ?)
Constraints issue that you brought up is a human error (invalid flag usage).
Thanks for the help.
@rusher81572 Can you please characterize what you mean by āload balancing with mesh networking is very slowā? Slow as in it is taking a long time or are you seeing throughput issues? Also is this when you expose a port and access it from outside the cluster or are you doing intra cluster?
I meet the same problem today with the newest docker engine version (CE)ć Are the swarm mode routing mesh and internal routing Production Ready?