I have created a 20 machine swarm on EC2. I am having a problem, in that all processes I run are on the same machine, regardless of the --hostname I specify. This seems broken in the first place, since the default strategy is ‘spread.’
The script I use to create the cluster is here. I am able to create the cluster, but it is not distributing work correctly.
#!/bin/bash
#
# Setup a docker swarm with 20 workers
#
# Spawn an instance to generate a Swarm token
docker-machine create \
--driver amazonec2 \
--amazonec2-instance-type m3.medium \
--amazonec2-subnet-id subnet-40502c36 \
--amazonec2-zone=c \
--amazonec2-vpc-id=vpc-66f0e002 \
'aws.swarm-token-machine'
# Setup environment to run a command on this node
eval "$(docker-machine env 'aws.swarm-token-machine')"
# Create a token for our swarm cluster and setup our enviroment
docker run swarm create # copy token to SWARM_CLUSTER_TOKEN
export SWARM_CLUSTER_TOKEN=''
# Create 20 swarm workers at once
for i in {1..$WORKER_COUNT}
do
docker-machine create \
--driver amazonec2 \
--amazonec2-instance-type m3.medium \
--amazonec2-subnet-id subnet-40502c36 \
--amazonec2-zone=c \
--amazonec2-vpc-id=vpc-66f0e002 \
--swarm \
--swarm-discovery token://$SWARM_CLUSTER_TOKEN \
aws.agent$i &
done
# Set our environment to this 20 machine swarm
eval "$(docker-machine env --swarm 'aws.swarm-master')"
#
# Selenium stuff
#
# Setup a selenium hub
docker run -d \
--name selenium-hub \
-p 4444:4444 \
--hostname aws.agent1 \
selenium/hub:2.53.0
# Setup 20 Chrome nodes linked to the hub
for i in {1..20}
do
docker run -d --name=chrome-node-$i --link selenium-hub:hub selenium/node-chrome:2.53.0
done
What is going on that all processes stick to one machine? How can I fix this? I have tried using --hostname in my docker run commands, but it has no effect.
This is the output of docker ps:
CONTAINER ID IMAGE ... STATUS PORTS NAMES
5a81c6a3e381 selenium/node-chrome:2.53.0 ... Up 5 seconds aws.agent13/chrome-node-2
74f46a0b5809 selenium/node-chrome:2.53.0 ... Up 2 minutes aws.agent13/chrome-node-1
244e4d3092cd selenium/hub:2.53.0 ... Up 4 minutes 54.86.124.43:4444->4444/tcp aws.agent13/selenium-hub,aws.agent13/chrome-node-1/hub
I will try your suggestion. You mean that linked containers have to run on the same agent? That is very unexpected. I will check out the docs you mention. I do not understand.
When you are using the default bridged network, --link will only work between containers on the same host connected to that same network.
The overlay network driver was specifically developed to achieve multi-host aware container networking. If you use an overlay network, you can --link between hosts. You could also skip using the --link feature, and use the new service discovery feature. Basically all containers connected to the same docker network style network will be able to connect to any other containers by referencing their --name.
Thanks! Ok, so if I create an overlay network and run the containers that are linked via the --net option, they will then be spread across the cluster?
Thanks, to use overlay networks I have to run a distributed store. Now I’m having trouble getting a swarm master to finish booting, when I use consul for its keystore. The keystore boots up ok, but the swarm master simply never finishes initializing.
Running pre-create checks...
Creating machine...
(aws.swarm-master) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded
Any idea what I’m doing wrong? I checked the security group, and ports 22 and 2376 are open.