Autoscaling in docker swarm

HI Team,

We have implemented docker swarm in our production environment.
But now we want to know if auto scaling is possible in docker swarm.

If yes how , please assist.

For eg : Lets say on some day just like blackfriday we are getting hits which are more than usual day and a single server on which docker image of web server is running is unable to handle those many requests. So is there any way or any method provided by docker which can spin off one more web server image automatically on spare server.

Please assist.

Hi orj123
docker swarm services can be scaled with a command, but there are not an automatic way to do that.

Iā€™m using a combination of cadvisor and node-exporter containers running on all docker nodes, to exports the metrics to a prometheus instance. Also have a grafana portal connected to the prometheus to get nice graphs. Also grafana can send alerts using mail or telegram bot

With that you will have all containers (and nodes) metrics in prometheus, and you can easily poll prometheus with a simple curl (or whatever you want to use) and depending on the values launch a command to scale up or down. (We are using the vmware orchestrator for that)

It is a complex way but it is possible and works so well for us.

If you need more details please feel free to ask.

Regards

1 Like

thanks a lot for your suggestion , I will definitely work on your suggested solution. Thanks again

this was so so helpful for me! i send you a million thanks for this, youā€™ve made my life so much more easier!

Hello Eldeberde,

Can you provide detail information?

We have java microserservices in docker swarm and we want to scale up and scale down these service based on number of request.

Thanks

Hi.

Im using cadvisor to collect all containers metrics. deployed like a global service, so 1 replica in each host.

Prometheus is pooling this cadvisor service in each node and after that you can pool prometheus:
This is the very basic configuration lines to pool cadvisor from prometheus:

  • job_name: ā€˜cadvisorā€™
    dns_sd_configs:
    • names: [ā€˜tasks.cadvisorā€™]
      type: A
      port: 8080

Prometheus is also a service, and we are using the internal docker dns resolver to pool the service ā€œcadvisorā€ in the exposed port.

image

You can pool prometheus with a simple curl or whatever you want, this query is the consumed total cpu consumed by a service (all replicas of the service across the nodes in cluster).

sum(rate(container_cpu_user_seconds_total{container_label_com_docker_swarm_service_name=~ā€œSERVICE_NAMEā€,id=~"/docker/.*"} [1m])*100)

Or you can create your custom query if you need scale based on memory or whatever

After that with this cpu usage per service metric you can decide if you need scale the service or not.

If you need to scale, you can get the number of running replicas:

ā€œ# REPLICAS =$( docker service ps SERVICE_NAME | grep Running| wc -l)ā€

And launch a command:

"# docker service update --replicas $REPLICAS + 1 "

You can deploy Prometheus like a docker service also, and it shouldnā€™t take so much time writting a custom script which pool prometheus and the docker manager to get the data and took a decision.

Regards

2 Likes

Combining your ideas

  • cadvisor current container performance metrics
  • node-exporter current node performance metrics
  • prometheus historical data

In addition to using the ServiceUpdate endpoint which can be accessed via TLS socket (ideal) or /var/run/docker.sock (unsecure but simpler) I think one can create a service that checks prometheus for cadvisor/node-exporter data and based on some logic increase the number of replicas as needed.

Although this is already done and supported in k8s. It would be ā€œcoolā€ to have something like this in swarm. Iā€™m not really a big proponent of auto-scaling unless it is a matter of dynamically provisioning another AWS instance and let it register into the swarm because you pay for the CPU/Memory by time anyway according to a comment in Server Fault.

1 Like

Youā€™re right. Iā€™m also thinking in give a try to K8s.
But in the other hand Swarm has some thinks i like.

  • The integrated load balancer make simple integrate the new replicas in it. You donā€™t need a Load balancer in front of all your services.
  • This solution allow us to scale services based in other parameters, like a rabbitmq queue length.

For us autoscaling means much less resources dedicated to machines working only a few hours per week or even per month, so now can share this resources and be available all time.

Also you can deploy new docker nodes and add it to the swarm with one simple command, so this solution could also be valid for that.

I thought that too but then with container technology, not VM technology the resources are not ā€œreservedā€ unless you explicitly say so in the YML with the reserve keyword to reserve CPU and memory for the specific container. Otherwise it will only use what it is really needed. Though if you chose 1000 scale and each container takes 100MB minimum thatā€™s another story.

There is also this project:

Although I havenā€™t tried it yet.

1 Like

Some people like it more because it is more mature and flexible. I still prefer swarm if I have to make it sustainable when I leave projects when I am done.

Thatā€™s nice! But looks to expose a easy way to scale. Not the logic behind.

But I will try it.

Thanks

Also take a look at https://monitor.dockerflow.com/auto-scaling/
Iā€™m using this approach, with the one change of using a piece of go code similar to gianarb/orbiter to handle the actual docker scale cmd rather than Jenkins, which is what vfarcic is using.

1 Like

I agree with @eldeberde ,
In addition, itā€™s easy to use AWS Cloud Services (as Auto Scaling Group).