Docker Community Forums

Share and learn in the Docker community.

Mesh routing and service lifecycles

I am running docker stacks on a swarm cluster.

What I expect when I update service is the following sequence for each container needs updating.

  • Route new traffic to other functioning containers, no new traffic will be routed to the stopping container.
  • Waiting X seconds before sending signals to stop the container.
  • Start a new container with the updated images.
  • Wait for another Y seconds to make sure the new container is stable.
  • Start routing new traffic to the updated (new) container.

The documents I found do not quite clear about how mesh routing and service lifecycle interacts.
My guess is that Y goes to:

    monitor: Y

Is this correct?
Or the value there does not affect mesh routing in any way, only affecting the container lifecycle?
Also, I can’t seem to find where to specify X.
Could you tell me where?
Or if I am missing something, could you tell me how does mesh routing actually works in this situation? Or, where can I find the related info?

Install BookInfo in your cluster.

Download the latest Istio package for your operating system, which includes the configuration files for the BookInfo app.

curl -L | ISTIO_VERSION=1.7.0 sh -
Navigate to the Istio package directory.

cd istio-1.7.0
Label the default namespace for automatic sidecar injection.

kubectl label namespace default istio-injection=enabled
Deploy the BookInfo application, gateway, and destination rules.

kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
kubectl apply -f samples/bookinfo/networking/destination-rule-all.yaml

Please ignore the responses of Lewish95. Judged by its responses its an AI bot in early stages - 95% of its responses are unrelated or close enough to appear as related by completly miss the context.

Though, you did read the docker compose reference, didn’t you ?

I feel deploy.update_config.order: start-first is what you need. Though is not what you want. Your processing sequence is differnt. Instead It will first start the new container and unregister the old container with an overlap. What you want also requires that your image or swarm deployment declares a heath check that only returns success if the service actualy is responding (instead of a simple “the port can be reached” type of check). deploy.update_config.monitor is used to declare the delay used before the first positive healthcheck result is expexted on the new container, in order to start the action your defined in failure_action if it failed.

Thanks a lot @meyay .

The mechanism is different from my expectation, so I don’t find the solution from where I stand, at the time, easily.
In docker, there are multiple ways to do things; I was a bit confusing at first. (most tutorials don’t use docker stacks)

I did read the reference, and I tried many things until I found out about the health check thing. What can be improved IMO is to mention about health check in the routing mesh page.

I implemented the health check with netcat (nc -vz) and quite happy about the result for some time.
However, only half of my issues are solved.
The issue related to starting up seems to be solved but shutting down.

In some environments (like nodejs application), it is hard to handle the shutting down signal.

Specific to nodejs, they provide process.on to trap signals. But in practice, trapping the signal and wait for every pending request to finish is quite a challenge. The problem lies with the waiting step, not the signal intercepting.

From the docker perspective, I am totally agreed that this is an application’s fault, but it is a quirk I don’t know a way around (yet).

Does docker provide some mechanism to delay between accepting the last TCP request and sending SIGTERM. Or rather, how do you ensure every pending request is finished before the container stop?

You should make it a habit ^^ Back in the days while I was working with swarm stacks daylie, most times this was my entrypoint when I had to lookup details.

Not realy. Though, you can determine which signal to send to terminate the container and you can define the stop grace period, which defines the delay between the terminate signal and a SIGKILL if the container didn’t stop before the grace period passed.

Other than that: tracking active requests is not a container problem - it exists regardless your nodejs app is run in a container or native on the host.