I’d like to be able to use the “docker kill -s SIG” command to send signals to containers running on other nodes in a swarm, so, something like:
docker stack kill -s HUP myswarm
It seems like there’s currently no way to send a signal to remote nodes, except for the TERM sent during a stack rm?
Thanks,
Chris
Edit: after doing some more research, it looks like I can probably do a crazy hack that involves doing a deploy of a small alpine container on all the nodes, and that container can have the /var/run/docker.sock mapped in, and curl (sadly the nc in alpine doesn’t support unix sockets) and jq installed, and a script can use the REST API to find the container ID of the target by the com.docker.swarm.service.name and then send it a signal with the kill endpoint. This is obviously totally nutballs and heavyweight compared to just having the swarm be able to send signals like above, but I think it’ll work.
How would that work, exactly? exec can’t execute on remote containers as far as I know (and tested). In other words, if I can get on the remote machine to run exec, I can just run kill anyway.
I need a command that will automatically run kill on all the swarm nodes that have a given service running using a single command on a manager without requiring me to ssh into all the nodes. The only way I can figure out to do that with the current set of features is to make another service that can then launch and do the kill, and the only way for a service to kill another service that I’ve found is for it to talk to the API socket.
Hi, I mentioned stack/service rm in the OP…I don’t want to send a TERM (and then KILL), I want to be able to send arbitrary signals, like docker kill -s can do for containers on the same machine. Basically I want exactly what docker kill can do for local containers, just for services. Seems like a reasonable feature request, no?
This is a strange question. Are you saying they should remove docker kill? The ability to send arbitrary signals to applications is a pretty standard and useful unix thing, there’s a lot of infrastructure around it, and it’s supported for local containers, so it seems like a bit of an omission from the services API. This seems pretty obvious to me?
In my specific case, I want to send signals to change the state of the app, like if it’s going into a certain mode, like maintenance mode, or if it should send a message to logged in users about an impending restart, or whatever. Another example is sending SIGHUP to an app to get it to reread config files without restarting. There are a zillion uses for sending signals to applications besides just TERM and KILL and an entire set of functions and tools to do this for apps, so it is strange to me that it’s not totally obvious why I’d want to send a signal to apps on remote machines in a swarm? Maybe you’ve never written a stateful app or something that doesn’t just handle a web request or whatever and exit immediately?
No idea what you’re talking about with the API thing…are you saying there’s an API entrypoint that will send signals to all the containers in a swarm service? If so, I missed it, and I’d love a pointer because I’d totally use that to solve my problem. If not, you’ll have to explain what you’re talking about and how that actually solves the problem I’m pointing out and requesting the feature for.
just because signals worked for local applications on unix, does NOT make that the right way to do it in a distributed swarm environment.
i suggested that YOU could add some code to your application, creating an API that YOU could call to provide the information needed. vs Docker trying to implement this older mechanism.
"Maybe you’ve never written a stateful app or something that doesn’t just handle a web request or whatever and exit immediately?"
Of course I could modify my app. I could also rewrite all of docker, or go become a farmer, or run for president, but this is a features request forum for docker, so it seems pretty reasonable to request features here, especially ones that already exist in the project (docker kill -s) and would just be relatively small extensions to the swarm API, and are obviously useful and time-proven ways of doing simple interprocess communications. On edit: and, in fact, docker uses signals on the swarm already with rm and so your point about them not scaling seems suspect.
In this case, I do control my app, but that is not always true; I’m not sure what you’d tell somebody who didn’t have the option or ability to modify an app running in a container. This is why platforms add features, so every app doesn’t have to be modified to implement them. This is presumably why they have a features request forum…
I’m not sure why you’re so determined to try to talk me out of making this totally reasonable feature request; it feels like the “you don’t need that” mentality that is so common (and poisonous) in OSS communities. Instead of saying, “hmm, let me try to understand this person’s problem, it might be something I haven’t thought of, and it’s a big world out there”, there’s this immediate, “you don’t need that, the current thing is fine, you could just do this other thing”…it was clear from your very first post that you didn’t bother to even understand what I was asking since you suggested something I’d already mentioned in the OP as not what I need, and now you’re clearly psychologically dug in on defense and so it feels like a waste of time to continue.
Anyway, I’m done discussing it with you, but feel free to have the last word if you’d like.
Hopefully somebody who works on docker will stop by and check out the request. If anybody else has questions or comments let me know. If I find some time after this deadline I’ll look into adding the feature myself and do a PR if it’s something they’d be interested in integrating. My first experience with containers has been pretty positive and it’s a cool project and I’d be happy to contribute back.
@chrishecker
Have you managed to find a solution to this problem? I have the very same need and I am sure there are others like us. I have a legacy java app that is now dockerized which does some cleanup to leave everything in a consistent state before shutting down on SIGTERM or SIGINT. Unfortunately, "docker stack rm " kills it abruptly leaving it with inconsistent state. Appreciate any help.
Thanks
No, I haven’t, I figured I’d just do something like use ansible, which is kind of lame but I looked at the code to make the change and it’s an ocean of RPC calls so it’d be a pretty huge diff, so I think somebody from the core team would have to make it. It’s unfortunate, since they’ve got the non-swarm version of the command right there, it just needs to be extended to the swarm.
Chris
PS. There are a few other puzzling decisions like not being able to have the worker token set manually anymore (there are a couple of threads about this here) that basically require you to use another tool like ansible in addition to docker it seems…I haven’t worked through all the issues yet with running my app in a swarm but it seems like there is a list of them. Once you’re forced to use a config tool you might as well have it send the signals too I guess. Sigh, more packages.
@chrishecker
Thanks for the quick response. After experimenting a little bit with “docker stack rm” I did find something very interesting. Actually, docker does indeed send SIGTERM but unfortunately, does not give much time before sending SIGKILL which follows almost immediately. See below.
I deployed a stack with a simple image that runs the following script on startup in a swarm …
#!/bin/sh
term_signal_handler() {
echo "############ Caught SIGTERM #############"
}
int_signal_handler() {
echo "############ Caught SIGINT #############"
}
trap 'term_signal_handler' SIGTERM
trap 'int_signal_handler' SIGINT
while true ; do
echo "Waiting for signal ...."
sleep 10
done
When I do “docker stack rm…” I do see SIGTERM getting caught by the script as shown below
[root@docker]# docker logs -f 9dc75fa80f21
Waiting for signal ....
Waiting for signal ....
Waiting for signal ....
Waiting for signal ....
Waiting for signal ....
############ Caught SIGTERM #############
Waiting for signal ....
At this point above, the script is killed.
Update:
Finally, found a way to make it work for me. I just need ~30 secs or so to do cleanup after the receipt of SIGTERM. I found an optionstop_grace_period that works perfectly for me. I was able to do cleanup before getting killed.
I was trying to configure logrotate for Traefik api gateway running on docker swarm, I needed to send signal “USR1” to container for postrotate, if this feature was available I could do it very clean.
I would also like to add my +1 to this feature request. My use case is the exact same as described by rooholam: We’re running Traefik in Docker Swarm, and the ability to easily send a USR1 to the container(s) in a swarm would simplify the use of logrotate.