Docker swarm periodically restarts all services

theriddler82 · February 23, 2019, 4:22pm

Hi all,

I am trying to get some experience with Docker and Docker Swarm on a single machine.

So far experimenting works well but I am experiencing a strange behaviour in swarm mode. Periodically, but with no fixed regularity, all services restart for no reason. For example, the Gitea container originally created at 02:09, stopped, restarted at 22:06, stopped again (just after 2 minutes) and runs since 22:08.

Every single docker-compose includes the following code snippet:

version: "3"
    services:
    main:
image: gitea/gitea:latest
deploy:
  replicas: 1
  restart_policy:
    condition: on-failure

So, I first thought that the program crashed insight the container (just ignoring that all programs crashed at the same time :-)) and forced a restart to above policy.

10.255.0.2 - - [22/Feb/2019:09:48:54 +0000] “GET / HTTP/1.1” 500 1672

10.255.0.2 - - [22/Feb/2019:10:14:13 +0000] “GET / HTTP/1.0” 500 1655

10.255.0.2 - - [22/Feb/2019:15:38:35 +0000] “\x03” 400 226

10.255.0.2 - - [22/Feb/2019:15:38:35 +0000] “\x03” 400 226

10.255.0.2 - - [22/Feb/2019:17:03:31 +0000] “GET / HTTP/1.1” 200 5099

10.255.0.2 - - [22/Feb/2019:17:11:38 +0000] “GET / HTTP/1.1” 500 1666

Caugth signal SIGTERM, passing it to child processes…

Caugth signal SIGTERM, passing it to child processes…

[Fri Feb 22 21:07:04.893407 2019] [mpm_prefork:notice] [pid 122] AH00169: caught SIGTERM, shutting down

So, it looks like that the docker deamon send the SIGTERM.

The question is: why?

The system details are:

Containers: 21
Running: 7
Paused: 0
Stopped: 14
Images: 7
Server Version: 18.09.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: fb0bk1eyp70chxwtnmbczdzew
Is Manager: true
ClusterID: yvke64ipr0le72r6lfja9m7lo
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 178.33.26.29
Manager Addresses:
178.33.26.29:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.0-8-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.8GiB
Name: menkisyscloudsrv29
ID: KKDM:55YP:BSAM:FNOZ:AA55:O2IJ:LTIK:MCEI:X47U:DGJ3:AK5U:EZT3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Do you have any idea what’s happening? Where can I get more log details what dockerd is doing?

Thanks!

meyay · February 23, 2019, 6:35pm

Did you check the service logs and your operating sytems syslog for hints?
Also a docker service ps {servicename} might provide some insights about the error reason.

If your containers consume too much ram, the operating system will start kiling random processes to free up ram. Also docker doesn’t like it if the storage has no space left.

If you run Java containers, you might want to take a closer look at their resource usage.
Java 10 is the first version beeing fully aware of beein run in a container. Java 9 has options to at least support cgroup limits. In Java 8 update 131 the Java 9 options got backported. Everything before is not remotely aware of cgroups or beeing run in a container at all.

theriddler82 · February 23, 2019, 7:28pm

Hi Metin,

thanks for your quick reply. Staying at my example Gitea I run your command with the following output:

ID                  NAME                IMAGE                NODE                 DESIRED STATE       CURRENT STATE           ERROR               PORTS
pdeipoqd6bvs        gitea_main.1        gitea/gitea:latest   menkisyscloudsrv29   Running             Running 22 hours ago
wzlciwqaqiqo         \_ gitea_main.1    gitea/gitea:latest   menkisyscloudsrv29   Shutdown            Shutdown 22 hours ago
tcdnlv21mb22         \_ gitea_main.1    gitea/gitea:latest   menkisyscloudsrv29   Shutdown            Shutdown 22 hours ago
pieodcryhv3c         \_ gitea_main.1    gitea/gitea:latest   menkisyscloudsrv29   Shutdown            Shutdown 42 hours ago
wle26krbmw49         \_ gitea_main.1    gitea/gitea:latest   menkisyscloudsrv29   Shutdown            Shutdown 42 hours ago

This is true for all services. The show the same behavior.

I checked my virtual server about RAM and disk shortages. Everything looks fine:

Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           799M   80M  720M  10% /run
/dev/sda1        49G   26G   21G  56% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
overlay          49G   26G   21G  56% /var/lib/docker/overlay2/c2c6032c02932a1df93920d0fe1aef4c86215e7b87333bec528ab726c996b2f0/merged
overlay          49G   26G   21G  56% /var/lib/docker/overlay2/c21c4312d93de8aeb5553e8f5740da805f8ecf45a8ed7a9f78773a0cc354c3d5/merged
overlay          49G   26G   21G  56% /var/lib/docker/overlay2/77e86f6dffe1f3c5907901ed164c4acaab06f546b760252b8a030e2df7c17015/merged
overlay          49G   26G   21G  56% /var/lib/docker/overlay2/db686f28b6d0e43ae6f92372864ed7d5bf4df3eac97c3136c950ca47b24e4767/merged
overlay          49G   26G   21G  56% /var/lib/docker/overlay2/43d9ca082d15b9e7c4dd652fe4a1b43b077e51088868b82c2684fe1805a81659/merged
overlay          49G   26G   21G  56% /var/lib/docker/overlay2/1b9d85053b4fe71cdaa7dd98a7ed232a8c597227bc007403abc19460f0bbf4d6/merged
shm              64M     0   64M   0% /var/lib/docker/containers/ee19075f23661f1b19bd1dc077df78a2eca6136c3f6b25576ed47d93a0964af9/mounts/shm
overlay          49G   26G   21G  56% /var/lib/docker/overlay2/62e5b358a9ea8a21cc3f7a4e7d234120b8f485ca4c6d15ae0608baac85680d14/merged
shm              64M     0   64M   0% /var/lib/docker/containers/4828835509abdfcc0559588df0fdbc697a237ba63865aa4695b2861a4ed7a914/mounts/shm
shm              64M     0   64M   0% /var/lib/docker/containers/d1868570bb5952a9d9e9bcaf0ea33658ab9f631f57b48961c4f97e4cdca8516b/mounts/shm
shm              64M     0   64M   0% /var/lib/docker/containers/a924bfefba1f71aa1c89bc927a77967796aea546d179eead81afd920ebde159d/mounts/shm
shm              64M     0   64M   0% /var/lib/docker/containers/cd11b392f7f1ed6cdc24ab2a3bfc9bd552762226c2f5624716d46f591ec2edbe/mounts/shm
shm              64M     0   64M   0% /var/lib/docker/containers/a3dc6f0484fa07854bd321b7caa020329bb12deecb87fe84666a55fc99790843/mounts/shm
shm              64M     0   64M   0% /var/lib/docker/containers/1cfcce261b479f67d26d39a31214f77a838d80d600d18154d4e8e93bcf595669/mounts/shm
tmpfs           799M     0  799M   0% /run/user/0

It’s strange that for get free RAM all docker processes at the same time would get killed and no other non-Docker services.

All of my Docker images do not use JAVA. Further, I use official vendor Docker builds from Docker.

Where do I find the docker deamon logs?

journalctl -u docker

and

tail -f /var/log/daemon.log

do not show anything related to Docker (I can post the content if you like).

Thanks again.

meyay · February 23, 2019, 8:30pm

The blockdevice has sufficient space.

In htop the green bar for memory is the accumulated memory usage of all processes. Somtimes there are also blue and yellow bars which indicate buffers and caches. Your ram is closed to be maxed out.

Well, on RHEL/CentOS the details are present in dmesg.
Though, I have no idea if this works for Ubuntu as well.

jckilmer · April 3, 2019, 11:53pm

@theriddler82 Did you ever discover a root cause? I’m having a very similar problem and haven’t been able to figure out the reason yet.

theriddler82 · May 24, 2019, 11:56am

Sorry for the late response, I just got the info that somebody answered.

The reason was indeed the RAM. I have migrated to a new VPS with a really good Memory management and the issue is gone. Since migration (2 weeks ago), I had no issues at all. Before it was an ESX, now it’s Hyper-V with dynamic memory allocation which did the trick.

frugan · August 13, 2023, 11:04am

I had the same problem, but my used RAM was not even 1/3 of the total.
I fixed it by changing the MACAddressPolicy option, as given here:

github.com/systemd/systemd

udevd: Could not generate persistent MAC address for $name: No such file or directory

opened 07:45AM - 28 May 16 UTC

closed 10:31AM - 15 Feb 19 UTC

tomty89

bug 🐛 udev network

### Submission type - [X] Bug report - [ ] Request for enhancement (RFE) ### sys…temd version the issue has been seen with 230 ### Used distribution Arch Linux ### In case of bug report: Expected behaviour you didn't see no message is logged when I add an interface with `ip link` ### In case of bug report: Unexpected behaviour you saw `Could not generate persistent MAC address for $name: No such file or directory` logged when I add an interface (e.g. bridge, bond) with `ip link`. If I create them with networkd/.netdev, no such message is logged. ### In case of bug report: Steps to reproduce the problem [tom@localhost ~]$ sudo ip l add name br0 type bridge [tom@localhost ~]$ sudo ip l add name bond1 type bond [tom@localhost ~]$ journalctl -b -u systemd-udevd -- Logs begin at Wed 2016-05-04 14:49:19 HKT, end at Sat 2016-05-28 15:39:18 HKT. -- May 28 15:39:13 localhost systemd-udevd[351]: Could not generate persistent MAC address for br0: No such file or directory May 28 15:39:18 localhost systemd-udevd[358]: Could not generate persistent MAC address for bond0: No such file or directory May 28 15:39:18 localhost systemd-udevd[357]: Could not generate persistent MAC address for bond1: No such file or directory

github.com/moby/moby

Docker swarm is restarting containers very often

opened 10:19AM - 23 Feb 17 UTC

alfonsovng

area/swarm

--------------------------------------------------- BUG REPORT INFORMATION --------------------------------------------------- **Description** I've created a swarm with 6 nodes, one leader and 5 slaves. I've also created a overlay secure network. I'm running a replicated service on the five slaves and the containers are restarted very often without any apparently reason. The max up time is 30 minutes and this is not happening when I run the same system without swam and without the overlay network. **Steps to reproduce the issue:** 1. Create a swarm and a overlay secure network with 2 or more nodes 2. Deploy a replicated service using the secure network **Describe the results you received:** Containers are restarted very often **Describe the results you expected:** More stable containers **Additional information you deem important (e.g. issue happens only occasionally):** This issue happens always in my system. Ive tried with a not secure overlay network and seems that containers are more stable, but it still happens. **Output of `docker version`:** ``` Client: Version: 17.03.0-ce-rc1 API version: 1.26 Go version: go1.7.5 Git commit: ce07fb6b0 Built: Mon Feb 20 05:37:09 2017 OS/Arch: linux/amd64 Server: Version: 17.03.0-ce-rc1 API version: 1.26 (minimum version 1.12) Go version: go1.7.5 Git commit: ce07fb6b0 Built: Mon Feb 20 05:37:09 2017 OS/Arch: linux/amd64 Experimental: false ``` **Output of `docker info`:** ``` Containers: 95 Running: 8 Paused: 0 Stopped: 87 Images: 9 Server Version: 17.03.0-ce-rc1 Storage Driver: overlay Backing Filesystem: extfs Supports d_type: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Swarm: active NodeID: xyeiqki98ckan03uiwlmbyevn Is Manager: false Node Address: XX.XX.XX.XX Manager Addresses: XX.XX.XX.XX:2377 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1 runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 4.9.0-1-amd64 Operating System: Debian GNU/Linux 9 (stretch) OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.59 GiB Name: cafard3 ID: E5IN:NYI4:M27N:T3VP:PJ7M:EKGK:MB3W:JN3A:KRBI:ZTFO:WQ6D:Z7BW Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false ``` **Additional environment details (AWS, VirtualBox, physical, etc.):** The daemon.log shows this message when a container is restarted: Feb 23 11:09:43 cafard3 dockerd[895]: time="2017-02-23T11:09:43+01:00" level=info msg="Firewalld running: false" Feb 23 11:11:16 cafard3 dockerd[895]: time="2017-02-23T11:11:16.793410350+01:00" level=info msg="Container b9fcc436f6a7077a3dd92dc489515c21b8182588da4d23151df323d3a208a814 failed to exit within 10 seconds of signal 15 - using the force" Feb 23 11:11:21 cafard3 systemd-udevd[13392]: Could not generate persistent MAC address for veth0946500: No such file or directory Feb 23 11:11:21 cafard3 systemd-udevd[13393]: Could not generate persistent MAC address for veth7ee776f: No such file or directory Feb 23 11:11:21 cafard3 systemd-udevd[13407]: Could not generate persistent MAC address for vethfb9cf9d: No such file or directory Feb 23 11:11:21 cafard3 systemd-udevd[13408]: Could not generate persistent MAC address for vethf556ff0: No such file or directory Feb 23 11:11:23 cafard3 dockerd[895]: time="2017-02-23T11:11:23+01:00" level=info msg="Firewalld running: false" Feb 23 11:11:23 cafard3 dockerd[895]: time="2017-02-23T11:11:23+01:00" level=info msg="Firewalld running: false" Feb 23 11:11:23 cafard3 dockerd[895]: time="2017-02-23T11:11:23+01:00" level=info msg="Firewalld running: false" Feb 23 11:11:40 cafard3 dockerd[895]: time="2017-02-23T11:11:40.048968672+01:00" level=error msg="failed to deactivate service binding for container ww_client.13.ry8h3inla3snzs1h5k8ltpu96" error="network sandbox does not exist for container ww_client.13.ry8h3inla3snzs1h5k8ltpu96" module="node/agent" The systemd message seems to indicate that the issue [systemd#3374](https://github.com/systemd/systemd/issues/3374) maybe it's responsible of this problem, but maybe I'm doing something wrong or there is config parameter that I can change to fix it. Thanks.

I hope it can be useful to someone else…

Topic		Replies	Views
Docker kills all processes after 5 min and then restarts again automatically General docker , swarm	2	746	July 30, 2024
Docker swarm rebuilds all containers at different times Swarm swarm	16	3630	May 8, 2025
All docker service in docker swarm suddenly restarted General docker , swarm	1	3417	September 26, 2022
Swarm periodically restart all containers. Swarm swarm	0	3022	January 16, 2018
Docker swarm cluster with more than 60+ nodes automatically restart all containers Swarm	2	2024	December 20, 2019

Docker swarm periodically restarts all services

Related topics