Containers start running on SSH

vmeijer · February 24, 2017, 9:04pm

I came across this quite a few times now:
I discover my Docker containers are not running, I SSH into the server and do docker ps and immediately the containers start running.

This happened both with my Jenkins container and my Prometheus/Grafana stack (all on one server).

My docker-compose settings for Jenkins has restart: always, and the Prometheus stack has restart: unless-stopped

Is this a bug or am I missing something? Thank you!

Docker version
core@ip- $ docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Thu Feb 23 02:17:18 2017
OS/Arch: linux/amd64

Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Thu Feb 23 02:17:18 2017
OS/Arch: linux/amd64

docker-compose version
core@ip- $ docker-compose version
docker-compose version 1.9.0, build 2585387
docker-py version: 1.10.6
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1t 3 May 2016

OS version
core@ip- $ cat /etc/*-release
DISTRIB_ID=“Container Linux by CoreOS”
DISTRIB_RELEASE=1235.12.0
DISTRIB_CODENAME=“Ladybug”
DISTRIB_DESCRIPTION=“Container Linux by CoreOS 1235.12.0 (Ladybug)”
ID=ami
VERSION_ID=0.0.7
NAME=“Amazon EC2”
HOME_URL=“Amazon EC2 - Cloud Compute Capacity - AWS”
BUG_REPORT_URL=“Issues · coreos/bugs · GitHub”
NAME=“Container Linux by CoreOS”
ID=coreos
VERSION=1235.12.0
VERSION_ID=1235.12.0
BUILD_ID=2017-02-23-0222
PRETTY_NAME=“Container Linux by CoreOS 1235.12.0 (Ladybug)”
ANSI_COLOR=“38;5;75”
HOME_URL=“https://coreos.com/”
BUG_REPORT_URL=“Issues · coreos/bugs · GitHub”

nathanleclaire · February 28, 2017, 7:44pm

My docker-compose settings for Jenkins has restart: always

That’s why. Your containers are thrashing for some reason. Look into why by reading the docker logs for them. I’m a bit suspicious of memory usage just at a quick glance. You might need a server with more RAM to run all of that.

vmeijer · February 28, 2017, 7:48pm

But why would it start up without a problem after I SSHed into the machine?
And only at that specific moment?

I will keep an eye on the logs as soon as this happens again, thanks!

nathanleclaire · February 28, 2017, 10:00pm

Odds are it’s not just after you SSH into the machine. It’s constantly.

vmeijer · March 1, 2017, 2:47am

Yet, this is not the case
The container keeps running without any problem after that startup right after I SSHed into the host machine.

nathanleclaire · March 1, 2017, 5:41pm

How do you know that?

vmeijer · March 1, 2017, 7:43pm

It’s actually happening at the moment, so I’ll describe what’s happening:

I can’t reach Jenkins through my browser, Chrome is telling me:
This site can’t be reached
my.domain.org refused to connect.
I wait 30 minutes to prove that it is not restarting by itself.
I SSH into the host machine and alas:

Last login: Fri Feb 24 20:33:15 UTC 2017 from 131.247.212.26 on pts/0
Container Linux by CoreOS stable (1298.5.0)
core@ip- ~ $ docker ps
CONTAINER ID        IMAGE                     COMMAND                  CREATED             STATUS                  PORTS                                              NAMES
183cd3b93446        cvast/cvast-jenkins:1.0   "/bin/sh -c ${INSTALL"   2 weeks ago         Up Less than a second   0.0.0.0:8080->8080/tcp, 0.0.0.0:50000->50000/tcp   cvastbuild_cvast-jenkins_1

I go back to my browser and voila, Jenkins opens up perfectly.

Some interesting facts:
Nothing interesting shows up in the logs (docker logs <container id>.
The last log before Jenkins went down was this (the first two lines it spits out every hour, only showing that for the time stamp):

[etc... etc...]
Mar 01, 2017 5:04:08 AM hudson.diagnosis.HudsonHomeDiskUsageChecker doRun
INFO: JENKINS_HOME disk usage information isn't available. aborting to monitor
[WARN  tini (5)] Tini is not running as PID 1 and isn't registered as a child subreaper.
        Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
        To fix the problem, use -s or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.

And here is right after I SSHed into the host machine, Jenkins is starting up:

Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Mar 01, 2017 2:33:10 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Mar 01, 2017 2:33:10 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Logging initialized @1441ms
Mar 01, 2017 2:33:10 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
[etc... etc...]

Here are the logs from last Jenkins builds a week ago to now:
docker logs jenkins.txt (26.6 KB)

And now it’s running fine again. Crazy?

nathanleclaire · March 1, 2017, 9:21pm

I’m still not convinced your container isn’t just thrashing or the server is running out of memory. Jenkins is Java so eats up a lot of RAM. What does docker inspect -f '{{.RestartCount}}' <container> say? How much memory does the server have? And what does free -m say while Jenkins is running?

vmeijer · March 1, 2017, 9:24pm

Seems all fine to me:

core@ip- ~ $ docker inspect -f '{{.RestartCount}}' 183cd3b93446
0

core@ip- ~ $ top
top - 21:23:57 up 10:53,  1 user,  load average: 0.00, 0.01, 0.00
Tasks:  86 total,   1 running,  85 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   1018904 total,   743340 used,   275564 free,    32164 buffers
KiB Swap:        0 total,        0 used,        0 free.   347928 cached Mem

core@ip-~ $ free -m
             total       used       free     shared    buffers     cached
Mem:           995        725        269          0         31        339
-/+ buffers/cache:        354        641
Swap:            0          0          0

nathanleclaire · March 1, 2017, 9:29pm

You don’t have nearly enough memory available and it seems you don’t have swap configured either. This is almost assuredly your issue. 269/995MB is not much headroom at all. Once the memory usage for a given process starts to balloon, the kernel will kill processes (Jenkins here) to free some up. Then Docker restarts it because “Oops, the process was killed somehow”.

Upgrade your server to one with 2-4x as much RAM (at least - Jenkins and some of the other things you seem to be running are memory-hungry beasts) and everything should work smoothly. You could also configure swap, that’s likely to make your programs slower though.

vmeijer · March 1, 2017, 9:34pm

Alright, thank you for bearing with me. I appreciate it.
For budgetary reasons I’m sticking to this server, but I’ll keep upgrading in mind.

Still not sure why the docker restart would happen exactly when I SSH in, though…

Anyway, thanks again.

nathanleclaire · March 1, 2017, 10:27pm

You could try swap as a band-aid fix

It seems likely to me that you just happen to see it then, and it might be happening continuously.

vmeijer · March 2, 2017, 7:14pm

I’ll try it, thanks!

dsperling · March 3, 2017, 2:59pm

@vmeijer I am seeing the exact same problem running the identical version of CoreOS. This is a system I use as only as an internal Docker registry. My assumption is that when no one accesses for several days, some part of the system goes to sleep. It does not make any sense, but doing a docker ps restores the Docker Registry just as you describe above.

System details:
Last login: Fri Feb 24 18:27:09 UTC 2017 from 10.100.xx.xx on pts/0
Container Linux by CoreOS stable (1298.5.0)
core@localhost ~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5779010183e2 hyper/docker-registry-web:latest “start.sh” 3 months ago Up Less than a second 0.0.0.0:80->8080/tcp web
8bfa9bc37d07 registry:2 “/entrypoint.sh /etc/” 3 months ago Up Less than a second 0.0.0.0:5000->5000/tcp docker.smithmicro.net
e5ee71a3029f registry:2 “/entrypoint.sh /etc/” 3 months ago Up 1 seconds 0.0.0.0:443->5000/tcp registry
core@localhost ~ $ docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Tue Feb 28 00:07:14 2017
OS/Arch: linux/amd64

Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Tue Feb 28 00:07:14 2017
OS/Arch: linux/amd64

vmeijer · March 3, 2017, 3:01pm

See @nathanleclaire, I’m not crazy

nathanleclaire · March 3, 2017, 7:19pm

Mm, I’m still not convinced

That definitely is very peculiar behavior. Since it seems specific to CoreOS, you might want to run it by the CoreOS folks and see what they think. I’d be curious to see recent logs from the Docker daemon after this behavior occurs.

jokeyrhyme · March 3, 2017, 9:07pm

Think it’s worth defining resource limits to at least stop the containers from stealing resources from each other?

reederz · August 22, 2017, 9:54am

I’ve come across this issue as well. Executing any docker command E.g. docker ps seems to wake up the docker engine.

To work around the issue, you can create a systemd timer to execute docker ps every minute.

Add this to /etc/systemd/system/docker-heartbeat.service

[Unit]
Description=Keeps docker daemon alive

[Service]
Type=oneshot
ExecStart=/usr/bin/sh -c '/usr/bin/docker ps >> /tmp/docker-heartbeat'

And this to /etc/systemd/system/docker-heartbeat.timer:

[Unit]
Description=Run docker-heartbeat.service every minute

[Timer]
OnCalendar=*:0/1

[Install]
WantedBy=timers.target

And finally start and enable the heartbeat timer:

systemctl start docker-heartbeat.timer
systemctl enable docker-heartbeat.timer

vmeijer · August 23, 2017, 2:16pm

Great, detailed workaround, thank you!

For those using cloud config, here is the correct syntax for that:

#cloud-config

coreos:
    units:
    - 
        name: docker-heartbeat.service
        content: |
            [Unit]
            Description=Keeps docker daemon alive

            [Service]
            Type=oneshot
            ExecStart=/usr/bin/sh -c '/usr/bin/docker ps >> /tmp/docker-heartbeat'
    - 
        name: docker-heartbeat.timer
        command: start
        content: |
            [Unit]
            Description=Run docker-heartbeat.service every minute

            [Timer]
            OnCalendar=*:0/1

            [Install]
            WantedBy=timers.target

carnifexing · November 26, 2019, 10:00am

I’m aware this issue is from a long time ago… but since I had the same issue and came across this topic and not many(read: none) others, I thought posting the actual solution might be still helpful to others as well.

Hints for me were that this issue so far only occurs on CoreOS(same as me) and the remark of @reederz: “to wake up the docker engine”

I did some further googling and digging, and it turned out that the docker daemon is not started by default on CoreOS. That is the actual issue. After a reboot of CoreOS the daemon is not active, but will be started with any docker command like docker ps , docker start or docker run and so on.

@nathanleclaire
So make the docker daemon start at boot by: systemctl enable docker

Topic		Replies	Views
Container with --restart=always goes away (CoreOS), 'docker ps' brings back General	4	4087	November 9, 2016
Container doesnt start after a docker run command General	0	729	April 26, 2017
Task container stopped but seen as still running in docker cloud Docker Hub dockercloud	15	7541	December 5, 2019
Docker container autostart General	4	3901	April 20, 2015
Docker container started from inside another container "disappears" General	1	3483	February 10, 2021

Containers start running on SSH

Related topics