I came across this quite a few times now:
I discover my Docker containers are not running, I SSH into the server and do docker ps and immediately the containers start running.
This happened both with my Jenkins container and my Prometheus/Grafana stack (all on one server).
My docker-compose settings for Jenkins has restart: always, and the Prometheus stack has restart: unless-stopped
Is this a bug or am I missing something? Thank you!
Docker version
core@ip- $ docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Thu Feb 23 02:17:18 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Thu Feb 23 02:17:18 2017
OS/Arch: linux/amd64
docker-compose version
core@ip- $ docker-compose version
docker-compose version 1.9.0, build 2585387
docker-py version: 1.10.6
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1t 3 May 2016
OS version
core@ip- $ cat /etc/*-release
DISTRIB_ID=“Container Linux by CoreOS”
DISTRIB_RELEASE=1235.12.0
DISTRIB_CODENAME=“Ladybug”
DISTRIB_DESCRIPTION=“Container Linux by CoreOS 1235.12.0 (Ladybug)”
ID=ami
VERSION_ID=0.0.7
NAME=“Amazon EC2”
HOME_URL=“Amazon EC2 - Cloud Compute Capacity - AWS”
BUG_REPORT_URL=“Issues · coreos/bugs · GitHub”
NAME=“Container Linux by CoreOS”
ID=coreos
VERSION=1235.12.0
VERSION_ID=1235.12.0
BUILD_ID=2017-02-23-0222
PRETTY_NAME=“Container Linux by CoreOS 1235.12.0 (Ladybug)”
ANSI_COLOR=“38;5;75”
HOME_URL=“https://coreos.com/”
BUG_REPORT_URL=“Issues · coreos/bugs · GitHub”
My docker-compose settings for Jenkins has restart: always
That’s why. Your containers are thrashing for some reason. Look into why by reading the docker logs for them. I’m a bit suspicious of memory usage just at a quick glance. You might need a server with more RAM to run all of that.
It’s actually happening at the moment, so I’ll describe what’s happening:
I can’t reach Jenkins through my browser, Chrome is telling me:
This site can’t be reached my.domain.org refused to connect.
I wait 30 minutes to prove that it is not restarting by itself.
I SSH into the host machine and alas:
Last login: Fri Feb 24 20:33:15 UTC 2017 from 131.247.212.26 on pts/0
Container Linux by CoreOS stable (1298.5.0)
core@ip- ~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
183cd3b93446 cvast/cvast-jenkins:1.0 "/bin/sh -c ${INSTALL" 2 weeks ago Up Less than a second 0.0.0.0:8080->8080/tcp, 0.0.0.0:50000->50000/tcp cvastbuild_cvast-jenkins_1
I go back to my browser and voila, Jenkins opens up perfectly.
Some interesting facts: Nothing interesting shows up in the logs (docker logs <container id>.
The last log before Jenkins went down was this (the first two lines it spits out every hour, only showing that for the time stamp):
[etc... etc...]
Mar 01, 2017 5:04:08 AM hudson.diagnosis.HudsonHomeDiskUsageChecker doRun
INFO: JENKINS_HOME disk usage information isn't available. aborting to monitor
[WARN tini (5)] Tini is not running as PID 1 and isn't registered as a child subreaper.
Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
To fix the problem, use -s or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
And here is right after I SSHed into the host machine, Jenkins is starting up:
Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Mar 01, 2017 2:33:10 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Mar 01, 2017 2:33:10 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Logging initialized @1441ms
Mar 01, 2017 2:33:10 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
[etc... etc...]
Here are the logs from last Jenkins builds a week ago to now: docker logs jenkins.txt (26.6 KB)
I’m still not convinced your container isn’t just thrashing or the server is running out of memory. Jenkins is Java so eats up a lot of RAM. What does docker inspect -f '{{.RestartCount}}' <container> say? How much memory does the server have? And what does free -m say while Jenkins is running?
You don’t have nearly enough memory available and it seems you don’t have swap configured either. This is almost assuredly your issue. 269/995MB is not much headroom at all. Once the memory usage for a given process starts to balloon, the kernel will kill processes (Jenkins here) to free some up. Then Docker restarts it because “Oops, the process was killed somehow”.
Upgrade your server to one with 2-4x as much RAM (at least - Jenkins and some of the other things you seem to be running are memory-hungry beasts) and everything should work smoothly. You could also configure swap, that’s likely to make your programs slower though.
@vmeijer I am seeing the exact same problem running the identical version of CoreOS. This is a system I use as only as an internal Docker registry. My assumption is that when no one accesses for several days, some part of the system goes to sleep. It does not make any sense, but doing a docker ps restores the Docker Registry just as you describe above.
System details:
Last login: Fri Feb 24 18:27:09 UTC 2017 from 10.100.xx.xx on pts/0
Container Linux by CoreOS stable (1298.5.0)
core@localhost ~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5779010183e2 hyper/docker-registry-web:latest “start.sh” 3 months ago Up Less than a second 0.0.0.0:80->8080/tcp web
8bfa9bc37d07 registry:2 “/entrypoint.sh /etc/” 3 months ago Up Less than a second 0.0.0.0:5000->5000/tcp docker.smithmicro.net
e5ee71a3029f registry:2 “/entrypoint.sh /etc/” 3 months ago Up 1 seconds 0.0.0.0:443->5000/tcp registry
core@localhost ~ $ docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Tue Feb 28 00:07:14 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: d5236f0
Built: Tue Feb 28 00:07:14 2017
OS/Arch: linux/amd64
That definitely is very peculiar behavior. Since it seems specific to CoreOS, you might want to run it by the CoreOS folks and see what they think. I’d be curious to see recent logs from the Docker daemon after this behavior occurs.
I’m aware this issue is from a long time ago… but since I had the same issue and came across this topic and not many(read: none) others, I thought posting the actual solution might be still helpful to others as well.
Hints for me were that this issue so far only occurs on CoreOS(same as me) and the remark of @reederz: “to wake up the docker engine”
I did some further googling and digging, and it turned out that the docker daemon is not started by default on CoreOS. That is the actual issue. After a reboot of CoreOS the daemon is not active, but will be started with any docker command like docker ps , docker start or docker run and so on.
@nathanleclaire
So make the docker daemon start at boot by: systemctl enable docker