Problem
Docker containers started via Jenkins pipeline command
docker.image(imageToStart).inside('--init')
can not be stopped due to zombie processes left by container.
Questions
- How is it possible to get zombie processes from a Docker container, when it was started with ‘–init’ option?
- Has someone else encountered the same issue?
Used environment
- Docker 18.03.1-ce
- Jenkins 2.60.2
- Docker Pipeline plugin 1.12
Details
When a container is started from Jenkins pipeline with a command like:
docker.image('alpine').inside('--init') {
sh ('ps -efa -o pid,ppid,user,comm')
}
There are several processes in this container with parent PID 0:
[Pipeline] withDockerContainer
testhost does not seem to be running inside a container
$ docker run -t -d -u 1001:1002 \
--init \
-w /lhome/testadmin/jenkins/workspace/bli-groovy-test \
-v /lhome/testadmin/jenkins/workspace/bli-groovy-test:/lhome/testadmin/jenkins/workspace/bli-groovy-test:rw,z \
-v /lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:/lhome/testadmin/jenkins/workspace/bli-groovy-test-tmp:rw,z \
-e ******** \
--entrypoint cat alpine
[Pipeline] {
[Pipeline] sh
[bli-groovy-test] Running shell script
+ ps -efa -o pid,ppid,user,comm
PID PPID USER COMMAND
1 0 1001 init
7 1 1001 cat
8 0 1001 sh
14 8 1001 script.sh
15 14 1001 ps
[Pipeline] }
- PID 1 / PPID 0 is the ‘init’ command used to start the container
- PID 8 / PPID 0 is the ‘sh’ command from the closure to execute ‘ps’ command
The ‘sh’ process does not reap its child processes. When the process itself exits its descendants are assigned to a PPID from outside the container and not to PPID 1 from the ‘init’ process of the container.
The new parent PID is the PID of the ‘docker-containerd-shim’ process of the container.
With the small example I could not reproduce the zombie processes, but here is the situation from a more complex Jenkins job:
Docker command from Jenkins job
$ docker run -t -d -u 1001:1002 \
--init \
-w /lhome/testadmin/jenkins-coreloops/workspace/test-job/database \
-v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database:rw,z \
-v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:rw,z \
-e ******** \
--entrypoint cat ait/mpde
[Pipeline] {
[Pipeline] sh
10:03:09 [database] Running shell script
10:03:09 + ./db-upgrade.sh
the ait/mpde image is based on RedHat 7.2 and inside the container the following scripts are started:
- shell scripts, that set environment and call perl scripts
- perl scripts start SQL*Plus (1 instance per database login)
- perl scripts read SQL scripts and send SQL commands to SQL*Plus instances via STDIN
When the closure ends and Jenkins tries to stop the container, the following processes are left:
[testadmin@testhost] ~ # ps -efa | grep -vw grep | grep -w 47077
root 1725 47077 0 10:03 ? 00:00:00 [ps] <defunct>
root 1732 47077 0 10:03 ? 00:00:00 [docker-runc] <defunct>
root 2887 47077 0 10:04 ? 00:00:00 [sqlplus] <defunct>
root 2915 47077 0 10:04 ? 00:00:00 [sqlplus] <defunct>
root 47077 17349 0 10:03 ? 00:00:00 docker-containerd-shim
-namespace moby
-workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1863503ca54f75168db8ce20c78b821c0e5280f07d59875e8f651db4f0b67d9f
-address /var/run/docker/containerd/docker-containerd.sock
-containerd-binary /usr/bin/docker-containerd
-runtime-root /var/run/docker/runtime-runc
root 47098 47077 0 10:03 pts/0 00:00:00 /dev/init -- cat
root 47506 47077 0 10:03 ? 00:00:00 [sh] <defunct>
[testadmin@testhost] ~ #
and command ‘docker stop’ is aborted by Jenkins after timeout of 180 seconds.
To cleanup the remaining processes of the container this docker-containerd-shim process has to be killed with SIGKILL.
Note
We observed this issue on our recently installed CentOS server:
-
CentOS release 7.5.1804
-
environment related parts from ‘docker info’:
Server Version: 18.03.1-ce Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88 runc version: 4fc53a81fb7c994640722ac585fa9ca548971871 init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 3.10.0-862.6.3.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 48 Total Memory: 377.6GiB Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
The behavior on other hosts is similar with respect to the multiple processes with parent PID 0.
But on these hosts we didn’t observe that containers were hanging on shutdown or that there was a similar number of zombie processes.
For comparisation the similar ‘docker info’ extract from one of these other hosts:
Server Version: 17.05.0-ce
Storage Driver: devicemapper
Pool Name: dock-thinpool
Pool Blocksize: 524.3kB
Base Device Size: 16.11GB
Backing Filesystem: xfs
Data file:
Metadata file:
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false