Docker container started with --init still leaves zombie processes

bli1974 · July 24, 2018, 3:59pm

Problem
Docker containers started via Jenkins pipeline command

docker.image(imageToStart).inside('--init')

can not be stopped due to zombie processes left by container.

Questions

How is it possible to get zombie processes from a Docker container, when it was started with ‘–init’ option?
Has someone else encountered the same issue?

Used environment

Docker 18.03.1-ce
Jenkins 2.60.2
Docker Pipeline plugin 1.12

Details
When a container is started from Jenkins pipeline with a command like:

docker.image('alpine').inside('--init') {
  sh ('ps -efa -o pid,ppid,user,comm')
}

There are several processes in this container with parent PID 0:

[Pipeline] withDockerContainer
testhost does not seem to be running inside a container
$ docker run -t -d -u 1001:1002 \
  --init \
  -w /lhome/testadmin/jenkins/workspace/bli-groovy-test \
  -v /lhome/testadmin/jenkins/workspace/bli-groovy-test:/lhome/testadmin/jenkins/workspace/bli-groovy-test:rw,z \
  -v /lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:/lhome/testadmin/jenkins/workspace/bli-groovy-test-tmp:rw,z \
  -e ******** \
  --entrypoint cat alpine
[Pipeline] {
[Pipeline] sh

[bli-groovy-test] Running shell script
+ ps -efa -o pid,ppid,user,comm
PID   PPID  USER     COMMAND
    1     0 1001     init
    7     1 1001     cat
    8     0 1001     sh
   14     8 1001     script.sh
   15    14 1001     ps
[Pipeline] }

PID 1 / PPID 0 is the ‘init’ command used to start the container
PID 8 / PPID 0 is the ‘sh’ command from the closure to execute ‘ps’ command

The ‘sh’ process does not reap its child processes. When the process itself exits its descendants are assigned to a PPID from outside the container and not to PPID 1 from the ‘init’ process of the container.
The new parent PID is the PID of the ‘docker-containerd-shim’ process of the container.

With the small example I could not reproduce the zombie processes, but here is the situation from a more complex Jenkins job:

Docker command from Jenkins job

$ docker run -t -d -u 1001:1002 \
  --init \
  -w /lhome/testadmin/jenkins-coreloops/workspace/test-job/database \
  -v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database:rw,z \
  -v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:rw,z \
  -e ******** \
  --entrypoint cat ait/mpde
[Pipeline] {
[Pipeline] sh
10:03:09 [database] Running shell script
10:03:09 + ./db-upgrade.sh

the ait/mpde image is based on RedHat 7.2 and inside the container the following scripts are started:

shell scripts, that set environment and call perl scripts
perl scripts start SQL*Plus (1 instance per database login)
perl scripts read SQL scripts and send SQL commands to SQL*Plus instances via STDIN

When the closure ends and Jenkins tries to stop the container, the following processes are left:

[testadmin@testhost] ~  # ps -efa | grep -vw grep | grep -w 47077
root      1725 47077  0 10:03 ?        00:00:00 [ps] <defunct>
root      1732 47077  0 10:03 ?        00:00:00 [docker-runc] <defunct>
root      2887 47077  0 10:04 ?        00:00:00 [sqlplus] <defunct>
root      2915 47077  0 10:04 ?        00:00:00 [sqlplus] <defunct>
root     47077 17349  0 10:03 ?        00:00:00 docker-containerd-shim 
                                                -namespace moby 
                                                -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1863503ca54f75168db8ce20c78b821c0e5280f07d59875e8f651db4f0b67d9f 
                                                -address /var/run/docker/containerd/docker-containerd.sock 
                                                -containerd-binary /usr/bin/docker-containerd 
                                                -runtime-root /var/run/docker/runtime-runc
root     47098 47077  0 10:03 pts/0    00:00:00 /dev/init -- cat
root     47506 47077  0 10:03 ?        00:00:00 [sh] <defunct>
[testadmin@testhost] ~  #

and command ‘docker stop’ is aborted by Jenkins after timeout of 180 seconds.

To cleanup the remaining processes of the container this docker-containerd-shim process has to be killed with SIGKILL.

Note
We observed this issue on our recently installed CentOS server:

CentOS release 7.5.1804

environment related parts from ‘docker info’:

Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.6.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

The behavior on other hosts is similar with respect to the multiple processes with parent PID 0.
But on these hosts we didn’t observe that containers were hanging on shutdown or that there was a similar number of zombie processes.

For comparisation the similar ‘docker info’ extract from one of these other hosts:

  Server Version: 17.05.0-ce
  Storage Driver: devicemapper
   Pool Name: dock-thinpool
   Pool Blocksize: 524.3kB
   Base Device Size: 16.11GB
   Backing Filesystem: xfs
   Data file:
   Metadata file:
   Udev Sync Supported: true
   Deferred Removal Enabled: true
   Deferred Deletion Enabled: true
   Deferred Deleted Device Count: 0
   Library Version: 1.02.140-RHEL7 (2017-05-03)
  Logging Driver: json-file
  Cgroup Driver: cgroupfs
  Plugins:
   Volume: local
   Network: bridge host macvlan null overlay
  Swarm: inactive
  Runtimes: runc
  Default Runtime: runc
  Init Binary: docker-init
  containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
  runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
  init version: 949e6fa
  Security Options:
   seccomp
    Profile: default
  Kernel Version: 3.10.0-693.11.6.el7.x86_64
  Operating System: CentOS Linux 7 (Core)
  OSType: linux
  Architecture: x86_64
  CPUs: 48
  Total Memory: 377.6GiB
  Docker Root Dir: /var/lib/docker
  Debug Mode (client): false
  Debug Mode (server): false
  Registry: https://index.docker.io/v1/
  Experimental: false
  Insecure Registries:
   127.0.0.0/8
  Live Restore Enabled: false

johnl23 · November 5, 2018, 2:50pm

Any updates on this issue? I seem have run into an issue running docker which looks like this. Any workarounds? I am running docker version 18.06.1-ce.

johnl

bli1974 · January 7, 2019, 9:25am

I didn’t receive any confirmation or update on that topic.

As workaround I created images that automatically start an SSH daemon and modified the ‘docker exec’ to start commands via “ssh 127.0.0.1 ${COMMAND}”.
This way the started processes are childs of SSHD process and the container INIT process.

Topic		Replies	Views
Java leaves zombie processes Docker Toolbox	6	6208	February 2, 2016
Process zombie problem in container General docker	0	661	May 11, 2018
What the latest with the zombie process reaping problem? General	2	11092	May 15, 2018
Alpine container not running, status=Exited Docker Desktop docker , windows-container	3	1358	May 13, 2024
Container doesnt start after a docker run command General	0	812	April 26, 2017

Docker container started with --init still leaves zombie processes

Related topics