I have a Jenkins pipeline that builds and run a docker container that writes on a file some lines.
After three hours and 5 minutes (exactly 185 minutes every time) the pipeline is killed with error code 137 but without any OOM indication.
I managed to reproduce the error with minimal code, shown below, but I cannot figure out who is killing the container and why. Also I managed to run a simpler pipeline with just some echo and it run for more than 6 hours without being given any interruption from Jenkins.
Do you have any idea why it’s being killed like that? could it be the file open for that long time?
here’s the folder structure of the minimal-code example
main
├── src
│ └── hello.sh
├── Jenkinsfile
├── Dockerfile
└── Makefile
here’s the hello.sh
content. this code simply writes an hello world to a file and to the standard output every minute.
#!/bin/sh
touch output.txt
while :
do
echo "hello world!" | tee output.txt
sleep 60
done
And I got the same behaviour with a simpler echo "hello world!" > output.txt
here’s the Dockerfile that runs the shell script
FROM alpine
WORKDIR /src
ADD src/ $WORKDIR
CMD ["./hello.sh"]
here’s the Makefile that build and run the docker container
.PHONY: all
all: build
# When run locally docker_tag won't be set so we should create it
# When run in Jenkins the Jenkinsfile defines this appropriately
docker_tag ?= "localtest-$(shell git rev-parse --short HEAD)"
build:
docker build -t $(docker_tag) .
docker run --rm $(docker_tag)
Here’s the Jenkinsfile which launches the makefile
#!/usr/bin/env groovy
env.docker_tag = "test"
node {
stage('Test') {
checkout scm
try{
sh 'make'
} catch(err) {
echo "Got bad result"
currentBuild.result = "UNSTABLE"
} finally {
sh "docker stop $docker_tag"
sh "docker rm $docker_tag"
}
}
}
and here’s the relevant part of the output
[Pipeline] sh
+ make
docker build -t d95b56c-cdtrain-timeout-test-master-4 .
Sending build context to Docker daemon 90.62kB
Step 1/4 : FROM alpine
latest: Pulling from library/alpine
Digest: sha256:eb3e4e175ba6d212ba1d6e04fc0782916c08e1c9d7b45892e9796141b1d379ae
Status: Downloaded newer image for alpine:latest
---> 021b3423115f
Step 2/4 : WORKDIR /src
---> Using cache
---> 09f0e2434bbd
Step 3/4 : ADD src/ $WORKDIR
---> 15a75f7421c4
Step 4/4 : CMD ["./hello.sh"]
---> Running in 28b77fee50ef
Removing intermediate container 28b77fee50ef
---> 5ff24b5feeda
Successfully built 5ff24b5feeda
Successfully tagged d95b56c-cdtrain-timeout-test-master-4:latest
docker run --rm d95b56c-cdtrain-timeout-test-master-4
hello world!
hello world!
hello world!
hello world!
...
...
...
hello world!
hello world!
hello world!
make: *** [build] Error 137
[Pipeline] echo
Got bad result
[Pipeline] sh
+ docker stop d95b56c-cdtrain-timeout-test-master-4
Error response from daemon: No such container: d95b56c-cdtrain-timeout-test-master-4
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: script returned exit code 1
[Bitbucket] Notifying commit build result
[Bitbucket] Build result notified
Finished: FAILURE
docker version: Docker version 20.10.7, build f0df350
update: here’s docker events related to this run and container
# docker events --since 24h | grep 32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804
2021-08-10T14:14:40.043159249Z container create 32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804 (image=d95b56c-cdtrain-timeout-test-master-4, name=gallant_cray)
2021-08-10T14:14:40.044824402Z container attach 32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804 (image=d95b56c-cdtrain-timeout-test-master-4, name=gallant_cray)
2021-08-10T14:14:40.055127905Z network connect 10054940a891b2d3c2831d09465302ba2d2a32d7c5a4d7215b03db1de29f5808 (container=32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804, name=bridge, type=bridge)
2021-08-10T14:14:40.359326266Z container start 32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804 (image=d95b56c-cdtrain-timeout-test-master-4, name=gallant_cray)
2021-08-10T17:20:01.571904701Z container kill 32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804 (image=d95b56c-cdtrain-timeout-test-master-4, name=gallant_cray, signal=9)
2021-08-10T17:20:01.680986792Z container die 32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804 (exitCode=137, image=d95b56c-cdtrain-timeout-test-master-4, name=gallant_cray)
2021-08-10T17:20:01.729749822Z network disconnect 10054940a891b2d3c2831d09465302ba2d2a32d7c5a4d7215b03db1de29f5808 (container=32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804, name=bridge, type=bridge)
2021-08-10T17:20:01.751531491Z container destroy 32f3ea8a4dcef64b44dc2d501e2fa6e61711487d5dda47f504fad33c88edd804 (image=d95b56c-cdtrain-timeout-test-master-4, name=gallant_cray)
Thank you for your help