Gitlab-runner, kubernetes, concurrency > 1

I develop using gitlab, and have gitlab-runners in my onprem environment.

There is a configuration option “concurrency” which allows multiple runners.

As long as I set concurrency to ‘1’, everything works, but if I increase this number I start to experience race conditions around docker.

I can see within the pod/container where docker runs, that 2 volumes have been mounted, one with onprem certs at /etc/ssl/certs, with a file name ca.crt, and /certs.

So everything is there that works with 1 pod, and the same files are there when I run more than one pod. In kubernetes the share is accessible via more than one pod as long as RWX (read-write-many) is set, and I have set that. And, I can modify the pipelines to output files in those folders, the folders are there… and yet, I get frequent TLS related errors:

docker build -t harbor.prod.k.home.net/list/server:0.0.1-2390 -t harbor.prod.k.home.net/list/server:latest .
ERROR: error during connect: Head "https://127.0.0.1:2376/_ping": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA")
docker login harbor.prod.k.home.net -u robot$list+gitlab --password-stdin
WARNING! Your credentials are stored unencrypted in '/root/.docker/config.json'.
Configure a credential helper to remove this warning. See
https://docs.docker.com/go/credential-store/
Login Succeeded
docker push harbor.prod.k.home.net/list/server --all-tags
error during connect: Post "https://127.0.0.1:2376/v1.49/images/harbor.prod.k.home.net/list/server/push": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA")

Remember, it works if I’m only running one instance so it can be seen that the certs work. This is acting like a threading / lockfile issue.

Tried adding this:

  script:
    - cd /etc/ssl/certs
    - awk 'BEGIN {c=0;} /BEGIN CERT/{c++} { print > "cert." c ".pem"}' < ca.crt
    - cd $CI_PROJECT_DIR
    - update-ca-certificates

Though that succeeds, still same issue.

I just added a pipeline step to connect to the target server from the same pod:

echo "" | openssl s_client harbor.prod.k.home.net:443

And that shows that the needed certs are in place and are working.

SSL handshake has read 1510 bytes and written 410 bytes
Verification: OK

So why wouldn’t docker have what it needs?

docker build -t harbor.prod.k.home.net/list/worker:0.0.1-2433 -t harbor.prod.k.home.net/list/worker:latest .
ERROR: error during connect: Head "https://127.0.0.1:2376/_ping": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA")

Someone is explaining the issue here:
https://stackoverflow.com/questions/75393206/issues-running-multiple-build-jobs-in-parallel-usin-dind

Looks like we need a way with the dind image to not overwrite the certs if certs have already been provided.

I think the answer is in the image description on Docker Hub

http://hub.docker.com/_/docker

TLS

Starting in 18.09+, the dind variants of this image will automatically generate TLS certificates in the directory specified by the DOCKER_TLS_CERTDIR environment variable.


To disable this image behavior, simply override the container command or entrypoint to run dockerd directly (... docker:dind dockerd ... or ... --entrypoint dockerd docker:dind ... ).

The quickest/easiest solution was to use an emptydir via the gitlab-runner config:

        [[runners.kubernetes.volumes.empty_dir]]
          name = "docker-certs"
          mount_path = "/certs/client"
          medium = "Memory"