Docker Community Forums

Share and learn in the Docker community.

UCP Kubernetes compose-api pod in state CrashLoopBackOff

On my Kubernetes Cluster in Docker Enterpris 3.2.0 one of the Pods does not start:

kube-system     compose-api-6c8c7c8fd8-2bq9x               0/1       CrashLoopBackOff   4

Here are the logs of the docker container:

    $ docker ps -a | grep compose-api
9be7a74f1bcf        7f719dba281f                                             "/api-server --kub..."   27 seconds ago      Exited (255) 5 seconds ago                                                                                               fe0vmc2782/k8s_ucp-kube-compose-api_compose-api-6c8c7c8fd8-2bq9x_kube-system_c0c40495-1fec-11ea-be4b-0242ac110008_13
5eccffff0098        docker/ucp-pause:3.2.0                                   "/pause"                 About an hour ago   Up 46 minutes                                                                                                            fe0vmc2782/k8s_POD_compose-api-6c8c7c8fd8-2bq9x_kube-system_c0c40495-1fec-11ea-be4b-0242ac110008_0
$ docker logs 9be7a74f1bcf
I1216 10:59:48.585828       1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I1216 10:59:48.585869       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
W1216 10:59:48.586056       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1216 10:59:48.588492       1 balancer_v1_wrapper.go:52] ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc00072e960
I1216 10:59:48.588550       1 clientconn.go:490] dialing to target with scheme: ""
I1216 10:59:48.588562       1 clientconn.go:490] could not get resolver for scheme: ""
I1216 10:59:48.588641       1 asm_amd64.s:1337] balancerWrapper: is pickfirst: false
I1216 10:59:48.588716       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.139.7.226:12379 <nil>}]
I1216 10:59:48.588765       1 balancer_v1_wrapper.go:202] ccBalancerWrapper: new subconn: [{10.139.7.226:12379 0  <nil>}]
I1216 10:59:48.589038       1 balancer_conn_wrappers.go:120] balancerWrapper: handle subconn state change: 0xc0003edfd0, CONNECTING
I1216 10:59:48.589138       1 balancer_v1_wrapper.go:247] ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc00072e960
W1216 10:59:48.591739       1 asm_amd64.s:1337] Failed to dial 10.139.7.226:12379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
I1216 10:59:48.591759       1 balancer_conn_wrappers.go:120] balancerWrapper: handle subconn state change: 0xc0003edfd0, TRANSIENT_FAILURE
I1216 10:59:48.591767       1 balancer_v1_wrapper.go:247] ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc00072e960
I1216 10:59:48.591773       1 balancer_conn_wrappers.go:120] balancerWrapper: handle subconn state change: 0xc0003edfd0, SHUTDOWN
I1216 10:59:48.591778       1 balancer_v1_wrapper.go:247] ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc00072e960
I1216 10:59:49.118807       1 balancer_v1_wrapper.go:52] ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc0000acc60
I1216 10:59:49.118929       1 clientconn.go:490] dialing to target with scheme: ""
I1216 10:59:49.118935       1 clientconn.go:490] could not get resolver for scheme: ""
I1216 10:59:49.118968       1 asm_amd64.s:1337] balancerWrapper: is pickfirst: false
I1216 10:59:49.118990       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.139.7.226:12379 <nil>}]
I1216 10:59:49.119003       1 balancer_v1_wrapper.go:202] ccBalancerWrapper: new subconn: [{10.139.7.226:12379 0  <nil>}]
I1216 10:59:49.119212       1 balancer_conn_wrappers.go:120] balancerWrapper: handle subconn state change: 0xc000227d00, CONNECTING
I1216 10:59:49.119227       1 balancer_v1_wrapper.go:247] ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc0000acc60
W1216 10:59:49.121784       1 asm_amd64.s:1337] Failed to dial 10.139.7.226:12379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
I1216 10:59:49.121846       1 balancer_conn_wrappers.go:120] balancerWrapper: handle subconn state change: 0xc000227d00, TRANSIENT_FAILURE
I1216 10:59:49.121921       1 balancer_v1_wrapper.go:247] ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc0000acc60
I1216 10:59:49.121954       1 balancer_conn_wrappers.go:120] balancerWrapper: handle subconn state change: 0xc000227d00, SHUTDOWN
I1216 10:59:49.121974       1 balancer_v1_wrapper.go:247] ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc0000acc60
I1216 11:00:08.588771       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: []
I1216 11:00:08.588839       1 balancer_v1_wrapper.go:217] ccBalancerWrapper: removing subconn
I1216 11:00:08.588870       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.139.7.226:12379 <nil>}]
I1216 11:00:08.588882       1 balancer_v1_wrapper.go:202] ccBalancerWrapper: new subconn: [{10.139.7.226:12379 0  <nil>}]
I1216 11:00:08.588916       1 balancer_conn_wrappers.go:120] balancerWrapper: handle subconn state change: 0xc000311e50, CONNECTING
I1216 11:00:08.588923       1 balancer_v1_wrapper.go:247] ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc00072e960
F1216 11:00:08.588928       1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry/docker.com/stacks {[10.139.7.226:12379] /etc/docker-compose/etcd/client.key /etc/docker-compose/etcd/client.crt /etc/docker-compose/etcd/ca.crt} false false 0xc0003b0000 <nil> <nil> 5m0s 1m0s}), err (context deadline exceeded)

How can I get the Pod up and running again?

Thank you,
Andreas

@ak23,

We can see the same errors in our cluster.
Did you find any root cause for this issue?

Thank you

Unfortunately we were not able to find the cause for this problem. My internal support redeployed the cluster to fix it. Now all pods are ok.