Docker Community Forums

Share and learn in the Docker community.

Node dont Update Stats to UCP

Cluster 1MNG 4WRK 1DTR
SO: RH7.7
Docker Version: 19.03.5

What happened was that from one day to the next, 3 Workers and DTR got the status “Node-local UCP component status was last updated XXXXXX seconds ago”, the Dynatrace log pointed to a slowness in the Cluster services. I did a one-time reboot on 1 of the problem Nodes and nothing worked.

             Finally, we remove one of the problem Nodes and add it back to the Cluster, after that the UCP-Agent container cannot be HEALTHY and restart every 10-12 seconds, but other deploys made by the teams are distributed normally in all Workes.


– Logs begin at Tue 2020-04-28 17:08:50 -03, end at Thu 2020-04-30 13:57:54 -03. –

Apr 30 13:57:48 MANAGER dockerd[20799]: time=“2020-04-30T13:57:48.329302748-03:00” level=warning msg=“grpc: Server.Serve failed to complete security handshake from “”: remote error: tls: bad certificate” module=grpc

Apr 30 13:57:48 MANAGER dockerd[20799]: time=“2020-04-30T13:57:48.613578850-03:00” level=error msg=“failed to sign CSR” error=“unable to perform certificate signing request: Post x509: certificate signed by unknown authority (possibly because of “x509: ECDSA verification failure” while trying to verify candidate authority certificate “swarm-ca”)” method="(*Server).signNodeCert" module=ca

Apr 30 13:57:49 MANAGER dockerd[20799]: time=“2020-04-30T13:57:49.266491155-03:00” level=warning msg=“grpc: Server.Serve failed to complete security handshake from “”: remote error: tls: bad certificate” module=grpc


– Logs begin at Thu 2020-04-30 13:19:46 -03, end at Thu 2020-04-30 15:28:37 -03. –

Apr 30 15:26:55 WORKER4 dockerd[2442]: time=“2020-04-30T15:26:55.808051780-03:00” level=warning msg=“7aeae2455217fe0aee791e5955be43f21cc4820faa9ef91c4e9a156b95d3ec01 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/7aeae2455217fe0aee791e5955be43f21cc4820faa9ef91c4e9a156b95d3ec01/mounts/shm, flags: 0x2: no such file or directory”

Apr 30 15:26:55 WORKER4 dockerd[2442]: time=“2020-04-30T15:26:55.824540165-03:00” level=error msg=“fatal task error” error=“task: non-zero exit (1)” module=node/agent/taskmanager

Apr 30 15:27:11 WORKER4 dockerd[2442]: time=“2020-04-30T15:27:11.857760984-03:00” level=info msg=“ignoring event” module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"


{“level”:“info”,“msg”:“Loading local node Docker Info”,“time”:“2020-04-30T18:31:02Z”}

{“level”:“info”,“msg”:“Loading local node TLS configuration”,“time”:“2020-04-30T18:31:02Z”}

{“level”:“info”,“msg”:“Loading Node TLS Config”,“time”:“2020-04-30T18:31:02Z”}

{“level”:“info”,“msg”:“UCP Node Certs do not exist - falling back to Swarm-mode node certs”,“time”:“2020-04-30T18:31:02Z”}

{“level”:“info”,“msg”:“Connecting to etcd cluster at addresses []”,“time”:“2020-04-30T18:31:02Z”}

{“level”:“info”,“msg”:“Attempting to connect to the etcd cluster at the following addresses: []”,“time”:“2020-04-30T18:31:02Z”}

seems like certificate rotation was messed up. Been there, tried to fix it and failed…

Even though there is well described solution in the success center, I never managed to correct this error myself. Instead of trying it by yourself, I would strongly advice to raise a support ticket.

The cause is related to ucp-metrics pods that are not able to rotate certificates.

Upgrade to latest UCP version (it has been fixed in the 3.1.15, 3.2.8 and 3.3.2).

Otherwise you can use kubectl and delete all pods of ucp-metrics, the daemon-set will restart them automatically and you will see all metrics back on your dashboard:

kubectl delete pod/ucp-metrics-xxxxx pod/ucp-metrics-yyyyy pod/ucp-metrics-zzzzz -n kube-system