Docker Community Forums

Share and learn in the Docker community.

Dockerd using 100% CPU

Hey folks - I’m trying to debug why dockerd on one of our bare metal servers is pegging our cpu at 100% for a few days now. None of the containers seem to utilizing a lot of CPU (from running docker stats). The dockerd logs don’t show anything helpful from what I can see. What else should I look for? Any help is appreciated. Here’s some docker info from our host, incase it’s useful:

# docker info
Containers: 4
 Running: 4
 Paused: 0
 Stopped: 0
Images: 4
Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-157-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 20
Total Memory: 31.3GiB
Name: basil
ID: PHHV:SHFX:DT6S:4BW4:UEWF:MXJM:32YT:NSDI:2J3F:EBF6:V3IB:IRIQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: bowerybot
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
# docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:56 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:21 2018
  OS/Arch:          linux/amd64
  Experimental:     false

One reason can be corrupt json-log files.

Use find /var/lib/docker/containers/ -name *-json.log -exec bash -c 'jq '.' {} > /dev/null 2>&1 || echo "file corrupt: {}"' \; to identify if all fines are still json compliant. Delete all corrupt files.

Wow good call that seems to have fixed it! Thank you so much for the help. Can you explain how that command finds corrupted log files? And why they may have been corrupted in the first place? I need to learn more about this. Thanks

I try my best :slight_smile:

The find command searches for all files that end with -json.log in /var/lib/docker/containers and sends them to jq to parse them, though the parsed ouput is redirected to /dev/null to not pollute the screen. The || is a boolen OR that makes use of the return code of jq: if it’s 0, parsing succeeded. if its >0 parsing failed and the output "file corupt: ${filename} gets printed. The {} characters are special characters in find -exec that get replaced by the actual filename.

Does that make sense?

We had it in the past when the filesytem was full and the logs got stuck mid-writing of log entries.

Yes that makes perfect sense. Thank you for the explanation. Cheers

This post saved my bacon, ty <3

We have been running into this issue as well; in particular this is only happening on the servers that are running a single Nginx container that are serving as the load balancers for our services. I’ve straced the docker daemon process, and while its in high cpu, the process seems be spinning while waiting for some kind of lock:

futex(0x2570a30, FUTEX_WAIT, 0, NULL) = 0
I am wondering if it might be related to Docker’s logging system as these load balancers are receiving a lot of traffice and producing a log of logs.