Hello, while using Docker on a GPU server, I found that the Docker service would restart itself. Checking the logs, it was killed by the OMM program. After testing, we discovered that even without any containers running, the CPU usage of the Docker service itself gradually increases until resources overflow, causing it to be forcibly killed and restarted. We conducted a simple performance test on the CPU, and no error messages were reported.
Below is the CPU usage information and error messages we monitored.
[200556.431286] nginx invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[200556.431305] oom_kill_process.cold+0xb/0x10
[200556.431523] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[200556.432169] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=4abbd780ff360c72f7e43c9eca5c03d23f5968668e8c167d9c28e77ff7de1e75,mems_allowed=0-1,global_oom,task_memcg=/system.slice/docker.service,task=dockerd,pid=108675,uid=0
[200556.434281] Out of memory: Killed process 108675 (dockerd) total-vm:1045589968kB, anon-rss:513442048kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1736844kB oom_score_adj:-500
[200575.428363] oom_reaper: reaped process 108675 (dockerd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
2025-04-28 03:56:32 - CPU Usage: 18.3%
Top 10 Processes by CPU Usage:
422357 root 20 0 40.0g 4.0g 55964 S 2435 0.8 766:26.35 dockerd
430481 root 20 0 10932 5420 3296 R 23.5 0.0 0:00.06 top
406880 root 20 0 0 0 0 I 5.9 0.0 0:02.25 kworker/103:1-events
422986 root 20 0 0 0 0 I 5.9 0.0 0:04.39 kworker/101:2-events
424181 root 20 0 1238188 13164 9660 S 5.9 0.0 0:00.69 containerd-shim
1 root 20 0 172236 11260 6256 S 0.0 0.0 0:40.84 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:08.22 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kblockd
2025-04-28 03:57:33 - CPU Usage: 20%
Top 10 Processes by CPU Usage:
422357 root 20 0 40.0g 4.1g 55964 S 2539 0.8 788:32.14 dockerd
430543 root 20 0 10932 5288 3164 R 22.2 0.0 0:00.06 top
406880 root 20 0 0 0 0 I 5.6 0.0 0:02.31 kworker/103:1-events
422409 root 20 0 0 0 0 I 5.6 0.0 0:04.27 kworker/86:0-events
422720 root 20 0 0 0 0 I 5.6 0.0 0:04.25 kworker/99:2-events
1 root 20 0 172236 11260 6256 S 0.0 0.0 0:40.84 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:08.22 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kblockd
2025-04-28 08:30:37 - CPU Usage: 70.2%
Top 10 Processes by CPU Usage:
422357 root 20 0 212.5g 170.3g 56436 S 6789 33.8 13194:56 dockerd
108450 root 20 0 3394404 28804 6992 S 100.0 0.0 242:54.77 glances
456224 root 20 0 11060 5276 3148 R 22.2 0.0 0:00.06 top
432 root rt 0 0 0 0 S 5.6 0.0 0:11.17 migrati+
420642 root 20 0 0 0 0 I 5.6 0.0 0:47.44 kworker+
422409 root 20 0 0 0 0 I 5.6 0.0 0:45.84 kworker+
422872 root 20 0 0 0 0 I 5.6 0.0 0:47.08 kworker+
423345 root 20 0 0 0 0 I 5.6 0.0 0:46.61 kworker+
425705 root 20 0 0 0 0 I 5.6 0.0 0:46.30 kworker+
444266 root 20 0 0 0 0 I 5.6 0.0 0:23.10 kworker+
2025-04-28 08:31:37 - CPU Usage: 67.8%
Top 10 Processes by CPU Usage:
422357 root 20 0 212.5g 170.3g 56436 S 2811 33.8 13248:42 dockerd
108450 root 20 0 3394404 28508 6992 S 100.0 0.0 243:04.75 glances
456435 root 20 0 11060 5368 3236 R 16.7 0.0 0:00.06 top
456438 hzau 20 0 8468 5252 3548 S 16.7 0.0 0:00.03 bash
423642 www-data 20 0 9092 2680 1548 S 5.6 0.0 0:02.42 nginx
451282 root 20 0 0 0 0 I 5.6 0.0 0:10.74 kworker+
1 root 20 0 172236 11760 6256 S 0.0 0.0 0:43.00 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:09.48 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par+
Finally, we tried reinstalling Docker and the server system, but the issue was not resolved. We are unsure which aspect to investigate. Has anyone encountered a similar problem?