Linux kernel becomes non-responsive on high end (200+ hw threads) platform for dockers with low cpu (cfs) quota and large number of threads

ganesanvivek · January 14, 2019, 10:18am

Hello,
Greetings! I observe prolonged (running to hour plus at times) kernel hang when I start application in a docker (with low cpu (cfs) quota, eg 400millicore) which starts hundreds of threads in a high end hw platform (upwards of 200 hw threads). Please note that I let default cpuset.cpus (200+ cores) to avoid complexity with CPU partitioning… My real interest is to ensure cpu limits using cgroups. I am suspecting this may have caused kernel scheduler to be thrashing (context-in/out) enforcing the cgroup cpu limits across the 200+ cores with upwards of 200+ ready-to-run application threads… If I increase the cpu limits (double it for example), I don’t hit this issue. Likewise, if I constrain the # of application threads to tens instead of hundreds I don’t hit this (freeze) issue. I am wondering if there is any other kernel params tuning will help mitigate this behaviour? Thanks for any pointers you can share…
Thanks much,
Ganesan

Topic		Replies	Views
No apparent cause for runtime/cgo: pthread_create failed: Resource temporarily unavailable General docker	1	11226	June 19, 2022
docker not running enough cpu thread multiplier ? Docker Desktop	0	510	December 27, 2022
Execution time does not scale up/down linearly with --cpu-period and --cpu-quota Docker Hub	0	549	July 24, 2019
Host machine is getting stuck when I restrict the memory and cpu during docker run General	0	64	June 26, 2024
Spawning docker container fails with pthread_create Resource temporarily unavailable General	0	6296	June 28, 2018

Linux kernel becomes non-responsive on high end (200+ hw threads) platform for dockers with low cpu (cfs) quota and large number of threads

Related topics