How to find root cause of slow container creating?

We are running a cluster of physical machines(CPU: 192) and running some jobs. And we find the job(Kubernetes pod) creating time can be more than 1 minute.

  • Docker version: 20.10.7
  • io: await is at most 10; most time await is 2~6.
  • SSD
  • overlay2/d_type=true
  • centos 7.9 + kernel 3.10
  • load is about 20.
  • container: 46 (42 runnings)
  • image 72 (360Gi some AI-related large images): disk capacity is about 900 Gi and more.

we run a test every 5 minutes to run an Nginx container in docker.

  • the creation time is not stable, at most times the creation time is less than 1s.
  • but sometimes the creation time is more than 10s and even 30s.

At first, we think the load is the key factor.

Are there any other factors that we are missing? How to find the root cause?

  • We plan to get the profile of docker when creating a new container to get more information.

01