Host VM with running Docker containers freezes periodically with I/O waits and 100% CPU utilization

Dear community,

we’re running several SLES VMs with Docker installed (from official SLES repositories). We’re experiencing every Sunday noon a freeze of all SLES VMs with running Docker containers. A VM with Docker installed and daemon running but no containers doesn’t show this behaviour. In an other environment running Docker on CentOS 7 we don’t experience the freeze. The containers here contain the same as the containers on SLES.

I’ve attached the different system parameters around the affected timeline. We did an intensive research of the different log files but weren’t able to indentify anything yet.

Any help or hint is highly appreciated.

Find furthermore the system information of the VMs

uname -a
Linux host_docker 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016 (42e0a66) x86_64 x86_64 x86_64 GNU/Linux

docker version
Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.1
 Git commit:   78d1802
 Built:        Wed Feb 15 15:00:28 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.1
 Git commit:   78d1802
 Built:        Wed Feb 15 15:00:28 2017
 OS/Arch:      linux/amd64

docker info
Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker-254:3-1835009-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 5.3 GB
 Data Space Total: 107.4 GB
 Data Space Available: 24.95 GB
 Metadata Space Used: 5.239 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.142 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.03.01 (2015-05-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null bridge host overlay
Swarm: inactive
Runtimes: oci runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.12.62-60.64.8-default
Operating System: SUSE Linux Enterprise Server 12 SP1
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 15.35 GiB
Name: host_docker
ID: D3D5:GNHN:U3OT:C6SI:Q4QA:TBGC:I2SY:YDGC:VVUZ:LUDX:RMPK:JOU3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
Insecure Registries:
 host_registry:5000
 127.0.0.0/8

We have the same fault on one of our machines. Running SLES 12 SP3.

The host machine is freezing after pulling an image or starting a container.
This problem happens randomly. One day the machine is running fine, even for days.

We got a crazy log line in the mesages / syslog:

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@2018-05-30T14:21:23.143760+02:00 helene kernel: [ 0.000000] Initializing cgroup subsys cpuset

Have you found the culprit? It happens frequently on our AWS EC2 Centos7 instances…

Same problem here! have you got any solution?
Our OS is Ubuntu 20.04.06 LTS