I have a very weird problem I can find nothing about… It’s hard to reproduce on other servers… 2 just work and 2 have the same problem. I am probably missing something so any pointers are appreciated…
When starting a container it fails with errors similar to
ERRO[2019-09-19T10:33:07.618605334+02:00] Handler for POST /v1.40/containers/create returned error: open /data/docker/overlay2/a96da70f876a18dc93ea677da85d7e02fa30cfcebde8b739cc3ae4e0803eddbb-init/merged/etc/hosts: invalid argument
docker: Error response from daemon: open /data/docker/overlay2/a96da70f876a18dc93ea677da85d7e02fa30cfcebde8b739cc3ae4e0803eddbb-init/merged/etc/hosts: invalid argument.
See 'docker run --help'.
I know similar issues can happen if you use xfs without ftype support, but that isn’t it. Manual checks and docker info
do report this is okay.
Also the OS/packages/kernel versions are all the same for these servers.
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
The error triggers when starting up the container. Docker tries to remove the /etc/hosts
if present to make sure it isn’t… Which completes without an error… and after tries to create a new empty one but crashes with invalid argument
.
This seems to happen because the /etc/hosts
-file as present in the overlays is a character special device
instead of a file. As far as I can see it’s created by Docker itself, but overlaying a character special device
and recreating a regular file fails.
On another server where everything works I can see it being a regular file by default, but somehow this isn’t the case on another server.
How it looks on a server where it works:
find /data -wholename *-init/diff/etc/hosts -exec ls -l {} \;
-rwxr-xr-x 1 root root 0 Sep 17 14:25 /data/docker/overlay2/c178f63836c51d6fca868cc442f8ffeb0299826b665eeb9d3182e43d9435f489-init/diff/etc/hosts
-rwxr-xr-x 1 root root 0 Sep 17 14:58 /data/docker/overlay2/16353e66e3d96f40768d0bcb7c94351ca3b2661602e09110a10642eeca98ca03-init/diff/etc/hosts
How it looks on the server where it doesn’t:
find /data -wholename *-init/diff/etc/hosts -exec ls -l {} \;
c--------- 1 root root 0, 0 Sep 19 10:19 /data/docker/overlay2/f593733309887fd011324e35bfa5e813b44b62b0a0937bd728a7d79ab0eb6e7c-init/diff/etc/hosts
c--------- 1 root root 0, 0 Sep 19 10:33 /data/docker/overlay2/a96da70f876a18dc93ea677da85d7e02fa30cfcebde8b739cc3ae4e0803eddbb-init/diff/etc/hosts
I looked through some of the source code and saw the /etc/hosts
being bind mounted and some other magic. But I cannot explain why it becomes a character special device
.
I gathered the information by using strace
on dockerd
and following the system calls it makes.
Manually doing the overlay and trying to remove /etc/hosts
and recreating it triggers the same error. So I can reproduce that manually. When I remove the character special device
and make it a regular file and overlay it again it works without issues.
This workaround doesn’t work for Docker as it will recreate the init layer when starting up a new
container every time.
Any ideas what I can check next? I am at a bit at a loss…
Command that I execute to test (with a clean /data/docker
directory):
docker run --rm -it bash
Full docker info:
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.21.3.el7.x86_64
Operating System: OpenShift
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.41GiB
Name: XXX
ID: XXX
Docker Root Dir: /data/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
/data/docker
is a mounted xfs partition.
$ xfs_info /data/docker/
meta-data=/dev/mapper/vgdata-lvdocker isize=512 agcount=4, agsize=32767744 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=131070976, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=63999, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0