"docker run --rm [image] ls /path/to/file" leaves container stale

I am a mediocre docker user and since today I am baffled by a command which makes a mess of the docker host service. Context: I am running various docker containers with traefik as reverse proxy and I am planning to run openvpn as vpn server in this setup, using the image kylemanna/openvpn.

Using ansible I need to check if a server conf is already available or not. The problem is docker run --rm kylemanna/openvpn ls /etc/openvpn/openvpn.conf keeps the container alive, I don’t get the terminal back and killing the container in a secondary session just makes things worse. I am not sure if this is because of something in the image used or that it’s something in docker-ce:

me@server:~$ docker run -v openvpn-data:/etc/openvpn --rm kylemanna/openvpn ls /etc/openvpn/openvpn.conf
ls: /etc/openvpn/openvpn.conf: No such file or directory
[ now I don't get any prompt back.... ]

In another session:

me@server:~$ docker container ls
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS                  PORTS                                      NAMES
f23db27b6184        kylemanna/openvpn                       "ls /etc/openvpn/pki"    10 hours ago        Up 10 hours             1194/udp                                   confident_chebyshev
8027cd41fe77        kylemanna/openvpn                       "ls /etc/openvpn/ope…"   12 hours ago        Up 12 hours             1194/udp                                   tender_saha
3f8fa70c4912        kylemanna/openvpn                       "ovpn_initpki"           12 hours ago        Up 12 hours             1194/udp                                   flamboyant_torvalds
49f2314af137        traefik:2.1                             "/entrypoint.sh trae…"   10 days ago         Up 22 hours             0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   traefik
me@server:~$ docker container stop confident_chebyshev 
Connection to example.com closed by remote host.
Connection to example.com closed.
me@workstation:~$ ssh server 

Last login: Thu Feb 27 18:01:20 2020 from 77.165.23.64
me@server:~$ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
me@server:~$ sudo reboot now
[sudo] password for me: 
Connection to example.com closed by remote host.
Connection to example.com closed.
me@workstation:~$ ssh server 

Last login: Thu Feb 27 18:03:16 2020 from 77.165.23.64
me@server:~$ docker ps
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS                          PORTS                                      NAMES
bed95f29d521        kylemanna/openvpn                       "ovpn_run --proto udp"   11 hours ago        Restarting (1) 10 seconds ago                                              ovpn-udp
49f2314af137        traefik:2.1                             "/entrypoint.sh trae…"   10 days ago         Up 49 seconds                   0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   traefik

A few things which look odd to me:

-I don’t get a prompt back when performing ls in the container
-docker can’t stop the container, it kills it after 10 (or so) seconds
-when killing, my ssh session is lost. Sshd runs on the host, not inside a container
-when going back, docker seems not running, a reboot solves the problem
-if there are more kylemanna/openvpn containers stale, all of them are gone after reboot

What is going on here? Is this because of something in the openvpn image, did I hit a problem with docker or should I execute the command differently? Any help is greatly appreciated!

Some more info on the system:

me@server:~$ docker --version
Docker version 19.03.6, build 369ce74a3c
me@server:~$ uname --al
Linux server 4.15.0-88-generic #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Try adding -ti to your commandline, see:

This didn’t make any difference unfortunately. The ls command looks already getting stuck.

What I found in syslog when docker stop eventually tries to kill the container:

Feb 28 14:14:08 nuala dockerd[897]: time=“2020-02-28T14:14:08.348567783Z” level=info msg=“Container failed to stop after sending signal 15 to the process, force killing”
Feb 28 14:14:08 nuala dockerd[897]: time=“2020-02-28T14:14:08.348666753Z” level=error msg=“Stop container error: Failed to stop container 49f2314af13752be4884bf699cc837f0b19e3efa26666249ab206cc110ee211f with error: Cannot kill container 49f2314af13752be4884bf699cc837f0b19e3efa26666249ab206cc110ee211f: all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused": unavailable”
Feb 28 14:14:08 nuala dockerd[897]: time=“2020-02-28T14:14:08.399354238Z” level=info msg=“Daemon shutdown complete”

Here is something I missed the first time. You do pass the argument 'ls /etc/openvpn/openvpn.conf` to the entrypoint script or command of the container. I am pretty sure the image maintainer does not simply forward commands to bash, but instead has a convionence script that supports a couple of predefined commands. Please look them up on the maintainer page on Dockerhub.

Another thing is that you started the container in the foreground. This is why you don’t get the prompt of the host back. Due to the lack of -i you don’t get the prompt inside the container either.

Obviously your user is in the docker group, so requiring the root user for the docker commands can’t be the problem. Your last observation of the restarted openvpn container does not match to the docker run command line you provided. As the line neither adds capabilities, nor uses privliged mode, it is highly unlikely that the container messes something up on your engine.

I just tested something out to perform the ls check for file existence with alpine and using the openvpn image to generate the config. I removed the volume before I started so everything should start fresh. Again, after ovpn_genconfig the ls hangs again. Is this because of something inside the volume?

me@server:~$ docker volume create ovpn-data
ovpn-data
me@server:~$ docker run -v ovpn-data:/etc/openvpn --rm -ti alpine ls /etc/openvpn/openvpn.conf
ls: /etc/openvpn/openvpn.conf: No such file or directory
failed to resize tty, using default size
                                        me@server:~$
me@server:~$ docker run --name tmp-openvpn-genconfig --rm -v ovpn-data:/etc/openvpn
kylemanna/openvpn ovpn_genconfig -u udp://example.com
Unable to find image 'kylemanna/openvpn:latest' locally
latest: Pulling from kylemanna/openvpn
050382585609: Pull complete                                                            944a899b9c42: Pull complete                                                            59afa6e6f5d8: Pull complete                                                            f2941e48588b: Pull complete                                                            18e0142d2a50: Pull complete                                                            Digest: sha256:266c52c3df8d257ad348ea1e1ba8f0f371625b898b0eba6e53c785b82f8d897e
Status: Downloaded newer image for kylemanna/openvpn:latest
Processing PUSH Config: 'block-outside-dns'
Processing Route Config: '192.168.254.0/24'
Processing PUSH Config: 'dhcp-option DNS 8.8.8.8'
Processing PUSH Config: 'dhcp-option DNS 8.8.4.4'
Processing PUSH Config: 'comp-lzo no'
Successfully generated config
Cleaning up before Exit ...
me@server:~$ docker run -v ovpn-data:/etc/openvpn --rm -ti alpine ls /etc/openvpn/openvpn.conf
/etc/openvpn/openvpn.conf
failed to resize tty, using default size

So I guess I can exclude the openvpn image to blame for this, since alpine triggers the same. As in this time the problem only occurs after genconfig, could there be something wrong with the volume?

Depends on whether you installed a docker volume plugin and managed to make it the default volume plugin or not :slight_smile:

If you didn’t then by default the local volume plugin is used. It creates a folder in /var/lib/docker/volumes/{volume-name} and mount binds its _data subfolder into the target folder inside the container. I would be surprised if this realy would be the problem.

The whole problem does not realy make sense if the local driver is used.

And I use the local driver which brings me back to the quote in the start post:

and since today I am baffled by a command which makes a mess of the docker host service

:stuck_out_tongue: or actually :sob:

On what level can this kind of behaviour be explained? BTW originally I was executing ansible playbooks from a Linux host. I also tested an SSH session in a Windows command prompt, so different terminals (which shouldn’t really make a difference) are all giving same results.

I reproduced your steps from post 5. My output is identical to yours, with the difference that I don’t get the failed to resize tty, using default size messages and my terminal returns.

I have no idea what causes it to behave differently on your system. It shouldn’t matter if ansible drives commands or uses the pi-docker module to directly interact with the docker-api, at the end the result should be the same. The only oddity is the error message you receive in your commands.

I implicitly assumed that your docker engine runs on linux too… doesn’t it? I have no experience with the non linux builds. From my experience Windows and MacOS users have the most and weirdest problems with even the most basic functionality, while Linux users still are the only first class citizens where those things work as expected…

This test server is a digitalocean droplet. I have a Linux machine at home to test with, I’ll give it a try shortly. If I can’t reproduce the error locally, I think I need to contact DO about something in their VM.

Just to confirm, I tested the same commands on another host (local Linux machine) and things worked flawlessly. I contacted DO support to have a look at this.

Hope they will sort it out quickly.

Did you know a volume plugin exists for DO: https://github.com/omallo/docker-volume-plugin-dostorage
If you run a multi node docker setup, having blockdevices attached as volumes to the containers is pretty sweat and more reliable then using a remote share.