Docker Community Forums

Share and learn in the Docker community.

DTR Install Hangs Post DB Configuration

Hi,
I’m having issues when trying the Docker Datacenter and DTR install,
The UCP install works great I’m going with this configuration :
1 Manager (RHEL 7.3 2/4 + 50Go of devicemapper)
3 Worker (RHEL 7.3 2/4 + 50Go of devicemapper (100Go for the Dedicated DTR node))
I’m launching the install this way :
docker run -it --rm docker/dtr install --dtr-external-url https://10.10.10.10 --ucp-node mydtrnode01 --ucp-username root --ucp-password 'XoXoXo' --ucp-url https://ucp-qa.my.company.com:443 --http-proxy http://my.company.proxy --https-proxy http://my.company.proxy --no-proxy '.mycompany.com, localhost' --debug
the log stops at :

INFO[0005] (01/04) Configured Table "action_configs"
INFO[0005] (02/04) Configured Table "crons"
INFO[0005] (03/04) Configured Table "joblogs"
INFO[0005] (04/04) Configured Table "jobs"
INFO[0005] Migrated database from version 0 to 5

The DTR version is 2.1.4 :

there is no error on the docker logs either on the UCP or on the DTR node.
here is the docker info :

Containers: 9
 Running: 5
 Paused: 0
 Stopped: 4
Images: 10
Server Version: 1.12.6-cs7
Storage Driver: devicemapper
 Pool Name: docker-thinpool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 983 MB
 Data Space Total: 102 GB
 Data Space Available: 101 GB
 Metadata Space Used: 409.6 kB
 Metadata Space Total: 1.07 GB
 Metadata Space Available: 1.069 GB
 Thin Pool Minimum Free Space: 10.2 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-09-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host null overlay
Swarm: active
 NodeID: 937y8qj2ynixlm8yya1ts1vks
 Is Manager: false
 Node Address: 10.10.10.10
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.688 GiB
Name: mydtrnode01
ID: XLFS:PZVQ:SVQQ:X3DJ:HJ3S:PWZQ:J2DD:WIXT:XFI6:UMPA:VF5P:4LDY
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 74
 Goroutines: 135
 System Time: 2017-02-01T10:32:25.858132495+01:00
 EventsListeners: 2
Http Proxy: http://my.company.proxy/
Https Proxy: http://my.company.proxy/
No Proxy: localhost, 127.0.0.1/8,.my.company.com
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

Thank you for your help on this issue

It looks like it’s stopping right where it’s about to fetch/run each of the containers, so I would suspect that the proxy isn’t working correctly for some reason.

To get around the proxy, can you try this on the node that you’re going to install on?

docker run --rm docker/dtr:2.1.4 images | xargs -n1 docker pull

Then try re-running the installation again.

Hi Patrick,
Thank’s For your anwser,
I’ve already tried this one.
I’ll retry to pull the images on every node, to be sure

Sadly even after pulling the images on every nodes, same behaviour,
I’ve tried bypassing, our VIP (that is handling the SSL certificate) but same behaviour,
an interessting thing is when I trie to remove the DTR like this :
docker run -it --rm docker/dtr:latest remove --ucp-url https://ucp-qa.my.company.com:443 --ucp-username root
without the --force-remove I’ve got this Error :

INFO[0003] Validating UCP cert
INFO[0003] Connecting to UCP
INFO[0003] UCP cert validation successful
INFO[0003] This cluster contains the replicas: 2477523e36ac
Choose a replica to remove [2477523e36ac]:
INFO[0004] This cluster contains the replicas: 2477523e36ac
Choose any healthy replica [2477523e36ac]:
ERRO[0005] Remove has failed. Try running it again.
FATA[0005] Attempting to remove replica without notifying the rest of the cluster. You need to use --force-remove to do this to confirm that you understand that this may break your cluster.

is there something on a UCP level that is needed to fully remove the DTR (the --force-remove option seems to work great) ?

Can you check the containers which are running on the each of the nodes? It’s possible that installation was successful before. Just look for containers whose names start with dtr-*.

The way the bootstrapper works is that it runs in two phases. The first phase starts up a copy of itself on the swarm cluster and which then starts the installation. The log snippet that you showed was generated from the second phase, whereas if you had press Ctrl-C you would have canceled the first phase, so it’s possible everything was running fine and the second phase completed without you realizing.

Hi,
Thank you for your help,
I upgraded to Docker engine 1.13 cleaned everything and re-launched the install with the right parameters,
it worked first try…
It was maybe an old running of DTR install…
Thanks !

1 Like