Docker Community Forums

Share and learn in the Docker community.

DTR Install fails on Couldn't reconfigure


(Nicolas Bihan) #1

FATA[0044] Couldn’t reconfigure: Couldn’t register with enzi: &errors.errorString{s:“unable to cast api errors when creating account: unable to perform request: Post https://dudockv1.mydomain.com/enzi/v0/accounts: EOF”}

Node is healthy, I tried several times. There is no way to get DTR installed.

This is the command line I am using
docker run -it --rm docker/dtr install --dtr-external-url https://dudockv2.mydomain.com --ucp-node dudockv2.mydomain.com --ucp-username admin --ucp-insecure-tls --ucp-url https://dudockv1.mydomain.com

If I try again

ATA[0004] Failed to get bootstrap client: Failed to connect to UCP; make sure that you are using a domain listed in UCP’s TLS certificate’s subject alternate names: Get https://dudockv1.mydomain.com/_ping: x509: certificate is valid for dudockv2.mydomain.com, not dudockv1.mydomain.com

Which seems to indicate that CA is wrong obviously, do I need to get a CA (not generated) on each node?


(Patrick Devine) #2

Hey Nicholas,

I think there’s two things which are not set correctly:

  1. For the --ucp-node flag, just use the node name which is specified inside of UCP (it’s probably something like dudockv2). You can actually leave this setting blank and the bootstrapper will give you a list of nodes to choose from.
  2. How was the cert set up for the UCP nodes? It looks like you’re somehow getting the wrong cert passed to you? You can snag the cert manually w/ curl -k https://dudockv1.mydomain.com/ca > ucp.crt and then check it with openssl x509 -in ucp.crt -text -noout.

(Nicolas Bihan) #3

Thanks Patrick,

the --ucp-node flag seems ok and was given by the ucp.

Not sure about the cert on my UCP nodes, I just used the swarm join command given by the UCP.
In my cluster, dudockv1 is my UCP, dudockv2 is where I try to install DTR.
So this message telling me that the cert is valid for the DTR host and not the UCP host is confusing…

I am using CentOS 7.2 by the way…

I tried to install on a different node and now error message is different

[nbihan@dudockv3 ~]$ docker run -it --rm docker/dtr install \

–dtr-external-url https://dudockv3.mydomain.com
–ucp-node dudockv3.mydomain.com
–ucp-username nbihan
–ucp-insecure-tls
–ucp-url https://dudockv1.mydomain.com
INFO[0000] Beginning Docker Trusted Registry installation
ucp-password:
INFO[0005] Validating UCP cert
INFO[0005] Connecting to UCP
INFO[0005] UCP cert validation successful
INFO[0006] The UCP cluster contains the following nodes: dudockv4.mydomain.com, dudockv2.mydomain.com, dudockv6.mydomain.com, dudockv3.mydomain.com, dudockv5.mydomain.com, dudockv1.mydomain.com
INFO[0006] verifying [80 443] ports on dudockv3.mydomain.com
INFO[0000] Validating UCP cert
INFO[0000] Connecting to UCP
INFO[0000] UCP cert validation successful
INFO[0000] Checking if the node is okay to install on
INFO[0000] Connecting to network: dtr-ol
INFO[0000] Waiting for phase2 container to be known to the Docker daemon
INFO[0001] Starting UCP connectivity test
INFO[0001] UCP connectivity test passed
INFO[0001] Setting up replica volumes…
INFO[0001] Creating initial CA certificates
INFO[0001] Bootstrapping rethink…
INFO[0001] Creating dtr-rethinkdb-cba01a335921…
INFO[0006] Waiting for database dtr2 to exist
INFO[0020] Generated TLS certificate. domain=dudockv3.mydomain.com
INFO[0021] License config copied from UCP.
INFO[0021] Migrating db…
INFO[0000] Migrating database schema fromVersion=0 toVersion=6
INFO[0004] Waiting for database notaryserver to exist
INFO[0004] Waiting for database notarysigner to exist
INFO[0004] Waiting for database jobrunner to exist
INFO[0007] Migrated database from version 0 to 6
INFO[0028] Starting all containers…
INFO[0028] Getting container configuration and starting containers…
INFO[0028] Recreating dtr-rethinkdb-cba01a335921…
INFO[0034] Creating dtr-registry-cba01a335921…
INFO[0037] Creating dtr-garant-cba01a335921…
INFO[0039] Creating dtr-api-cba01a335921…
INFO[0042] Creating dtr-notary-server-cba01a335921…
INFO[0044] Recreating dtr-nginx-cba01a335921…
INFO[0046] Creating dtr-jobrunner-cba01a335921…
INFO[0049] Creating dtr-notary-signer-cba01a335921…
INFO[0052] Creating dtr-scanningstore-cba01a335921…
INFO[0054] Trying to get the kv store connection back after reconfigure
FATA[0054] Couldn’t reconfigure: Couldn’t register with enzi: &errors.errorString{s:“unable to cast api errors when creating account: unable to perform request: Post https://dudockv1.mydomain.com/enzi/v0/accounts: read tcp 172.17.0.5:40074->172.17.0.1:443: read: connection reset by peer”}
FATA[0067] Failed to execute phase 2: Phase 2 returned non-zero status: 1


(Patrick Devine) #4

Hey Nicholas,

Do you know if the auth service for UCP is down? Are you able to log in/out of UCP correctly?


(Nicolas Bihan) #5

Yes, the UCP server is up and I can login/out to the web interface.

What is weird is the IP addresses resolved
https://dudockv1.mydomain.com/enzi/v0/accounts: read tcp 172.17.0.5:40074->172.17.0.1:443
Is it normal to get the Docker network?


(Patrick Devine) #6

Yeah… it does look a little weird. I asked around and one thought was maybe the auth containers were restarting. You mentioned you were using CentOS 7.2. Do you know what version linux kernel you’re using? There are some issues with the 3.10 kernel which ships as default.


(Nicolas Bihan) #7

Well, might just be that
This is running on Kernel 3.10.0-514.6.2.el7.x86_64
I will try after updating the Kernel to 4.11


(Nicolas Bihan) #8

Same error with Kernel 4.11.3-1.el7.elrepo.x86_64
Both on UCP node and DTR node…

I will try to reinstall everything I guess.
Which Linux distro is working the best for you?

Here is the log when I install UCP

INFO[0000] Verifying your system is compatible with UCP 2.1.4 (10e6c44)
INFO[0000] Your engine version 17.03.1-ee-3, build 3fcee33 (4.11.3-1.el7.elrepo.x86_64) is compatible
Admin Username: adminuserblah
Admin Password:
Confirm Admin Password:
INFO[0015] All required images are present
We detected the following hostnames/IP addresses for this system [dudockv1.mydomain.com 127.0.0.1 172.17.0.1 192.168.141.101]

You may enter additional aliases (SANs) now or press enter to proceed with the above list.
Additional aliases: dudockv1

Where/why does it get 172.17.0.1 from?


(Nicolas Bihan) #9

Unfortunately, same error after reinstalling everything…

FATA[0050] Couldn’t reconfigure: Couldn’t register with enzi: &errors.errorString{s:“unable to cast api errors when creating account: unable to perform request: Post https://dudockv1.mydomain.com/enzi/v0/accounts: read tcp 172.17.0.5:59026->172.17.0.1:443: read: connection reset by peer”}

ifconfig gives me that 172.17.0.1 id docker0

[nbihan@dudockv1 ~]$ ifconfig
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0


(Patrick Devine) #10

Hi Nicolas,

You should only need to upgrade the kernel, not reinstall UCP or the node. Really sorry you’ve been having so many problems. I’ll see if I can get someone from the networking team to take a look.


(Nicolas Bihan) #11

Looks like it’s a network issue. I have a feeling our DNS isn’t helping here…
Would it make sense and is it possible to try with IP addresses?

thanks


(Patrick Devine) #12

Oh! We’ve seen weird things happen before with slow DNS. On the host machine, could you check if there’s a search line in /etc/resolv.conf? You could potentially try commenting it out and re-installing DTR.

You can try w/ IP addresses, but I haven’t tried to install with newer versions of UCP, so I’m not certain if there are any obvious pitfalls there off the top of my head. My guess is /etc/resolv.conf would probably be more simple.


(Nicolas Bihan) #13

I had indeed a search line in /etc/resolv.conf and when I ping the ucp server I got the 172.18.0.1 address…
Still, same after I removed the search line.
So, I hard coded in my /etc/hosts the ucp ip address/name, ping is correct now.
But still, same error… This is starting to drive me crazy…

I will try with ip address but I have a bad feeling about tls certiicates,.


(Nicolas Bihan) #14

And IP address made it work!!! I can’t believe I didn’t try before.
Anyway, that seems suspect to me and OK for my dev/test instances but not production.
I have a feeling this is a case of not so good DNS, but do you know how the installer is resolving the server?

Here is how I got it to work

docker run -it --rm docker/dtr install
–dtr-external-url https://192.168.141.102
–ucp-node dudockv2.mydomain.com
–ucp-username admin
–ucp-insecure-tls
–ucp-url https://192.168.141.101

Thank you so much for your help Patrick!


(Nicolas Bihan) #15

Just a follow-up, I bricked my install by changing the Domain in the DTR settings by trying to set the host name.
After doing so, the certificate became invalid and any request was rejected.

So I reinstalled it… It must be some kind of a curse…


(Patrick Devine) #16

Hey Nicolas,

So glad you got things to work. I’d love to see if we can replicate what went wrong with your install so we can fix things on our end. It sounds like:

  • slow DNS
  • CentOS 7.2
  • Linux 3.10 kernel

Was this on physical hardware or inside of VMs?


(Nicolas Bihan) #17

Inside a VM, I will get more info about it tomorrow.


(Nicolas Bihan) #18

So, our VMs are running on esxi hosts that are running esxi 5.5. The vSphere cluster is on version 6.0.
They are running Centos 7.3 each host has 3 vCPU and 12 GB.
They’re all on the same subnet in our own network.


(Nicolas Bihan) #19

Well, it seems that our DNS was acting up and all that mess was a result of that…
I was not impressed…

Thank you again for your help!


(Jacob Roy) #20

INFO[0004] Waiting for database dtr2 to exist
INFO[0010] Generated TLS certificate. domain=192.168.1.41
INFO[0010] License config copied from UCP.
INFO[0010] Migrating db…
rpc error: code = 2 desc = oci runtime error: exec failed: cannot exec a container that has run and stopped

FATA[0011] Couldn’t migrate database: exit code 126
FATA[0021] Failed to execute phase 2: Phase 2 returned non-zero status: 1
[root@SA-ucp-DTR ~]#
$$$$$$$$$$
Any help please