Docker Community Forums

Share and learn in the Docker community.

DTR dead after running one week

dtr

(A496) #1

I install DTR one weeks ago and leave it there.
Then I come back and find that the DTR UI page shows 502 gateway error.

I check the /load_balancer_status it shows the api is down.

So I try to restart the api container.

docker stop xxxxx
docker start xxxxx
but this don’t works

The last log in api container still weeks ago.

{“level”:“error”,“msg”:“Couldn’t get kernel info: fork/exec /bin/uname: cannot allocate memory”,“time”:“2017-06-07T19:45:48Z”}

But my system still have 1GB free mem.


Then I come to UCP and try to restart the whole DTR application and encounter 404 error 

and then I find that even the monitor endpoint is also dead.

logs shows:

2017/06/20 02:00:50 [error] 14#0: *529214 no live upstreams while connecting to upstream, client: 192.168.1.7, server: , request: "GET /favicon.ico HTTP/1.1", upstream: "https://api/favicon.ico", host: "192.168.1.249", referrer: "https://192.168.1.249/load_balancer_status"
*** Shutting down /bin/nginxwrapper (PID 6)...
*** Init system aborted.
*** Killing all processes...


note 192.168.1.7 is my server where I acess DTR and UCP.

(Patrick Devine) #2

Can you post some specs for your setup? Which OS are you using, which version of UCP/DTR, how much memory you’ve got, how many nodes?


(Hector8044) #3

Having almost exactly the same problem. Any new information about this? Has anybody been able to fix this issue?

Thanks,
H


(Patrick Devine) #4

Hey Hector,

Do you have any more info to go on? (Host OS, memory, DTR/UCP versions, whether this is HA or not, log files, etc…)


(Hector8044) #5

Hi Patrick:

Actually I was able to figure out the problem or at least the source of the problem. My environment was using Azure blob storage which got changed/updated by the Azure team. The 502 error is basically linked to the fact that the UCP/DTR cannot access the persistent storage configured.


(Patrick Devine) #6

Hey Hector,

Glad you figured it out. I don’t think there’s anything we could do about preventing the storage from going away, but I’m wondering if there was a better error message we could have given which would have made it more obvious what was going on on the back end. I’ll mention it to the rest of the team.