Hello,
I’m just starting my way in the docker world and many (basic) principles of how everything is organized are still unclear. Please help me to understand how should we approach troubleshooting failed docker tasks.
My docker service doesn’t work but that is a secondary problem. The primary issue is that it’s totally unclear how to troubleshoot that.
This is my docker-compose file, it consists of an application service and mongodb service. The application service writes logs to the /opt/myapp/log/app.log. Complete sources can be found here. I also built a corresponding docker image and uploaded it to the dockerhub - deniszhdanov/docker-swarm-troobleshoot-service:1
Let’s start the stack:
docker swarm init
Swarm initialized: current node (xpkngdn0vpr73nioalzbkem1k) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-6109nv6pn7eb9gtam8bq4m198k5sk7ztzf7hy7yfv5c47kcrmq-9fbrmmccd977kx22mivs7segn 192.168.65.3:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
docker stack deploy -c docker-compose.yml myapp
Creating network myapp_default
Creating service myapp_web
Creating service myapp_db
After that, let’s wait for a little while (~1 minute) and proceed:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
31e9f3a8f5aa deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" 31 seconds ago Up 25 seconds 8090/tcp myapp_web.1.sij3z7cbbynsxos6608ru2f8a
9fc6a5868c12 mongo:latest "docker-entrypoint.s…" About a minute ago Up About a minute 27017/tcp myapp_db.1.gxl5xwj1tg80nr16clbskk2oc
a3ff2ba0c8c5 deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" About a minute ago Exited (137) 32 seconds ago myapp_web.1.3dv8x2dx6kig4qkf1wc2axro8
We see that there is a failed task. Let’s try to understand what went wrong:
docker commit a3ff2ba0c8c5 snapshot
sha256:bec4756cadebbada400b4d1037cac671168396bf73b7d3e875c6f98f63522afd
docker run --rm -it snapshot /bin/sh
/opt/myapp # cat /opt/myapp/log/app.log
2018-10-05 15:34:20 - Starting Start on a3ff2ba0c8c5 with PID 1 (/opt/myapp/lib/myapp.jar started by root in /opt/myapp)
2018-10-05 15:34:21 - No active profile set, falling back to default profiles: default
The task’s log doesn’t contain target information which would be enough to troubleshoot the problem. However, that data is available when we run the application image standalone:
docker run --rm -d deniszhdanov/docker-swarm-troobleshoot-service:1
825a818b425feb7ed1f593c14a411efb68457aee9c6bfcf27f745fd58cfa0001
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
825a818b425f deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" 46 seconds ago Up 44 seconds 8090/tcp zen_bardeen
docker exec -it 825a818b425f /bin/sh
/opt/myapp # cat /opt/myapp/log/app.log
2018-10-05 15:30:09 - Starting Start on 825a818b425f with PID 1 (/opt/myapp/lib/myapp.jar started by root in /opt/myapp)
2018-10-05 15:30:09 - No active profile set, falling back to default profiles: default
2018-10-05 15:30:09 - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@18ef96: startup date <Fri Oct 05 15:30:09 GMT 2018>; root of context hierarchy
2018-10-05 15:30:10 - Cluster created with settings {hosts=<db:27017>, mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2018-10-05 15:30:10 - Adding discovered server db:27017 to client view of cluster
2018-10-05 15:30:11 - Exception in monitor thread while connecting to server db:27017
com.mongodb.MongoSocketException: db: Name does not resolve
at com.mongodb.ServerAddress.getSocketAddress(ServerAddress.java:188)
at com.mongodb.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:59)
at com.mongodb.connection.SocketStream.open(SocketStream.java:57)
at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:126)
at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:114)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: db: Name does not resolve
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at com.mongodb.ServerAddress.getSocketAddress(ServerAddress.java:186)
... 5 common frames omitted
Huh, it took a while to describe all of that, thanks to everyone who managed to get to this point
Questions:
- Why do we have different states for standalone containers and tasks?
- What is the recommended way to troubleshoot failing tasks?
Regards, Denis