Docker-compose network crash or something else? Help me to debug

Hi all,

We have built a distributed worker system using Docker-compose. Jobs read a lot of data from PostgreSQL in our compamy network. RabbitMQ works as a job queue.

The problem we have is that suddenly one worker reports a connection timeout to db. Then all workers stop working and I can no longer access RabbitMQ manage portal on mapped port at localhost:8080.

Has the whole docker network chrashed, since an error in one worker shouldn’t affect to other workers and to RabbitMQ?

Have you any idea what happend?
How can I debug this scenario? I need little bit help figuring out what has happened here.

So I can no access localhost:8080 where manage portal should be. This normally works.
RabbitMq image is running since docker exec tools_rabbitmq_1 rabbitmqctl list_queues
works and running ping in worker conteiners to mabbitmq container responds correctly.

In the PostgreSQL logs I can see following errors:

  • incomplete startup packet
  • could not receive data from client: An existing connection was forcibly closed by the remote host.

Currently docker runs on my Windows 10 laptop. I have latest versions of docker:

  • Docker version 17.03.1-ce, build c6d412e
  • Docker-compose version 1.11.2, build f963d76f

Thank you for any help in advance!

  1. Can you share a minimal test case that reproduces the problem reliably?
  2. Does the problem only manifest in Docker for Windows or also when running Docker on a stand-alone Linux machine? In the latter case, please open an issue on https://github.com/docker/docker . In the former case, please run diagnostics, and provide the diagnostic id and 1) in an issue on here: https://github.com/docker/for-win/issues/new

Back on my desk.
Thank you for the reply.

I tried to create a mimicking minimal test application, but could not replicate the issue.

I’ll try to run this next on my ubuntu machine and report the progress.