We had a issue in our initial deployment where we didn’t allocate enough space to /var. So when the machine ran out of space about 6 hours after installing, we were unable to login to UCP first thing in the morning (kind of amusing actually). We’re not entirely sure what happened, but the web interface would not start on the ‘master’ server. Everything else seemed to start however when quering from other machine’s it didn’t appear the swarm or ucp were fully converged/synched.
In the end, due to deadline, we increased space, removed all images (including manually removing some container folders) and restarted, including rebuilding all nodes, and immediately installed Zabbix to monitor servers.
Recommend the following:
- Update documentation to reflect what directories will be written to/installed into.
- Put sometype of notification to users that space is low (assuming it’s not there already, which I assume not since it’s reporting more space is available then is in reality.)