We’ve been running a mixed node (Linux/Windows) docker swarm for three years and have been happily upgrading docker engine versions without issues (ie. running nodes on different docker and os versions).
However, we have recently updated some canary worker nodes in development running FedoraCoreOS to docker 20.10.5 vs the rest of the fleet which is 19.03.13. These new canary nodes operate fine for up to 24 hours but then will simultaneously crash synchronously.
We note that the newer docker engine is running containers as containerd-shim-runc-v2 vs older nodes which are using containerd-shim-runc-v1. Are there compatability issues with running these together? Is there any documentation on this?
- Issue type - Worker Node Docker Engine Panics & Core Dumps
- OS Version/build - Fedora Core OS Next Stream (Fedora 34) 34.20210328.1.0
- App version - Docker Engine 20.10.5
- Steps to reproduce - Unable to Isolate Completely Yet - We have seen:
Option 1: Wait 24 Hours w/Running Containers - Without touching management plane fails spontaneously
Option 2: Occasionally
docker service update --force <service>will crash new nodes