Docker how to run multi-node training

I try to run multi-node training with NGC’s PyTorch container with DGX nodes. My node connection is Infiniteband. And container host could connect each other.
The commend I used to create docker is
sudo docker run --gpus all -it --user root --shm-size=1g --network=host -v /workspace/Megatron-LM-NEO:/workspace/megatron -v /workspace/data_processed:/workspace/data_processed -v /workspace/falcon-refinedweb:/workspace/dataset -v /workspace/general_training/out:/workspace/checkpoints -v /root/.ssh/:/root/.ssh nvcr.io/nvidia/pytorch:23.10-py3
~

I found that inside container I could only ssh to other container host, but I can not ssh to other docker container. I doubt that lead my multinode training hang, Anyone has idea how could I solve it?