Follow instructions from quick start for Ray on k8s , I got exceptions while
> ✗ kubectl describe pod raycluster-kuberay-head-fx4c9
RAY_CLUSTER_NAME: (v1:metadata.labels['ray.io/cluster'])
RAY_CLOUD_INSTANCE_ID: raycluster-kuberay-head-fx4c9 (v1:metadata.name)
RAY_NODE_TYPE_NAME: (v1:metadata.labels['ray.io/group'])
KUBERAY_GEN_RAY_START_CMD: ray start --head --metrics-export-port=8080 --block --dashboard-agent-listen-port=52365 --num-cpus=1 --memory=2000000000 --dashboard-host=0.0.0.0
RAY_PORT: 6379
RAY_ADDRESS: 127.0.0.1:6379
RAY_USAGE_STATS_KUBERAY_IN_USE: 1
RAY_USAGE_STATS_EXTRA_TAGS: kuberay_version=v1.2.1;kuberay_crd=RayCluster
REDIS_PASSWORD:
RAY_DASHBOARD_ENABLE_K8S_DISK_USAGE: 1
Mounts:
/dev/shm from shared-mem (rw)
/tmp/ray from log-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8gxsm (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
log-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
shared-mem:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 2G
kube-api-access-8gxsm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/raycluster-kuberay-head-fx4c9 to minikube
Warning Failed 14m (x2 over 14m) kubelet Failed to pull image "rayproject/ray:2.34.0": Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded
Normal Pulling 12m (x4 over 14m) kubelet Pulling image "rayproject/ray:2.34.0"
Warning Failed 12m (x4 over 14m) kubelet Error: ErrImagePull
Warning Failed 12m (x2 over 13m) kubelet Failed to pull image "rayproject/ray:2.34.0": Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 12m (x6 over 14m) kubelet Error: ImagePullBackOff
Normal BackOff 4m52s (x36 over 14m) kubelet Back-off pulling image "rayproject/ray:2.34.0"
> ✗ kubectl logs raycluster-kuberay-head-fx4c9
Error from server (BadRequest): container "ray-head" in pod "raycluster-kuberay-head-fx4c9" is waiting to start: trying and failing to pull image
I also tried another minikube cluster instead of kind cluster, but got the same output after installing raycluster helm install raycluster kuberay/ray-cluster --version 1.2.1
I can not understand the describe pod
output the request timeout for https://registry-1.docker.io/v2/, as the the pull image succeeds when running docker directly:
> ✗ docker pull rayproject/ray:2.34.0
2.34.0: Pulling from rayproject/ray
9b857f539cb1: Pull complete
6385f74c231a: Pull complete
d93807efc02c: Pull complete
4f4fb700ef54: Pull complete
e00095e97bc6: Pull complete
62bc57cef369: Pull complete
0c606dbe74e6: Pull complete
0d71e9581bc0: Pull complete
83d53ff2b179: Pull complete
Digest: sha256:d3f0831b510ce4569499540117a6d1b7c36b9f924616097657364d0c8f069f15
Status: Downloaded newer image for rayproject/ray:2.34.0
docker.io/rayproject/ray:2.34.0
Thanks for your suggestion.