DLRover can automatically train the Deep Learning model on the distributed cluster.
It helps model developers to focus on model architecture, without taking care of any engineering stuff, say, hardware acceleration, distributed running, etc. Now, it provides automated operation and maintenance for deep learning training jobs on K8s/Ray.