Train and monitor live metrics. Distribute jobs on one or more GPUs. Archive all experiments, compare results and publish best models to production safely.