mcore-run-on-slurm
v?How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions, monitoring, and per-rank failure diagnosis.
0· 0·0 当前·0 累计
运行时依赖
无特殊依赖
安装命令
点击复制官方npx clawhub@latest install mcore-run-on-slurm
镜像加速npx clawhub@latest install mcore-run-on-slurm --registry https://cn.longxiaskill.com镜像同步中