Set Up and Test MSCCL
This article contains some straightforward instructions for getting MSCCL up and running on a cluster. It assumes that lmod contains the necessary dependencies (CUDA, MPI, etc.). If you’re running on a cluster without lmod, you may need to install these dependencies manually.
Build Parts
The first part of this article will be focused on building all parts necessary for MSCCL.
Set Environment Variables
If you’re running on a cluster with Slurm, you can load the CUDA module: