ACCESS GPU Resources : TechWeb : Boston University

Many deep learning applications can be trained using a single or may require multiple GPUs. In general, SCC users are limited to using a maximum of 4 GPUs on a single node unless they have access to reserved buy-in resources. If your application requires more GPUs that are available on the SCC, we recommend requesting resources from ACCESS. For information about getting credits to access these resources please see this page. We advise using the NCSA Delta system for distributed multi-node GPU computations. The full list of resource providers can be found here.

Each of these clusters uses the SLURM workload manager to schedule batch jobs on their system. We provide an overview of hardware details about each system below. For specific details on using SLURM on these clusters and complete hardware details please click on the documentation links in each section. At the end of this documentation we provide a link to a Github repository that contains two example codes which demonstrate how to run multiple-GPU distributed node computations on the Delta system.

Sections

Delta
Additional Access GPU resources

Delta

The University of Illinois NCSA Delta system is designed to run applications on GPU nodes or hybrid CPU-GPU nodes. The Delta system is comprised of 5 node types. The following are the relevant 4 GPU node types:

Number of nodes	Number of GPUs per node	GPU type	Memory
100	4	NVIDIA A40	40GB
100	4	NVIDIA A100	48GB
6	8	NVIDIA A100	40 GB
1	8	AMD M100	32 GB

Users can log on to the Delta system by following the instructions at this link. Delta users can ssh to command line login nodes. Alternatively, there is an Open OnDemand interface that is similar to the SCC.

Example GPU codes for Delta

Follow this Github link for documentation on the example codes.

Additional Access GPU resources

Rockfish

The JHU Rockfish system is a community-shared cluster housed at the Maryland Advanced Research and Computer Center in Baltimore. The GPU nodes that are available are:

Number of nodes	Number of GPUs per node	GPU type	Memory
18	4	NVIDIA A100	40 GB
6	4	NVIDIA A100	80 GB

FASTER

The TAMU FASTER system is a Dell x86 HPC cluster consisting of 180 compute nodes. Researchers with allocations on FASTER can request up to 10 composable GPUs. This means that GPU resources are added to a compute node on the fly. The GPU architectures that are composable to the compute nodes are:

Number of GPUS	GPU type	Memory
200	NVIDIA T4	16 GB
40	NVIDIA A100	40 GB
8	NVIDIA A10	24 GB
4	NVIDIA A30	24 GB
8	NVIDIA A40	24 GB

Last updated: Loading…