Topics
- Multiprocessing with CPUs
- Compile a program parallelized with OpenMP directives
- Compile a program parallelized with MPI
- Compiling 32-bit executables
- Multiprocessing with GPUs
Multiprocessing with CPUs
The use of multiple CPUs to achieve parallel speedup has been practiced for decades. Mature enabling paradigms and associated software are well understood and broadly adopted. Among these the most commonly used paradigms (which the SCC supports) are Message Passing Interface (MPI) for distributed memory systems and OpenMP for shared memory, thread-based, computer systems. These paradigms support common languages — C, C++, and Fortran — for which you can build executable with the SCC provided GNU and PGI families of compilers. Please visit the Compilers page for more details, such as how to optimize the performance of your code.
Compile a program parallelized with OpenMP directives
- For GNU compilers, use the
-fopenmp
compiler flag to activate the OpenMP paradigm:scc1$ gfortran myprogram.f <-- OpenMP not turned on scc1$ gfortran -fopenmp myprogram.f <-- OpenMP turned on scc1$ gfortran -fopenmp myprogram.f90 <-- OpenMP turned on scc1$ gcc -fopenmp myprogram.c <-- OpenMP turned on scc1$ g++ -fopenmp myprogram.C <-- OpenMP turned on
Default executable name is
a.out
. Use-o my-executable
to assign a specific name. Whenever possible, use-O3 for highest level of code optimization. See Compilers for more options. - For PGI compilers, use the
-mp
compiler flag to activate the OpenMP paradigm:scc1$ pgfortran myprogram.f <-- OpenMP not turned on scc1$ pgfortran -mp myprogram.f <-- OpenMP turned on scc1$ pgfortran -mp myprogram.f90 <-- OpenMP turned on scc1$ pgcc -mp myprogram.c <-- OpenMP turned on scc1$ pgc++ -mp myprogram.C <-- OpenMP turned on
Default executable name is
a.out
. Use-o my-executable
to assign a specific name. Whenever possible, use-O3 for highest level of code optimization. See Compilers for more options. - For Intel compilers, use the
-openmp
compiler flag to activate the OpenMP paradigm:scc1$ module load intel/2016 <-- Load the Intel compiler module scc1$ ifort myprogram.f <-- OpenMP not turned on scc1$ ifort -openmp myprogram.f <-- OpenMP turned on scc1$ ifort -openmp myprogram.f90 <-- OpenMP turned on scc1$ icc -openmp myprogram.c <-- OpenMP turned on scc1$ icpc -openmp myprogram.C <-- OpenMP turned on
Default executable name is
a.out
. Use-o my-executable
to assign a specific name. Whenever possible, use-fast for highest level of code optimization. See Compilers for more options.Running OpenMP jobs
For program development and debugging purposes, short OpenMP jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. Jobs exceeding these limit will be terminated automatically by the system. All other jobs (> 10 minutes and/or > 4 threads) should run in batch. (See Running Jobs page)
- Run executable
a.out
on a login node:scc1$ setenv OMP_NUM_THREADS 2 <-- set thread count (for tcsh users) scc1$ export OMP_NUM_THREADS=2 <-- set thread count (for bash users) scc1$ ./a.out
- Run executable
a.out
on a compute node in batch:scc1$ setenv OMP_NUM_THREADS 2 <-- set thread count (for tcsh users) scc1$ export OMP_NUM_THREADS=2 <-- set thread count (for bash users) scc1$ qsub -pe omp 2 -V -b y ./a.out
- Run executable
Compile a program parallelized with MPI
Compiling an MPI-enabled program requires the directory path from which the compiler can find the necessary header file (e.g., mpi.h
) and MPI library. For ease of compilation, this additional information are built into wrapper scripts mpif77, mpif90, mpicc,
and mpicxx
for the respective languages they serve: Fortran 77, Fortran 90/95/03, C, and C++. By default, these wrappers are linked to the GNU compilers. For example, mpicc
is, by default, linked to the gcc
compiler while mpif90
points to the gfortran
compiler. Switching to the PGI compilers can be accomplished by specifying the selection through the environment variable MPI_COMPILER
. Note that an undefined (unset) MPI_COMPILER points the wrappers to their respective GNU compilers. To compile an MPI program with Intel compiler, use module commands to load the Intel compiler and a corresponding MPI implementation.
-
To make the MPI wrappers compile with
GNU
compilers:Step 1. The MPI wrappers will use GNU compilers if either
MPI_COMPILER
is unset or set as gnu:scc1$ setenv MPI_COMPILER gnu <-- select gnu compilers (for tcsh users) scc1$ export MPI_COMPILER=gnu <-- select gnu compilers (for bash users)
Step 2. Compile with MPI wrappers:
scc1$ mpif77 myprogram.f scc1$ mpif90 myprogram.f90 scc1$ mpicc myprogram.c scc1$ mpicxx myprogram.C
-
To make MPI wrappers compile with
PGI
compilers:Step 1. Setting
MPI_COMPILER
topgi
makes the wrappers compile with PGI compilers:scc1$ setenv MPI_COMPILER pgi <-- select PGI compilers (for tcsh users) scc1$ export MPI_COMPILER=pgi <-- select PGI compilers (for bash users)
Step 2. Compile with MPI wrappers:
scc1$ mpif77 myprogram.f scc1$ mpif90 myprogram.f90 scc1$ mpicc myprogram.c scc1$ mpicxx myprogram.C
-
To compile an MPI program with
Intel
compilers:Step 1. Use the module command to load the Intel compiler and a corresponding MPI implementation:
scc1$ module load intel/2016 <-- load Intel compiler scc1$ module load openmpi/1.10.1_intel2016 <-- load OpenMPI configured with Intel compiler
Step 2. Compile with MPI wrappers:
scc1$ mpifort myprogram.f scc1$ mpicc myprogram.c scc1$ mpicxx myprogram.C
To switch back to the previous compiler (GNU or PGI), use the module command to remove the Intel compiler and the MPI implementation that have been loaded:
scc1$ module rm intel/2016 <-- remove the Intel compiler scc1$ module rm openmpi/1.10.1_intel2016 <-- remove OpenMPI configured with Intel compiler
-
To check which compiler is currently in use:
Provide the option
-show
to the MPI wrappers:scc1$ mpicc -show <-- show the real command hidden in the wrapper scc1$ mpif90 -show <-- show the real command hidden in the wrapper
or query the path to the wrappers using the command
which
:scc1$ which mpicc <-- show the path to mpicc scc1$ which mpif90 <-- show the path to mpif90
Running MPI jobs
For program development and debugging purposes, short MPI jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. All other MPI jobs (> 10 minutes and/or > 4 threads) should run in batch. (More in Running Jobs page)
- Run MPI executable
a.out
on a login node:scc1$ mpirun -np 4 ./a.out
- Run executable
a.out
on a compute node in batch:scc1$ qsub -pe mpi_4_tasks_per_node 4 -b y "mpirun -np 4 ./a.out"
- Run MPI executable
- If you always use the GNU family of compilers, none of the
MPI_COMPILER
settings described before is needed because the MPI wrappers point to the GNU compilers by default. - On the other hand, if you always use the PGI compilers for MPI compilation, you can permanently set
MPI_COMPILER
to PGI in your.cshrc
or.bashrc
shell script. - If you use the PGI compiler, it is important to read the page on PGI compilers’ impact on job performance and portability.
- If you want to use the Intel compiler, use module commands to load it and its corresponding MPI impementation.
- For MPI options, please consult the specific wrapper manpage. For example,
scc1$ mpiman mpicc
- Examples for building 32-bit GNU executables:
scc1$ gcc -m32 -fopenmp myexample.c <-- for OpenMP code scc1$ mpicc -m32 myexample.c <-- for MPI code
- Examples for building 32-bit PGI executables:
scc1$ pgcc -m32 -mp myexample.c <-- for OpenMP code scc1$ mpicc -m32 myexample.c <-- for MPI code
- Examples for building 32-bit Intel executables:
scc1$ icc -m32 -openmp myexample.c <-- for OpenMP code scc1$ mpicc -m32 myexample.c <-- for MPI code
Notes
Compiling 32-bit executables
By default, all compilers on the SCC Cluster produce 64-bit executables. To build 32-bit executables, add the compiler flag -m32
:
Multiprocessing with GPUs
Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs. (See GPU Computing for detail.)