Multiprocessor Programming : TechWeb : Boston University

Topics

Multiprocessing with CPUs

Compile a program parallelized with OpenMP directives
Compile a program parallelized with MPI
Compiling 32-bit executables

Multiprocessing with GPUs

Multiprocessing with CPUs

The use of multiple CPUs to achieve parallel speedup has been practiced for decades. Mature enabling paradigms and associated software are well understood and broadly adopted. Among these the most commonly used paradigms (which the SCC supports) are Message Passing Interface (MPI) for distributed memory systems and OpenMP for shared memory, thread-based, computer systems. These paradigms support common languages — C, C++, and Fortran — for which you can build executable with the SCC provided GNU and PGI families of compilers. Please visit the Compilers page for more details, such as how to optimize the performance of your code.

Compile a program parallelized with OpenMP directives

For GNU compilers, use the -fopenmp compiler flag to activate the OpenMP paradigm:

scc1$ gfortran myprogram.f              <-- OpenMP not turned on
scc1$ gfortran -fopenmp myprogram.f     <-- OpenMP turned on
scc1$ gfortran -fopenmp myprogram.f90   <-- OpenMP turned on
scc1$ gcc -fopenmp myprogram.c          <-- OpenMP turned on
scc1$ g++ -fopenmp myprogram.C          <-- OpenMP turned on

Default executable name is a.out. Use -o my-executable to assign a specific name. Whenever possible, use -O3 for highest level of code optimization. See Compilers for more options.

For PGI compilers, use the -mp compiler flag to activate the OpenMP paradigm:

scc1$ pgfortran myprogram.f             <-- OpenMP not turned on
scc1$ pgfortran -mp myprogram.f         <-- OpenMP turned on
scc1$ pgfortran -mp myprogram.f90       <-- OpenMP turned on
scc1$ pgcc -mp myprogram.c              <-- OpenMP turned on
scc1$ pgc++ -mp myprogram.C             <-- OpenMP turned on

Default executable name is a.out. Use -o my-executable to assign a specific name. Whenever possible, use -O3 for highest level of code optimization. See Compilers for more options.

For Intel compilers, use the -openmp compiler flag to activate the OpenMP paradigm:


scc1$ module load intel/2016          <-- Load the Intel compiler module
scc1$ ifort myprogram.f               <-- OpenMP not turned on
scc1$ ifort -openmp myprogram.f       <-- OpenMP turned on
scc1$ ifort -openmp myprogram.f90     <-- OpenMP turned on
scc1$ icc -openmp myprogram.c         <-- OpenMP turned on
scc1$ icpc -openmp myprogram.C        <-- OpenMP turned on

Default executable name is a.out. Use -o my-executable to assign a specific name. Whenever possible, use -fast for highest level of code optimization. See Compilers for more options.

Running OpenMP jobs

For program development and debugging purposes, short OpenMP jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. Jobs exceeding these limit will be terminated automatically by the system. All other jobs (> 10 minutes and/or > 4 threads) should run in batch. (See Running Jobs page)

Run executable a.out on a login node:

scc1$ setenv OMP_NUM_THREADS 2      <-- set thread count (for tcsh users)
scc1$ export OMP_NUM_THREADS=2      <-- set thread count (for bash users)
scc1$ ./a.out

Run executable a.out on a compute node in batch:

scc1$ setenv OMP_NUM_THREADS 2      <-- set thread count (for tcsh users)
scc1$ export OMP_NUM_THREADS=2      <-- set thread count (for bash users)
scc1$ qsub -pe omp 2 -V -b y ./a.out

Compile a program parallelized with MPI

Compiling an MPI-enabled program requires the directory path from which the compiler can find the necessary header file (e.g., mpi.h) and MPI library. For ease of compilation, this additional information are built into wrapper scripts mpif77, mpif90, mpicc, and mpicxx for the respective languages they serve: Fortran 77, Fortran 90/95/03, C, and C++. By default, these wrappers are linked to the GNU compilers. For example, mpicc is, by default, linked to the gcc compiler while mpif90 points to the gfortran compiler. Switching to the PGI compilers can be accomplished by specifying the selection through the environment variable MPI_COMPILER. Note that an undefined (unset) MPI_COMPILER points the wrappers to their respective GNU compilers. To compile an MPI program with Intel compiler, use module commands to load the Intel compiler and a corresponding MPI implementation.

To make the MPI wrappers compile with `GNU` compilers:

Step 1. The MPI wrappers will use GNU compilers if either MPI_COMPILER is unset or set as gnu:

scc1$ setenv MPI_COMPILER gnu       <-- select gnu compilers (for tcsh users)
scc1$ export MPI_COMPILER=gnu       <-- select gnu compilers (for bash users)

Step 2. Compile with MPI wrappers:

scc1$ mpif77 myprogram.f
scc1$ mpif90 myprogram.f90
scc1$ mpicc myprogram.c
scc1$ mpicxx myprogram.C

To make MPI wrappers compile with `PGI` compilers:

Step 1. Setting MPI_COMPILER to pgi makes the wrappers compile with PGI compilers:

scc1$ setenv MPI_COMPILER pgi    <-- select PGI compilers (for tcsh users)
scc1$ export MPI_COMPILER=pgi    <-- select PGI compilers (for bash users)

Step 2. Compile with MPI wrappers:

scc1$ mpif77 myprogram.f
scc1$ mpif90 myprogram.f90
scc1$ mpicc myprogram.c
scc1$ mpicxx myprogram.C

To compile an MPI program with `Intel` compilers:

Step 1. Use the module command to load the Intel compiler and a corresponding MPI implementation:


scc1$ module load intel/2016    <-- load Intel compiler
scc1$ module load openmpi/1.10.1_intel2016  <-- load OpenMPI configured with Intel compiler

Step 2. Compile with MPI wrappers:

scc1$ mpifort myprogram.f
scc1$ mpicc myprogram.c
scc1$ mpicxx myprogram.C

To switch back to the previous compiler (GNU or PGI), use the module command to remove the Intel compiler and the MPI implementation that have been loaded:


scc1$ module rm intel/2016    <-- remove the Intel compiler
scc1$ module rm openmpi/1.10.1_intel2016    <-- remove OpenMPI configured with Intel compiler

To check which compiler is currently in use:

Provide the option -show to the MPI wrappers:
```
scc1$ mpicc -show     <-- show the real command hidden in the wrapper
scc1$ mpif90 -show    <-- show the real command hidden in the wrapper
```
or query the path to the wrappers using the command which:
```
scc1$ which mpicc     <-- show the path to mpicc
scc1$ which mpif90    <-- show the path to mpif90
```
Running MPI jobs

For program development and debugging purposes, short MPI jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. All other MPI jobs (> 10 minutes and/or > 4 threads) should run in batch. (More in Running Jobs page)
1. Run MPI executable a.out on a login node:
```
scc1$ mpirun -np 4 ./a.out 
```
2. Run executable a.out on a compute node in batch:
```
scc1$ qsub -pe mpi_4_tasks_per_node 4 -b y "mpirun -np 4 ./a.out"
```

Notes

If you always use the GNU family of compilers, none of the MPI_COMPILER settings described before is needed because the MPI wrappers point to the GNU compilers by default.
On the other hand, if you always use the PGI compilers for MPI compilation, you can permanently set MPI_COMPILER to PGI in your .cshrc or .bashrc shell script.
If you use the PGI compiler, it is important to read the page on PGI compilers’ impact on job performance and portability.
If you want to use the Intel compiler, use module commands to load it and its corresponding MPI impementation.
For MPI options, please consult the specific wrapper manpage. For example,
```
scc1$ mpiman mpicc
```

Compiling 32-bit executables

By default, all compilers on the SCC Cluster produce 64-bit executables. To build 32-bit executables, add the compiler flag -m32:

Examples for building 32-bit GNU executables:

scc1$ gcc -m32 -fopenmp myexample.c       <-- for OpenMP code
scc1$ mpicc -m32 myexample.c              <-- for MPI code

Examples for building 32-bit PGI executables:

scc1$ pgcc -m32 -mp myexample.c     <-- for OpenMP code
scc1$ mpicc -m32 myexample.c        <-- for MPI code

Examples for building 32-bit Intel executables:

scc1$ icc -m32 -openmp myexample.c     <-- for OpenMP code
scc1$ mpicc -m32 myexample.c           <-- for MPI code

Multiprocessing with GPUs

Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs. (See GPU Computing for detail.)

Important alerts

VPN Login Issue – Authentication Timeout

Topics

Multiprocessing with CPUs

Compile a program parallelized with OpenMP directives

Running OpenMP jobs

Compile a program parallelized with MPI

To make the MPI wrappers compile with `GNU` compilers:

To make MPI wrappers compile with `PGI` compilers:

To compile an MPI program with `Intel` compilers:

To check which compiler is currently in use:

Running MPI jobs

Notes

Compiling 32-bit executables

Multiprocessing with GPUs

Important alerts

VPN Login Issue – Authentication Timeout

Multiprocessor Programming

Topics

Multiprocessing with CPUs

Compile a program parallelized with OpenMP directives

Running OpenMP jobs

Compile a program parallelized with MPI

To make the MPI wrappers compile with GNU compilers:

To make MPI wrappers compile with PGI compilers:

To compile an MPI program with Intel compilers:

To check which compiler is currently in use:

Running MPI jobs

Notes

Compiling 32-bit executables

Multiprocessing with GPUs

To make the MPI wrappers compile with `GNU` compilers:

To make MPI wrappers compile with `PGI` compilers:

To compile an MPI program with `Intel` compilers: