The PGI compiler family is produced by The Portland Group which is owned by Nvidia, Inc.  It is available on SCC. As of the AlmaLinux 8 operating system upgrade (summer 2023) this family of compilers is installed as part of the nvidia-hpc module. Previous versions of the compilers are part of the pgi modules. The following table summarizes some relevant commands on the SCC:

Command Description
module avail pgi OR module avail nvidia-hpc List available versions of the PGI compiler.
module load nvidia-hpc/2023-23.5 Load a particular version.
pgcc C compiler.
pg++ C++ compiler.
pgf90 Fortran compiler.

The C/C++ and Fortran compilers use the same optimization flags, and both compilers have manuals available:

man pgcc
man pgf90

The older pgi modules have an online reference manual that describes their compiler flags in detail. The nvidia-hpc compilers also have online manuals for command line options.

General Compiler Optimization Flags

The basic optimization flags are summarized below.

Flag Description
-O0 Optimization level 0. Usually for debugging.
-O1 Optimization level 1. Scheduling within extended basic blocks is performed. No global optimizations are performed. It is the default level if none flag is specified.
-O Optimization level 2. All level 1 optimizations are performed. In addition, traditional scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer.
-O2 All -O optimizations are performed. In addition, more advanced optimizations such as SIMD code generation, cache alignment and partial redundancy elimination are enabled.
-O3 All -O1 and -O2 optimizations are performed. In addition, this level enables more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable.
-O4 All -O1, -O2, and -O3 optimizations are performed. In addition, hoisting of guarded invariant floating point expressions is enabled.

pgi Module Flags to Specify SIMD Instructions

These flags will produce executables that contain specific SIMD instructions which may effect compatibility with compute nodes on the SCC.

Flag Description
-tp=nehalem-64 For Intel Nehalem architecture Core processors.
-tp=sandybridge-64 For Intel SandyBridge and Ivybridge architecture Core processors.
-tp=hashwell-64 For Intel Hashwell and Broadwell architecture Core processors.
-tp=bulldozer-64 For AMD Bulldozer processors.
-tp=x64 For all Intel 64-bit processors and AMD 64-bit processors.
-tp=px For any x86-compatible processors (including all above).
-fast Includes: -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mcache_align -Mflushz -Mpre . Chooses generally optimal flags for target platforms and selects SIMD instructions that are available on the compiling computer.

nvidia-hpc Module Flags to Specify SIMD Instructions

These flags will produce executables that contain specific SIMD instructions which may effect compatibility with compute nodes on the SCC. The manual page for the compilers can be referenced for the specific version you’re using: man pgfortran.

Flag Description
-tp=bulldozer For AMD Bulldozer architecture processors.
-tp=zen/zen2/zen3 (choose 1) For AMD Epyc and Ryzen architecture processors.
-tp=sandybridge For Intel Sandybridge processors.
-tp=ivybridge For Intel Ivybridge processors.
-tp=skylake For Intel Skylake processors.
-tp=px For any x86-compatible processors (including all above).
-fast Includes: -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mcache_align -Mflushz -Mpre . Chooses generally optimal flags for target platforms and selects SIMD instructions that are available on the compiling computer.

Default Optimization Behavior

The PGI compilers by default will always produce executables that are tuned for the architecture of the compiling computer.  This means that without the -tp=x64 or -tp=px flags the output executable when compiled on the SCC login nodes will only be compatible with the Broadwell architecture.  The CPU architecture type of all of the login nodes on the SCC is Broadwell.

Recommendations

Here are recommendations for compiling codes on SCC.  Either the -tp=x64 or -tp=px flags should be used for compute node compatibility.  The -tp=x64 flag will generally produce faster code at the cost of longer compile times but has been removed on the newer compiler versions. The -tp=px flag will usually compile notably faster. It is recommended that these flags be used to build executables on the SCC with the addition of an extra flag to enable the 128-bit SIMD instructions available on all SCC nodes:

pgc++ -fast -tp=px -Mvect=simd:128 mycode.cpp -o myexe

The generated executable will run on any compute node on the SCC. And alternate set of optimization flags can be used which target the Sandybridge CPU architecture. This is also compatible on all SCC compute nodes:

pgc++ -fast -tp=sandybridge  mycode.cpp -o myexe

To build an optimized executable for a particular node the easiest on the SCC way is to compile your code on the compute node that will run your job and have the compiler auto-select the best SIMD instructions for that compute node:

pgcc -fast -tp=native mycode.cpp -o myexe

However, the resulting compiled code won’t execute on an older architecture, so compiling this way on a Skylake compute node will result in programs that won’t run on SCC compute nodes that lack the AVX-512 instructions.

qsub -l cpu_arch=skylake -b y ./myexe

Another option is to compile the code as part of a batch job which completely avoids any architectural issues and allows for the maximum amount of optimizations. For example, a job that is submitted to run on a Buy-in node equipped with an Ivybridge architecture CPU could be compiled with options auto-selected by the compiler for that node. As a precaution the source is copied into $TMPDIR: