Profiling : TechWeb : Boston University

Profiling your code means measuring how long various parts of your application run. You can do this manually by writing your own code, checking the time before and after execution. An alternative option is to instruct the compiler to record profiling information for every function and subroutine in the program. Then with an external tool, such as gprof, you can explore 1) the total time spent calling each method and 2) the average time spent for each call of each method. This provides the evidence needed for deciding which methods need work to improve their performance.

An example using gprof

Consider this example, where two subroutines are called multiple times in a main program. To profile this example, you need to:

Compile this program using the -pg flag: gfortran -pg -o example.exe example.f90
Run the resulting executable; it will produce a file gmon.out, which holds profiling information.
Run the command gprof -b example.exe gmon.out
The -b flag tells gprof to suppress tons of extra help blurbs. Try this again without the flag, and you’ll understand why I advised using it. There is also a -p flag that requests only the “Flat profile” section, which I discuss below, and -P requests only the “Call graph” section.

The resulting output looks like this:

scc4% gprof -b example.exe gmon.out

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 64.25      5.52     5.52        5     1.10     1.10  do_slow_
 36.47      8.65     3.13       10     0.31     0.31  do_fast_
  0.00      8.65     0.00        1     0.00     8.65  MAIN__


                        Call graph


granularity: each sample hit covers 2 byte(s) for 0.12% of 8.65 seconds

index % time    self  children    called     name
                0.00    8.65       1/1           main [2]
[1]    100.0    0.00    8.65       1         MAIN__ [1]
                5.52    0.00       5/5           do_slow_ [3]
                3.13    0.00      10/10          do_fast_ [4]
-----------------------------------------------
                                                 
[2]    100.0    0.00    8.65                 main [2]
                0.00    8.65       1/1           MAIN__ [1]
-----------------------------------------------
                5.52    0.00       5/5           MAIN__ [1]
[3]     63.8    5.52    0.00       5         do_slow_ [3]
-----------------------------------------------
                3.13    0.00      10/10          MAIN__ [1]
[4]     36.2    3.13    0.00      10         do_fast_ [4]
-----------------------------------------------


Index by function name

   [1] MAIN__                  [4] do_fast_                [3] do_slow_

The above output shows a “Flat profile”, which orders the routines by the percent of the total time spent calling these routines. The “call graph” on the other hand, shows the total time spent calling each routing, including the time spent in routines called with in. For example, very little time is spent in the MAIN__ program, as seen in the Flat Profile, self seconds column. But in the call graph, we see in section [2] that 8.65 seconds is spent as a result of calling the main program. This resulted in part from 5.52 seconds spent calling do_slow_, as seen in section [3].