Profiling your code means measuring how long various parts of your application run. You can do this manually by writing your own code, checking the time before and after execution. An alternative option is to instruct the compiler to record profiling information for every function and subroutine in the program. Then with an external tool, such as gprof, you can explore 1) the total time spent calling each method and 2) the average time spent for each call of each method. This provides the evidence needed for deciding which methods need work to improve their performance.
An example using gprof
Consider this example, where two subroutines are called multiple times in a main program. To profile this example, you need to:
- Compile this program using the
-pg
flag:gfortran -pg -o example.exe example.f90
- Run the resulting executable; it will produce a file
gmon.out
, which holds profiling information. - Run the command
gprof -b example.exe gmon.out
The-b
flag tellsgprof
to suppress tons of extra help blurbs. Try this again without the flag, and you’ll understand why I advised using it. There is also a-p
flag that requests only the “Flat profile” section, which I discuss below, and-P
requests only the “Call graph” section.
The resulting output looks like this:
scc4% gprof -b example.exe gmon.out
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
64.25 5.52 5.52 5 1.10 1.10 do_slow_
36.47 8.65 3.13 10 0.31 0.31 do_fast_
0.00 8.65 0.00 1 0.00 8.65 MAIN__
Call graph
granularity: each sample hit covers 2 byte(s) for 0.12% of 8.65 seconds
index % time self children called name
0.00 8.65 1/1 main [2]
[1] 100.0 0.00 8.65 1 MAIN__ [1]
5.52 0.00 5/5 do_slow_ [3]
3.13 0.00 10/10 do_fast_ [4]
-----------------------------------------------
[2] 100.0 0.00 8.65 main [2]
0.00 8.65 1/1 MAIN__ [1]
-----------------------------------------------
5.52 0.00 5/5 MAIN__ [1]
[3] 63.8 5.52 0.00 5 do_slow_ [3]
-----------------------------------------------
3.13 0.00 10/10 MAIN__ [1]
[4] 36.2 3.13 0.00 10 do_fast_ [4]
-----------------------------------------------
Index by function name
[1] MAIN__ [4] do_fast_ [3] do_slow_
The above output shows a “Flat profile”, which orders the routines by the percent of the total time spent calling these routines. The “call graph” on the other hand, shows the total time spent calling each routing, including the time spent in routines called with in. For example, very little time is spent in the MAIN__
program, as seen in the Flat Profile, self seconds column. But in the call graph, we see in section [2] that 8.65 seconds is spent as a result of calling the main program. This resulted in part from 5.52 seconds spent calling do_slow_
, as seen in section [3].