Performance Profiling |
8 |
![]() |
Running the program with the time command prints a line of timing information on program termination.
demo% time myprog The Answer is: 543.01 6.5u 17.1s 1:16 31% 11+21k 354+210io 135pf+0w demo% |
The interpretation is:
Since the user time displayed includes the time spent on all the processors, it can be quite large and is not a good measure of performance. A better measure is the real time, which is the wall clock time. This also means that to get an accurate timing of a parallelized program it must be run on a quiet system dedicated to just your program.
To enable gprof profiling, compile and link the program with the -pg option:
demo% f77 -o Myprog -fast -pg Myprog.f ...etc ... demo% gprof Myprog |
The program must complete normally for gprof to obtain meaningful timing information.
gprof -b option eliminates the explanatory text; see the gprof(1) man page for other options that can be used to limit the amount of output generated.)
The call graph profile is followed by a flat profile that provides a routine-by- routine overview. An (edited) example of gprof output follows.
The call graph profile:
Note - User-defined functions appear with their Fortran names followed by an underscore. Library routines appear with leading underscores.
-pg option) may greatly increase the running time of a program. This is due to the extra overhead required to clock program performance and subprogram calls. Profiling tools like gprof attempt to subtract an approximate overhead factor when computing relative runtime percentages. All other timings shown may not be accurate due to UNIX and hardware timekeeping inaccuracies.Programs with short execution times are the most difficult to profile because the overhead may be a significant fraction of the total execution time. The best practice is to choose input data for the profiling run that will result in a realistic test of the program's performance. If this is not possible, consider enclosing the main computational part of the program within a loop that effectively runs the program N times. Estimate actual performance by dividing the profile results by N.
The Fortran library includes two routines that return the total time used by the calling process. See dtime(3F) and etime(3F).
demo% f77 -p real.f real.f: MAIN stuff: ld: -lc_p: No such file or directory demo% |
There is a system utility to extract files from the release CD. You can use it to get the debugging files after the system is installed. See add_services(8). You may want to get help from your system administrator.
The tcov Profiling Command
The tcov (1) command, when used with programs compiled with the -a, or
-xa,-xprofile=tcov options, produces a statement-by-statement profile of the source code showing which statements executed and how often. It also gives a summary of information about the basic block structure of the program. -a or -xa compiler options. Enhanced statement level coverage is invoked by the -xprofile=tcov compiler option and the -x tcov option. In either case, the output is a copy of the source files annotated with statement execution counts in the margin. Although these two versions of tcov are essentially the same as far as the Fortran user is concerned (most of the enhancements apply to C++ programs), there will be some performance improvement with the newer style. "Old Style" tcov Coverage Analysis
Compile the program with the -a (or -xa) option. This produces the file $TCOVDIR/file.d for each source .f file in the compilation. (If environment variable $TCOVDIR is not set at compile time, the .d files are stored in the current directory.)tcov shows the number of times each statement was actually executed. Statements that were not executed are marked with ####-> to the left of the statement.
"New Style" Enhanced tcov Analysis
To use new style tcov, compile with -xprofile=tcov. When the program is run, coverage data is stored in program.profile/tcovd, where program is the name of the executable file. (If the executable were a.out, a.out.profile/tcovd would be created.)
Environment variables
$SUN_PROFDATA and $SUN_PROFDATA_DIR can be used to specify where the intermediary data collection files are kept. These are the *.d and tcovd files created by old and new style tcov, respectively.
I/O Profiling
You can obtain a report about how much data was transferred by your program. For each Fortran unit, the report shows the file name, the number of I/O statements, the number of bytes, and some statistics on these items.start_iostats and end_iostats around the parts of the program you wish to measure. (A call to end_iostats is required if the program terminates with an END or STOP statement rather than a CALL EXIT.)
EXTERNAL start_iostats ... CALL start_iostats OPEN(5) OPEN(6) OPEN(0) |
If you want to measure only part of the program, call end_iostats to stop the process. A call to end_iostats may also be required if your program terminates with an END or STOP statement rather than CALL EXIT.
-pg option. When the program terminates, the I/O profile report is produced on the file name.io_stats . (name is the name of the executable file).
Each pair of lines in the report displays information about an I/O unit. There is one section showing input operations and another for output. The first line of a pair displays statistics on the number of data elements transferred before the unit was closed. The second row of statistics is based on the number of I/O statements processed.
mmap() calls is recorded in parentheses in the second row of the pair.
Note - Compiling with environment variable LD_LIBRARY_PATH set may disable I/O profiling, which relies on its profiling I/O library being in a standard location.