dev:profiling
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
profiling [2013/07/03 06:52] – add section on valgrind 129.132.169.16 | dev:profiling [2020/08/21 10:15] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 70: | Line 70: | ||
- TOTAL TIME: How much time is spent in this subroutine, including time spent in timed subroutines. AVERAGE and MAXIMUM as defined above | - TOTAL TIME: How much time is spent in this subroutine, including time spent in timed subroutines. AVERAGE and MAXIMUM as defined above | ||
- | Note that, for the threaded code, only the master thread is instrumented. | + | By default, only routines contributing up to 2% of the total runtime are included in the timing report. |
+ | Note that, for the threaded code, only the master thread is instrumented. | ||
==== Modifying the timing report ==== | ==== Modifying the timing report ==== | ||
Line 208: | Line 209: | ||
Basic profiling is easy: | Basic profiling is easy: | ||
< | < | ||
- | valgrind --tool=callgraph | + | valgrind --tool=callgrind |
</ | </ | ||
The result, a file named callgrind.out.XXX, | The result, a file named callgrind.out.XXX, | ||
+ | ===== nvprof ===== | ||
+ | |||
+ | Profiling the CUDA code can be done quite nicely using the nvprof tool. To do so, it is useful to enable user events which requires compiling cp2k with < | ||
+ | < | ||
+ | nvprof -o log.nvprof ./cp2k.sopt -i test.inp -o test.out | ||
+ | </ | ||
+ | and visualize log.nvprof with the nvvp tool, which might take several minutes to open the data. | ||
+ | |||
+ | An example profile for a linear scaling benchmark (TiO2) is shown here | ||
+ | {{ :: | ||
+ | |||
+ | To run on CRAY architectures in parallel the following additional tricks are needed | ||
+ | < | ||
+ | export PMI_NO_FORK=1 | ||
+ | # no cuda proxy | ||
+ | # export CRAY_CUDA_MPS=1 | ||
+ | # use all cores with OMP | ||
+ | export OMP_NUM_THREADS=8 | ||
+ | # use aprun in MPMD mode to have only the output from the master rank (here 169 nodes are used) | ||
+ | COMMAND=" | ||
+ | PART1=" | ||
+ | PART2=" | ||
+ | aprun ${PART1} : ${PART2} | ||
+ | </ | ||
dev/profiling.1372834353.txt.gz · Last modified: 2020/08/21 10:14 (external edit)