User Tools

Site Tools


dev:debugging

Debugging

Debugging CP2K can be a little challenge. So suggestions and techniques here make things easier.

reproducer

To debug the code, one will need to have an input that reliably triggers the problem. If the bug is really hard (i.e. requires repeatedly running the testcase), finding a sufficiently small testcase is very valuable. See if the bug reproduces with few atoms, lower cutoff, small basis, energy instead of md, … usually understanding what is needed to trigger the bug will help a lot fixing it.

environment dependence

Check if the bug only happens on machine X or with library Y. Buggy libraries are unfortunately very common. Try linking netlib scalapack/blas, gfortran (-O0) compiled binaries on linux as a reference. If you find bugs in the libraries/tools, report them to the vendors, only this way things improve in the long run.

use the trace

Run CP2K with the ' TRACE' keyword enabled in the '&GLOBAL' section. The additional output gives a good idea where things might go wrong.

debug build

A large number of issues can be caught using a debug built with bounds checking. With gfortran, the following flags are useful in the arch file:

FCFLAGS  = -O1 -fstrict-aliasing -g -fno-omit-frame-pointer -fno-realloc-lhs \
           -fcheck=bounds,do,recursion,pointer -ffree-form $(DFLAGS)

also, link against netlib blas/lapack/scalapack (compiled with the same options).

valgrind

undefined variables

valgrind is very useful to find additional bugs, most commonly related to undefined variables. Unfortunately, the slowdown caused by valgrind makes this practical only for test cases that run within seconds/minutes. The full regtester is run from time to time under valgrind.

The following arch file works well with valgrind (in particular no libraries that return false positives, needs gfortran >4.8.X to avoid spurious warnings).

CC       = cc
CPP      =

FC       = gfortran
LD       = gfortran

AR       = ar -r

CPPFLAGS =
DFLAGS   = -D__GFORTRAN -D__FFTSG -D__LIBINT -D__FFTW3 -D__LIBINT_MAX_AM=6 -D__LIBDERIV_MAX_AM1=5 -D__LIBXC2
FCFLAGS  = -O0 -g -ffree-form $(DFLAGS)
LDFLAGS  = $(FCFLAGS)  -L/data/vjoost/libint_ham/install/lib/ -L/data/vjoost/scalapack/scalapack_installer_1.0.2/install/lib/ -L/data/vjoost/libxc-2.0.1/install/lib
LIBS     = -lderiv -lint -lstdc++ -lfftw3 -lreflapack -lrefblas -lxc

OBJECTS_ARCHITECTURE = machine_gfortran.o

Run valgrind with

valgrind --max-stackframe=2100192 --leak-check=full --track-origins=yes

to get the origin of undefined variables in addition to a leak check report.

understanding memory usage

valgrind also comes with the 'massif' tool, which can provide detailed information about memory usage. How much is the peak allocated memory, and where do most of these allocations come from?

Rather easy with

valgrind --tool=massif ../../../exe/local_valgrind/cp2k.sopt test.inp 
ms_print massif.out.XYZ

The valgrind homepage has detailed description of massif.

valgrind in parallel

valgrind can also be used for debugging parallel code, eg.:

mpirun -np 2 -x OMP_NUM_THREADS=2  valgrind --max-stackframe=2100192 --leak-check=full --track-origins=yes cp2k.psmp cp2k.inp

Memory leak checking

Starting from gcc 4.9.0, good memory leak checking is integrated with gfortran. Compile CP2K with

-fno-omit-frame-pointer -O1 -g -fsanitize=leak

to get detailed memory leak reports. CP2K should be fully clean, however, this also find leaks in libraries such as mpi and scalapack. It is possible to write suppression files that require an export like

export LSAN_OPTIONS=suppressions=suppr.txt

for the format of the file see LeakSanitizer docs.

Compiler warnings

Unfortunately is the GNU Fortran compiler not on the same level concerning warnings as its C/C++ counterparts. Especially the -Wunitialized which is part of -Wall may give spurious warnings of the following kind when building together with -O1 (or greater):

attention : ‘arr.offset’ may be used uninitialized in this function [-Wuninitialized]
attention : ‘arr.dim[1].stride’ may be used uninitialized in this function [-Wuninitialized]
attention : ‘arr.dim[0].ubound’ may be used uninitialized in this function [-Wuninitialized]

This is tracked at GNU/gfortran upstream here Bug 66459 and Bug 60500

dev/debugging.txt · Last modified: 2020/08/21 10:15 by 127.0.0.1