Currently three major operations in CP2K support CUDA-acceleration:

- Anything that uses
`dbcsr_multiply`

, i.e. sparse matrix multiplication, when compiled with`-D__ACC -D__DBCSR_ACC`

. This benefits in particular the linear scaling DFT code. See also the DBCSR project. - FFTs, when compiled with
`-D__PW_CUDA`

. - If linked against an accelerated scalapack/blas library (in particular pdgemm/pdsyrk/dgemm) that executes these calls on the GPU. The impact of this is most visible for MP2 and RPA calculations. On the hybrid Cray XC30 linking against libsci_acc makes this happen.

To enable all CUDA acceleration options the following lines have to be added to the ARCH-file:

NVCC = /path_to_cuda/bin/nvcc DFLAGS += -D__ACC -D__DBCSR_ACC -D__PW_CUDA LIBS += -lcudart -lcublas -lcufft -lrt

As a prerequisite the Nvidia CUDA Toolkit has to be installed.

The acceleration of DBCSR is performed by libcusmm. This library provides a number of kernels. Each of these kernels can multiply blocks of specific blocksizes. The blocksizes of a simulation are determined by the employed basis-set. By default libcusmm is complied with about 200 common kernels. However, if an exotic basis set is used the particular blocksizes might be missing. This can be seen from the *DBCSR Statistics*, which is printed at the end of every CP2K-run.

In the following example the kernel for 13x13x15 was missing:

------------------------------------------------------------------------------- - - - DBCSR STATISTICS - - - ------------------------------------------------------------------------------- COUNTER CPU ACC ACC% number of processed stacks 160 64 28.6 matmuls inhomo. stacks 11880 0 0.0 matmuls total 132360 53530 28.8 flops 13 x 13 x 13 0 33218640 100.0 flops 13 x 13 x 15 34810620 0 0.0 <-- kernel missing flops 24 x 13 x 13 0 55177824 100.0 ... flops total 1452705420 657928368 31.2 marketing flops 2048000000 -------------------------------------------------------------------------------

There are over 2300 readily optimized kernel-parameters available in cp2k/src/dbcsr/libsmm_acc/libcusmm/parameters.txt. If the desired kernel is already listed in parameters.txt, it can be included in libcusmm by editing the file cp2k/src/dbcsr/libsmm_acc/libcusmm/generate.py. Otherwise new kernel parameters have to be optimized, which this howto explains in detail.

If you are interested in profiling CP2K with nvprof have a look at these remarks .

