# CP2K Open Source Molecular Dynamics

### Sidebar

#### For Developers

howto:compile_with_cuda

# How to Compile CP2K with CUDA Support

Currently three major operations in CP2K support CUDA-acceleration:

• Anything that uses dbcsr_multiply, i.e. sparse matrix multiplication, when compiled with -D__ACC -D__DBCSR_ACC. This benefits in particular the linear scaling DFT code. See also the DBCSR project.
• FFTs, when compiled with -D__PW_CUDA.
• If linked against an accelerated scalapack/blas library (in particular pdgemm/pdsyrk/dgemm) that executes these calls on the GPU. The impact of this is most visible for MP2 and RPA calculations. On the hybrid Cray XC50 linking against cray-libsci_acc makes this happen.

To enable all CUDA acceleration options the following lines have to be added to the ARCH-file:

NVCC    = /path_to_cuda/bin/nvcc
DFLAGS += -D__ACC -D__DBCSR_ACC -D__PW_CUDA
LIBS   += -lcudart -lcublas -lcufft -lnvrtc

See here for details. As a prerequisite the Nvidia CUDA Toolkit has to be installed.

## Libcusmm

The acceleration of DBCSR is performed by libcusmm. This library provides a number of kernels. Each of these kernels can multiply blocks of specific blocksizes. The blocksizes of a simulation are determined by the employed basis-set. As of DBCSR 1.1, by default libcusmm is able to generate any kernel for {m,n,k}≤80, see here for more details. The DBCSR Statistics are printed at the end of every CP2K-run, example

 -------------------------------------------------------------------------------
-                                                                             -
-                                DBCSR STATISTICS                             -
-                                                                             -
-------------------------------------------------------------------------------
COUNTER                                      CPU                  ACC      ACC%
number of processed stacks                   160                   64      28.6
matmuls inhomo. stacks                     11880                    0       0.0
matmuls total                             132360                53530      28.8
flops  13 x   13 x   13                        0             33218640     100.0
flops  24 x   13 x   13                        0             55177824     100.0
...
flops total                           1452705420            657928368      31.2
marketing flops                       2048000000
-------------------------------------------------------------------------------