User Tools

Site Tools


This is an old revision of the document!

How to Compile CP2K with CUDA Support

Currently three major operations in CP2K support CUDA-acceleration:

  • Anything that uses dbcsr_multiply, i.e. sparse matrix multiplication, when compiled with -D__ACC -D__DBCSR_ACC. This benefits in particular the linear scaling DFT code. See also the DBCSR project.
  • FFTs, when compiled with -D__PW_CUDA.
  • If linked against an accelerated scalapack/blas library (in particular pdgemm/pdsyrk/dgemm) that executes these calls on the GPU. The impact of this is most visible for MP2 and RPA calculations. On the hybrid Cray XC30 linking against libsci_acc makes this happen.

To enable all CUDA acceleration options the following lines have to be added to the ARCH-file:

NVCC    = /path_to_cuda/bin/nvcc
LIBS   += -lcudart -lcublas -lcufft -lrt

See here for more details. As a prerequisite the Nvidia CUDA Toolkit has to be installed.


The acceleration of DBCSR is performed by libcusmm. This library provides a number of kernels. Each of these kernels can multiply blocks of specific blocksizes. The blocksizes of a simulation are determined by the employed basis-set. As of DBCSR 1.1, by default libcusmm is able to generate any kernel for {m,n,k}≤80, see here for more details. The DBCSR Statistics are printed at the end of every CP2K-run, example

 -                                                                             -
 -                                DBCSR STATISTICS                             -
 -                                                                             -
 COUNTER                                      CPU                  ACC      ACC%
 number of processed stacks                   160                   64      28.6
 matmuls inhomo. stacks                     11880                    0       0.0
 matmuls total                             132360                53530      28.8
 flops  13 x   13 x   13                        0             33218640     100.0
 flops  24 x   13 x   13                        0             55177824     100.0
 flops total                           1452705420            657928368      31.2
 marketing flops                       2048000000

More supported GPUs can be added, please refer to the description here .

New kernel parameters have to be optimized, which this howto explains in detail.


If you are interested in profiling CP2K with nvprof have a look at these remarks .

howto/compile_with_cuda.1554803384.txt.gz · Last modified: 2020/08/21 10:15 (external edit)