Should I use MPI or OpenMP or both?
The entire CP2K code is MPI parallelized. Some additional loops are also OpenMP parallelized. You should therefore first take advantage of the MPI parallelization. However, running one MPI-rank per CPU-core will probably lead to memory shortage.
At this point, OpenMP threads can be used to utilized all CPU-cores without the large memory-footprint of a MPI-process.
The optimal ratio between MPI-ranks and OpenMP-threads depends on the kind of simulation you run. Do your own benchmarks! A ratio of two threads per rank is usually a good point to start.