StartDate: 2022-08-08 19:06:06+00:00 CpuId: 32x AMD (unknown model) [Zen 3], 7nm (SMT disabled) CommitSHA: 11eb9ba6796d9b404282b5ce8f1706bbcd807856 CommitTime: 2022-08-08 15:59:23 +0200 CommitAuthor: Matthias Krack CommitSubject: Adjust tolerances Populating docker build cache... done. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Path: / Build-Args: GIT_COMMIT_SHA=11eb9ba6796d9b404282b5ce8f1706bbcd807856 Sending build context to Docker daemon 364.2MB Step 1/42 : FROM ubuntu:22.04 22.04: Pulling from library/ubuntu d19f32bd9e41: Already exists Digest: sha256:34fea4f31bf187bc915536831fd0afc9d214755bf700b5cdb1336c82516d154e Status: Downloaded newer image for ubuntu:22.04 ---> df5de72bdb3b Step 2/42 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 026e35f2a85c Step 3/42 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 61e4cd54df66 Step 4/42 : RUN ./install_requirements.sh ubuntu:22.04 ---> Using cache ---> 9302e3cfee49 Step 5/42 : RUN mkdir scripts ---> Using cache ---> ba1db08844ca Step 6/42 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> b5b513ee8b38 Step 7/42 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> ab9fa54757b5 Step 8/42 : RUN ./install_cp2k_toolchain.sh --install-all --mpi-mode=mpich --with-gcc=system --dry-run ---> Running in 2d017536706c WARNING: (./install_cp2k_toolchain.sh, line 332) No MPI installation detected (ignore this message in Cray Linux Environment or when MPI installation was requested). Compiling with 32 processes. Wrote only configuration files (--dry-run). Removing intermediate container 2d017536706c ---> 0025c11e3967 Step 9/42 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> 847fd5974f4c Step 10/42 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Running in c782d9304f53 ==================== Finding GCC from system paths ==================== path to gcc is /usr/bin/gcc path to g++ is /usr/bin/g++ path to gfortran is /usr/bin/gfortran Found include directory /usr/include Step gcc took 0.00 seconds. Step intel took 0.00 seconds. ==================== Getting proc arch info using OpenBLAS tools ==================== OpenBLAS-0.3.21.tar.gz: OK Checksum of OpenBLAS-0.3.21.tar.gz Ok OpenBLAS detected LIBCORE = zen OpenBLAS detected ARCH = x86_64 ==================== Installing CMake ==================== cmake-3.22.1-Linux-x86_64.sh: OK Checksum of cmake-3.22.1-Linux-x86_64.sh Ok Installing from scratch into /opt/cp2k-toolchain/install/cmake-3.22.1 Step cmake took 3.00 seconds. Removing intermediate container c782d9304f53 ---> 3c643e28d05d Step 11/42 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> 8fa069ac2928 Step 12/42 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Running in 655132d98b1b ==================== Installing MPICH ==================== mpich-3.4.3.tar.gz: OK Checksum of mpich-3.4.3.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/mpich-3.4.3 Found directory /opt/cp2k-toolchain/install/mpich-3.4.3/bin Found directory /opt/cp2k-toolchain/install/mpich-3.4.3/lib Found directory /opt/cp2k-toolchain/install/mpich-3.4.3/include mpirun is installed as /opt/cp2k-toolchain/install/mpich-3.4.3/bin/mpirun mpicc is installed as /opt/cp2k-toolchain/install/mpich-3.4.3/bin/mpicc mpicxx is installed as /opt/cp2k-toolchain/install/mpich-3.4.3/bin/mpicxx mpif90 is installed as /opt/cp2k-toolchain/install/mpich-3.4.3/bin/mpif90 Step mpich took 232.00 seconds. Removing intermediate container 655132d98b1b ---> ad831f1eda8c Step 13/42 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> b177c0decf2c Step 14/42 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Running in a1624e216363 ==================== Installing OpenBLAS ==================== OpenBLAS-0.3.21.tar.gz: OK Checksum of OpenBLAS-0.3.21.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/openblas-0.3.21 Step openblas took 78.00 seconds. Removing intermediate container a1624e216363 ---> 4786b5d49610 Step 15/42 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> fedb9e4ffb72 Step 16/42 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Running in a547c19ee3f4 ==================== Installing FFTW ==================== fftw-3.3.10.tar.gz: OK Checksum of fftw-3.3.10.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/fftw-3.3.10 Step fftw took 47.00 seconds. ==================== Installing LIBINT ==================== libint-v2.6.0-cp2k-lmax-5.tgz: OK Checksum of libint-v2.6.0-cp2k-lmax-5.tgz Ok Installing from scratch into /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5 Step libint took 320.00 seconds. ==================== Installing LIBXC ==================== libxc-5.2.3.tar.gz: OK Checksum of libxc-5.2.3.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxc-5.2.3 Step libxc took 78.00 seconds. Removing intermediate container a547c19ee3f4 ---> 01bbebe1c968 Step 17/42 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> aa24e815977e Step 18/42 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Running in 1e0baae9946c ==================== Installing Libxsmm ==================== libxsmm-1.17.tar.gz: OK Checksum of libxsmm-1.17.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxsmm-1.17 Step libxsmm took 12.00 seconds. ==================== Installing ScaLAPACK ==================== scalapack-2.1.0.tgz: OK Checksum of scalapack-2.1.0.tgz Ok Installing from scratch into /opt/cp2k-toolchain/install/scalapack-2.1.0 Step scalapack took 29.00 seconds. ==================== Installing COSMA ==================== COSMA-v2.6.1.tar.gz: OK Checksum of COSMA-v2.6.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/COSMA-2.6.1 Step cosma took 15.00 seconds. Removing intermediate container 1e0baae9946c ---> 19821abb125b Step 19/42 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> 699d8a65e041 Step 20/42 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in e62e5d3d45e6 ==================== Installing ELPA ==================== elpa-2021.11.002.tar.gz: OK Checksum of elpa-2021.11.002.tar.gz Ok patching file nvcc_wrap Installing from scratch into /opt/cp2k-toolchain/install/elpa-2021.11.002/cpu Step elpa took 163.00 seconds. ==================== Installing PT-Scotch ==================== scotch_6.0.0.tar.gz: OK Checksum of scotch_6.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scotch-6.0.0 Step ptscotch took 7.00 seconds. ==================== Installing SuperLU_DIST ==================== superlu_dist_6.1.0.tar.gz: OK Checksum of superlu_dist_6.1.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/superlu_dist-6.1.0 Step superlu took 7.00 seconds. ==================== Installing PEXSI ==================== pexsi_v1.2.0.tar.gz: OK Checksum of pexsi_v1.2.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pexsi-1.2.0 Step pexsi took 62.00 seconds. Removing intermediate container e62e5d3d45e6 ---> 1da027928c0e Step 21/42 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> 4ce5e1ec73f7 Step 22/42 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in 25ac96f05668 ==================== Installing QUIP ==================== QUIP-b4336484fb65b0e73211a8f920ae4361c7c353fd.tar.gz: OK Checksum of QUIP-b4336484fb65b0e73211a8f920ae4361c7c353fd.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/quip-b4336484fb65b0e73211a8f920ae4361c7c353fd Step quip took 227.00 seconds. ==================== Installing gsl ==================== gsl-2.7.tar.gz: OK Checksum of gsl-2.7.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.7 Step gsl took 47.00 seconds. ==================== Installing PLUMED ==================== plumed-src-2.8.0.tgz: OK Checksum of plumed-src-2.8.0.tgz Ok Installing from scratch into /opt/cp2k-toolchain/install/plumed-2.8.0 Step plumed took 53.00 seconds. Removing intermediate container 25ac96f05668 ---> 76ec7f6cc7d3 Step 23/42 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> 23b2c9158fc5 Step 24/42 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in 0517e24984ac ==================== Installing hdf5 ==================== hdf5-1.12.0.tar.bz2: OK Checksum of hdf5-1.12.0.tar.bz2 Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-1.12.0 Step hdf5 took 97.00 seconds. ==================== Installing libvdwxc ==================== libvdwxc-0.4.0.tar.gz: OK Checksum of libvdwxc-0.4.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.4.0 Step libvdwxc took 22.00 seconds. ==================== Installing spglib ==================== spglib-1.16.2.tar.gz: OK Checksum of spglib-1.16.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-1.16.2 Step spglib took 5.00 seconds. ==================== Installing libvori ==================== libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 17.00 seconds. Removing intermediate container 0517e24984ac ---> 62880660c7bf Step 25/42 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> 3c770e8774ee Step 26/42 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in 1144992a113b ==================== Installing spfft ==================== SpFFT-1.0.6.tar.gz: OK Checksum of SpFFT-1.0.6.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.0.6 Step spfft took 5.00 seconds. ==================== Installing spla ==================== SpLA-1.5.4.tar.gz: OK Checksum of SpLA-1.5.4.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.5.4 Step spla took 6.00 seconds. ==================== Installing SIRIUS ==================== SIRIUS-7.3.1.tar.gz: OK Checksum of SIRIUS-7.3.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.3.1 Step sirius took 86.00 seconds. Removing intermediate container 1144992a113b ---> a25f7263e3f9 Step 27/42 : COPY ./tools/toolchain/scripts/arch_base.tmpl ./tools/toolchain/scripts/generate_arch_files.sh ./scripts/ ---> a34917afdf26 Step 28/42 : RUN ./scripts/generate_arch_files.sh && rm -rf ./build ---> Running in 1ada6d759a0e ==================== generating arch files ==================== arch files can be found in the /opt/cp2k-toolchain/install/arch subdirectory Wrote /opt/cp2k-toolchain/install/arch/local.ssmp Wrote /opt/cp2k-toolchain/install/arch/local_static.ssmp Wrote /opt/cp2k-toolchain/install/arch/local.sdbg Wrote /opt/cp2k-toolchain/install/arch/local_coverage.sdbg Wrote /opt/cp2k-toolchain/install/arch/local.psmp Wrote /opt/cp2k-toolchain/install/arch/local.pdbg Wrote /opt/cp2k-toolchain/install/arch/local_static.psmp Wrote /opt/cp2k-toolchain/install/arch/local_warn.psmp Wrote /opt/cp2k-toolchain/install/arch/local_coverage.pdbg ========================== usage ========================= Done! Now copy: cp /opt/cp2k-toolchain/install/arch/* to the cp2k/arch/ directory To use the installed tools and libraries and cp2k version compiled with it you will first need to execute at the prompt: source /opt/cp2k-toolchain/install/setup To build CP2K you should change directory: cd cp2k/ make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg" arch files for GPU enabled CUDA versions are named "local_cuda.*" arch files for GPU enabled HIP versions are named "local_hip.*" arch files for OpenCL (GPU) versions are named "local_opencl.*" arch files for coverage versions are named "local_coverage.*" Note that these pre-built arch files are for the GNU compiler, users have to adapt them for other compilers. It is possible to use the provided CP2K arch files as guidance. Removing intermediate container 1ada6d759a0e ---> eea142a7b15e Step 29/42 : WORKDIR /opt/cp2k ---> Running in 1b6b0a1f0fea Removing intermediate container 1b6b0a1f0fea ---> 60a72489b70a Step 30/42 : COPY ./Makefile . ---> 92053277cf6e Step 31/42 : COPY ./src ./src ---> da52bd90909c Step 32/42 : COPY ./exts ./exts ---> 3654f86a7484 Step 33/42 : COPY ./tools/build_utils ./tools/build_utils ---> 13be5f3197de Step 34/42 : RUN /bin/bash -c " mkdir -p arch && ln -vs /opt/cp2k-toolchain/install/arch/local.psmp ./arch/ && echo 'Compiling cp2k...' && source /opt/cp2k-toolchain/install/setup && ( make -j ARCH=local VERSION=psmp &> /dev/null || true ) && ( [ ! -f ./exe/local/cp2k.psmp ] || ldd ./exe/local/cp2k.psmp | grep -q libmpi )" ---> Running in 3d5e81e8c122 './arch/local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' Compiling cp2k... Removing intermediate container 3d5e81e8c122 ---> c238971678ba Step 35/42 : COPY ./data ./data ---> f0c68ceb6d28 Step 36/42 : COPY ./tests ./tests ---> 28c55b796c71 Step 37/42 : COPY ./tools/regtesting ./tools/regtesting ---> 849760866053 Step 38/42 : COPY ./benchmarks ./benchmarks ---> 5dc1bb63e0d6 Step 39/42 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> f1ab56fbbb5e Step 40/42 : RUN ./test_performance.sh "local" 2>&1 | tee report.log ---> Running in 707f76d83c11 ========== Compiling CP2K ========== Compiling cp2k... done. Checking benchmark inputs... Found 60 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.042 0.042 132.762 132.762 qs_mol_dyn_low 1 2.0 0.003 0.003 132.075 132.075 qs_forces 11 3.9 0.002 0.002 132.025 132.025 qs_energies 11 4.9 0.001 0.001 123.348 123.348 scf_env_do_scf 11 5.9 0.002 0.002 108.524 108.524 velocity_verlet 10 3.0 0.002 0.002 84.958 84.958 scf_env_do_scf_inner_loop 108 6.5 0.016 0.016 82.068 82.068 qs_scf_new_mos 108 7.5 0.001 0.001 32.173 32.173 qs_scf_loop_do_ot 108 8.5 0.001 0.001 32.172 32.172 ot_scf_mini 108 9.5 0.004 0.004 29.929 29.929 dbcsr_multiply_generic 2286 12.5 0.271 0.271 29.536 29.536 qs_rho_update_rho_low 119 7.7 0.001 0.001 28.971 28.971 calculate_rho_elec 119 8.7 1.040 1.040 28.970 28.970 rebuild_ks_matrix 119 8.3 0.001 0.001 28.920 28.920 qs_ks_build_kohn_sham_matrix 119 9.3 0.021 0.021 28.919 28.919 qs_ks_update_qs_env 119 7.6 0.001 0.001 26.363 26.363 init_scf_loop 11 6.9 0.000 0.000 26.289 26.289 prepare_preconditioner 11 7.9 0.000 0.000 22.948 22.948 make_preconditioner 11 8.9 0.000 0.000 22.948 22.948 grid_collocate_task_list 119 9.7 21.668 21.668 21.668 21.668 make_full_inverse_cholesky 11 9.9 0.000 0.000 21.613 21.613 sum_up_and_integrate 119 10.3 0.327 0.327 18.147 18.147 integrate_v_rspace 119 11.3 0.162 0.162 17.820 17.820 ot_mini 108 10.5 0.001 0.001 17.676 17.676 make_m2s 4572 13.5 0.078 0.078 15.743 15.743 grid_integrate_task_list 119 12.3 14.351 14.351 14.351 14.351 pw_transfer 1439 11.6 0.104 0.104 10.936 10.936 fft_wrap_pw1pw2 1201 12.6 0.011 0.011 10.493 10.493 qs_ot_get_derivative 108 11.5 0.002 0.002 10.432 10.432 cp_fm_cholesky_decompose 22 10.9 9.252 9.252 9.252 9.252 fft_wrap_pw1pw2_140 487 13.2 0.619 0.619 9.172 9.172 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.377 8.377 dbcsr_make_dense_low 5837 15.5 0.139 0.139 8.262 8.262 make_dense_data 5837 16.5 6.966 6.966 8.101 8.101 make_images 4572 14.5 2.926 2.926 8.032 8.032 cp_fm_cholesky_invert 11 10.9 7.697 7.697 7.697 7.697 multiply_cannon 2286 13.5 0.377 0.377 7.678 7.678 ot_diis_step 108 11.5 0.006 0.006 7.239 7.239 dbcsr_make_images_dense 3978 14.8 0.029 0.029 7.101 7.101 qs_ot_get_p 119 10.4 0.002 0.002 6.911 6.911 multiply_cannon_loop 2286 14.5 0.186 0.186 6.793 6.793 multiply_cannon_multrec 2286 15.5 6.519 6.519 6.605 6.605 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 6.351 6.351 apply_single 119 13.6 0.001 0.001 6.351 6.351 dbcsr_copy 2102 12.0 0.340 0.340 6.265 6.265 density_rs2pw 119 9.7 0.006 0.006 6.262 6.262 dbcsr_copy_into_existing 22 7.9 5.832 5.832 5.833 5.833 init_scf_run 11 5.9 0.002 0.002 5.565 5.565 scf_env_initial_rho_setup 11 6.9 0.001 0.001 5.563 5.563 fft3d_s 1202 14.6 5.481 5.481 5.488 5.488 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 5.022 5.022 wfi_extrapolate 11 7.9 0.001 0.001 4.841 4.841 qs_ot_p2m_diag 50 11.0 0.165 0.165 4.736 4.736 dbcsr_complete_redistribute 329 12.2 2.281 2.281 4.539 4.539 cp_dbcsr_syevd 50 12.0 0.004 0.004 4.166 4.166 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.048 4.048 cp_fm_diag_elpa 50 13.0 0.000 0.000 4.021 4.021 cp_fm_diag_elpa_base 50 14.0 3.948 3.948 4.020 4.020 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.762 3.762 copy_dbcsr_to_fm 153 11.3 0.004 0.004 3.689 3.689 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.651 3.651 qs_ot_get_derivative_diag 49 12.0 0.003 0.003 3.462 3.462 pw_poisson_solve 119 10.3 1.351 1.351 3.406 3.406 potential_pw2rs 119 12.3 0.073 0.073 3.307 3.307 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 3.243 3.243 qs_create_task_list 11 7.9 0.000 0.000 3.193 3.193 generate_qs_task_list 11 8.9 2.091 2.091 3.193 3.193 hybrid_alltoall_any 4725 16.4 2.521 2.521 2.900 2.900 dbcsr_finalize 5186 13.8 0.255 0.255 2.804 2.804 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 2.740 2.740 make_images_data 4572 15.5 0.048 0.048 2.715 2.715 pw_scatter_s 595 15.2 2.666 2.666 2.666 2.666 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.031 74.384 74.395 qs_mol_dyn_low 1 2.0 0.004 0.006 74.225 74.232 qs_forces 11 3.9 0.002 0.002 74.147 74.148 qs_energies 11 4.9 0.001 0.001 69.813 69.817 scf_env_do_scf 11 5.9 0.001 0.003 64.310 64.312 scf_env_do_scf_inner_loop 108 6.5 0.005 0.027 59.380 59.381 velocity_verlet 10 3.0 0.002 0.004 43.860 43.861 rebuild_ks_matrix 119 8.3 0.001 0.001 26.138 26.361 qs_ks_build_kohn_sham_matrix 119 9.3 0.024 0.029 26.137 26.360 dbcsr_multiply_generic 2286 12.5 0.116 0.126 25.521 25.867 qs_ks_update_qs_env 119 7.6 0.002 0.002 23.445 23.659 qs_scf_new_mos 108 7.5 0.001 0.001 20.658 20.888 qs_scf_loop_do_ot 108 8.5 0.001 0.002 20.657 20.887 ot_scf_mini 108 9.5 0.003 0.004 19.479 19.708 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.355 19.421 calculate_rho_elec 119 8.7 0.033 0.035 19.354 19.421 sum_up_and_integrate 119 10.3 0.031 0.040 18.965 19.003 integrate_v_rspace 119 11.3 0.007 0.007 18.934 18.975 multiply_cannon 2286 13.5 0.190 0.209 18.161 18.963 mp_waitall_1 169478 16.3 16.617 17.574 16.617 17.574 multiply_cannon_loop 2286 14.5 0.158 0.179 16.763 17.484 grid_collocate_task_list 119 9.7 11.405 13.091 11.405 13.091 grid_integrate_task_list 119 12.3 9.881 12.855 9.881 12.855 multiply_cannon_metrocomm3 18288 15.5 0.058 0.063 10.854 12.116 ot_mini 108 10.5 0.001 0.002 11.523 11.768 rs_pw_transfer 974 11.9 0.020 0.024 9.175 10.114 density_rs2pw 119 9.7 0.007 0.008 7.364 8.236 qs_ot_get_derivative 108 11.5 0.001 0.002 6.075 6.301 potential_pw2rs 119 12.3 0.008 0.010 5.789 5.808 pw_transfer 1439 11.6 0.107 0.116 5.399 5.465 ot_diis_step 108 11.5 0.005 0.006 5.339 5.340 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 5.084 5.269 apply_single 119 13.6 0.001 0.001 5.084 5.269 fft_wrap_pw1pw2 1201 12.6 0.013 0.014 5.196 5.256 init_scf_loop 11 6.9 0.000 0.001 4.908 4.909 mp_waitany 9880 13.7 3.820 4.816 3.820 4.816 multiply_cannon_multrec 18288 15.5 4.169 4.549 4.189 4.571 make_m2s 4572 13.5 0.062 0.068 4.385 4.552 mp_alltoall_d11v 2130 13.8 3.590 4.179 3.590 4.179 fft_wrap_pw1pw2_140 487 13.2 0.390 0.412 4.024 4.132 make_images 4572 14.5 0.152 0.166 3.943 4.108 fft3d_ps 1201 14.6 1.641 1.771 4.018 4.102 rs_pw_transfer_RS2PW_140 130 11.5 0.329 0.366 3.138 4.062 init_scf_run 11 5.9 0.000 0.006 3.972 3.973 scf_env_initial_rho_setup 11 6.9 0.000 0.004 3.972 3.972 rs_gather_matrices 119 12.3 0.120 0.133 3.211 3.759 wfi_extrapolate 11 7.9 0.001 0.001 3.568 3.569 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 2.913 2.938 mp_sum_l 11218 13.2 2.265 2.863 2.265 2.863 qs_ot_get_p 119 10.4 0.001 0.002 2.523 2.732 rs_pw_transfer_PW2RS_140 130 13.9 0.741 0.788 2.188 2.261 make_images_data 4572 15.5 0.045 0.052 1.929 2.127 mp_alltoall_z22v 1201 16.6 1.986 2.115 1.986 2.115 mp_sum_d 4129 12.0 1.775 2.100 1.775 2.100 qs_ot_get_derivative_diag 49 12.0 0.001 0.002 1.969 2.064 multiply_cannon_metrocomm1 18288 15.5 0.028 0.033 1.249 2.048 qs_ot_get_derivative_taylor 59 13.0 0.002 0.002 1.920 2.037 prepare_preconditioner 11 7.9 0.000 0.000 1.967 2.001 make_preconditioner 11 8.9 0.000 0.000 1.967 2.001 make_full_inverse_cholesky 11 9.9 0.000 0.000 1.799 1.840 hybrid_alltoall_any 4725 16.4 0.092 0.289 1.614 1.792 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.002 1.706 1.708 rs_pw_transfer_PW2RS_50 119 14.3 0.385 0.416 1.542 1.630 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 1.454 1.512 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=66.309, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.668, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=14.351, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=9.252, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=7.697, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="make_dense_data", label="make_dense_data", y=6.966, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=6.519, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=28.491999999999997, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=11.405, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=9.881, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.169, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=16.617, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitany", label="mp_waitany", y=3.82, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.041 0.041 157.848 157.848 qs_mol_dyn_low 1 2.0 0.003 0.003 157.122 157.122 qs_forces 11 3.9 0.001 0.001 157.072 157.072 qs_energies 11 4.9 0.001 0.001 146.456 146.456 scf_env_do_scf 11 5.9 0.002 0.002 129.075 129.075 scf_env_do_scf_inner_loop 96 6.5 0.014 0.014 101.356 101.356 velocity_verlet 10 3.0 0.002 0.002 100.322 100.322 rebuild_ks_matrix 107 8.3 0.001 0.001 43.293 43.293 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 43.292 43.292 qs_rho_update_rho_low 107 7.7 0.001 0.001 43.150 43.150 calculate_rho_elec 107 8.7 0.929 0.929 43.150 43.150 qs_ks_update_qs_env 107 7.6 0.001 0.001 38.841 38.841 grid_collocate_task_list 107 9.7 36.502 36.502 36.502 36.502 sum_up_and_integrate 107 10.3 0.289 0.289 33.204 33.204 integrate_v_rspace 107 11.3 0.137 0.137 32.915 32.915 grid_integrate_task_list 107 12.3 29.693 29.693 29.693 29.693 qs_scf_new_mos 96 7.5 0.001 0.001 28.354 28.354 qs_scf_loop_do_ot 96 8.5 0.001 0.001 28.353 28.353 init_scf_loop 11 6.9 0.000 0.000 27.516 27.516 ot_scf_mini 96 9.5 0.004 0.004 26.322 26.322 dbcsr_multiply_generic 1966 12.4 0.228 0.228 26.281 26.281 prepare_preconditioner 11 7.9 0.000 0.000 22.174 22.174 make_preconditioner 11 8.9 0.000 0.000 22.174 22.174 make_full_inverse_cholesky 11 9.9 0.000 0.000 20.847 20.847 ot_mini 96 10.5 0.001 0.001 15.417 15.417 make_m2s 3932 13.4 0.063 0.063 14.169 14.169 pw_transfer 1295 11.6 0.090 0.090 10.179 10.179 fft_wrap_pw1pw2 1081 12.6 0.009 0.009 9.781 9.781 qs_ot_get_derivative 96 11.5 0.002 0.002 9.077 9.077 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.903 8.903 cp_fm_cholesky_decompose 22 10.9 8.795 8.795 8.795 8.795 fft_wrap_pw1pw2_140 439 13.2 0.633 0.633 8.564 8.564 init_scf_run 11 5.9 0.002 0.002 7.543 7.543 scf_env_initial_rho_setup 11 6.9 0.001 0.001 7.541 7.541 cp_fm_cholesky_invert 11 10.9 7.420 7.420 7.420 7.420 dbcsr_make_dense_low 4961 15.5 0.145 0.145 7.346 7.346 make_images 3932 14.4 2.673 2.673 7.247 7.247 make_dense_data 4961 16.5 6.069 6.069 7.184 7.184 multiply_cannon 1966 13.4 0.331 0.331 6.753 6.753 wfi_extrapolate 11 7.9 0.001 0.001 6.654 6.654 dbcsr_copy 1855 11.9 0.312 0.312 6.412 6.412 dbcsr_make_images_dense 3386 14.7 0.023 0.023 6.342 6.342 ot_diis_step 96 11.5 0.005 0.005 6.336 6.336 qs_ot_get_p 107 10.4 0.001 0.001 6.296 6.296 dbcsr_copy_into_existing 22 7.9 6.018 6.018 6.018 6.018 multiply_cannon_loop 1966 14.4 0.198 0.198 6.004 6.004 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.823 5.823 apply_single 107 13.6 0.001 0.001 5.823 5.823 multiply_cannon_multrec 1966 15.4 5.730 5.730 5.805 5.805 density_rs2pw 107 9.7 0.006 0.006 5.718 5.718 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 5.482 5.482 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 5.131 5.131 fft3d_s 1082 14.6 5.055 5.055 5.065 5.065 dbcsr_complete_redistribute 317 12.2 2.253 2.253 4.662 4.662 qs_ot_p2m_diag 44 11.0 0.143 0.143 4.489 4.489 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.200 4.200 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.106 4.106 cp_dbcsr_syevd 44 12.0 0.004 0.004 4.011 4.011 copy_dbcsr_to_fm 147 11.2 0.004 0.004 3.828 3.828 cp_fm_diag_elpa 44 13.0 0.000 0.000 3.824 3.824 cp_fm_diag_elpa_base 44 14.0 3.762 3.762 3.823 3.823 qs_create_task_list 11 7.9 0.000 0.000 3.637 3.637 generate_qs_task_list 11 8.9 2.530 2.530 3.637 3.637 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.029 108.101 108.112 qs_mol_dyn_low 1 2.0 0.004 0.005 107.970 107.975 qs_forces 11 3.9 0.002 0.002 107.914 107.915 qs_energies 11 4.9 0.001 0.001 101.145 101.151 scf_env_do_scf 11 5.9 0.001 0.003 93.693 93.695 scf_env_do_scf_inner_loop 96 6.5 0.004 0.024 86.544 86.544 velocity_verlet 10 3.0 0.002 0.003 64.058 64.059 rebuild_ks_matrix 107 8.3 0.001 0.001 45.070 45.177 qs_ks_build_kohn_sham_matrix 107 9.3 0.021 0.025 45.070 45.177 qs_ks_update_qs_env 107 7.6 0.001 0.002 39.920 40.026 sum_up_and_integrate 107 10.3 0.027 0.030 38.749 38.789 integrate_v_rspace 107 11.3 0.006 0.007 38.721 38.760 qs_rho_update_rho_low 107 7.7 0.001 0.001 37.468 37.485 calculate_rho_elec 107 8.7 0.029 0.038 37.467 37.484 grid_integrate_task_list 107 12.3 27.348 33.398 27.348 33.398 grid_collocate_task_list 107 9.7 27.245 32.492 27.245 32.492 dbcsr_multiply_generic 1966 12.4 0.099 0.109 22.042 22.260 qs_scf_new_mos 96 7.5 0.001 0.001 17.476 17.632 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.475 17.631 ot_scf_mini 96 9.5 0.003 0.004 16.525 16.691 multiply_cannon 1966 13.4 0.162 0.182 15.776 16.407 multiply_cannon_loop 1966 14.4 0.135 0.146 14.592 15.143 mp_waitall_1 146670 16.2 14.323 14.714 14.323 14.714 rs_pw_transfer 878 11.9 0.017 0.021 11.394 13.549 density_rs2pw 107 9.7 0.006 0.007 9.721 11.898 multiply_cannon_metrocomm3 15728 15.4 0.050 0.064 9.420 10.337 ot_mini 96 10.5 0.001 0.002 9.707 9.899 mp_waitany 8968 13.7 6.709 8.852 6.709 8.852 mp_alltoall_d11v 1998 13.7 6.576 8.318 6.576 8.318 rs_pw_transfer_RS2PW_140 118 11.5 0.294 0.343 6.081 8.236 rs_gather_matrices 107 12.3 0.105 0.118 6.234 7.915 init_scf_loop 11 6.9 0.000 0.001 7.128 7.129 init_scf_run 11 5.9 0.000 0.006 5.939 5.939 scf_env_initial_rho_setup 11 6.9 0.000 0.004 5.939 5.939 wfi_extrapolate 11 7.9 0.001 0.001 5.403 5.404 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 5.350 5.366 qs_ot_get_derivative 96 11.5 0.001 0.001 5.138 5.309 potential_pw2rs 107 12.3 0.007 0.008 5.083 5.106 pw_transfer 1295 11.6 0.094 0.100 4.722 4.765 fft_wrap_pw1pw2 1081 12.6 0.012 0.017 4.545 4.580 ot_diis_step 96 11.5 0.005 0.005 4.481 4.484 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 4.377 4.484 apply_single 107 13.6 0.001 0.001 4.377 4.484 multiply_cannon_multrec 15728 15.4 3.708 3.969 3.725 3.988 make_m2s 3932 13.4 0.053 0.060 3.740 3.881 fft_wrap_pw1pw2_140 439 13.2 0.340 0.367 3.567 3.618 fft3d_ps 1081 14.6 1.438 1.540 3.491 3.564 make_images 3932 14.4 0.132 0.145 3.354 3.460 mp_sum_l 9666 13.1 1.926 2.497 1.926 2.497 qs_ot_get_p 107 10.4 0.001 0.001 2.136 2.353 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=69.369, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=36.502, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=29.693, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=8.795, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=7.42, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="make_dense_data", label="make_dense_data", y=6.069, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=25.89999999999999, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=27.245, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.348, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=6.576, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=14.323, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=6.709, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.215 0.215 135.234 135.234 qs_energies 1 2.0 0.000 0.000 134.310 134.310 scf_env_do_scf 1 3.0 0.000 0.000 132.739 132.739 qs_ks_update_qs_env 8 5.0 0.000 0.000 123.809 123.809 rebuild_ks_matrix 7 6.0 0.000 0.000 123.741 123.741 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 123.741 123.741 hfx_ks_matrix 7 8.0 0.000 0.000 110.374 110.374 integrate_four_center 7 9.0 1.623 1.623 110.340 110.340 integrate_four_center_main 7 10.0 1.068 1.068 97.677 97.677 integrate_four_center_bin 457 11.0 96.609 96.609 96.609 96.609 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 71.370 71.370 init_scf_loop 1 4.0 0.000 0.000 61.356 61.356 integrate_four_center_load 7 10.0 0.001 0.001 10.689 10.689 hfx_load_balance 1 11.0 0.002 0.002 10.688 10.688 hfx_load_balance_count 1 12.0 6.095 6.095 6.095 6.095 qs_vxc_create 14 8.0 0.000 0.000 5.063 5.063 xc_vxc_pw_create 14 9.0 0.134 0.134 5.062 5.062 hfx_load_balance_bin 1 12.0 4.543 4.543 4.543 4.543 prepare_preconditioner 1 5.0 0.000 0.000 4.402 4.402 make_preconditioner 1 6.0 0.000 0.000 4.402 4.402 calculate_rho_elec 15 7.4 0.125 0.125 3.809 3.809 xc_rho_set_and_dset_create 14 10.0 0.125 0.125 3.693 3.693 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 3.346 3.346 pw_transfer 251 10.6 0.020 0.020 3.221 3.221 fft_wrap_pw1pw2 237 11.7 0.003 0.003 3.183 3.183 fft_wrap_pw1pw2_140 150 13.1 0.199 0.199 3.022 3.022 grid_collocate_task_list 15 8.4 2.901 2.901 2.901 2.901 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.310 0.343 156.589 156.600 qs_energies 1 2.0 0.000 0.000 156.083 156.092 scf_env_do_scf 1 3.0 0.000 0.000 155.550 155.550 qs_ks_update_qs_env 8 5.0 0.000 0.000 152.272 152.272 rebuild_ks_matrix 7 6.0 0.000 0.000 152.262 152.262 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.004 152.262 152.262 hfx_ks_matrix 7 8.0 0.000 0.001 144.153 144.156 integrate_four_center 7 9.0 0.060 0.384 144.141 144.144 integrate_four_center_main 7 10.0 0.005 0.006 95.081 130.497 integrate_four_center_bin 448 11.0 95.076 130.492 95.076 130.492 scf_env_do_scf_inner_loop 7 4.0 0.001 0.002 88.890 88.890 init_scf_loop 1 4.0 0.000 0.000 66.659 66.660 mp_sync 70 11.3 35.450 40.669 35.450 40.669 integrate_four_center_load 7 10.0 0.000 0.000 12.772 12.783 hfx_load_balance 1 11.0 0.001 0.001 12.772 12.783 mp_sum_l 1135 8.3 6.455 6.769 6.455 6.769 hfx_load_balance_dist 1 12.0 0.000 0.000 6.232 6.545 hfx_load_balance_count 1 12.0 3.220 6.358 3.220 6.358 hfx_load_balance_bin 1 12.0 3.231 6.325 3.231 6.325 qs_vxc_create 14 8.0 0.000 0.001 3.604 3.605 xc_vxc_pw_create 14 9.0 0.009 0.010 3.604 3.604 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=23.463000000000008, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=96.609, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=6.095, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.543, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=2.901, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=1.623, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=13.097000000000008, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=95.076, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=3.22, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=3.231, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.06, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=35.45, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=6.455, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.018 127.805 127.805 qs_energies 1 2.0 0.000 0.000 127.339 127.339 mp2_main 1 3.0 0.000 0.000 121.891 121.891 mp2_gpw_main 1 4.0 0.000 0.000 121.700 121.700 rpa_ri_compute_en 1 5.0 0.000 0.000 117.462 117.462 rpa_num_int 1 6.0 0.001 0.001 117.456 117.456 compute_mat_P_omega 1 7.0 0.003 0.003 99.461 99.461 compute_mat_P_omega_contract 10 8.0 10.154 10.154 99.151 99.151 dbt_total 2336 9.6 0.019 0.019 87.166 87.166 dbt_contract 787 11.0 0.056 0.056 77.178 77.178 dbt_tas_total 1149 12.2 0.404 0.404 74.539 74.539 dbt_tas_multiply 807 12.1 0.003 0.003 73.148 73.148 dbt_tas_dbm 807 14.1 0.006 0.006 64.611 64.611 dbm_multiply 807 16.1 64.592 64.592 64.592 64.592 dbt_tas_mm_1N 524 15.1 0.003 0.003 44.064 44.064 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 37.155 37.155 compute_mat_P_omega_calc_M_occ 250 9.0 10.074 10.074 25.407 25.407 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 18.489 18.489 dbt_tas_mm_2 251 15.0 0.003 0.003 17.170 17.170 compute_QP_energies 1 7.0 0.000 0.000 9.708 9.708 compute_self_energy_cubic_gw 1 8.0 0.069 0.069 9.707 9.707 dbt_copy 1103 10.7 0.147 0.147 8.566 8.566 contract_cubic_gw 21 9.0 0.000 0.000 8.163 8.163 scf_env_do_scf 1 3.0 0.000 0.000 5.287 5.287 scf_env_do_scf_inner_loop 17 4.0 0.003 0.003 5.286 5.286 dbt_tas_reserve_blocks_index 3261 14.3 0.202 0.202 4.525 4.525 dbm_reserve_blocks 3628 15.3 4.453 4.453 4.453 4.453 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 4.230 4.230 dbt_crop 1042 12.0 2.586 2.586 3.864 3.864 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 3.647 3.647 dbt_tas_copy 574 11.4 2.305 2.305 3.625 3.625 compute_W_cubic_GW 10 7.0 0.009 0.009 3.434 3.434 dbt_reserve_blocks_index 2280 13.1 0.084 0.084 3.329 3.329 dbt_reserve_blocks_index_array 2222 12.2 0.016 0.016 3.319 3.319 dbt_tas_mm_3N 22 15.1 0.000 0.000 3.089 3.089 dbt_reshape 278 11.9 1.775 1.775 2.957 2.957 cp_fm_cholesky_decompose 14 8.1 2.841 2.841 2.841 2.841 convert_to_new_pgrid 2421 14.1 0.252 0.252 2.773 2.773 qs_scf_new_mos 17 5.0 0.001 0.001 2.678 2.678 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.026 50.015 50.025 qs_energies 1 2.0 0.000 0.001 49.903 49.905 mp2_main 1 3.0 0.001 0.009 48.228 48.230 mp2_gpw_main 1 4.0 0.000 0.000 48.157 48.158 rpa_ri_compute_en 1 5.0 0.000 0.000 46.601 46.603 rpa_num_int 1 6.0 0.000 0.002 46.600 46.602 dbt_total 2336 9.6 0.020 0.027 41.059 41.088 compute_mat_P_omega 1 7.0 0.001 0.007 39.494 39.512 compute_mat_P_omega_contract 10 8.0 0.536 0.590 39.242 39.246 dbt_contract 787 11.0 0.042 0.047 29.832 29.850 dbt_tas_total 1149 12.2 0.089 0.105 26.615 26.620 dbt_tas_multiply 807 12.1 0.003 0.004 26.556 26.572 dbt_tas_dbm 807 14.1 0.006 0.007 19.148 19.165 dbm_multiply 807 16.1 14.437 15.337 14.437 15.337 compute_mat_P_omega_calc_M_occ 250 9.0 0.517 0.560 11.548 11.556 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 11.344 11.345 mp_sync 8706 11.6 9.024 10.306 9.024 10.306 dbt_copy 1111 10.7 0.019 0.024 8.992 9.375 dbt_tas_mm_2 251 15.0 0.003 0.004 9.099 9.106 dbt_reshape 1098 11.7 3.154 3.651 8.591 8.969 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.002 7.978 7.987 dbt_tas_mm_1N 524 15.1 0.002 0.003 6.433 7.023 mp_waitall_2 3776 15.3 4.444 4.951 4.444 4.951 dbt_communicate_buffer 1098 12.7 0.071 0.087 4.457 4.713 compute_QP_energies 1 7.0 0.000 0.000 4.445 4.447 compute_self_energy_cubic_gw 1 8.0 0.003 0.004 4.441 4.445 contract_cubic_gw 21 9.0 0.000 0.000 3.468 3.468 dbt_crop 1042 12.0 1.213 1.546 1.923 2.289 dbt_reserve_blocks_index_array 2791 12.2 0.014 0.016 2.100 2.272 dbt_reserve_blocks_index 2849 13.1 0.087 0.101 2.099 2.268 dbt_tas_reserve_blocks_index 3300 14.5 0.136 0.169 2.057 2.214 dbm_reserve_blocks 3696 15.4 2.046 2.191 2.046 2.191 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 1.683 1.694 dbt_tas_replicate 396 14.1 0.659 0.844 1.500 1.622 scf_env_do_scf 1 3.0 0.000 0.000 1.603 1.604 scf_env_do_scf_inner_loop 17 4.0 0.001 0.004 1.603 1.604 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 1.590 1.603 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 1.549 1.553 cp_gemm 105 8.4 0.000 0.000 1.527 1.544 cp_gemm_cosma 105 9.4 1.527 1.544 1.527 1.544 mp_max_i 1992 9.8 1.151 1.380 1.151 1.380 compute_W_cubic_GW 10 7.0 0.001 0.001 1.253 1.261 convert_to_new_pgrid 2421 14.1 0.035 0.039 1.124 1.253 dbm_copy 1608 15.1 1.080 1.207 1.080 1.207 mp_sum_l 6085 13.0 1.063 1.137 1.063 1.137 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 1.011 1.012 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=33.91600000000001, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=64.592, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=10.154, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=10.074, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=4.453, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=2.841, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbt_reshape", label="dbt_reshape", y=1.775, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=15.857000000000006, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=14.437, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=0.536, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=0.517, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=2.046, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbt_reshape", label="dbt_reshape", y=3.154, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=4.444, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=9.024, yerr=0.0 Running RI-HFX_H2O-32.inp with 1 threads and 32 ranks... done. Running RI-HFX_H2O-32.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.021 0.021 350.622 350.622 qs_forces 1 2.0 0.000 0.000 349.972 349.972 rebuild_ks_matrix 7 6.6 0.000 0.000 347.612 347.612 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 347.612 347.612 hfx_ks_matrix 7 8.6 0.000 0.000 344.728 344.728 dbt_total 4861 11.6 0.050 0.050 287.400 287.400 hfx_ri_update_ks 7 9.6 0.000 0.000 275.401 275.401 hfx_ri_update_ks_Pmat 7 10.6 38.481 38.481 275.397 275.397 dbt_tas_total 2391 14.1 1.720 1.720 246.049 246.049 qs_energies 1 3.0 0.000 0.000 244.955 244.955 scf_env_do_scf 1 4.0 0.000 0.000 244.530 244.530 qs_ks_update_qs_env 8 6.0 0.000 0.000 242.674 242.674 dbt_contract 1473 13.0 0.238 0.238 228.883 228.883 dbt_tas_multiply 1482 14.0 0.007 0.007 214.461 214.461 dbt_tas_dbm 1482 16.0 0.011 0.011 185.882 185.882 dbm_multiply 1482 18.0 185.849 185.849 185.849 185.849 hfx_ri_update_ks_Pmat_KS 567 11.6 0.007 0.007 182.082 182.082 dbt_tas_mm_2 649 17.1 0.008 0.008 150.529 150.529 scf_env_do_scf_inner_loop 6 5.0 0.001 0.001 149.402 149.402 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 104.942 104.942 init_scf_loop 2 5.0 0.000 0.000 95.123 95.123 hfx_ri_update_forces 1 7.0 0.000 0.000 69.322 69.322 hfx_ri_forces_Pmat_3c 1 8.0 0.003 0.003 48.582 48.582 dbt_copy 2331 12.4 0.299 0.299 26.847 26.847 dbt_tas_mm_3T 659 17.1 0.003 0.003 24.452 24.452 dbt_tas_reshape 906 14.4 0.022 0.022 24.352 24.352 hfx_ri_update_ks_Pmat_Px3C 567 11.6 0.003 0.003 23.042 23.042 hfx_ri_pre_scf_Pmat 1 12.0 0.001 0.001 17.901 17.901 precalc_derivatives 1 8.0 0.009 0.009 16.478 16.478 dbm_reserve_blocks 8303 16.8 15.455 15.455 15.455 15.455 dbt_tas_merge 649 14.1 13.255 13.255 14.900 14.900 dbt_tas_reserve_blocks_index 7397 16.0 0.446 0.446 14.611 14.611 dbt_reshape 856 13.9 8.543 8.543 14.355 14.355 dbt_crop 2763 14.2 9.719 9.719 14.081 14.081 dbt_tas_reshape_buffer_fill 906 15.4 12.579 12.579 12.579 12.579 dbt_reserve_blocks_index 4998 15.2 0.192 0.192 10.926 10.926 dbt_reserve_blocks_index_array 4963 14.3 0.038 0.038 10.853 10.853 dbt_tas_mm_3N 163 16.5 0.001 0.001 9.767 9.767 build_3c_derivatives 9 9.0 2.392 2.392 8.889 8.889 dbt_tas_reshape_buffer_obtain 906 15.4 7.590 7.590 8.750 8.750 reshape_mm_small 906 15.6 0.163 0.163 8.077 8.077 dbt_tas_replicate 906 15.6 5.944 5.944 8.056 8.056 dbt_tas_copy 1475 13.1 4.738 4.738 8.011 8.011 hfx_ri_pre_scf_Pmat_RIx3C 81 13.0 0.001 0.001 7.460 7.460 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-HFX_H2O-32_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.029 85.233 85.244 qs_forces 1 2.0 0.000 0.000 84.950 84.951 rebuild_ks_matrix 7 6.6 0.000 0.000 83.756 83.759 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 83.756 83.759 hfx_ks_matrix 7 8.6 0.000 0.002 82.178 82.195 dbt_total 4861 11.6 0.043 0.049 74.692 74.712 dbt_contract 1473 13.0 0.139 0.155 56.988 57.005 hfx_ri_update_ks 7 9.6 0.000 0.000 54.375 54.377 hfx_ri_update_ks_Pmat 7 10.6 1.825 2.365 54.372 54.372 dbt_tas_total 2391 14.1 0.192 0.214 53.927 53.934 qs_energies 1 3.0 0.000 0.000 50.102 50.102 scf_env_do_scf 1 4.0 0.000 0.001 49.885 49.886 qs_ks_update_qs_env 8 6.0 0.000 0.000 48.925 48.929 dbt_tas_multiply 1482 14.0 0.007 0.008 48.574 48.588 dbt_tas_dbm 1482 16.0 0.010 0.012 36.453 36.486 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 34.833 34.834 dbm_multiply 1482 18.0 24.633 30.072 24.633 30.072 hfx_ri_update_ks_Pmat_KS 567 11.6 0.006 0.008 29.008 29.010 scf_env_do_scf_inner_loop 6 5.0 0.000 0.002 28.879 28.880 hfx_ri_update_forces 1 7.0 0.000 0.001 27.802 27.817 mp_sync 17513 13.6 19.375 23.021 19.375 23.021 init_scf_loop 2 5.0 0.000 0.000 21.005 21.006 dbt_tas_mm_2 649 17.1 0.007 0.009 20.808 20.839 hfx_ri_forces_Pmat_3c 1 8.0 0.003 0.004 20.001 20.023 hfx_ri_update_ks_Pmat_Px3C 567 11.6 0.003 0.003 10.867 10.874 dbt_copy 2349 12.4 0.053 0.061 9.444 10.015 dbt_tas_mm_3N 163 16.5 0.001 0.001 7.060 7.213 dbt_reshape 1256 13.5 2.898 3.161 6.611 6.937 hfx_ri_pre_scf_Pmat 1 12.0 0.001 0.001 5.996 5.997 precalc_derivatives 1 8.0 0.002 0.003 5.911 5.911 dbt_tas_mm_3T 659 17.1 0.003 0.003 5.187 5.804 dbt_crop 2763 14.2 3.668 4.271 4.641 5.384 mp_waitall_2 5988 16.5 4.526 5.035 4.526 5.035 dbt_tas_merge 649 14.1 1.893 2.337 3.522 3.971 hfx_ri_pre_scf_Pmat_RIx3C 81 13.0 0.000 0.001 3.661 3.670 dbm_reserve_blocks 8337 16.9 2.983 3.391 2.983 3.391 dbt_tas_reserve_blocks_index 7428 16.1 0.303 0.357 2.869 3.312 dbt_tas_communicate_buffer 1825 16.3 0.076 0.081 2.932 3.292 mp_max_i 3372 12.5 2.560 3.089 2.560 3.089 dbt_tas_replicate 909 15.6 0.761 0.868 2.834 2.896 dbt_communicate_buffer 1256 14.5 0.057 0.067 2.586 2.788 dbt_reserve_blocks_index_array 5363 14.2 0.022 0.026 2.394 2.750 dbt_reserve_blocks_index 5398 15.2 0.160 0.197 2.392 2.746 build_3c_derivatives 9 9.0 0.267 0.366 2.653 2.660 mp_sum_l 38201 15.3 2.332 2.554 2.332 2.554 dbt_tas_reshape 916 14.4 0.013 0.016 2.312 2.400 mp_alltoall_i 4339 15.3 2.114 2.333 2.114 2.333 hfx_ri_update_ks_Pmat_copy_2 567 11.6 0.002 0.003 2.231 2.236 convert_to_new_pgrid 4446 16.0 0.061 0.066 1.788 1.915 dbm_copy 3043 16.9 1.727 1.860 1.727 1.860 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_32omp", title="Timings of RI-HFX_H2O-32 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="rest", label="rest", y=75.28400000000005, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=185.849, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=38.481, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=15.455, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_tas_merge", label="dbt_tas_merge", y=13.255, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_tas_reshape_buffer_fill", label="dbt_tas_reshape_buffer_fill", y=12.579, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_crop", label="dbt_crop", y=9.719, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_32mpi", title="Timings of RI-HFX_H2O-32 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="rest", label="rest", y=26.33, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=24.633, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=1.825, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=2.983, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_tas_merge", label="dbt_tas_merge", y=1.893, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_tas_reshape_buffer_fill", label="dbt_tas_reshape_buffer_fill", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_crop", label="dbt_crop", y=3.668, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=4.526, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_sync", label="mp_sync", y=19.375, yerr=0.0 Running RI-MP2_ammonia.inp with 1 threads and 32 ranks... done. Running RI-MP2_ammonia.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.019 261.696 261.696 qs_energies 1 2.0 0.000 0.000 261.472 261.472 mp2_main 1 3.0 0.000 0.000 254.091 254.091 mp2_gpw_main 1 4.0 0.001 0.001 253.500 253.500 mp2_ri_gpw_compute_in 1 5.0 0.399 0.399 185.749 185.749 mp2_ri_gpw_compute_in_loop 1 6.0 0.025 0.025 158.586 158.586 mp2_eri_3c_integrate_gpw 2656 7.0 0.023 0.023 114.603 114.603 integrate_v_rspace 2666 8.0 1.094 1.094 97.315 97.315 grid_integrate_task_list 2666 9.0 93.094 93.094 93.094 93.094 mp2_ri_gpw_compute_en 1 5.0 0.090 0.090 67.722 67.722 mp2_ri_gpw_compute_en_RI_loop 1 6.0 11.760 11.760 65.743 65.743 mp2_ri_gpw_compute_en_expansio 2080 7.0 3.756 3.756 43.243 43.243 offload_gemm 2080 8.0 39.486 39.486 39.486 39.486 dbcsr_multiply_generic 5322 8.0 0.288 0.288 33.928 33.928 ao_to_mo_and_store_B_mult_1 2656 7.0 0.020 0.020 33.888 33.888 calculate_wavefunction 5312 9.0 18.351 18.351 30.963 30.963 get_2c_integrals 1 6.0 0.000 0.000 26.762 26.762 compute_2c_integrals 1 7.0 0.007 0.007 25.325 25.325 compute_2c_integrals_loop_lm 1 8.0 0.007 0.007 25.298 25.298 mp2_eri_2c_integrate_gpw 1 9.0 4.178 4.178 25.291 25.291 pw_transfer 63872 10.6 1.191 1.191 18.370 18.370 fft_wrap_pw1pw2 53228 11.4 0.149 0.149 16.652 16.652 multiply_cannon 5322 9.0 0.783 0.783 16.195 16.195 make_m2s 10644 9.0 0.102 0.102 14.061 14.061 multiply_cannon_loop 5322 10.0 0.659 0.659 14.038 14.038 make_images 10644 10.0 5.051 5.051 13.475 13.475 fft_wrap_pw1pw2_20 21271 12.4 1.035 1.035 11.373 11.373 multiply_cannon_multrec 5322 11.0 11.057 11.057 11.124 11.124 fft3d_s 53229 13.4 9.992 9.992 10.071 10.071 ao_to_mo_and_store_B_E_Ex_1 2656 7.0 3.446 3.446 9.969 9.969 mp2_ri_gpw_compute_en_ener 2080 7.0 7.758 7.758 7.758 7.758 copy_dbcsr_to_fm 2679 8.0 0.047 0.047 7.143 7.143 scf_env_do_scf 1 3.0 0.000 0.000 6.851 6.851 scf_env_do_scf_inner_loop 10 4.0 0.002 0.002 6.851 6.851 potential_pw2rs 5322 10.0 0.201 0.201 6.266 6.266 make_images_data 10644 11.0 0.103 0.103 5.823 5.823 hybrid_alltoall_any 13323 11.6 5.411 5.411 5.746 5.746 mp2_eri_2c_integrate_gpw_pot_l 2656 10.0 0.009 0.009 5.398 5.398 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-MP2_ammonia_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.036 53.719 53.730 qs_energies 1 2.0 0.000 0.000 53.554 53.555 mp2_main 1 3.0 0.000 0.001 50.191 50.192 mp2_gpw_main 1 4.0 0.002 0.003 49.986 49.987 mp2_ri_gpw_compute_en 1 5.0 0.234 0.256 28.011 29.110 mp2_ri_gpw_compute_in 1 5.0 0.063 0.067 21.856 27.505 mp2_ri_gpw_compute_in_loop 1 6.0 0.001 0.002 19.550 25.204 mp2_eri_3c_integrate_gpw 83 7.0 0.001 0.002 16.507 22.499 integrate_v_rspace 93 8.1 0.146 0.159 16.440 22.326 grid_integrate_task_list 93 9.1 15.928 21.892 15.928 21.892 mp2_ri_gpw_compute_en_RI_loop 1 6.0 1.394 1.529 20.814 20.852 mp2_ri_gpw_compute_en_expansio 65 7.0 0.119 0.156 13.891 15.087 offload_gemm 65 8.0 13.772 14.972 13.772 14.972 mp_min_d 2 7.0 5.724 6.899 5.724 6.899 mp2_ri_get_integ_group_size 1 6.0 0.000 0.000 5.655 6.754 mp2_ri_gpw_compute_en_comm 20 7.0 0.200 0.253 5.071 6.645 mp_sendrecv_dm3 600 8.0 4.091 5.803 4.091 5.803 scf_env_do_scf 1 3.0 0.000 0.000 3.169 3.170 scf_env_do_scf_inner_loop 10 4.0 0.000 0.002 3.169 3.170 dbcsr_multiply_generic 176 8.0 0.013 0.017 2.642 2.995 ao_to_mo_and_store_B_mult_1 83 7.0 0.001 0.001 2.614 2.967 get_2c_integrals 1 6.0 0.000 0.000 2.201 2.257 qs_scf_new_mos 10 5.0 0.000 0.000 1.789 1.831 compute_2c_integrals 1 7.0 0.003 0.005 1.745 1.766 eigensolver 11 5.8 0.001 0.002 1.738 1.739 compute_2c_integrals_loop_lm 1 8.0 0.002 0.004 1.252 1.641 mp2_eri_2c_integrate_gpw 1 9.0 0.245 0.413 1.251 1.640 multiply_cannon 176 9.0 0.023 0.027 1.431 1.601 multiply_cannon_loop 176 10.0 0.003 0.004 1.353 1.516 cp_fm_diag_elpa 11 6.8 0.000 0.000 1.430 1.438 cp_fm_redistribute_end 11 7.8 0.541 1.415 0.557 1.422 cp_fm_diag_elpa_base 11 7.8 0.834 1.346 0.853 1.371 make_m2s 352 9.0 0.005 0.005 1.149 1.361 make_images 352 10.0 0.061 0.070 1.133 1.344 calculate_wavefunction 166 9.0 0.576 0.811 1.037 1.304 multiply_cannon_multrec 246 11.0 1.099 1.206 1.107 1.216 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_32omp", title="Timings of RI-MP2_ammonia with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="rest", label="rest", y=87.94800000000006, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=93.094, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="offload_gemm", label="offload_gemm", y=39.486, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="calculate_wavefunction", label="calculate_wavefunction", y=18.351, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=11.76, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=11.057, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=0.0, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp_min_d", label="mp_min_d", y=0.0, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_32mpi", title="Timings of RI-MP2_ammonia with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="rest", label="rest", y=11.134999999999998, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=15.928, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="offload_gemm", label="offload_gemm", y=13.772, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="calculate_wavefunction", label="calculate_wavefunction", y=0.576, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=1.394, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=1.099, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=4.091, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp_min_d", label="mp_min_d", y=5.724, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.119 0.119 175.083 175.083 qs_energies 1 2.0 0.000 0.000 173.577 173.577 scf_env_do_scf 1 3.0 0.000 0.000 164.683 164.683 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 164.683 164.683 qs_scf_new_mos 15 5.0 0.000 0.000 69.436 69.436 qs_ks_update_qs_env 15 5.0 0.000 0.000 65.562 65.562 rebuild_ks_matrix 15 6.0 0.000 0.000 65.333 65.333 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 65.333 65.333 eigensolver 15 6.0 0.002 0.002 60.320 60.320 qs_vxc_create 15 8.0 0.054 0.054 46.697 46.697 calculate_dispersion_nonloc 15 9.0 9.488 9.488 40.669 40.669 cp_fm_diag_elpa 15 7.0 0.000 0.000 37.380 37.380 cp_fm_diag_elpa_base 15 8.0 34.547 34.547 37.379 37.379 pw_transfer 1191 10.0 0.101 0.101 32.583 32.583 fft_wrap_pw1pw2 1086 11.0 0.016 0.016 32.289 32.289 qs_rho_update_rho_low 16 5.0 0.000 0.000 26.555 26.555 calculate_rho_elec 16 6.0 0.230 0.230 26.555 26.555 grid_collocate_task_list 16 7.0 24.740 24.740 24.740 24.740 fft_wrap_pw1pw2_150 765 12.0 4.523 4.523 23.767 23.767 cp_fm_cholesky_restore 45 7.0 20.453 20.453 20.453 20.453 sum_up_and_integrate 15 8.0 0.053 0.053 17.039 17.039 integrate_v_rspace 15 9.0 0.024 0.024 16.985 16.985 fft3d_s 1087 13.0 16.245 16.245 16.257 16.257 grid_integrate_task_list 15 10.0 16.076 16.076 16.076 16.076 fft_wrap_pw1pw2_200 197 12.3 0.870 0.870 8.181 8.181 pw_scatter_s 585 13.1 8.111 8.111 8.111 8.111 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.320 6.320 dbcsr_complete_redistribute 46 8.3 2.613 2.613 6.212 6.212 xc_vxc_pw_create 15 9.0 0.239 0.239 5.975 5.975 cp_fm_upper_to_full 30 8.0 5.317 5.317 5.317 5.317 vdW_energy 15 10.0 5.221 5.221 5.221 5.221 gspace_mixing 14 5.0 0.175 0.175 4.650 4.650 init_scf_run 1 3.0 0.000 0.000 4.585 4.585 xc_pw_derive 90 11.0 0.001 0.001 3.990 3.990 broyden_mixing 14 6.0 3.913 3.913 3.914 3.914 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 3.677 3.677 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.037 84.495 84.506 qs_energies 1 2.0 0.000 0.000 84.141 84.148 scf_env_do_scf 1 3.0 0.000 0.001 79.310 79.310 scf_env_do_scf_inner_loop 15 4.0 0.001 0.005 79.310 79.310 qs_ks_update_qs_env 15 5.0 0.000 0.000 31.760 31.781 rebuild_ks_matrix 15 6.0 0.000 0.000 31.712 31.733 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.004 31.712 31.733 qs_rho_update_rho_low 16 5.0 0.000 0.000 24.899 24.906 calculate_rho_elec 16 6.0 0.007 0.014 24.899 24.906 qs_scf_new_mos 15 5.0 0.001 0.001 23.378 23.480 grid_collocate_task_list 16 7.0 22.571 22.985 22.571 22.985 eigensolver 15 6.0 0.002 0.003 21.918 21.942 sum_up_and_integrate 15 8.0 0.009 0.012 16.800 16.857 integrate_v_rspace 15 9.0 0.001 0.001 16.791 16.849 cp_fm_diag_elpa 15 7.0 0.000 0.000 15.483 15.503 cp_fm_diag_elpa_base 15 8.0 15.194 15.235 15.468 15.478 grid_integrate_task_list 15 10.0 15.125 15.421 15.125 15.421 qs_vxc_create 15 8.0 0.001 0.001 14.515 14.530 calculate_dispersion_nonloc 15 9.0 1.114 1.711 12.085 12.102 pw_transfer 1191 10.0 0.111 0.122 11.800 11.923 fft_wrap_pw1pw2 1086 11.0 0.017 0.019 11.581 11.703 fft3d_ps 1086 13.0 3.380 3.530 9.358 9.560 fft_wrap_pw1pw2_150 765 12.0 0.408 0.430 8.221 8.298 cp_fm_cholesky_restore 45 7.0 6.207 6.277 6.207 6.277 mp_alltoall_z22v 1086 15.0 5.237 5.542 5.237 5.542 yz_to_x 501 13.9 0.261 0.304 3.565 3.833 fft_wrap_pw1pw2_200 197 12.3 0.246 0.281 3.114 3.184 qs_energies_init_hamiltonians 1 3.0 0.000 0.001 2.826 2.827 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.466 2.678 density_rs2pw 16 7.0 0.001 0.001 2.135 2.619 rs_pw_transfer 158 9.4 0.002 0.003 2.153 2.613 x_to_yz 585 14.1 0.443 0.461 2.377 2.478 xc_vxc_pw_create 15 9.0 0.019 0.024 2.430 2.453 mp_waitany 520 11.3 1.473 1.989 1.473 1.989 xc_pw_derive 90 11.0 0.001 0.002 1.678 1.763 init_scf_run 1 3.0 0.000 0.001 1.731 1.734 vdW_energy 15 10.0 1.648 1.723 1.648 1.723 rs_pw_transfer_RS2PW_200 18 8.8 0.051 0.063 1.085 1.722 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=63.02199999999999, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=34.547, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=24.74, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=20.453, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=16.245, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=16.076, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=20.161, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=15.194, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=22.571, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=6.207, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=15.125, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=5.237, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.100 0.100 352.010 352.010 qs_energies 1 2.0 0.000 0.000 351.785 351.785 ls_scf 1 3.0 0.000 0.000 350.299 350.299 ls_scf_main 1 4.0 0.003 0.003 335.645 335.645 density_matrix_trs4 11 5.0 0.019 0.019 202.157 202.157 ls_scf_dm_to_ks 11 5.0 0.000 0.000 127.515 127.515 matrix_ls_to_qs 11 6.0 0.000 0.000 123.850 123.850 dbcsr_multiply_generic 185 6.1 0.965 0.965 101.161 101.161 arnoldi_extremal 12 6.1 0.000 0.000 85.672 85.672 arnoldi_normal_ev 12 7.1 0.020 0.020 85.672 85.672 build_subspace 23 8.1 0.100 0.100 84.092 84.092 dbcsr_matrix_vector_mult 652 9.0 0.222 0.222 83.487 83.487 dbcsr_matrix_vector_mult_local 652 10.0 81.556 81.556 81.567 81.567 dbcsr_copy_into_existing 11 7.0 80.569 80.569 80.570 80.570 multiply_cannon 185 7.1 0.369 0.369 64.189 64.189 multiply_cannon_loop 185 8.1 0.374 0.374 50.216 50.216 dbcsr_complete_redistribute 23 7.5 34.070 34.070 47.430 47.430 matrix_decluster 11 7.0 0.000 0.000 43.279 43.279 multiply_cannon_multrec 185 9.1 33.294 33.294 33.540 33.540 make_m2s 370 7.1 0.043 0.043 30.927 30.927 make_images 370 8.1 11.658 11.658 28.458 28.458 dbcsr_finalize 646 7.5 0.308 0.308 18.675 18.675 dbcsr_merge_all 597 8.5 3.369 3.369 17.017 17.017 calculate_norms 370 9.1 16.302 16.302 16.302 16.302 ls_scf_init_scf 1 4.0 0.000 0.000 13.723 13.723 setup_rec_index_2d 370 8.1 13.483 13.483 13.483 13.483 ls_scf_init_matrix_S 1 5.0 0.000 0.000 13.359 13.359 dbcsr_sort_indices 1103 9.9 12.997 12.997 12.997 12.997 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 12.578 12.578 tree_to_linear_d 110 9.4 12.054 12.054 12.054 12.054 quick_finalize 395 10.0 0.829 0.829 11.583 11.583 dbcsr_special_finalize 370 9.1 0.004 0.004 10.710 10.710 dbcsr_new_transposed 2 7.0 0.082 0.082 8.033 8.033 dbcsr_redistribute 2 8.0 7.855 7.855 7.916 7.916 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.027 99.933 99.943 qs_energies 1 2.0 0.000 0.000 99.818 99.818 ls_scf 1 3.0 0.000 0.000 99.742 99.743 ls_scf_main 1 4.0 0.001 0.008 95.766 95.766 density_matrix_trs4 11 5.0 0.009 0.026 92.155 92.281 dbcsr_multiply_generic 185 6.1 0.087 0.109 85.800 86.177 multiply_cannon 185 7.1 0.050 0.059 71.667 73.187 multiply_cannon_loop 185 8.1 0.200 0.231 68.383 69.772 multiply_cannon_multrec 1480 9.1 40.370 43.291 40.742 43.703 mp_waitall_1 11936 10.3 24.336 28.109 24.336 28.109 multiply_cannon_metrocomm3 1480 9.1 0.023 0.029 14.479 19.954 calculate_norms 2960 9.1 7.361 10.224 7.361 10.224 make_m2s 370 7.1 0.042 0.049 9.803 9.934 make_images 370 8.1 0.720 0.802 9.651 9.785 multiply_cannon_metrocomm1 1480 9.1 0.013 0.016 5.508 8.115 arnoldi_extremal 12 6.1 0.000 0.000 5.391 5.422 arnoldi_normal_ev 12 7.1 0.002 0.005 5.391 5.422 build_subspace 23 8.1 0.032 0.050 5.221 5.225 make_images_data 370 9.1 0.014 0.017 4.714 5.108 mp_sum_l 1119 5.6 3.208 5.029 3.208 5.029 dbcsr_matrix_vector_mult 652 9.0 0.016 0.067 4.256 4.378 hybrid_alltoall_any 393 9.9 0.279 1.746 4.030 4.237 dbcsr_multiply_generic_mpsum_f 137 7.1 0.001 0.001 1.846 3.437 dbcsr_matrix_vector_mult_local 652 10.0 2.806 3.396 2.811 3.400 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.123 3.248 ls_scf_init_scf 1 4.0 0.000 0.000 3.063 3.065 ls_scf_init_matrix_S 1 5.0 0.000 0.000 3.014 3.020 dbcsr_complete_redistribute 23 7.5 1.641 1.873 2.727 2.908 matrix_ls_to_qs 11 6.0 0.000 0.000 2.598 2.783 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.012 2.765 2.770 matrix_decluster 11 7.0 0.000 0.000 2.432 2.616 make_images_pack 370 9.1 2.156 2.367 2.162 2.375 mp_sum_dv 2907 10.4 1.725 2.261 1.725 2.261 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=106.21900000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=81.556, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=80.569, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=34.07, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=33.294, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=16.302, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=20.211000000000013, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.806, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.641, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=40.37, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=7.361, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=3.208, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=24.336, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.022 0.022 98.800 98.800 lib_test 1 2.0 0.000 0.000 98.675 98.675 dbcsr_run_tests 3 3.0 0.064 0.064 98.675 98.675 test_multiplies_multiproc 3 4.0 0.002 0.002 82.112 82.112 dbcsr_redistribute 9 5.0 56.426 56.426 58.178 58.178 dbcsr_multiply_generic 9 5.0 0.001 0.001 22.047 22.047 multiply_cannon 9 6.0 0.003 0.003 16.436 16.436 dbcsr_make_random_matrix 9 4.0 13.182 13.182 16.391 16.391 multiply_cannon_loop 9 7.0 0.029 0.029 16.034 16.034 multiply_cannon_multrec 9 8.0 16.004 16.004 16.005 16.005 dbcsr_finalize 27 5.7 0.028 0.028 6.224 6.224 dbcsr_merge_all 18 6.5 2.405 2.405 5.436 5.436 dbcsr_data_release 975 7.6 2.826 2.826 2.826 2.826 tree_to_linear_d 9 7.0 2.091 2.091 2.091 2.091 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.017 25.043 25.049 lib_test 1 2.0 0.000 0.000 24.988 25.013 dbcsr_run_tests 3 3.0 0.001 0.001 24.987 25.013 test_multiplies_multiproc 3 4.0 0.001 0.003 24.010 24.115 dbcsr_multiply_generic 9 5.0 0.001 0.002 21.762 21.851 multiply_cannon 9 6.0 0.002 0.003 19.230 19.804 multiply_cannon_loop 9 7.0 0.003 0.004 18.859 19.475 multiply_cannon_multrec 72 8.0 15.553 16.326 15.554 16.327 mp_waitall_1 576 9.2 3.752 5.094 3.752 5.094 multiply_cannon_metrocomm1 72 8.0 0.002 0.002 2.876 4.399 mp_sum_l 390 2.5 0.738 1.426 0.738 1.426 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.718 1.404 multiply_cannon_metrocomm3 72 8.0 0.000 0.001 0.420 1.394 dbcsr_make_random_matrix 9 4.0 0.742 1.023 0.929 1.143 dbcsr_finalize 27 5.7 0.001 0.001 0.927 1.038 dbcsr_data_release 444 7.6 0.888 1.023 0.888 1.023 make_m2s 18 6.0 0.001 0.001 0.923 0.967 make_images 18 7.0 0.024 0.030 0.919 0.963 dbcsr_destroy 111 5.9 0.001 0.001 0.708 0.818 dbcsr_merge_all 18 6.5 0.137 0.175 0.745 0.813 dbcsr_checksum 6 5.0 0.185 0.601 0.612 0.614 make_images_data 18 8.0 0.001 0.001 0.505 0.589 dbcsr_redistribute 9 5.0 0.293 0.332 0.521 0.551 hybrid_alltoall_any 18 9.0 0.043 0.197 0.457 0.539 dbcsr_data_copy_aa2 18 7.5 0.457 0.526 0.457 0.526 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=7.956999999999994, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=56.426, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=16.004, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=13.182, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.826, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=2.405, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=2.9399999999999977, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.293, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=15.553, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.742, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.888, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.137, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.738, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=3.752, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.057 0.057 152.289 152.289 qs_mol_dyn_low 1 2.0 0.004 0.004 150.730 150.730 velocity_verlet 5 3.0 0.004 0.004 123.351 123.351 qmmm_el_coupling 6 3.8 0.000 0.000 92.653 92.653 qmmm_elec_with_gaussian 6 4.8 0.111 0.111 92.639 92.639 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 91.714 91.714 qmmm_elec_gaussian_low_G 6 6.8 90.678 90.678 90.678 90.678 qs_forces 6 3.8 0.001 0.001 47.843 47.843 qs_energies 6 4.8 0.000 0.000 42.579 42.579 scf_env_do_scf 6 5.8 0.001 0.001 39.669 39.669 scf_env_do_scf_inner_loop 39 6.8 0.006 0.006 34.256 34.256 rebuild_ks_matrix 45 8.4 0.000 0.000 32.848 32.848 qs_ks_build_kohn_sham_matrix 45 9.4 0.008 0.008 32.848 32.848 qs_ks_update_qs_env 45 7.8 0.000 0.000 28.101 28.101 pw_transfer 966 12.3 0.070 0.070 24.647 24.647 fft_wrap_pw1pw2 801 13.6 0.009 0.009 24.309 24.309 fft_wrap_pw1pw2_150 507 15.2 2.932 2.932 23.664 23.664 qs_vxc_create 45 10.4 0.001 0.001 18.648 18.648 xc_vxc_pw_create 45 11.4 0.728 0.728 18.647 18.647 xc_pw_derive 270 13.4 0.003 0.003 13.417 13.417 fft3d_s 802 15.6 12.432 12.432 12.442 12.442 qs_rho_update_rho_low 45 7.9 0.000 0.000 9.639 9.639 calculate_rho_elec 45 8.9 0.570 0.570 9.639 9.639 xc_rho_set_and_dset_create 45 12.4 0.872 0.872 9.111 9.111 xc_pw_divergence 45 12.4 0.002 0.002 8.717 8.717 qmmm_forces 6 3.8 0.002 0.002 6.409 6.409 pw_scatter_s 429 15.8 6.402 6.402 6.402 6.402 qmmm_forces_with_gaussian 6 4.8 0.132 0.132 5.994 5.994 init_scf_loop 6 6.8 0.000 0.000 5.407 5.407 qs_ks_ddapc 45 10.4 0.001 0.001 5.161 5.161 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 4.890 4.890 pw_integral_ab 2539 7.4 4.877 4.877 4.877 4.877 density_rs2pw 45 9.9 0.002 0.002 4.808 4.808 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.757 4.757 sum_up_and_integrate 45 10.4 0.179 0.179 4.337 4.337 grid_collocate_task_list 45 9.9 4.261 4.261 4.261 4.261 integrate_v_rspace 45 11.4 0.011 0.011 4.158 4.158 qmmm_forces_gaussian_low_G 6 6.8 4.068 4.068 4.068 4.068 pw_poisson_solve 51 9.9 1.431 1.431 3.414 3.414 fist_calc_energy_force 6 3.8 0.001 0.001 3.212 3.212 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.040 0.065 77.212 77.222 qs_mol_dyn_low 1 2.0 0.004 0.011 75.771 75.831 qs_forces 6 3.8 0.001 0.001 56.011 56.011 qs_energies 6 4.8 0.000 0.001 53.510 53.510 scf_env_do_scf 6 5.8 0.000 0.002 52.134 52.136 scf_env_do_scf_inner_loop 113 6.2 0.004 0.025 50.040 50.041 rebuild_ks_matrix 119 8.1 0.000 0.001 36.022 36.036 qs_ks_build_kohn_sham_matrix 119 9.1 0.022 0.027 36.022 36.036 qs_ks_update_qs_env 119 7.3 0.001 0.001 33.887 33.900 velocity_verlet 5 3.0 0.002 0.006 31.504 31.508 pw_transfer 2446 12.3 0.224 0.246 23.698 24.180 fft_wrap_pw1pw2 2059 13.4 0.030 0.033 23.173 23.671 fft_wrap_pw1pw2_150 1321 14.9 1.708 1.970 21.712 22.245 fft3d_ps 2059 15.4 8.536 9.427 17.807 18.472 qs_vxc_create 119 10.1 0.004 0.005 18.068 18.085 xc_vxc_pw_create 119 11.1 0.192 0.240 18.064 18.081 qs_rho_update_rho_low 119 7.3 0.001 0.001 14.883 14.887 calculate_rho_elec 119 8.3 0.052 0.059 14.883 14.886 xc_pw_derive 714 13.1 0.010 0.012 13.634 13.985 sum_up_and_integrate 119 10.1 0.071 0.093 13.082 13.268 integrate_v_rspace 119 11.1 0.005 0.006 13.010 13.203 rs_pw_transfer 988 11.5 0.019 0.022 11.051 11.537 qmmm_forces 6 3.8 0.002 0.003 9.989 9.989 qmmm_forces_with_gaussian 6 4.8 0.470 0.531 9.206 9.792 density_rs2pw 119 9.3 0.007 0.008 9.366 9.777 xc_pw_divergence 119 12.1 0.005 0.006 9.217 9.490 xc_rho_set_and_dset_create 119 12.1 0.435 0.716 8.297 9.098 qmmm_el_coupling 6 3.8 0.000 0.000 8.623 8.759 qmmm_elec_with_gaussian 6 4.8 0.421 0.465 8.621 8.757 potential_pw2rs 119 12.1 0.007 0.009 8.512 8.537 mp_alltoall_z22v 2059 17.4 7.494 8.300 7.494 8.300 grid_collocate_task_list 119 9.3 5.246 5.808 5.246 5.808 x_to_yz 1095 16.8 1.038 1.165 4.851 5.228 yz_to_x 964 16.0 0.672 0.904 4.352 4.963 mp_waitany 4028 12.8 4.174 4.732 4.174 4.732 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 4.380 4.711 grid_integrate_task_list 119 12.1 4.001 4.552 4.001 4.552 rs_pw_transfer_PW2RS_150 125 13.9 1.573 1.724 4.135 4.227 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.792 4.098 rs_pw_transfer_RS2PW_150 125 11.2 1.169 1.268 3.480 3.952 qmmm_forces_gaussian_low_G 6 6.8 3.589 3.914 3.589 3.914 qs_scf_new_mos 113 7.2 0.001 0.001 3.777 3.789 qs_scf_loop_do_ot 113 8.2 0.001 0.001 3.776 3.789 dbcsr_multiply_generic 2588 12.3 0.076 0.084 3.563 3.682 pw_restrict_s3 18 5.8 1.712 1.952 3.408 3.670 ot_scf_mini 113 9.2 0.002 0.002 3.585 3.601 qmmm_elec_gaussian_low_G 6 6.8 3.130 3.436 3.130 3.436 mp_waitall_1 188862 16.2 3.234 3.351 3.234 3.351 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.000 2.810 3.020 pw_prolongate_s3 18 6.8 1.425 1.597 2.810 3.020 qs_ks_ddapc 119 10.1 0.003 0.003 2.578 2.667 mp_sum_dm3 33 5.7 2.435 2.569 2.435 2.569 ot_mini 113 10.2 0.001 0.001 2.217 2.231 pw_integral_ab 2761 7.7 1.515 1.668 2.138 2.229 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 2.158 2.159 init_scf_loop 6 6.8 0.000 0.001 2.090 2.092 pw_gather_p 964 15.0 1.802 2.045 1.802 2.045 mp_sum_d 5820 12.2 1.674 1.966 1.674 1.966 pw_scatter_p 1095 15.8 1.791 1.839 1.791 1.839 qs_ot_get_derivative 113 11.2 0.001 0.001 1.753 1.767 rs_pw_transfer_PW2RS_40 119 14.1 0.282 0.293 1.580 1.639 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=33.638999999999996, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=90.678, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=12.432, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=6.402, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=4.877, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=4.261, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=43.116, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=3.13, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=1.515, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=5.246, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=8.536, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=4.001, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_waitany", label="mp_waitany", y=4.174, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=7.494, yerr=0.0 Summary: Performance test took 46 minutes. Status: OK Removing intermediate container 707f76d83c11 ---> d91d2b5bf8d7 Step 41/42 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in a4182ac05736 Removing intermediate container a4182ac05736 ---> d360791baacc Step 42/42 : ENTRYPOINT [] ---> Running in 4912ecaef707 Removing intermediate container 4912ecaef707 ---> 1c20741a62cf [Warning] One or more build-args [GIT_COMMIT_SHA] were not consumed Successfully built 1c20741a62cf Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-14b:master Pushing new image... done. #################### Running Image cp2k-perf-openmp #################### Uploading artifacts... done EndDate: 2022-08-08 20:32:58+00:00