StartDate: 2022-10-27 11:05:49+00:00 CpuId: 32x AMD EPYC (3rd Gen) (Milan) [Zen 3], 7nm (SMT disabled) CommitSHA: 007813510527d819d1aa75b16315fe3ed2c20304 CommitTime: 2022-10-27 10:37:50 +0200 CommitAuthor: Frederick Stein CommitSubject: Fix and improve RPA and SOS-MP2 gradients (#2375) Populating docker build cache... done. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Path: / Build-Args: GIT_COMMIT_SHA=007813510527d819d1aa75b16315fe3ed2c20304 Sending build context to Docker daemon 365.4MB Step 1/42 : FROM ubuntu:22.04 22.04: Pulling from library/ubuntu 301a8b74f71f: Already exists Digest: sha256:7cfe75438fc77c9d7235ae502bf229b15ca86647ac01c844b272b56326d56184 Status: Downloaded newer image for ubuntu:22.04 ---> cdb68b455a14 Step 2/42 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> e36f08f73260 Step 3/42 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 682c309d9640 Step 4/42 : RUN ./install_requirements.sh ubuntu:22.04 ---> Using cache ---> 40a433e6df84 Step 5/42 : RUN mkdir scripts ---> Using cache ---> 99ad9eee5fce Step 6/42 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> e9da3ca0f434 Step 7/42 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> a1b4d22eda67 Step 8/42 : RUN ./install_cp2k_toolchain.sh --install-all --mpi-mode=mpich --target-cpu=native --with-gcc=system --dry-run ---> Using cache ---> 99d8c439deed Step 9/42 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 396b14ea4f0a Step 10/42 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 8c8bdd8cd062 Step 11/42 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 318710a932c5 Step 12/42 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> 0a03f027fa49 Step 13/42 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 396a23ac81a6 Step 14/42 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 9fceba6a66c4 Step 15/42 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 48c084976c13 Step 16/42 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 0434b1b150d8 Step 17/42 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> b36ae8885714 Step 18/42 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Running in 14fed55ffd62 ==================== Installing Libxsmm ==================== libxsmm-1.17.tar.gz: OK Checksum of libxsmm-1.17.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxsmm-1.17 Step libxsmm took 10.00 seconds. ==================== Installing ScaLAPACK ==================== scalapack-2.1.0.tgz: OK Checksum of scalapack-2.1.0.tgz Ok Installing from scratch into /opt/cp2k-toolchain/install/scalapack-2.1.0 Step scalapack took 24.00 seconds. ==================== Installing COSMA ==================== COSMA-v2.6.2.tar.gz: OK Checksum of COSMA-v2.6.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/COSMA-2.6.2 Step cosma took 14.00 seconds. Removing intermediate container 14fed55ffd62 ---> 56438aac163c Step 19/42 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> 7c77057f5511 Step 20/42 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in 661b2837bd11 ==================== Installing ELPA ==================== elpa-2022.05.001.tar.gz: OK Checksum of elpa-2022.05.001.tar.gz Ok patching file nvcc_wrap Installing from scratch into /opt/cp2k-toolchain/install/elpa-2022.05.001/cpu Step elpa took 143.00 seconds. ==================== Installing PT-Scotch ==================== scotch_6.0.0.tar.gz: OK Checksum of scotch_6.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scotch-6.0.0 Step ptscotch took 5.00 seconds. ==================== Installing SuperLU_DIST ==================== superlu_dist_6.1.0.tar.gz: OK Checksum of superlu_dist_6.1.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/superlu_dist-6.1.0 Step superlu took 7.00 seconds. ==================== Installing PEXSI ==================== pexsi_v1.2.0.tar.gz: OK Checksum of pexsi_v1.2.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pexsi-1.2.0 Step pexsi took 54.00 seconds. Removing intermediate container 661b2837bd11 ---> 2caf5ab180d0 Step 21/42 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> 72903d1e8a74 Step 22/42 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in b68be6d58630 ==================== Installing QUIP ==================== QUIP-0.9.10.tar.gz: OK Checksum of QUIP-0.9.10.tar.gz Ok fox-b5b69ef9a46837bd944ba5c9bc1cf9d00a6198a7.tar.gz: OK Checksum of fox-b5b69ef9a46837bd944ba5c9bc1cf9d00a6198a7.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/quip-0.9.10 Step quip took 199.00 seconds. ==================== Installing gsl ==================== gsl-2.7.tar.gz: OK Checksum of gsl-2.7.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.7 Step gsl took 40.00 seconds. ==================== Installing PLUMED ==================== plumed-src-2.8.0.tgz: OK Checksum of plumed-src-2.8.0.tgz Ok Installing from scratch into /opt/cp2k-toolchain/install/plumed-2.8.0 Step plumed took 47.00 seconds. Removing intermediate container b68be6d58630 ---> 9af1b3dfa6df Step 23/42 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> 984e3bb3f079 Step 24/42 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in a837956df2b5 ==================== Installing hdf5 ==================== hdf5-1.12.0.tar.bz2: OK Checksum of hdf5-1.12.0.tar.bz2 Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-1.12.0 Step hdf5 took 84.00 seconds. ==================== Installing libvdwxc ==================== libvdwxc-0.4.0.tar.gz: OK Checksum of libvdwxc-0.4.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.4.0 Step libvdwxc took 31.00 seconds. ==================== Installing spglib ==================== spglib-1.16.2.tar.gz: OK Checksum of spglib-1.16.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-1.16.2 Step spglib took 6.00 seconds. ==================== Installing libvori ==================== libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 15.00 seconds. Removing intermediate container a837956df2b5 ---> 3c8c2a26fa78 Step 25/42 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> 736b6281714d Step 26/42 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in 86193ee09873 ==================== Installing spfft ==================== SpFFT-1.0.6.tar.gz: OK Checksum of SpFFT-1.0.6.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.0.6 Step spfft took 5.00 seconds. ==================== Installing spla ==================== SpLA-1.5.4.tar.gz: OK Checksum of SpLA-1.5.4.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.5.4 Step spla took 6.00 seconds. ==================== Installing SIRIUS ==================== SIRIUS-7.3.2.tar.gz: OK Checksum of SIRIUS-7.3.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.3.2 Step sirius took 75.00 seconds. Removing intermediate container 86193ee09873 ---> 190ef110007d Step 27/42 : COPY ./tools/toolchain/scripts/arch_base.tmpl ./tools/toolchain/scripts/generate_arch_files.sh ./scripts/ ---> 06e823f1a2a2 Step 28/42 : RUN ./scripts/generate_arch_files.sh && rm -rf ./build ---> Running in 038e2daca618 ==================== generating arch files ==================== arch files can be found in the /opt/cp2k-toolchain/install/arch subdirectory Wrote /opt/cp2k-toolchain/install/arch/local.ssmp Wrote /opt/cp2k-toolchain/install/arch/local_static.ssmp Wrote /opt/cp2k-toolchain/install/arch/local.sdbg Wrote /opt/cp2k-toolchain/install/arch/local_asan.ssmp Wrote /opt/cp2k-toolchain/install/arch/local_coverage.sdbg Wrote /opt/cp2k-toolchain/install/arch/local.psmp Wrote /opt/cp2k-toolchain/install/arch/local.pdbg Wrote /opt/cp2k-toolchain/install/arch/local_asan.psmp Wrote /opt/cp2k-toolchain/install/arch/local_static.psmp Wrote /opt/cp2k-toolchain/install/arch/local_warn.psmp Wrote /opt/cp2k-toolchain/install/arch/local_coverage.pdbg ========================== usage ========================= Done! Now copy: cp /opt/cp2k-toolchain/install/arch/* to the cp2k/arch/ directory To use the installed tools and libraries and cp2k version compiled with it you will first need to execute at the prompt: source /opt/cp2k-toolchain/install/setup To build CP2K you should change directory: cd cp2k/ make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg" arch files for GPU enabled CUDA versions are named "local_cuda.*" arch files for GPU enabled HIP versions are named "local_hip.*" arch files for OpenCL (GPU) versions are named "local_opencl.*" arch files for coverage versions are named "local_coverage.*" Note that these pre-built arch files are for the GNU compiler, users have to adapt them for other compilers. It is possible to use the provided CP2K arch files as guidance. Removing intermediate container 038e2daca618 ---> d6b51e99b0cf Step 29/42 : WORKDIR /opt/cp2k ---> Running in 8db52d11a0f6 Removing intermediate container 8db52d11a0f6 ---> 4ba5ead164b4 Step 30/42 : COPY ./Makefile . ---> 4ef934dd255e Step 31/42 : COPY ./src ./src ---> 8ffe5041da56 Step 32/42 : COPY ./exts ./exts ---> 2c5c2008e5e4 Step 33/42 : COPY ./tools/build_utils ./tools/build_utils ---> 1a0ca20432c1 Step 34/42 : RUN /bin/bash -c " mkdir -p arch && ln -vs /opt/cp2k-toolchain/install/arch/local.psmp ./arch/ && echo 'Compiling cp2k...' && source /opt/cp2k-toolchain/install/setup && ( make -j ARCH=local VERSION=psmp &> /dev/null || true ) && ( [ ! -f ./exe/local/cp2k.psmp ] || ldd ./exe/local/cp2k.psmp | grep -q libmpi )" ---> Running in 0e52e4117b2b './arch/local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' Compiling cp2k... Removing intermediate container 0e52e4117b2b ---> e13a231ed6da Step 35/42 : COPY ./data ./data ---> 6f08d820e415 Step 36/42 : COPY ./tests ./tests ---> ddfc34853bbf Step 37/42 : COPY ./tools/regtesting ./tools/regtesting ---> 3778e6439269 Step 38/42 : COPY ./benchmarks ./benchmarks ---> f1c38f848af2 Step 39/42 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 51928184406c Step 40/42 : RUN ./test_performance.sh "local" 2>&1 | tee report.log ---> Running in 784d4539344d ========== Compiling CP2K ========== Compiling cp2k... done. Checking benchmark inputs... Found 60 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.029 88.154 88.154 qs_mol_dyn_low 1 2.0 0.003 0.003 87.539 87.539 qs_forces 11 3.9 0.001 0.001 87.499 87.499 qs_energies 11 4.9 0.001 0.001 81.503 81.503 scf_env_do_scf 11 5.9 0.001 0.001 70.893 70.893 velocity_verlet 10 3.0 0.002 0.002 57.269 57.269 scf_env_do_scf_inner_loop 108 6.5 0.012 0.012 53.852 53.852 rebuild_ks_matrix 119 8.3 0.001 0.001 20.182 20.182 qs_ks_build_kohn_sham_matrix 119 9.3 0.013 0.013 20.182 20.182 qs_scf_new_mos 108 7.5 0.001 0.001 19.855 19.855 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.854 19.854 dbcsr_multiply_generic 2286 12.5 0.158 0.158 19.654 19.654 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.191 19.191 calculate_rho_elec 119 8.7 0.949 0.949 19.190 19.190 qs_ks_update_qs_env 119 7.6 0.001 0.001 18.525 18.525 ot_scf_mini 108 9.5 0.002 0.002 18.482 18.482 init_scf_loop 11 6.9 0.000 0.000 16.920 16.920 grid_collocate_task_list 119 9.7 15.022 15.022 15.022 15.022 prepare_preconditioner 11 7.9 0.000 0.000 14.502 14.502 make_preconditioner 11 8.9 0.000 0.000 14.502 14.502 make_full_inverse_cholesky 11 9.9 0.000 0.000 13.336 13.336 sum_up_and_integrate 119 10.3 0.536 0.536 13.040 13.040 integrate_v_rspace 119 11.3 0.092 0.092 12.504 12.504 ot_mini 108 10.5 0.001 0.001 11.828 11.828 make_m2s 4572 13.5 0.048 0.048 10.909 10.909 grid_integrate_task_list 119 12.3 10.554 10.554 10.554 10.554 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 6.280 6.280 qs_ot_get_derivative 108 11.5 0.001 0.001 6.260 6.260 dbcsr_make_dense_low 5837 15.5 0.068 0.068 5.976 5.976 make_dense_data 5837 16.5 5.351 5.351 5.893 5.893 pw_transfer 1439 11.6 0.068 0.068 5.807 5.807 fft_wrap_pw1pw2 1201 12.6 0.006 0.006 5.575 5.575 ot_diis_step 108 11.5 0.004 0.004 5.564 5.564 make_images 4572 14.5 2.181 2.181 5.424 5.424 dbcsr_make_images_dense 3978 14.8 0.018 0.018 5.184 5.184 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 5.038 5.038 apply_single 119 13.6 0.000 0.000 5.038 5.038 multiply_cannon 2286 13.5 0.185 0.185 4.900 4.900 fft_wrap_pw1pw2_140 487 13.2 0.443 0.443 4.763 4.763 cp_fm_cholesky_decompose 22 10.9 4.710 4.710 4.710 4.710 multiply_cannon_loop 2286 14.5 0.054 0.054 4.423 4.423 cp_fm_cholesky_invert 11 10.9 4.386 4.386 4.386 4.386 multiply_cannon_multrec 2286 15.5 4.310 4.310 4.368 4.368 init_scf_run 11 5.9 0.002 0.002 3.741 3.741 scf_env_initial_rho_setup 11 6.9 0.001 0.001 3.739 3.739 dbcsr_complete_redistribute 329 12.2 1.926 1.926 3.706 3.706 dbcsr_copy 2102 12.0 0.227 0.227 3.683 3.683 dbcsr_copy_into_existing 22 7.9 3.422 3.422 3.422 3.422 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 3.373 3.373 wfi_extrapolate 11 7.9 0.001 0.001 3.272 3.272 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.265 3.265 qs_ot_get_p 119 10.4 0.001 0.001 3.225 3.225 density_rs2pw 119 9.7 0.004 0.004 3.219 3.219 copy_dbcsr_to_fm 153 11.3 0.002 0.002 3.058 3.058 qs_create_task_list 11 7.9 0.000 0.000 2.888 2.888 generate_qs_task_list 11 8.9 1.967 1.967 2.888 2.888 fft3d_s 1202 14.6 2.804 2.804 2.809 2.809 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 2.620 2.620 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 2.602 2.602 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 2.393 2.393 dbcsr_data_release 279534 16.0 2.077 2.077 2.077 2.077 pw_poisson_solve 119 10.3 0.352 0.352 2.037 2.037 qs_ot_get_derivative_diag 49 12.0 0.001 0.001 1.979 1.979 qs_ot_get_derivative_taylor 59 13.0 0.001 0.001 1.910 1.910 qs_ot_p2m_diag 50 11.0 0.155 0.155 1.876 1.876 potential_pw2rs 119 12.3 0.047 0.047 1.858 1.858 copy_fm_to_dbcsr 176 11.2 0.001 0.001 1.829 1.829 cp_fm_upper_to_full 72 14.2 1.777 1.777 1.777 1.777 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.022 43.799 43.808 qs_mol_dyn_low 1 2.0 0.003 0.004 43.694 43.698 qs_forces 11 3.9 0.001 0.001 43.654 43.654 qs_energies 11 4.9 0.001 0.001 40.828 40.830 scf_env_do_scf 11 5.9 0.001 0.002 37.445 37.446 scf_env_do_scf_inner_loop 108 6.5 0.003 0.019 34.435 34.435 velocity_verlet 10 3.0 0.001 0.003 26.286 26.287 rebuild_ks_matrix 119 8.3 0.001 0.001 15.746 15.846 qs_ks_build_kohn_sham_matrix 119 9.3 0.015 0.016 15.745 15.845 qs_ks_update_qs_env 119 7.6 0.001 0.001 14.018 14.112 dbcsr_multiply_generic 2286 12.5 0.070 0.073 13.083 13.254 qs_rho_update_rho_low 119 7.7 0.001 0.001 12.763 12.776 calculate_rho_elec 119 8.7 0.030 0.034 12.763 12.775 sum_up_and_integrate 119 10.3 0.029 0.033 11.633 11.657 integrate_v_rspace 119 11.3 0.004 0.006 11.603 11.631 qs_scf_new_mos 108 7.5 0.001 0.001 10.254 10.340 qs_scf_loop_do_ot 108 8.5 0.001 0.001 10.254 10.339 multiply_cannon 2286 13.5 0.149 0.155 9.662 9.970 ot_scf_mini 108 9.5 0.002 0.002 9.610 9.691 multiply_cannon_loop 2286 14.5 0.089 0.099 9.141 9.510 grid_collocate_task_list 119 9.7 8.895 9.173 8.895 9.173 mp_waitall_1 169478 16.3 8.187 8.912 8.187 8.912 grid_integrate_task_list 119 12.3 7.952 8.135 7.952 8.135 multiply_cannon_metrocomm3 18288 15.5 0.036 0.039 5.343 5.978 ot_mini 108 10.5 0.001 0.001 5.549 5.640 rs_pw_transfer 974 11.9 0.011 0.012 4.267 4.600 density_rs2pw 119 9.7 0.005 0.006 3.495 3.833 pw_transfer 1439 11.6 0.094 0.104 3.300 3.368 potential_pw2rs 119 12.3 0.007 0.007 3.206 3.232 fft_wrap_pw1pw2 1201 12.6 0.009 0.010 3.136 3.194 multiply_cannon_multrec 18288 15.5 2.949 3.132 2.959 3.141 init_scf_loop 11 6.9 0.000 0.000 2.997 2.997 qs_ot_get_derivative 108 11.5 0.001 0.001 2.826 2.908 fft_wrap_pw1pw2_140 487 13.2 0.252 0.303 2.656 2.777 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 2.649 2.741 apply_single 119 13.6 0.000 0.000 2.648 2.741 ot_diis_step 108 11.5 0.003 0.004 2.702 2.702 init_scf_run 11 5.9 0.000 0.004 2.407 2.408 scf_env_initial_rho_setup 11 6.9 0.000 0.003 2.407 2.407 make_m2s 4572 13.5 0.045 0.047 2.305 2.384 fft3d_ps 1201 14.6 1.172 1.360 2.273 2.357 wfi_extrapolate 11 7.9 0.001 0.001 2.193 2.193 make_images 4572 14.5 0.115 0.118 1.989 2.048 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 1.866 1.877 mp_waitany 9880 13.7 1.389 1.767 1.389 1.767 qs_ot_get_p 119 10.4 0.001 0.001 1.244 1.341 rs_pw_transfer_RS2PW_140 130 11.5 0.202 0.249 1.006 1.340 rs_pw_transfer_PW2RS_140 130 13.9 0.341 0.397 1.246 1.278 make_images_data 4572 15.5 0.034 0.039 1.126 1.256 prepare_preconditioner 11 7.9 0.000 0.000 1.211 1.234 make_preconditioner 11 8.9 0.000 0.000 1.211 1.234 make_full_inverse_cholesky 11 9.9 0.000 0.000 1.100 1.122 hybrid_alltoall_any 4725 16.4 0.062 0.176 0.967 1.081 mp_alltoall_z22v 1201 16.6 0.868 1.070 0.868 1.070 mp_sum_l 11218 13.2 0.684 0.950 0.684 0.950 qs_ot_get_derivative_diag 49 12.0 0.001 0.001 0.902 0.942 multiply_cannon_metrocomm1 18288 15.5 0.018 0.020 0.557 0.938 build_core_hamiltonian_matrix_ 11 4.9 0.000 0.000 0.857 0.935 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 0.928 0.930 qs_ot_get_derivative_taylor 59 13.0 0.001 0.001 0.881 0.924 mp_sum_d 4129 12.0 0.598 0.896 0.598 0.896 rs_pw_transfer_PW2RS_50 119 14.3 0.268 0.299 0.811 0.880 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=43.82099999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=15.022, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=10.554, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="make_dense_data", label="make_dense_data", y=5.351, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=4.71, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=4.386, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.31, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=14.427, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.895, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.952, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=2.949, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitany", label="mp_waitany", y=1.389, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.187, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.029 111.611 111.611 qs_mol_dyn_low 1 2.0 0.003 0.003 110.937 110.937 qs_forces 11 3.9 0.001 0.001 110.896 110.896 qs_energies 11 4.9 0.001 0.001 103.254 103.254 scf_env_do_scf 11 5.9 0.001 0.001 90.588 90.588 scf_env_do_scf_inner_loop 96 6.5 0.010 0.010 72.103 72.103 velocity_verlet 10 3.0 0.002 0.002 71.595 71.595 rebuild_ks_matrix 107 8.3 0.001 0.001 33.457 33.457 qs_ks_build_kohn_sham_matrix 107 9.3 0.012 0.012 33.456 33.456 qs_rho_update_rho_low 107 7.7 0.001 0.001 31.345 31.345 calculate_rho_elec 107 8.7 0.858 0.858 31.344 31.344 qs_ks_update_qs_env 107 7.6 0.001 0.001 30.062 30.062 grid_collocate_task_list 107 9.7 27.485 27.485 27.485 27.485 sum_up_and_integrate 107 10.3 0.474 0.474 26.717 26.717 integrate_v_rspace 107 11.3 0.084 0.084 26.242 26.242 grid_integrate_task_list 107 12.3 24.413 24.413 24.413 24.413 init_scf_loop 11 6.9 0.000 0.000 18.361 18.361 dbcsr_multiply_generic 1966 12.4 0.148 0.148 18.088 18.088 qs_scf_new_mos 96 7.5 0.001 0.001 17.499 17.499 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.498 17.498 ot_scf_mini 96 9.5 0.002 0.002 16.280 16.280 prepare_preconditioner 11 7.9 0.000 0.000 14.281 14.281 make_preconditioner 11 8.9 0.000 0.000 14.281 14.281 make_full_inverse_cholesky 11 9.9 0.000 0.000 13.128 13.128 ot_mini 96 10.5 0.001 0.001 10.553 10.553 make_m2s 3932 13.4 0.041 0.041 9.872 9.872 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 6.720 6.720 qs_ot_get_derivative 96 11.5 0.001 0.001 5.576 5.576 pw_transfer 1295 11.6 0.059 0.059 5.413 5.413 dbcsr_make_dense_low 4961 15.5 0.067 0.067 5.298 5.298 init_scf_run 11 5.9 0.002 0.002 5.284 5.284 scf_env_initial_rho_setup 11 6.9 0.001 0.001 5.282 5.282 make_dense_data 4961 16.5 4.722 4.722 5.218 5.218 fft_wrap_pw1pw2 1081 12.6 0.006 0.006 5.211 5.211 make_images 3932 14.4 1.956 1.956 4.979 4.979 ot_diis_step 96 11.5 0.003 0.003 4.975 4.975 cp_fm_cholesky_decompose 22 10.9 4.671 4.671 4.671 4.671 wfi_extrapolate 11 7.9 0.001 0.001 4.640 4.640 dbcsr_make_images_dense 3386 14.7 0.015 0.015 4.635 4.635 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 4.606 4.606 apply_single 107 13.6 0.000 0.000 4.606 4.606 fft_wrap_pw1pw2_140 439 13.2 0.516 0.516 4.467 4.467 multiply_cannon 1966 13.4 0.158 0.158 4.463 4.463 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.257 4.257 cp_fm_cholesky_invert 11 10.9 4.141 4.141 4.141 4.141 multiply_cannon_loop 1966 14.4 0.044 0.044 4.052 4.052 multiply_cannon_multrec 1966 15.4 3.963 3.963 4.007 4.007 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.756 3.756 dbcsr_complete_redistribute 317 12.2 1.907 1.907 3.729 3.729 dbcsr_copy 1855 11.9 0.224 0.224 3.619 3.619 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 3.384 3.384 qs_create_task_list 11 7.9 0.000 0.000 3.374 3.374 generate_qs_task_list 11 8.9 2.418 2.418 3.374 3.374 dbcsr_copy_into_existing 22 7.9 3.343 3.343 3.344 3.344 density_rs2pw 107 9.7 0.003 0.003 3.002 3.002 copy_dbcsr_to_fm 147 11.2 0.002 0.002 2.995 2.995 fft3d_s 1082 14.6 2.600 2.600 2.605 2.605 qs_ot_get_p 107 10.4 0.001 0.001 2.578 2.578 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 2.537 2.537 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 2.386 2.386 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.020 69.963 69.974 qs_mol_dyn_low 1 2.0 0.003 0.003 69.740 69.757 qs_forces 11 3.9 0.001 0.001 69.698 69.699 qs_energies 11 4.9 0.001 0.001 65.008 65.009 scf_env_do_scf 11 5.9 0.000 0.002 60.145 60.145 scf_env_do_scf_inner_loop 96 6.5 0.002 0.016 55.613 55.614 velocity_verlet 10 3.0 0.002 0.003 41.581 41.583 rebuild_ks_matrix 107 8.3 0.000 0.001 30.140 30.231 qs_ks_build_kohn_sham_matrix 107 9.3 0.014 0.015 30.140 30.231 qs_ks_update_qs_env 107 7.6 0.001 0.001 26.541 26.623 sum_up_and_integrate 107 10.3 0.030 0.036 26.325 26.347 integrate_v_rspace 107 11.3 0.004 0.005 26.294 26.318 qs_rho_update_rho_low 107 7.7 0.001 0.001 25.274 25.279 calculate_rho_elec 107 8.7 0.027 0.027 25.273 25.278 grid_integrate_task_list 107 12.3 22.621 23.019 22.621 23.019 grid_collocate_task_list 107 9.7 21.754 22.119 21.754 22.119 dbcsr_multiply_generic 1966 12.4 0.063 0.065 12.174 12.349 qs_scf_new_mos 96 7.5 0.001 0.001 9.429 9.523 qs_scf_loop_do_ot 96 8.5 0.001 0.001 9.429 9.522 multiply_cannon 1966 13.4 0.131 0.145 9.065 9.459 multiply_cannon_loop 1966 14.4 0.085 0.104 8.616 8.995 ot_scf_mini 96 9.5 0.002 0.002 8.867 8.954 mp_waitall_1 146670 16.2 7.740 8.307 7.740 8.307 multiply_cannon_metrocomm3 15728 15.4 0.034 0.039 5.105 5.738 ot_mini 96 10.5 0.001 0.001 5.176 5.269 rs_pw_transfer 878 11.9 0.010 0.012 4.030 4.595 init_scf_loop 11 6.9 0.000 0.000 4.518 4.518 init_scf_run 11 5.9 0.000 0.004 3.852 3.852 scf_env_initial_rho_setup 11 6.9 0.000 0.004 3.852 3.852 density_rs2pw 107 9.7 0.005 0.005 3.199 3.774 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.724 3.735 wfi_extrapolate 11 7.9 0.001 0.001 3.496 3.497 pw_transfer 1295 11.6 0.087 0.102 2.972 3.027 multiply_cannon_multrec 15728 15.4 2.723 3.005 2.732 3.015 potential_pw2rs 107 12.3 0.006 0.007 2.982 3.000 fft_wrap_pw1pw2 1081 12.6 0.008 0.009 2.822 2.878 qs_ot_get_derivative 96 11.5 0.001 0.001 2.634 2.717 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 2.486 2.593 apply_single 107 13.6 0.000 0.000 2.486 2.593 ot_diis_step 96 11.5 0.003 0.003 2.514 2.514 fft_wrap_pw1pw2_140 439 13.2 0.233 0.271 2.404 2.507 make_m2s 3932 13.4 0.040 0.047 2.099 2.142 fft3d_ps 1081 14.6 1.062 1.161 2.019 2.112 mp_waitany 8968 13.7 1.356 1.936 1.356 1.936 make_images 3932 14.4 0.103 0.109 1.816 1.862 mp_alltoall_d11v 1998 13.7 0.875 1.556 0.875 1.556 rs_pw_transfer_RS2PW_140 118 11.5 0.161 0.188 0.947 1.515 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=42.21600000000001, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=27.485, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=24.413, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="make_dense_data", label="make_dense_data", y=4.722, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=4.671, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=4.141, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.963, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=13.768999999999991, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.754, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=22.621, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=2.723, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=7.74, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=1.356, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.188 0.188 104.996 104.996 qs_energies 1 2.0 0.000 0.000 104.181 104.181 scf_env_do_scf 1 3.0 0.000 0.000 103.035 103.035 qs_ks_update_qs_env 8 5.0 0.000 0.000 97.984 97.984 rebuild_ks_matrix 7 6.0 0.000 0.000 97.929 97.929 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.001 97.929 97.929 hfx_ks_matrix 7 8.0 0.000 0.000 88.921 88.921 integrate_four_center 7 9.0 1.903 1.903 88.890 88.890 integrate_four_center_main 7 10.0 0.608 0.608 81.092 81.092 integrate_four_center_bin 446 11.0 80.485 80.485 80.485 80.485 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 56.333 56.333 init_scf_loop 1 4.0 0.000 0.000 46.691 46.691 integrate_four_center_load 7 10.0 0.000 0.000 5.651 5.651 hfx_load_balance 1 11.0 0.013 0.013 5.651 5.651 qs_vxc_create 14 8.0 0.000 0.000 3.136 3.136 xc_vxc_pw_create 14 9.0 0.116 0.116 3.136 3.136 hfx_load_balance_bin 1 12.0 2.817 2.817 2.817 2.817 hfx_load_balance_count 1 12.0 2.805 2.805 2.805 2.805 calculate_rho_elec 15 7.4 0.118 0.118 2.420 2.420 xc_rho_set_and_dset_create 14 10.0 0.111 0.111 2.405 2.405 prepare_preconditioner 1 5.0 0.000 0.000 2.357 2.357 make_preconditioner 1 6.0 0.000 0.000 2.357 2.357 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 2.195 2.195 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.215 0.235 95.178 95.189 qs_energies 1 2.0 0.000 0.000 94.840 94.847 scf_env_do_scf 1 3.0 0.000 0.000 94.490 94.490 qs_ks_update_qs_env 8 5.0 0.000 0.000 92.553 92.553 rebuild_ks_matrix 7 6.0 0.000 0.000 92.545 92.545 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.001 92.545 92.545 hfx_ks_matrix 7 8.0 0.000 0.000 87.337 87.338 integrate_four_center 7 9.0 0.052 0.333 87.328 87.329 integrate_four_center_main 7 10.0 0.003 0.003 78.637 80.005 integrate_four_center_bin 448 11.0 78.634 80.002 78.634 80.002 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 53.067 53.067 init_scf_loop 1 4.0 0.000 0.000 41.422 41.426 integrate_four_center_load 7 10.0 0.000 0.000 5.708 5.708 hfx_load_balance 1 11.0 0.001 0.001 5.708 5.708 mp_sync 70 11.3 2.317 3.410 2.317 3.410 hfx_load_balance_bin 1 12.0 2.773 2.940 2.773 2.940 hfx_load_balance_count 1 12.0 2.761 2.815 2.761 2.815 qs_vxc_create 14 8.0 0.000 0.000 2.348 2.348 xc_vxc_pw_create 14 9.0 0.007 0.008 2.348 2.348 xc_rho_set_and_dset_create 14 10.0 0.010 0.011 1.897 2.010 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=16.189999999999984, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=80.485, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=2.817, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=2.805, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=1.903, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=0.608, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="CP2K", label="CP2K", y=0.188, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=8.423000000000002, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=78.634, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=2.773, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=2.761, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.052, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.003, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="CP2K", label="CP2K", y=0.215, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=2.317, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.013 90.802 90.802 qs_energies 1 2.0 0.000 0.000 90.429 90.429 mp2_main 1 3.0 0.000 0.000 87.605 87.605 mp2_gpw_main 1 4.0 0.000 0.000 87.501 87.501 rpa_ri_compute_en 1 5.0 0.000 0.000 83.966 83.966 rpa_num_int 1 6.0 0.001 0.001 83.960 83.960 compute_mat_P_omega 1 7.0 0.003 0.003 74.382 74.382 compute_mat_P_omega_contract 10 8.0 8.793 8.793 74.180 74.180 dbt_total 2336 9.6 0.011 0.011 60.500 60.500 dbt_contract 787 11.0 0.034 0.034 53.826 53.826 dbt_tas_total 1149 12.2 0.206 0.206 53.001 53.001 dbt_tas_multiply 807 12.1 0.002 0.002 51.575 51.575 dbt_tas_dbm 807 14.1 0.003 0.003 45.078 45.078 dbm_multiply 807 16.1 45.070 45.070 45.070 45.070 dbt_tas_mm_1N 524 15.1 0.001 0.001 37.713 37.713 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 36.226 36.226 compute_mat_P_omega_calc_M_occ 250 9.0 8.812 8.812 16.847 16.847 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 6.856 6.856 dbt_tas_mm_2 251 15.0 0.001 0.001 5.921 5.921 dbt_copy 1103 10.7 0.054 0.054 5.231 5.231 compute_QP_energies 1 7.0 0.000 0.000 4.955 4.955 compute_self_energy_cubic_gw 1 8.0 0.052 0.052 4.954 4.954 contract_cubic_gw 21 9.0 0.000 0.000 3.996 3.996 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 3.528 3.528 dbt_tas_reserve_blocks_index 3261 14.3 0.143 0.143 3.260 3.260 dbm_reserve_blocks 3628 15.3 3.178 3.178 3.178 3.178 scf_env_do_scf 1 3.0 0.000 0.000 2.697 2.697 scf_env_do_scf_inner_loop 17 4.0 0.002 0.002 2.696 2.696 dbt_reserve_blocks_index 2280 13.1 0.049 0.049 2.475 2.475 dbt_reserve_blocks_index_array 2222 12.2 0.009 0.009 2.439 2.439 convert_to_new_pgrid 2421 14.1 0.112 0.112 2.204 2.204 dbt_tas_reshape 367 15.0 0.006 0.006 2.159 2.159 dbt_crop 1042 12.0 1.367 1.367 2.148 2.148 dbt_tas_copy 574 11.4 1.293 1.293 2.121 2.121 dbm_copy 1614 15.1 2.092 2.092 2.092 2.092 compute_W_cubic_GW 10 7.0 0.004 0.004 1.906 1.906 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 1.889 1.889 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.026 30.764 30.774 qs_energies 1 2.0 0.000 0.000 30.670 30.672 mp2_main 1 3.0 0.000 0.001 29.759 29.761 mp2_gpw_main 1 4.0 0.000 0.004 29.725 29.727 rpa_ri_compute_en 1 5.0 0.000 0.000 28.467 28.469 rpa_num_int 1 6.0 0.000 0.002 28.464 28.465 dbt_total 2336 9.6 0.011 0.012 25.260 25.267 compute_mat_P_omega 1 7.0 0.001 0.005 24.237 24.239 compute_mat_P_omega_contract 10 8.0 0.427 0.437 24.137 24.140 dbt_contract 787 11.0 0.025 0.026 18.868 18.870 dbt_tas_total 1149 12.2 0.052 0.058 16.906 16.907 dbt_tas_multiply 807 12.1 0.002 0.002 16.853 16.856 dbt_tas_dbm 807 14.1 0.003 0.003 12.598 12.600 dbm_multiply 807 16.1 9.901 10.786 9.901 10.786 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 7.204 7.204 compute_mat_P_omega_calc_M_occ 250 9.0 0.400 0.410 7.093 7.094 dbt_tas_mm_2 251 15.0 0.001 0.001 5.985 5.986 dbt_copy 1111 10.7 0.012 0.012 5.701 5.882 dbt_reshape 1098 11.7 2.239 2.373 5.443 5.619 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 5.069 5.070 dbt_tas_mm_1N 524 15.1 0.001 0.002 4.577 4.939 mp_sync 8706 11.6 4.159 4.857 4.159 4.857 mp_waitall_2 3776 15.3 2.457 2.666 2.457 2.666 compute_QP_energies 1 7.0 0.000 0.000 2.636 2.637 compute_self_energy_cubic_gw 1 8.0 0.003 0.003 2.635 2.636 dbt_communicate_buffer 1098 12.7 0.054 0.057 2.513 2.631 contract_cubic_gw 21 9.0 0.000 0.000 2.091 2.091 dbt_reserve_blocks_index 2849 13.1 0.064 0.067 1.640 1.814 dbt_reserve_blocks_index_array 2791 12.2 0.011 0.012 1.632 1.804 dbt_tas_reserve_blocks_index 3300 14.5 0.117 0.125 1.611 1.781 dbm_reserve_blocks 3696 15.4 1.587 1.753 1.587 1.753 dbt_crop 1042 12.0 0.909 1.006 1.407 1.545 mp2_ri_gpw_compute_in 1 5.0 0.000 0.001 1.256 1.256 dbt_tas_replicate 396 14.1 0.529 0.682 1.073 1.134 convert_to_new_pgrid 2421 14.1 0.025 0.029 0.845 0.998 compute_mat_P_omega_copy_M_vir 250 9.0 0.001 0.001 0.981 0.989 dbm_copy 1608 15.1 0.814 0.968 0.814 0.968 parallel_gemm_fm 105 8.4 0.000 0.000 0.956 0.964 parallel_gemm_fm_cosma 105 9.4 0.956 0.964 0.956 0.964 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 0.898 0.899 scf_env_do_scf 1 3.0 0.000 0.000 0.872 0.872 scf_env_do_scf_inner_loop 17 4.0 0.001 0.003 0.872 0.872 compute_W_cubic_GW 10 7.0 0.001 0.001 0.732 0.737 dbm_add 807 14.1 0.642 0.676 0.642 0.676 mp_max_i 1994 9.8 0.487 0.664 0.487 0.664 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=22.857000000000014, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=45.07, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=8.812, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=8.793, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=3.178, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_copy", label="dbm_copy", y=2.092, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbt_reshape", label="dbt_reshape", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=8.780000000000001, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=9.901, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=0.4, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=0.427, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=1.587, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_copy", label="dbm_copy", y=0.814, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.457, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=4.159, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbt_reshape", label="dbt_reshape", y=2.239, yerr=0.0 Running RI-HFX_H2O-32.inp with 1 threads and 32 ranks... done. Running RI-HFX_H2O-32.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.017 0.017 347.717 347.717 qs_forces 1 2.0 0.000 0.000 347.182 347.182 rebuild_ks_matrix 7 6.6 0.000 0.000 345.719 345.719 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.001 345.719 345.719 hfx_ks_matrix 7 8.6 0.000 0.000 343.714 343.714 hfx_ri_update_ks 7 9.6 0.000 0.000 306.625 306.625 hfx_ri_update_ks_Pmat 7 10.6 30.524 30.524 306.621 306.621 dbt_total 841 11.0 0.005 0.005 291.160 291.160 dbt_contract 207 12.4 0.042 0.042 273.140 273.140 dbt_tas_total 375 13.4 1.404 1.404 272.867 272.867 qs_energies 1 3.0 0.000 0.000 270.708 270.708 scf_env_do_scf 1 4.0 0.000 0.000 270.391 270.391 qs_ks_update_qs_env 8 6.0 0.000 0.000 269.293 269.293 dbt_tas_multiply 216 13.5 0.001 0.001 268.305 268.305 dbt_tas_dbm 216 15.5 0.001 0.001 256.983 256.983 dbm_multiply 216 17.5 256.980 256.980 256.980 256.980 hfx_ri_update_ks_Pmat_KS 63 11.6 0.000 0.000 252.824 252.824 dbt_tas_mm_2 91 16.5 0.001 0.001 244.049 244.049 scf_env_do_scf_inner_loop 6 5.0 0.001 0.001 169.574 169.574 init_scf_loop 2 5.0 0.000 0.000 100.815 100.815 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 76.429 76.429 hfx_ri_update_forces 1 7.0 1.574 1.574 37.087 37.087 hfx_ri_forces_Pmat_3c 1 8.0 4.684 4.684 21.605 21.605 dbt_copy 409 11.7 0.041 0.041 13.451 13.451 precalc_derivatives 1 8.0 2.108 2.108 11.972 11.972 dbt_reshape 132 13.2 5.857 5.857 9.419 9.419 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 9.228 9.228 dbt_tas_mm_3T 77 17.1 0.000 0.000 8.869 8.869 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-HFX_H2O-32_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.022 43.924 43.935 qs_forces 1 2.0 0.000 0.000 43.821 43.821 rebuild_ks_matrix 7 6.6 0.000 0.000 43.179 43.180 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.002 43.179 43.180 hfx_ks_matrix 7 8.6 0.000 0.000 42.258 42.266 dbt_total 841 11.0 0.005 0.005 37.631 37.639 dbt_contract 207 12.4 0.021 0.023 29.888 29.899 dbt_tas_total 375 13.4 0.054 0.181 27.779 27.779 dbt_tas_multiply 216 13.5 0.001 0.001 26.836 26.836 hfx_ri_update_ks 7 9.6 0.000 0.000 24.696 24.696 hfx_ri_update_ks_Pmat 7 10.6 1.217 1.282 24.695 24.695 qs_energies 1 3.0 0.000 0.000 23.068 23.068 scf_env_do_scf 1 4.0 0.000 0.001 22.935 22.935 qs_ks_update_qs_env 8 6.0 0.000 0.000 22.437 22.437 dbt_tas_dbm 216 15.5 0.001 0.001 21.403 21.405 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 20.743 20.743 dbm_multiply 216 17.5 19.060 20.042 19.060 20.042 hfx_ri_update_forces 1 7.0 0.056 0.059 17.561 17.570 hfx_ri_forces_Pmat_3c 1 8.0 0.165 0.178 13.272 13.284 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 12.861 12.861 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 11.588 11.588 init_scf_loop 2 5.0 0.000 0.000 10.073 10.073 dbt_tas_mm_2 91 16.5 0.001 0.001 9.759 9.761 dbt_copy 421 11.8 0.010 0.012 6.512 6.690 dbt_tas_mm_3T 77 17.1 0.000 0.000 5.953 6.392 mp_sync 2909 12.8 4.202 6.268 4.202 6.268 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 5.070 5.070 dbt_reshape 252 12.8 2.281 2.349 4.415 4.487 dbt_tas_mm_3N 37 15.4 0.000 0.000 4.139 4.323 precalc_derivatives 1 8.0 0.079 0.083 3.250 3.250 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 3.126 3.126 dbm_reserve_blocks 1477 16.3 2.539 2.764 2.539 2.764 dbt_tas_reserve_blocks_index 1302 15.5 0.214 0.224 2.518 2.752 dbt_crop 372 13.7 1.718 1.815 2.348 2.491 mp_waitall_2 1204 16.3 2.130 2.218 2.130 2.218 dbt_reserve_blocks_index 938 14.4 0.089 0.094 1.973 2.128 dbt_reserve_blocks_index_array 915 13.4 0.005 0.006 1.950 2.106 build_3c_derivatives 3 9.0 0.222 0.241 1.796 1.807 dbt_tas_replicate 175 15.2 0.697 0.729 1.674 1.731 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 1.697 1.703 dbt_tas_copy 169 12.8 0.852 0.883 1.466 1.574 convert_to_new_pgrid 648 15.5 0.033 0.062 1.332 1.559 dbt_communicate_buffer 252 13.8 0.012 0.015 1.438 1.496 dbm_copy 452 16.3 1.157 1.390 1.157 1.390 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 1.379 1.387 dbt_tas_communicate_buffer 352 16.4 0.014 0.014 0.922 0.971 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_32omp", title="Timings of RI-HFX_H2O-32 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="rest", label="rest", y=47.56399999999991, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=256.98, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=30.524, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_reshape", label="dbt_reshape", y=5.857, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="hfx_ri_forces_Pmat_3c", label="hfx_ri_forces_Pmat_3c", y=4.684, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="precalc_derivatives", label="precalc_derivatives", y=2.108, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_32mpi", title="Timings of RI-HFX_H2O-32 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="rest", label="rest", y=12.251000000000001, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=19.06, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=1.217, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_reshape", label="dbt_reshape", y=2.281, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="hfx_ri_forces_Pmat_3c", label="hfx_ri_forces_Pmat_3c", y=0.165, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="precalc_derivatives", label="precalc_derivatives", y=0.079, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=2.539, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_sync", label="mp_sync", y=4.202, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.13, yerr=0.0 Running RI-MP2_ammonia.inp with 1 threads and 32 ranks... done. Running RI-MP2_ammonia.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.014 0.014 222.956 222.956 qs_energies 1 2.0 0.000 0.000 222.776 222.776 mp2_main 1 3.0 0.000 0.000 218.096 218.096 mp2_gpw_main 1 4.0 0.001 0.001 217.654 217.654 mp2_ri_gpw_compute_in 1 5.0 0.376 0.376 173.291 173.291 mp2_ri_gpw_compute_in_loop 1 6.0 0.010 0.010 162.056 162.056 mp2_eri_3c_integrate_gpw 2656 7.0 0.013 0.013 131.823 131.823 integrate_v_rspace 2666 8.0 0.590 0.590 119.088 119.088 grid_integrate_task_list 2666 9.0 116.502 116.502 116.502 116.502 mp2_ri_gpw_compute_en 1 5.0 0.090 0.090 44.338 44.338 mp2_ri_gpw_compute_en_RI_loop 1 6.0 9.941 9.941 42.523 42.523 mp2_ri_gpw_compute_en_expansio 2080 7.0 2.175 2.175 25.032 25.032 local_gemm 2080 8.0 22.857 22.857 22.857 22.857 dbcsr_multiply_generic 5322 8.0 0.181 0.181 20.749 20.749 ao_to_mo_and_store_B_mult_1 2656 7.0 0.009 0.009 20.729 20.729 pw_transfer 63872 10.6 1.013 1.013 11.399 11.399 calculate_wavefunction 2656 8.0 8.105 8.105 11.390 11.390 get_2c_integrals 1 6.0 0.000 0.000 10.858 10.858 multiply_cannon 5322 9.0 0.418 0.418 10.223 10.223 fft_wrap_pw1pw2 53228 11.4 0.115 0.115 10.185 10.185 compute_2c_integrals 1 7.0 0.006 0.006 10.046 10.046 compute_2c_integrals_loop_lm 1 8.0 0.012 0.012 10.026 10.026 mp2_eri_2c_integrate_gpw 1 9.0 3.474 3.474 10.014 10.014 ao_to_mo_and_store_B_E_Ex_1 2656 7.0 2.278 2.278 9.400 9.400 multiply_cannon_loop 5322 10.0 0.127 0.127 8.907 8.907 make_m2s 10644 9.0 0.060 0.060 7.993 7.993 make_images 10644 10.0 3.213 3.213 7.704 7.704 copy_dbcsr_to_fm 2679 8.0 0.025 0.025 7.615 7.615 multiply_cannon_multrec 5322 11.0 7.442 7.442 7.483 7.483 fft_wrap_pw1pw2_20 21271 12.4 0.447 0.447 7.264 7.264 fft3d_s 53229 13.4 6.251 6.251 6.284 6.284 dbcsr_complete_redistribute 2689 9.0 1.189 1.189 5.986 5.986 dbcsr_finalize 10708 9.5 0.184 0.184 5.635 5.635 mp2_ri_gpw_compute_en_ener 2080 7.0 5.623 5.623 5.623 5.623 dbcsr_merge_all 8011 10.3 3.806 3.806 4.989 4.989 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-MP2_ammonia_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.021 34.156 34.167 qs_energies 1 2.0 0.001 0.013 34.086 34.087 mp2_main 1 3.0 0.000 0.001 32.024 32.024 mp2_gpw_main 1 4.0 0.001 0.002 31.910 31.910 mp2_ri_gpw_compute_in 1 5.0 0.051 0.051 16.876 17.249 mp2_ri_gpw_compute_in_loop 1 6.0 0.001 0.001 15.697 16.056 mp2_ri_gpw_compute_en 1 5.0 0.146 0.154 14.947 15.175 mp2_eri_3c_integrate_gpw 83 7.0 0.001 0.001 13.671 14.174 integrate_v_rspace 93 8.1 0.100 0.108 13.534 14.033 mp2_ri_gpw_compute_en_RI_loop 1 6.0 0.828 0.905 13.930 13.935 grid_integrate_task_list 93 9.1 13.247 13.753 13.247 13.753 mp2_ri_gpw_compute_en_expansio 65 7.0 0.106 0.120 10.464 10.628 local_gemm 65 8.0 10.358 10.511 10.358 10.511 mp2_ri_gpw_compute_en_comm 17 7.0 0.070 0.086 2.267 2.518 mp_sendrecv_dm3 1054 8.0 1.721 2.012 1.721 2.012 scf_env_do_scf 1 3.0 0.000 0.000 1.947 1.948 scf_env_do_scf_inner_loop 10 4.0 0.000 0.002 1.947 1.948 dbcsr_multiply_generic 176 8.0 0.008 0.009 1.739 1.920 ao_to_mo_and_store_B_mult_1 83 7.0 0.001 0.001 1.722 1.903 get_2c_integrals 1 6.0 0.000 0.000 1.110 1.135 multiply_cannon 176 9.0 0.015 0.016 1.023 1.091 multiply_cannon_loop 176 10.0 0.002 0.002 0.967 1.033 qs_scf_new_mos 10 5.0 0.000 0.000 0.983 0.988 eigensolver 11 5.8 0.001 0.001 0.968 0.969 multiply_cannon_multrec 246 11.0 0.828 0.868 0.833 0.874 compute_2c_integrals 1 7.0 0.002 0.003 0.817 0.833 cp_fm_diag_elpa 11 6.8 0.000 0.000 0.811 0.812 cp_fm_redistribute_end 11 7.8 0.303 0.804 0.316 0.806 make_m2s 352 9.0 0.003 0.003 0.680 0.789 cp_fm_diag_elpa_base 11 7.8 0.474 0.764 0.486 0.780 make_images 352 10.0 0.050 0.051 0.668 0.777 compute_2c_integrals_loop_lm 1 8.0 0.002 0.003 0.731 0.755 mp2_eri_2c_integrate_gpw 1 9.0 0.198 0.206 0.729 0.754 pw_transfer 2120 10.5 0.044 0.047 0.735 0.746 fft_wrap_pw1pw2 1768 11.4 0.004 0.005 0.678 0.687 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_32omp", title="Timings of RI-MP2_ammonia with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="rest", label="rest", y=58.10900000000001, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=116.502, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="local_gemm", label="local_gemm", y=22.857, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=9.941, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="calculate_wavefunction", label="calculate_wavefunction", y=8.105, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=7.442, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=0.0, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_32mpi", title="Timings of RI-MP2_ammonia with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="rest", label="rest", y=7.1739999999999995, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=13.247, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="local_gemm", label="local_gemm", y=10.358, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=0.828, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="calculate_wavefunction", label="calculate_wavefunction", y=0.0, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.828, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=1.721, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.084 0.084 123.292 123.292 qs_energies 1 2.0 0.000 0.000 121.931 121.931 scf_env_do_scf 1 3.0 0.000 0.000 115.360 115.360 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 115.360 115.360 qs_ks_update_qs_env 15 5.0 0.000 0.000 47.833 47.833 rebuild_ks_matrix 15 6.0 0.000 0.000 47.625 47.625 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.002 47.625 47.625 qs_scf_new_mos 15 5.0 0.000 0.000 42.964 42.964 eigensolver 15 6.0 0.001 0.001 35.350 35.350 qs_vxc_create 15 8.0 0.039 0.039 32.600 32.600 calculate_dispersion_nonloc 15 9.0 6.683 6.683 28.411 28.411 cp_fm_diag_elpa 15 7.0 0.000 0.000 22.990 22.990 cp_fm_diag_elpa_base 15 8.0 20.313 20.313 22.990 22.990 pw_transfer 1191 10.0 0.062 0.062 22.099 22.099 fft_wrap_pw1pw2 1086 11.0 0.008 0.008 21.926 21.926 qs_rho_update_rho_low 16 5.0 0.000 0.000 21.758 21.758 calculate_rho_elec 16 6.0 0.216 0.216 21.758 21.758 grid_collocate_task_list 16 7.0 20.450 20.450 20.450 20.450 fft_wrap_pw1pw2_150 765 12.0 3.519 3.519 15.964 15.964 sum_up_and_integrate 15 8.0 0.111 0.111 13.921 13.921 integrate_v_rspace 15 9.0 0.018 0.018 13.811 13.811 grid_integrate_task_list 15 10.0 13.283 13.283 13.283 13.283 fft3d_s 1087 13.0 10.027 10.027 10.034 10.034 cp_fm_cholesky_restore 45 7.0 10.034 10.034 10.034 10.034 pw_scatter_s 585 13.1 6.516 6.516 6.516 6.516 fft_wrap_pw1pw2_200 197 12.3 0.722 0.722 5.776 5.776 copy_dbcsr_to_fm 16 5.9 0.001 0.001 5.581 5.581 dbcsr_complete_redistribute 46 8.3 2.241 2.241 5.468 5.468 cp_fm_upper_to_full 30 8.0 5.001 5.001 5.001 5.001 vdW_energy 15 10.0 4.350 4.350 4.350 4.350 xc_vxc_pw_create 15 9.0 0.231 0.231 4.150 4.150 gspace_mixing 14 5.0 0.171 0.171 4.006 4.006 broyden_mixing 14 6.0 3.400 3.400 3.401 3.401 init_scf_run 1 3.0 0.000 0.000 3.119 3.119 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 2.995 2.995 xc_pw_derive 90 11.0 0.001 0.001 2.695 2.695 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.028 58.758 58.770 qs_energies 1 2.0 0.000 0.000 58.509 58.514 scf_env_do_scf 1 3.0 0.000 0.000 54.438 54.438 scf_env_do_scf_inner_loop 15 4.0 0.001 0.004 54.438 54.438 qs_ks_update_qs_env 15 5.0 0.000 0.000 22.851 22.867 rebuild_ks_matrix 15 6.0 0.000 0.000 22.818 22.834 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 22.818 22.834 qs_rho_update_rho_low 16 5.0 0.000 0.000 19.978 19.981 calculate_rho_elec 16 6.0 0.007 0.007 19.978 19.980 grid_collocate_task_list 16 7.0 19.016 19.220 19.016 19.220 sum_up_and_integrate 15 8.0 0.011 0.016 13.508 13.551 integrate_v_rspace 15 9.0 0.001 0.001 13.497 13.545 grid_integrate_task_list 15 10.0 12.784 12.896 12.784 12.896 qs_scf_new_mos 15 5.0 0.000 0.001 12.282 12.347 eigensolver 15 6.0 0.001 0.002 11.322 11.339 qs_vxc_create 15 8.0 0.001 0.001 9.018 9.027 cp_fm_diag_elpa 15 7.0 0.000 0.000 8.104 8.107 cp_fm_diag_elpa_base 15 8.0 7.956 7.976 8.100 8.100 calculate_dispersion_nonloc 15 9.0 0.905 0.918 7.269 7.287 pw_transfer 1191 10.0 0.098 0.106 6.757 6.810 fft_wrap_pw1pw2 1086 11.0 0.014 0.015 6.570 6.617 fft3d_ps 1086 13.0 2.592 2.727 4.840 4.925 fft_wrap_pw1pw2_150 765 12.0 0.319 0.337 4.256 4.283 cp_fm_cholesky_restore 45 7.0 3.072 3.109 3.072 3.109 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 2.484 2.484 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.149 2.385 fft_wrap_pw1pw2_200 197 12.3 0.192 0.213 2.204 2.282 mp_alltoall_z22v 1086 15.0 1.616 1.986 1.616 1.986 xc_vxc_pw_create 15 9.0 0.015 0.019 1.748 1.772 build_core_ppnl 1 5.0 1.259 1.387 1.259 1.387 init_scf_run 1 3.0 0.000 0.000 1.367 1.368 vdW_energy 15 10.0 1.290 1.342 1.290 1.342 x_to_yz 585 14.1 0.363 0.385 1.176 1.319 scf_env_initial_rho_setup 1 4.0 0.000 0.001 1.282 1.283 yz_to_x 501 13.9 0.241 0.362 1.044 1.256 xc_pw_derive 90 11.0 0.001 0.002 1.154 1.228 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=49.185, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=20.45, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=20.313, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=13.283, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=10.034, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=10.027, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=13.338000000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=19.016, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=7.956, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=12.784, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=3.072, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=2.592, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.073 0.073 357.521 357.521 qs_energies 1 2.0 0.000 0.000 357.388 357.388 ls_scf 1 3.0 0.000 0.000 356.173 356.173 ls_scf_main 1 4.0 0.002 0.002 347.341 347.341 density_matrix_trs4 11 5.0 0.012 0.012 257.824 257.824 arnoldi_extremal 12 6.1 0.000 0.000 174.555 174.555 arnoldi_normal_ev 12 7.1 0.014 0.014 174.555 174.555 dbcsr_matrix_vector_mult 652 9.0 0.165 0.165 171.598 171.598 build_subspace 23 8.1 0.076 0.076 171.548 171.548 dbcsr_matrix_vector_mult_local 652 10.0 170.089 170.089 170.098 170.098 ls_scf_dm_to_ks 11 5.0 0.000 0.000 84.347 84.347 matrix_ls_to_qs 11 6.0 0.000 0.000 81.057 81.057 dbcsr_multiply_generic 185 6.1 0.818 0.818 72.449 72.449 multiply_cannon 185 7.1 0.283 0.283 43.996 43.996 dbcsr_copy_into_existing 11 7.0 43.601 43.601 43.602 43.602 dbcsr_complete_redistribute 23 7.5 29.893 29.893 41.137 41.137 matrix_decluster 11 7.0 0.000 0.000 37.454 37.454 multiply_cannon_loop 185 8.1 0.222 0.222 31.748 31.748 make_m2s 370 7.1 0.037 0.037 24.279 24.279 make_images 370 8.1 10.502 10.502 22.767 22.767 multiply_cannon_multrec 185 9.1 22.360 22.360 22.385 22.385 dbcsr_finalize 646 7.5 0.154 0.154 14.784 14.784 dbcsr_merge_all 597 8.5 2.289 2.289 13.576 13.576 setup_rec_index_2d 370 8.1 11.878 11.878 11.878 11.878 dbcsr_sort_indices 1103 9.9 10.036 10.036 10.036 10.036 tree_to_linear_d 110 9.4 10.022 10.022 10.022 10.022 calculate_norms 370 9.1 9.141 9.141 9.141 9.141 quick_finalize 395 10.0 0.341 0.341 8.656 8.656 ls_scf_init_scf 1 4.0 0.000 0.000 8.145 8.145 dbcsr_special_finalize 370 9.1 0.002 0.002 8.016 8.016 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.808 7.808 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 7.172 7.172 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.024 65.188 65.198 qs_energies 1 2.0 0.000 0.000 65.092 65.092 ls_scf 1 3.0 0.000 0.000 65.042 65.043 ls_scf_main 1 4.0 0.001 0.009 62.529 62.530 density_matrix_trs4 11 5.0 0.006 0.018 60.083 60.156 dbcsr_multiply_generic 185 6.1 0.058 0.074 56.948 57.331 multiply_cannon 185 7.1 0.035 0.038 47.674 48.145 multiply_cannon_loop 185 8.1 0.115 0.136 45.256 45.847 multiply_cannon_multrec 1480 9.1 27.903 29.812 28.176 30.084 mp_waitall_1 11936 10.3 14.771 17.686 14.771 17.686 multiply_cannon_metrocomm3 1480 9.1 0.014 0.017 8.768 13.016 make_m2s 370 7.1 0.034 0.037 6.564 6.622 make_images 370 8.1 0.617 0.654 6.439 6.498 calculate_norms 2960 9.1 4.794 5.748 4.794 5.748 multiply_cannon_metrocomm1 1480 9.1 0.008 0.009 3.348 5.500 make_images_data 370 9.1 0.009 0.011 2.902 3.079 hybrid_alltoall_any 393 9.9 0.192 0.995 2.546 2.729 mp_sum_l 1119 5.6 1.663 2.343 1.663 2.343 arnoldi_extremal 12 6.1 0.000 0.000 2.242 2.257 arnoldi_normal_ev 12 7.1 0.001 0.003 2.242 2.257 ls_scf_dm_to_ks 11 5.0 0.000 0.000 2.120 2.168 build_subspace 23 8.1 0.020 0.024 2.157 2.158 dbcsr_complete_redistribute 23 7.5 1.176 1.216 1.915 1.994 matrix_ls_to_qs 11 6.0 0.000 0.000 1.872 1.957 ls_scf_init_scf 1 4.0 0.000 0.000 1.931 1.931 dbcsr_matrix_vector_mult 652 9.0 0.009 0.043 1.871 1.917 ls_scf_init_matrix_S 1 5.0 0.000 0.000 1.907 1.913 make_images_pack 370 9.1 1.642 1.838 1.645 1.841 matrix_decluster 11 7.0 0.000 0.000 1.732 1.815 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.002 1.743 1.744 dbcsr_matrix_vector_mult_local 652 10.0 1.577 1.623 1.578 1.625 dbcsr_finalize 646 7.5 0.008 0.008 1.400 1.530 buffer_matrices_ensure_size 370 8.1 1.342 1.458 1.342 1.458 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.000 0.862 1.434 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=70.55900000000003, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=170.089, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=43.601, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=29.893, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=22.36, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=11.878, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=9.141, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="make_images_pack", label="make_images_pack", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=11.662000000000006, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=1.577, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.176, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=27.903, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.794, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="make_images_pack", label="make_images_pack", y=1.642, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=14.771, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=1.663, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.007 69.722 69.722 lib_test 1 2.0 0.000 0.000 69.714 69.714 dbcsr_run_tests 3 3.0 0.002 0.002 69.714 69.714 test_multiplies_multiproc 3 4.0 0.001 0.001 53.983 53.983 dbcsr_redistribute 9 5.0 35.052 35.052 36.634 36.634 dbcsr_multiply_generic 9 5.0 0.001 0.001 15.985 15.985 dbcsr_make_random_matrix 9 4.0 12.477 12.477 15.622 15.622 multiply_cannon 9 6.0 0.001 0.001 11.199 11.199 multiply_cannon_loop 9 7.0 0.008 0.008 10.840 10.840 multiply_cannon_multrec 9 8.0 10.832 10.832 10.832 10.832 dbcsr_finalize 27 5.7 0.011 0.011 5.696 5.696 dbcsr_merge_all 18 6.5 2.010 2.010 4.952 4.952 dbcsr_data_release 975 7.6 2.884 2.884 2.884 2.884 tree_to_linear_d 9 7.0 1.876 1.876 1.876 1.876 make_m2s 18 6.0 0.001 0.001 1.636 1.636 make_images 18 7.0 0.555 0.555 1.589 1.589 dbcsr_destroy 93 5.8 0.000 0.000 1.419 1.419 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.010 17.380 17.385 lib_test 1 2.0 0.000 0.000 17.354 17.372 dbcsr_run_tests 3 3.0 0.000 0.001 17.353 17.371 test_multiplies_multiproc 3 4.0 0.000 0.002 16.523 16.583 dbcsr_multiply_generic 9 5.0 0.001 0.001 15.204 15.307 multiply_cannon 9 6.0 0.001 0.002 13.486 13.834 multiply_cannon_loop 9 7.0 0.002 0.002 13.188 13.492 multiply_cannon_multrec 72 8.0 11.078 11.466 11.079 11.467 mp_waitall_1 576 9.2 2.423 2.896 2.423 2.896 multiply_cannon_metrocomm1 72 8.0 0.001 0.001 1.888 2.454 mp_sum_l 390 2.5 0.382 0.924 0.382 0.924 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.379 0.921 dbcsr_data_release 444 7.6 0.763 0.885 0.763 0.885 dbcsr_make_random_matrix 9 4.0 0.660 0.664 0.800 0.826 dbcsr_finalize 27 5.7 0.000 0.000 0.713 0.808 dbcsr_destroy 111 5.9 0.000 0.000 0.604 0.713 make_m2s 18 6.0 0.001 0.001 0.659 0.702 make_images 18 7.0 0.020 0.021 0.656 0.699 multiply_cannon_metrocomm3 72 8.0 0.000 0.000 0.214 0.670 dbcsr_merge_all 18 6.5 0.104 0.127 0.553 0.648 dbcsr_redistribute 9 5.0 0.233 0.271 0.402 0.428 make_images_data 18 8.0 0.000 0.001 0.352 0.426 dbcsr_data_copy_aa2 18 7.5 0.332 0.401 0.332 0.401 hybrid_alltoall_any 18 9.0 0.028 0.125 0.314 0.389 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=6.466999999999999, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=35.052, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=12.477, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=10.832, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.884, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=2.01, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=1.737, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.233, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.66, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=11.078, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.763, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.104, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.382, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=2.423, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.052 0.052 127.126 127.126 qs_mol_dyn_low 1 2.0 0.003 0.003 125.767 125.767 velocity_verlet 5 3.0 0.003 0.003 102.633 102.633 qmmm_el_coupling 6 3.8 0.000 0.000 83.580 83.580 qmmm_elec_with_gaussian 6 4.8 0.012 0.012 83.576 83.576 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 82.961 82.961 qmmm_elec_gaussian_low_G 6 6.8 82.057 82.057 82.057 82.057 qs_forces 6 3.8 0.000 0.000 33.856 33.856 qs_energies 6 4.8 0.000 0.000 30.010 30.010 scf_env_do_scf 6 5.8 0.001 0.001 27.780 27.780 scf_env_do_scf_inner_loop 39 6.8 0.004 0.004 24.199 24.199 rebuild_ks_matrix 45 8.4 0.000 0.000 23.252 23.252 qs_ks_build_kohn_sham_matrix 45 9.4 0.005 0.005 23.252 23.252 qs_ks_update_qs_env 45 7.8 0.000 0.000 19.839 19.839 pw_transfer 966 12.3 0.052 0.052 16.602 16.602 fft_wrap_pw1pw2 801 13.6 0.006 0.006 16.412 16.412 fft_wrap_pw1pw2_150 507 15.2 2.267 2.267 16.014 16.014 qs_vxc_create 45 10.4 0.001 0.001 12.678 12.678 xc_vxc_pw_create 45 11.4 0.663 0.663 12.677 12.677 xc_pw_derive 270 13.4 0.002 0.002 9.053 9.053 fft3d_s 802 15.6 7.644 7.644 7.652 7.652 qs_rho_update_rho_low 45 7.9 0.000 0.000 6.967 6.967 calculate_rho_elec 45 8.9 0.562 0.562 6.967 6.967 xc_rho_set_and_dset_create 45 12.4 0.535 0.535 6.488 6.488 xc_pw_divergence 45 12.4 0.001 0.001 5.467 5.467 pw_scatter_s 429 15.8 5.233 5.233 5.233 5.233 qmmm_forces 6 3.8 0.002 0.002 5.212 5.212 qmmm_forces_with_gaussian 6 4.8 0.028 0.028 4.874 4.874 pw_integral_ab 2539 7.4 4.443 4.443 4.443 4.443 qs_ks_ddapc 45 10.4 0.001 0.001 4.163 4.163 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 4.147 4.147 init_scf_loop 6 6.8 0.000 0.000 3.546 3.546 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 3.421 3.421 qmmm_forces_gaussian_low_G 6 6.8 3.414 3.414 3.414 3.414 sum_up_and_integrate 45 10.4 0.329 0.329 3.339 3.339 grid_collocate_task_list 45 9.9 3.236 3.236 3.236 3.236 density_rs2pw 45 9.9 0.001 0.001 3.168 3.168 integrate_v_rspace 45 11.4 0.006 0.006 3.011 3.011 cp_ddapc_apply_CD 45 11.4 0.004 0.004 2.556 2.556 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.034 0.051 52.738 52.749 qs_mol_dyn_low 1 2.0 0.003 0.004 51.748 51.808 qs_forces 6 3.8 0.000 0.001 37.707 37.708 qs_energies 6 4.8 0.000 0.000 35.970 35.970 scf_env_do_scf 6 5.8 0.000 0.001 35.080 35.081 scf_env_do_scf_inner_loop 113 6.2 0.002 0.016 33.674 33.675 rebuild_ks_matrix 119 8.1 0.000 0.000 24.781 24.791 qs_ks_build_kohn_sham_matrix 119 9.1 0.015 0.017 24.781 24.790 qs_ks_update_qs_env 119 7.3 0.001 0.001 23.325 23.334 velocity_verlet 5 3.0 0.002 0.004 21.777 21.781 pw_transfer 2446 12.3 0.209 0.231 16.485 16.935 fft_wrap_pw1pw2 2059 13.4 0.024 0.026 16.020 16.472 fft_wrap_pw1pw2_150 1321 14.9 1.317 1.430 15.492 16.009 qs_vxc_create 119 10.1 0.002 0.003 13.054 13.058 xc_vxc_pw_create 119 11.1 0.154 0.221 13.052 13.056 fft3d_ps 2059 15.4 7.033 7.664 11.856 12.487 qs_rho_update_rho_low 119 7.3 0.000 0.001 10.134 10.136 calculate_rho_elec 119 8.3 0.049 0.055 10.133 10.135 xc_pw_derive 714 13.1 0.011 0.013 9.779 10.112 sum_up_and_integrate 119 10.1 0.086 0.094 8.506 8.516 integrate_v_rspace 119 11.1 0.003 0.004 8.420 8.435 qmmm_forces 6 3.8 0.002 0.002 7.202 7.202 qmmm_forces_with_gaussian 6 4.8 0.011 0.013 6.949 7.094 xc_pw_divergence 119 12.1 0.005 0.006 6.378 6.610 xc_rho_set_and_dset_create 119 12.1 0.363 0.454 6.340 6.490 rs_pw_transfer 988 11.5 0.011 0.014 5.907 6.154 qmmm_el_coupling 6 3.8 0.000 0.000 5.975 6.086 qmmm_elec_with_gaussian 6 4.8 0.003 0.003 5.974 6.084 density_rs2pw 119 9.3 0.006 0.007 5.684 5.935 potential_pw2rs 119 12.1 0.006 0.006 4.830 4.844 grid_collocate_task_list 119 9.3 4.320 4.607 4.320 4.607 mp_alltoall_z22v 2059 17.4 3.311 4.010 3.311 4.010 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 3.821 3.860 grid_integrate_task_list 119 12.1 3.369 3.554 3.369 3.554 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.290 3.317 qmmm_forces_gaussian_low_G 6 6.8 3.142 3.179 3.142 3.179 x_to_yz 1095 16.8 0.870 0.983 2.640 2.983 qmmm_elec_gaussian_low_G 6 6.8 2.701 2.726 2.701 2.726 pw_restrict_s3 18 5.8 1.352 1.372 2.522 2.611 mp_waitany 4028 12.8 2.146 2.507 2.146 2.507 yz_to_x 964 16.0 0.577 0.763 2.119 2.470 rs_pw_transfer_PW2RS_150 125 13.9 0.766 0.819 2.237 2.277 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.000 2.028 2.138 pw_prolongate_s3 18 6.8 1.113 1.156 2.028 2.138 rs_pw_transfer_RS2PW_150 125 11.2 0.594 0.739 1.789 2.020 qs_ks_ddapc 119 10.1 0.002 0.002 1.936 2.005 qs_scf_new_mos 113 7.2 0.000 0.000 1.939 1.944 qs_scf_loop_do_ot 113 8.2 0.000 0.000 1.938 1.944 pw_gather_p 964 15.0 1.266 1.926 1.266 1.926 ot_scf_mini 113 9.2 0.001 0.001 1.855 1.859 dbcsr_multiply_generic 2588 12.3 0.056 0.058 1.762 1.789 mp_waitall_1 188862 16.2 1.633 1.753 1.633 1.753 pw_scatter_p 1095 15.8 1.527 1.556 1.527 1.556 pw_integral_ab 2761 7.7 1.252 1.320 1.446 1.541 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 1.466 1.467 init_scf_loop 6 6.8 0.000 0.000 1.404 1.404 mp_sum_dm3 33 5.7 1.154 1.190 1.154 1.190 ot_mini 113 10.2 0.001 0.001 1.163 1.168 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=21.09899999999999, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=82.057, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=7.644, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=5.233, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=4.443, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=3.414, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=3.236, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=27.61, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=2.701, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=1.252, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=3.142, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=4.32, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=3.369, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=7.033, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.311, yerr=0.0 Summary: Performance test took 36 minutes. Status: OK Removing intermediate container 784d4539344d ---> 85349116b001 Step 41/42 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 9f188ac97899 Removing intermediate container 9f188ac97899 ---> ad4eac38bcc4 Step 42/42 : ENTRYPOINT [] ---> Running in a97f3c0d507d Removing intermediate container a97f3c0d507d ---> 672de6fc085a [Warning] One or more build-args [GIT_COMMIT_SHA] were not consumed Successfully built 672de6fc085a Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-be5:master Pushing new image... done. #################### Running Image cp2k-perf-openmp #################### Uploading artifacts... done EndDate: 2022-10-27 12:05:58+00:00