StartDate: 2022-09-05 19:05:59+00:00 CpuId: 32x AMD (unknown model) [Zen 3], 7nm (SMT disabled) CommitSHA: 98100e47df789b3bcb3b9f7b71530128d5284e8b CommitTime: 2022-09-05 19:11:34 +0200 CommitAuthor: Matthias Krack CommitSubject: Fix folder name Populating docker build cache... done. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Path: / Build-Args: GIT_COMMIT_SHA=98100e47df789b3bcb3b9f7b71530128d5284e8b Sending build context to Docker daemon 364.3MB Step 1/42 : FROM ubuntu:22.04 22.04: Pulling from library/ubuntu 2b55860d4c66: Already exists Digest: sha256:20fa2d7bb4de7723f542be5923b06c4d704370f0390e4ae9e1c833c8785644c1 Status: Downloaded newer image for ubuntu:22.04 ---> 2dc39ba059dc Step 2/42 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> bf0c853ea628 Step 3/42 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 3f014c28f167 Step 4/42 : RUN ./install_requirements.sh ubuntu:22.04 ---> Using cache ---> 9e0fed4f46ea Step 5/42 : RUN mkdir scripts ---> Using cache ---> 51c38e746560 Step 6/42 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 2dae48987405 Step 7/42 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 3355c055d187 Step 8/42 : RUN ./install_cp2k_toolchain.sh --install-all --mpi-mode=mpich --with-gcc=system --dry-run ---> Using cache ---> 63ede7454394 Step 9/42 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> a07d07c7de8d Step 10/42 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 7ae1dc1b8967 Step 11/42 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 1816082fe0dd Step 12/42 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> 2f671e6ce544 Step 13/42 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 7388eb22d3a9 Step 14/42 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 598ede48d900 Step 15/42 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 2466948c5c56 Step 16/42 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 06f90dd3dfcc Step 17/42 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> bc70e277997b Step 18/42 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 39d418a0e1db Step 19/42 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> 49aea900fab1 Step 20/42 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in e988abddd8e0 ==================== Installing ELPA ==================== elpa-2021.11.002.tar.gz: OK Checksum of elpa-2021.11.002.tar.gz Ok patching file nvcc_wrap Installing from scratch into /opt/cp2k-toolchain/install/elpa-2021.11.002/cpu Step elpa took 135.00 seconds. ==================== Installing PT-Scotch ==================== scotch_6.0.0.tar.gz: OK Checksum of scotch_6.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scotch-6.0.0 Step ptscotch took 5.00 seconds. ==================== Installing SuperLU_DIST ==================== superlu_dist_6.1.0.tar.gz: OK Checksum of superlu_dist_6.1.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/superlu_dist-6.1.0 Step superlu took 7.00 seconds. ==================== Installing PEXSI ==================== pexsi_v1.2.0.tar.gz: OK Checksum of pexsi_v1.2.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pexsi-1.2.0 Step pexsi took 52.00 seconds. Removing intermediate container e988abddd8e0 ---> afca7a64a49e Step 21/42 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> 11b2c09cc9e5 Step 22/42 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in ca835494a447 ==================== Installing QUIP ==================== QUIP-b4336484fb65b0e73211a8f920ae4361c7c353fd.tar.gz: OK Checksum of QUIP-b4336484fb65b0e73211a8f920ae4361c7c353fd.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/quip-b4336484fb65b0e73211a8f920ae4361c7c353fd Step quip took 199.00 seconds. ==================== Installing gsl ==================== gsl-2.7.tar.gz: OK Checksum of gsl-2.7.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.7 Step gsl took 40.00 seconds. ==================== Installing PLUMED ==================== plumed-src-2.8.0.tgz: OK Checksum of plumed-src-2.8.0.tgz Ok Installing from scratch into /opt/cp2k-toolchain/install/plumed-2.8.0 Step plumed took 44.00 seconds. Removing intermediate container ca835494a447 ---> edb32ead46fe Step 23/42 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> 3c5f5ec960e3 Step 24/42 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in ee1effbbf936 ==================== Installing hdf5 ==================== hdf5-1.12.0.tar.bz2: OK Checksum of hdf5-1.12.0.tar.bz2 Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-1.12.0 Step hdf5 took 83.00 seconds. ==================== Installing libvdwxc ==================== libvdwxc-0.4.0.tar.gz: OK Checksum of libvdwxc-0.4.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.4.0 Step libvdwxc took 25.00 seconds. ==================== Installing spglib ==================== spglib-1.16.2.tar.gz: OK Checksum of spglib-1.16.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-1.16.2 Step spglib took 8.00 seconds. ==================== Installing libvori ==================== libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 15.00 seconds. Removing intermediate container ee1effbbf936 ---> 488b39985ea3 Step 25/42 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> acf2466d1518 Step 26/42 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in edb1c0a87754 ==================== Installing spfft ==================== SpFFT-1.0.6.tar.gz: OK Checksum of SpFFT-1.0.6.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.0.6 Step spfft took 5.00 seconds. ==================== Installing spla ==================== SpLA-1.5.4.tar.gz: OK Checksum of SpLA-1.5.4.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.5.4 Step spla took 4.00 seconds. ==================== Installing SIRIUS ==================== SIRIUS-7.3.2.tar.gz: OK Checksum of SIRIUS-7.3.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.3.2 Step sirius took 81.00 seconds. Removing intermediate container edb1c0a87754 ---> 24160d1ef63e Step 27/42 : COPY ./tools/toolchain/scripts/arch_base.tmpl ./tools/toolchain/scripts/generate_arch_files.sh ./scripts/ ---> 02b1ac7a24fe Step 28/42 : RUN ./scripts/generate_arch_files.sh && rm -rf ./build ---> Running in 8493a86b4ce3 ==================== generating arch files ==================== arch files can be found in the /opt/cp2k-toolchain/install/arch subdirectory Wrote /opt/cp2k-toolchain/install/arch/local.ssmp Wrote /opt/cp2k-toolchain/install/arch/local_static.ssmp Wrote /opt/cp2k-toolchain/install/arch/local.sdbg Wrote /opt/cp2k-toolchain/install/arch/local_coverage.sdbg Wrote /opt/cp2k-toolchain/install/arch/local.psmp Wrote /opt/cp2k-toolchain/install/arch/local.pdbg Wrote /opt/cp2k-toolchain/install/arch/local_static.psmp Wrote /opt/cp2k-toolchain/install/arch/local_warn.psmp Wrote /opt/cp2k-toolchain/install/arch/local_coverage.pdbg ========================== usage ========================= Done! Now copy: cp /opt/cp2k-toolchain/install/arch/* to the cp2k/arch/ directory To use the installed tools and libraries and cp2k version compiled with it you will first need to execute at the prompt: source /opt/cp2k-toolchain/install/setup To build CP2K you should change directory: cd cp2k/ make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg" arch files for GPU enabled CUDA versions are named "local_cuda.*" arch files for GPU enabled HIP versions are named "local_hip.*" arch files for OpenCL (GPU) versions are named "local_opencl.*" arch files for coverage versions are named "local_coverage.*" Note that these pre-built arch files are for the GNU compiler, users have to adapt them for other compilers. It is possible to use the provided CP2K arch files as guidance. Removing intermediate container 8493a86b4ce3 ---> 9a2e1b348e4a Step 29/42 : WORKDIR /opt/cp2k ---> Running in a9cc3dc032b5 Removing intermediate container a9cc3dc032b5 ---> 85af92b0e7f6 Step 30/42 : COPY ./Makefile . ---> 6fd9fdbf4ce6 Step 31/42 : COPY ./src ./src ---> 034fe0f2fddd Step 32/42 : COPY ./exts ./exts ---> 85c765ab808e Step 33/42 : COPY ./tools/build_utils ./tools/build_utils ---> d9c8e1be30d9 Step 34/42 : RUN /bin/bash -c " mkdir -p arch && ln -vs /opt/cp2k-toolchain/install/arch/local.psmp ./arch/ && echo 'Compiling cp2k...' && source /opt/cp2k-toolchain/install/setup && ( make -j ARCH=local VERSION=psmp &> /dev/null || true ) && ( [ ! -f ./exe/local/cp2k.psmp ] || ldd ./exe/local/cp2k.psmp | grep -q libmpi )" ---> Running in 68f0c607f22c './arch/local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' Compiling cp2k... Removing intermediate container 68f0c607f22c ---> 1cc87ee602e8 Step 35/42 : COPY ./data ./data ---> f13f47c89572 Step 36/42 : COPY ./tests ./tests ---> 585a5c4e0964 Step 37/42 : COPY ./tools/regtesting ./tools/regtesting ---> 498102a0c47e Step 38/42 : COPY ./benchmarks ./benchmarks ---> eed7115c81dc Step 39/42 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> b021797e37e3 Step 40/42 : RUN ./test_performance.sh "local" 2>&1 | tee report.log ---> Running in 60d5db1f3194 ========== Compiling CP2K ========== Compiling cp2k... done. Checking benchmark inputs... Found 60 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.030 0.030 90.714 90.714 qs_mol_dyn_low 1 2.0 0.003 0.003 90.118 90.118 qs_forces 11 3.9 0.001 0.001 90.079 90.079 qs_energies 11 4.9 0.001 0.001 84.049 84.049 scf_env_do_scf 11 5.9 0.001 0.001 73.220 73.220 velocity_verlet 10 3.0 0.002 0.002 57.143 57.143 scf_env_do_scf_inner_loop 108 6.5 0.011 0.011 56.451 56.451 qs_scf_new_mos 108 7.5 0.001 0.001 22.007 22.007 qs_scf_loop_do_ot 108 8.5 0.001 0.001 22.006 22.006 ot_scf_mini 108 9.5 0.002 0.002 20.621 20.621 rebuild_ks_matrix 119 8.3 0.001 0.001 20.259 20.259 qs_ks_build_kohn_sham_matrix 119 9.3 0.012 0.012 20.258 20.258 dbcsr_multiply_generic 2286 12.5 0.159 0.159 19.545 19.545 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.482 19.482 calculate_rho_elec 119 8.7 0.952 0.952 19.481 19.481 qs_ks_update_qs_env 119 7.6 0.001 0.001 18.669 18.669 init_scf_loop 11 6.9 0.000 0.000 16.645 16.645 grid_collocate_task_list 119 9.7 14.952 14.952 14.952 14.952 prepare_preconditioner 11 7.9 0.000 0.000 14.259 14.259 make_preconditioner 11 8.9 0.000 0.000 14.259 14.259 make_full_inverse_cholesky 11 9.9 0.000 0.000 13.139 13.139 sum_up_and_integrate 119 10.3 0.189 0.189 12.559 12.559 integrate_v_rspace 119 11.3 0.093 0.093 12.370 12.370 ot_mini 108 10.5 0.001 0.001 11.745 11.745 make_m2s 4572 13.5 0.047 0.047 10.816 10.816 grid_integrate_task_list 119 12.3 10.423 10.423 10.423 10.423 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 6.442 6.442 qs_ot_get_derivative 108 11.5 0.001 0.001 6.212 6.212 pw_transfer 1439 11.6 0.057 0.057 6.087 6.087 fft_wrap_pw1pw2 1201 12.6 0.006 0.006 5.861 5.861 dbcsr_make_dense_low 5837 15.5 0.068 0.068 5.828 5.828 make_dense_data 5837 16.5 5.156 5.156 5.746 5.746 ot_diis_step 108 11.5 0.004 0.004 5.530 5.530 qs_ot_get_p 119 10.4 0.001 0.001 5.428 5.428 make_images 4572 14.5 2.054 2.054 5.358 5.358 dbcsr_make_images_dense 3978 14.8 0.018 0.018 5.095 5.095 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 4.963 4.963 apply_single 119 13.6 0.000 0.000 4.963 4.963 fft_wrap_pw1pw2_140 487 13.2 0.392 0.392 4.887 4.887 multiply_cannon 2286 13.5 0.183 0.183 4.811 4.811 cp_fm_cholesky_decompose 22 10.9 4.802 4.802 4.802 4.802 multiply_cannon_loop 2286 14.5 0.088 0.088 4.321 4.321 cp_fm_cholesky_invert 11 10.9 4.235 4.235 4.235 4.235 multiply_cannon_multrec 2286 15.5 4.182 4.182 4.232 4.232 qs_ot_p2m_diag 50 11.0 0.154 0.154 4.169 4.169 init_scf_run 11 5.9 0.002 0.002 3.801 3.801 scf_env_initial_rho_setup 11 6.9 0.001 0.001 3.799 3.799 cp_dbcsr_syevd 50 12.0 0.002 0.002 3.770 3.770 dbcsr_copy 2102 12.0 0.225 0.225 3.719 3.719 cp_fm_diag_elpa 50 13.0 0.000 0.000 3.658 3.658 cp_fm_diag_elpa_base 50 14.0 3.597 3.597 3.657 3.657 dbcsr_complete_redistribute 329 12.2 1.959 1.959 3.634 3.634 density_rs2pw 119 9.7 0.004 0.004 3.577 3.577 build_core_hamiltonian_matrix_ 11 4.9 0.000 0.000 3.498 3.498 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.462 3.462 dbcsr_copy_into_existing 22 7.9 3.462 3.462 3.462 3.462 wfi_extrapolate 11 7.9 0.001 0.001 3.284 3.284 qs_create_task_list 11 7.9 0.000 0.000 3.065 3.065 generate_qs_task_list 11 8.9 2.127 2.127 3.065 3.065 copy_dbcsr_to_fm 153 11.3 0.002 0.002 2.965 2.965 fft3d_s 1202 14.6 2.953 2.953 2.958 2.958 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 2.560 2.560 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 2.530 2.530 pw_poisson_solve 119 10.3 0.858 0.858 2.354 2.354 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 2.302 2.302 qs_ot_get_derivative_diag 49 12.0 0.001 0.001 1.979 1.979 qs_ot_get_derivative_taylor 59 13.0 0.001 0.001 1.911 1.911 dbcsr_data_release 279534 16.0 1.898 1.898 1.898 1.898 potential_pw2rs 119 12.3 0.045 0.045 1.853 1.853 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.023 49.052 49.062 qs_mol_dyn_low 1 2.0 0.003 0.007 48.944 48.954 qs_forces 11 3.9 0.001 0.001 48.898 48.898 qs_energies 11 4.9 0.001 0.002 45.573 45.575 scf_env_do_scf 11 5.9 0.000 0.002 41.828 41.829 scf_env_do_scf_inner_loop 108 6.5 0.002 0.017 38.670 38.670 velocity_verlet 10 3.0 0.001 0.003 29.265 29.266 rebuild_ks_matrix 119 8.3 0.000 0.001 18.388 18.546 qs_ks_build_kohn_sham_matrix 119 9.3 0.014 0.020 18.388 18.545 qs_ks_update_qs_env 119 7.6 0.001 0.001 16.393 16.529 dbcsr_multiply_generic 2286 12.5 0.069 0.082 13.880 15.751 qs_rho_update_rho_low 119 7.7 0.001 0.001 14.242 14.253 calculate_rho_elec 119 8.7 0.029 0.031 14.241 14.253 sum_up_and_integrate 119 10.3 0.017 0.019 14.060 14.151 integrate_v_rspace 119 11.3 0.005 0.015 14.044 14.137 qs_scf_new_mos 108 7.5 0.001 0.001 11.015 11.126 qs_scf_loop_do_ot 108 8.5 0.001 0.001 11.015 11.126 grid_collocate_task_list 119 9.7 8.996 10.812 8.996 10.812 multiply_cannon 2286 13.5 0.131 0.143 10.100 10.599 grid_integrate_task_list 119 12.3 8.085 10.539 8.085 10.539 ot_scf_mini 108 9.5 0.002 0.002 10.358 10.460 multiply_cannon_loop 2286 14.5 0.091 0.107 9.546 10.022 mp_waitall_1 169478 16.3 8.734 9.817 8.734 9.817 multiply_cannon_metrocomm3 18288 15.5 0.037 0.046 5.724 6.704 rs_pw_transfer 974 11.9 0.010 0.012 6.060 6.529 ot_mini 108 10.5 0.001 0.001 6.047 6.156 density_rs2pw 119 9.7 0.005 0.006 4.894 5.356 potential_pw2rs 119 12.3 0.006 0.007 3.391 3.427 qs_ot_get_derivative 108 11.5 0.001 0.001 3.178 3.272 mp_waitany 9880 13.7 2.764 3.266 2.764 3.266 pw_transfer 1439 11.6 0.085 0.100 3.193 3.251 multiply_cannon_multrec 18288 15.5 2.943 3.212 2.953 3.221 mp_alltoall_d11v 2130 13.8 2.765 3.188 2.765 3.188 init_scf_loop 11 6.9 0.000 0.000 3.146 3.147 fft_wrap_pw1pw2 1201 12.6 0.009 0.010 3.046 3.090 rs_gather_matrices 119 12.3 0.077 0.087 2.531 2.983 rs_pw_transfer_RS2PW_140 130 11.5 0.277 0.346 2.473 2.952 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 2.763 2.850 apply_single 119 13.6 0.000 0.000 2.762 2.850 ot_diis_step 108 11.5 0.003 0.004 2.842 2.843 fft_wrap_pw1pw2_140 487 13.2 0.246 0.298 2.544 2.642 mp_sum_l 11218 13.2 0.998 2.628 0.998 2.628 init_scf_run 11 5.9 0.000 0.004 2.588 2.588 scf_env_initial_rho_setup 11 6.9 0.000 0.003 2.588 2.588 make_m2s 4572 13.5 0.044 0.055 2.351 2.423 wfi_extrapolate 11 7.9 0.001 0.001 2.361 2.362 fft3d_ps 1201 14.6 1.135 1.264 2.196 2.295 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 2.129 2.151 make_images 4572 14.5 0.117 0.137 2.029 2.101 qs_ot_get_p 119 10.4 0.001 0.001 1.395 1.535 rs_pw_transfer_PW2RS_140 130 13.9 0.520 0.659 1.383 1.450 mp_sum_d 4129 12.0 0.915 1.257 0.915 1.257 make_images_data 4572 15.5 0.034 0.047 1.079 1.246 build_core_hamiltonian_matrix_ 11 4.9 0.000 0.000 0.865 1.163 prepare_preconditioner 11 7.9 0.000 0.000 1.133 1.146 make_preconditioner 11 8.9 0.000 0.000 1.133 1.146 multiply_cannon_metrocomm1 18288 15.5 0.018 0.025 0.582 1.092 qs_ot_get_derivative_taylor 59 13.0 0.001 0.001 1.024 1.084 mp_alltoall_z22v 1201 16.6 0.826 1.075 0.826 1.075 qs_ot_get_derivative_diag 49 12.0 0.001 0.001 1.022 1.061 make_full_inverse_cholesky 11 9.9 0.000 0.000 1.028 1.046 hybrid_alltoall_any 4725 16.4 0.059 0.188 0.932 1.040 rs_pw_transfer_PW2RS_50 119 14.3 0.266 0.318 0.884 1.025 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=46.964, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=14.952, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=10.423, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="make_dense_data", label="make_dense_data", y=5.156, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=4.802, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=4.235, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.182, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=17.528999999999996, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.996, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.085, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=2.943, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=2.765, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.734, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.032 0.032 114.131 114.131 qs_mol_dyn_low 1 2.0 0.003 0.003 113.484 113.484 qs_forces 11 3.9 0.001 0.001 113.445 113.445 qs_energies 11 4.9 0.001 0.001 105.816 105.816 scf_env_do_scf 11 5.9 0.001 0.001 92.982 92.982 scf_env_do_scf_inner_loop 96 6.5 0.010 0.010 74.366 74.366 velocity_verlet 10 3.0 0.002 0.002 72.162 72.162 rebuild_ks_matrix 107 8.3 0.001 0.001 33.521 33.521 qs_ks_build_kohn_sham_matrix 107 9.3 0.011 0.011 33.521 33.521 qs_rho_update_rho_low 107 7.7 0.001 0.001 31.306 31.306 calculate_rho_elec 107 8.7 0.856 0.856 31.306 31.306 qs_ks_update_qs_env 107 7.6 0.001 0.001 30.188 30.188 grid_collocate_task_list 107 9.7 27.266 27.266 27.266 27.266 sum_up_and_integrate 107 10.3 0.169 0.169 26.566 26.566 integrate_v_rspace 107 11.3 0.082 0.082 26.397 26.397 grid_integrate_task_list 107 12.3 24.618 24.618 24.618 24.618 qs_scf_new_mos 96 7.5 0.001 0.001 19.772 19.772 qs_scf_loop_do_ot 96 8.5 0.001 0.001 19.772 19.772 init_scf_loop 11 6.9 0.000 0.000 18.467 18.467 ot_scf_mini 96 9.5 0.002 0.002 18.379 18.379 dbcsr_multiply_generic 1966 12.4 0.139 0.139 17.889 17.889 prepare_preconditioner 11 7.9 0.000 0.000 14.350 14.350 make_preconditioner 11 8.9 0.000 0.000 14.350 14.350 make_full_inverse_cholesky 11 9.9 0.000 0.000 13.202 13.202 ot_mini 96 10.5 0.001 0.001 10.401 10.401 make_m2s 3932 13.4 0.041 0.041 9.807 9.807 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 6.675 6.675 init_scf_run 11 5.9 0.002 0.002 5.509 5.509 scf_env_initial_rho_setup 11 6.9 0.001 0.001 5.508 5.508 pw_transfer 1295 11.6 0.052 0.052 5.503 5.503 qs_ot_get_derivative 96 11.5 0.001 0.001 5.469 5.469 fft_wrap_pw1pw2 1081 12.6 0.006 0.006 5.311 5.311 dbcsr_make_dense_low 4961 15.5 0.082 0.082 5.228 5.228 make_dense_data 4961 16.5 4.613 4.613 5.134 5.134 ot_diis_step 96 11.5 0.003 0.003 4.929 4.929 make_images 3932 14.4 1.885 1.885 4.911 4.911 wfi_extrapolate 11 7.9 0.001 0.001 4.880 4.880 cp_fm_cholesky_decompose 22 10.9 4.767 4.767 4.767 4.767 qs_ot_get_p 107 10.4 0.001 0.001 4.704 4.704 fft_wrap_pw1pw2_140 439 13.2 0.424 0.424 4.572 4.572 dbcsr_make_images_dense 3386 14.7 0.016 0.016 4.559 4.559 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 4.457 4.457 apply_single 107 13.6 0.000 0.000 4.456 4.456 multiply_cannon 1966 13.4 0.148 0.148 4.383 4.383 cp_fm_cholesky_invert 11 10.9 4.268 4.268 4.268 4.268 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.179 4.179 multiply_cannon_loop 1966 14.4 0.130 0.130 3.985 3.985 multiply_cannon_multrec 1966 15.4 3.807 3.807 3.854 3.854 dbcsr_complete_redistribute 317 12.2 1.916 1.916 3.834 3.834 dbcsr_copy 1855 11.9 0.189 0.189 3.688 3.688 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.687 3.687 qs_ot_p2m_diag 44 11.0 0.143 0.143 3.535 3.535 build_core_hamiltonian_matrix_ 11 4.9 0.000 0.000 3.447 3.447 dbcsr_copy_into_existing 22 7.9 3.436 3.436 3.437 3.437 qs_create_task_list 11 7.9 0.000 0.000 3.309 3.309 generate_qs_task_list 11 8.9 2.396 2.396 3.309 3.309 density_rs2pw 107 9.7 0.003 0.003 3.183 3.183 cp_dbcsr_syevd 44 12.0 0.002 0.002 3.160 3.160 copy_dbcsr_to_fm 147 11.2 0.002 0.002 3.123 3.123 cp_fm_diag_elpa 44 13.0 0.000 0.000 3.032 3.032 cp_fm_diag_elpa_base 44 14.0 2.977 2.977 3.031 3.031 fft3d_s 1082 14.6 2.656 2.656 2.661 2.661 build_core_hamiltonian_matrix 11 6.9 0.000 0.000 2.572 2.572 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 2.322 2.322 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.021 82.907 82.918 qs_mol_dyn_low 1 2.0 0.003 0.004 82.814 82.818 qs_forces 11 3.9 0.001 0.001 82.774 82.775 qs_energies 11 4.9 0.001 0.001 77.202 77.204 scf_env_do_scf 11 5.9 0.000 0.002 71.501 71.502 scf_env_do_scf_inner_loop 96 6.5 0.002 0.016 66.290 66.290 velocity_verlet 10 3.0 0.001 0.003 49.316 49.317 rebuild_ks_matrix 107 8.3 0.000 0.000 36.497 36.604 qs_ks_build_kohn_sham_matrix 107 9.3 0.013 0.017 36.496 36.604 sum_up_and_integrate 107 10.3 0.016 0.020 32.394 32.438 integrate_v_rspace 107 11.3 0.004 0.005 32.378 32.424 qs_ks_update_qs_env 107 7.6 0.001 0.001 32.267 32.366 qs_rho_update_rho_low 107 7.7 0.000 0.001 30.846 30.863 calculate_rho_elec 107 8.7 0.027 0.027 30.846 30.863 grid_integrate_task_list 107 12.3 22.944 29.020 22.944 29.020 grid_collocate_task_list 107 9.7 21.960 27.827 21.960 27.827 dbcsr_multiply_generic 1966 12.4 0.062 0.070 12.794 13.016 rs_pw_transfer 878 11.9 0.009 0.011 9.702 10.507 qs_scf_new_mos 96 7.5 0.000 0.001 10.023 10.149 qs_scf_loop_do_ot 96 8.5 0.001 0.001 10.022 10.149 multiply_cannon 1966 13.4 0.114 0.138 9.453 9.861 ot_scf_mini 96 9.5 0.002 0.002 9.416 9.548 density_rs2pw 107 9.7 0.004 0.005 8.542 9.358 multiply_cannon_loop 1966 14.4 0.087 0.104 8.879 9.323 mp_waitall_1 146670 16.2 8.051 9.020 8.051 9.020 mp_waitany 8968 13.7 6.762 7.464 6.762 7.464 mp_alltoall_d11v 1998 13.7 6.459 7.387 6.459 7.387 rs_pw_transfer_RS2PW_140 118 11.5 0.227 0.272 6.396 7.197 rs_gather_matrices 107 12.3 0.072 0.089 6.222 7.137 multiply_cannon_metrocomm3 15728 15.4 0.034 0.042 5.359 6.305 ot_mini 96 10.5 0.001 0.001 5.550 5.696 init_scf_loop 11 6.9 0.000 0.001 5.199 5.199 init_scf_run 11 5.9 0.000 0.004 4.473 4.473 scf_env_initial_rho_setup 11 6.9 0.000 0.003 4.472 4.473 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.349 4.359 wfi_extrapolate 11 7.9 0.001 0.001 4.062 4.062 potential_pw2rs 107 12.3 0.005 0.006 3.165 3.202 multiply_cannon_multrec 15728 15.4 2.733 3.047 2.742 3.056 pw_transfer 1295 11.6 0.079 0.094 2.961 3.047 qs_ot_get_derivative 96 11.5 0.001 0.001 2.815 2.946 fft_wrap_pw1pw2 1081 12.6 0.008 0.009 2.826 2.908 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 2.590 2.744 apply_single 107 13.6 0.000 0.000 2.590 2.744 ot_diis_step 96 11.5 0.003 0.003 2.708 2.709 fft_wrap_pw1pw2_140 439 13.2 0.226 0.282 2.384 2.536 make_m2s 3932 13.4 0.039 0.048 2.183 2.273 fft3d_ps 1081 14.6 1.036 1.202 2.034 2.147 make_images 3932 14.4 0.105 0.123 1.893 1.972 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=48.599000000000004, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=27.266, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=24.618, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=4.767, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="make_dense_data", label="make_dense_data", y=4.613, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=4.268, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=16.730999999999995, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.96, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=22.944, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=6.762, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=6.459, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.051, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.177 0.177 111.904 111.904 qs_energies 1 2.0 0.000 0.000 111.096 111.096 scf_env_do_scf 1 3.0 0.000 0.000 109.955 109.955 qs_ks_update_qs_env 8 5.0 0.000 0.000 104.592 104.592 rebuild_ks_matrix 7 6.0 0.000 0.000 104.536 104.536 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.001 104.536 104.536 hfx_ks_matrix 7 8.0 0.000 0.000 95.137 95.137 integrate_four_center 7 9.0 1.412 1.412 95.117 95.117 integrate_four_center_main 7 10.0 1.023 1.023 81.565 81.565 integrate_four_center_bin 451 11.0 80.542 80.542 80.542 80.542 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 57.711 57.711 init_scf_loop 1 4.0 0.000 0.000 52.233 52.233 integrate_four_center_load 7 10.0 0.000 0.000 11.737 11.737 hfx_load_balance 1 11.0 0.002 0.002 11.737 11.737 hfx_load_balance_count 1 12.0 5.865 5.865 5.865 5.865 hfx_load_balance_bin 1 12.0 5.854 5.854 5.854 5.854 qs_vxc_create 14 8.0 0.000 0.000 3.731 3.731 xc_vxc_pw_create 14 9.0 0.116 0.116 3.731 3.731 xc_rho_set_and_dset_create 14 10.0 0.085 0.085 2.971 2.971 prepare_preconditioner 1 5.0 0.000 0.000 2.474 2.474 make_preconditioner 1 6.0 0.000 0.000 2.474 2.474 calculate_rho_elec 15 7.4 0.117 0.117 2.456 2.456 xc_functional_eval 35 11.0 0.000 0.000 2.336 2.336 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.196 0.217 132.499 132.510 qs_energies 1 2.0 0.000 0.000 132.195 132.196 scf_env_do_scf 1 3.0 0.000 0.000 131.837 131.838 qs_ks_update_qs_env 8 5.0 0.000 0.000 129.782 129.782 rebuild_ks_matrix 7 6.0 0.000 0.000 129.774 129.774 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.004 129.774 129.774 hfx_ks_matrix 7 8.0 0.000 0.000 123.618 123.619 integrate_four_center 7 9.0 0.047 0.322 123.609 123.611 integrate_four_center_main 7 10.0 0.003 0.004 79.746 111.412 integrate_four_center_bin 448 11.0 79.743 111.408 79.743 111.408 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 74.569 74.569 init_scf_loop 1 4.0 0.000 0.000 57.267 57.267 mp_sync 70 11.3 31.685 34.211 31.685 34.211 integrate_four_center_load 7 10.0 0.000 0.000 11.527 11.531 hfx_load_balance 1 11.0 0.001 0.001 11.527 11.531 mp_sum_l 1135 8.3 5.840 6.186 5.840 6.186 hfx_load_balance_dist 1 12.0 0.000 0.000 5.705 5.997 hfx_load_balance_bin 1 12.0 2.868 5.728 2.868 5.728 hfx_load_balance_count 1 12.0 2.871 5.716 2.871 5.716 qs_vxc_create 14 8.0 0.000 0.000 3.015 3.015 xc_vxc_pw_create 14 9.0 0.008 0.009 3.015 3.015 xc_rho_set_and_dset_create 14 10.0 0.009 0.012 1.919 2.666 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=17.208, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=80.542, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=5.865, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=5.854, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=1.412, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=1.023, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=9.442000000000007, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=79.743, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=2.871, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=2.868, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.047, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.003, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=31.685, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=5.84, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.013 79.397 79.397 qs_energies 1 2.0 0.000 0.000 79.021 79.021 mp2_main 1 3.0 0.000 0.000 75.200 75.200 mp2_gpw_main 1 4.0 0.000 0.000 75.031 75.031 rpa_ri_compute_en 1 5.0 0.000 0.000 71.760 71.760 rpa_num_int 1 6.0 0.001 0.001 71.755 71.755 compute_mat_P_omega 1 7.0 0.003 0.003 61.451 61.451 compute_mat_P_omega_contract 10 8.0 8.817 8.817 61.255 61.255 dbt_total 2336 9.6 0.011 0.011 47.795 47.795 dbt_contract 787 11.0 0.033 0.033 41.045 41.045 dbt_tas_total 1149 12.2 0.209 0.209 39.860 39.860 dbt_tas_multiply 807 12.1 0.002 0.002 38.576 38.576 dbt_tas_dbm 807 14.1 0.003 0.003 31.914 31.914 dbm_multiply 807 16.1 31.905 31.905 31.905 31.905 dbt_tas_mm_1N 524 15.1 0.001 0.001 23.706 23.706 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 22.420 22.420 compute_mat_P_omega_calc_M_occ 250 9.0 8.840 8.840 17.002 17.002 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 7.512 7.512 dbt_tas_mm_2 251 15.0 0.001 0.001 6.556 6.556 dbt_copy 1103 10.7 0.122 0.122 5.450 5.450 compute_QP_energies 1 7.0 0.000 0.000 5.233 5.233 compute_self_energy_cubic_gw 1 8.0 0.055 0.055 5.232 5.232 contract_cubic_gw 21 9.0 0.000 0.000 4.222 4.222 scf_env_do_scf 1 3.0 0.000 0.000 3.713 3.713 scf_env_do_scf_inner_loop 17 4.0 0.002 0.002 3.713 3.713 dbt_tas_reserve_blocks_index 3261 14.3 0.150 0.150 3.286 3.286 mp2_ri_gpw_compute_in 1 5.0 0.000 0.000 3.263 3.263 dbm_reserve_blocks 3628 15.3 3.208 3.208 3.208 3.208 dbt_reserve_blocks_index 2280 13.1 0.050 0.050 2.523 2.523 dbt_reserve_blocks_index_array 2222 12.2 0.010 0.010 2.515 2.515 convert_to_new_pgrid 2421 14.1 0.102 0.102 2.472 2.472 dbm_copy 1614 15.1 2.370 2.370 2.370 2.370 dbt_crop 1042 12.0 1.559 1.559 2.366 2.366 qs_scf_new_mos 17 5.0 0.000 0.000 2.141 2.141 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 2.120 2.120 dbt_tas_copy 574 11.4 1.324 1.324 2.109 2.109 compute_W_cubic_GW 10 7.0 0.004 0.004 2.096 2.096 dbt_tas_reshape 367 15.0 0.006 0.006 2.002 2.002 eigensolver 18 5.9 0.001 0.001 1.934 1.934 cp_fm_diag_elpa 18 6.9 0.000 0.000 1.822 1.822 cp_fm_diag_elpa_base 18 7.9 1.787 1.787 1.822 1.822 dbt_reshape 278 11.9 0.969 0.969 1.815 1.815 get_2c_integrals 1 6.0 0.000 0.000 1.737 1.737 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.025 32.740 32.750 qs_energies 1 2.0 0.000 0.001 32.633 32.634 mp2_main 1 3.0 0.000 0.001 31.621 31.622 mp2_gpw_main 1 4.0 0.000 0.000 31.584 31.585 rpa_ri_compute_en 1 5.0 0.000 0.000 30.363 30.364 rpa_num_int 1 6.0 0.000 0.002 30.362 30.364 dbt_total 2336 9.6 0.011 0.013 27.001 27.006 compute_mat_P_omega 1 7.0 0.001 0.005 25.838 25.847 compute_mat_P_omega_contract 10 8.0 0.394 0.426 25.707 25.713 dbt_contract 787 11.0 0.025 0.037 20.401 20.412 dbt_tas_total 1149 12.2 0.049 0.071 18.227 18.227 dbt_tas_multiply 807 12.1 0.002 0.002 18.188 18.191 dbt_tas_dbm 807 14.1 0.003 0.003 13.088 13.117 dbm_multiply 807 16.1 9.760 10.870 9.760 10.870 compute_mat_P_omega_calc_M_occ 250 9.0 0.376 0.414 7.810 7.810 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 7.365 7.365 mp_sync 8706 11.6 5.604 7.304 5.604 7.304 dbt_tas_mm_2 251 15.0 0.001 0.001 6.011 6.027 dbt_copy 1111 10.7 0.011 0.012 5.540 5.898 dbt_reshape 1098 11.7 2.080 2.700 5.280 5.624 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 5.609 5.610 dbt_tas_mm_1N 524 15.1 0.001 0.002 4.512 5.387 mp_waitall_2 3776 15.3 2.703 2.919 2.703 2.919 compute_QP_energies 1 7.0 0.000 0.000 2.915 2.916 compute_self_energy_cubic_gw 1 8.0 0.002 0.003 2.912 2.915 dbt_communicate_buffer 1098 12.7 0.053 0.072 2.544 2.724 contract_cubic_gw 21 9.0 0.000 0.000 2.293 2.293 dbt_crop 1042 12.0 0.874 1.328 1.329 1.832 dbt_reserve_blocks_index_array 2791 12.2 0.009 0.012 1.474 1.825 dbt_reserve_blocks_index 2849 13.1 0.064 0.079 1.473 1.822 dbt_tas_reserve_blocks_index 3300 14.5 0.114 0.160 1.442 1.787 dbm_reserve_blocks 3696 15.4 1.417 1.756 1.417 1.756 dbt_tas_replicate 396 14.1 0.531 0.916 1.458 1.604 mp2_ri_gpw_compute_in 1 5.0 0.000 0.001 1.218 1.220 compute_mat_P_omega_copy_M_vir 250 9.0 0.001 0.001 1.041 1.056 convert_to_new_pgrid 2421 14.1 0.024 0.036 0.782 1.023 mp_max_i 1992 9.8 0.763 1.001 0.763 1.001 dbm_copy 1608 15.1 0.752 0.994 0.752 0.994 cp_gemm 105 8.4 0.000 0.000 0.973 0.981 cp_gemm_cosma 105 9.4 0.972 0.981 0.972 0.981 scf_env_do_scf 1 3.0 0.000 0.000 0.974 0.974 scf_env_do_scf_inner_loop 17 4.0 0.000 0.002 0.974 0.974 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 0.914 0.927 mp_sum_l 6085 13.0 0.555 0.766 0.555 0.766 compute_W_cubic_GW 10 7.0 0.001 0.001 0.734 0.741 dbm_add 807 14.1 0.581 0.707 0.581 0.707 dbt_tas_communicate_buffer 792 16.5 0.024 0.031 0.594 0.682 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=23.288000000000004, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=31.905, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=8.84, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=8.817, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=3.208, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_copy", label="dbm_copy", y=2.37, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbt_reshape", label="dbt_reshape", y=0.969, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=9.654000000000003, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=9.76, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=0.376, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=0.394, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=1.417, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_copy", label="dbm_copy", y=0.752, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbt_reshape", label="dbt_reshape", y=2.08, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=5.604, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.703, yerr=0.0 Running RI-HFX_H2O-32.inp with 1 threads and 32 ranks... done. Running RI-HFX_H2O-32.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.018 328.915 328.915 qs_forces 1 2.0 0.000 0.000 328.347 328.347 rebuild_ks_matrix 7 6.6 0.000 0.000 326.834 326.834 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.001 326.834 326.834 hfx_ks_matrix 7 8.6 0.000 0.000 324.954 324.954 dbt_total 4903 11.6 0.028 0.028 271.760 271.760 hfx_ri_update_ks 7 9.6 0.000 0.000 267.220 267.220 hfx_ri_update_ks_Pmat 7 10.6 31.940 31.940 267.218 267.218 dbt_tas_total 2391 14.1 1.511 1.511 245.591 245.591 qs_energies 1 3.0 0.000 0.000 236.534 236.534 scf_env_do_scf 1 4.0 0.000 0.000 236.219 236.219 qs_ks_update_qs_env 8 6.0 0.000 0.000 235.099 235.099 dbt_contract 1473 13.0 0.156 0.156 225.702 225.702 dbt_tas_multiply 1482 14.0 0.005 0.005 215.422 215.422 hfx_ri_update_ks_Pmat_KS 567 11.6 0.004 0.004 203.149 203.149 dbt_tas_dbm 1482 16.0 0.006 0.006 190.227 190.227 dbm_multiply 1482 18.0 190.209 190.209 190.209 190.209 dbt_tas_mm_2 649 17.1 0.004 0.004 174.300 174.300 scf_env_do_scf_inner_loop 6 5.0 0.001 0.001 149.318 149.318 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 91.738 91.738 init_scf_loop 2 5.0 0.000 0.000 86.899 86.899 hfx_ri_update_forces 1 7.0 1.603 1.603 57.731 57.731 hfx_ri_forces_Pmat_3c 1 8.0 1.565 1.565 40.165 40.165 dbt_tas_reshape 906 14.4 0.034 0.034 23.403 23.403 dbt_copy 2373 12.3 0.192 0.192 15.848 15.848 dbt_tas_merge 649 14.1 12.949 12.949 14.502 14.502 dbt_tas_reshape_buffer_fill 906 15.4 13.535 13.535 13.535 13.535 precalc_derivatives 1 8.0 2.204 2.204 13.283 13.283 hfx_ri_update_ks_Pmat_Px3C 567 11.6 0.002 0.002 11.935 11.935 hfx_ri_pre_scf_Pmat 1 12.0 0.001 0.001 11.377 11.377 dbt_tas_mm_3T 659 17.1 0.002 0.002 10.580 10.580 dbm_reserve_blocks 8426 16.7 10.472 10.472 10.472 10.472 dbt_crop 2763 14.2 7.087 7.087 10.019 10.019 dbt_tas_reserve_blocks_index 7520 15.9 0.333 0.333 9.803 9.803 dbt_reshape 850 13.9 5.207 5.207 8.861 8.861 reshape_mm_small 906 15.6 0.136 0.136 8.077 8.077 dbt_tas_replicate 906 15.6 5.720 5.720 7.776 7.776 build_3c_derivatives 9 9.0 3.261 3.261 7.462 7.462 dbt_reserve_blocks_index 5073 15.1 0.123 0.123 7.140 7.140 dbt_reserve_blocks_index_array 5038 14.1 0.019 0.019 7.070 7.070 dbt_tas_reshape_buffer_obtain 906 15.4 6.061 6.061 6.880 6.880 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-HFX_H2O-32_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.022 57.326 57.337 qs_forces 1 2.0 0.000 0.000 57.143 57.143 rebuild_ks_matrix 7 6.6 0.000 0.000 56.463 56.464 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.003 56.463 56.464 hfx_ks_matrix 7 8.6 0.000 0.000 55.495 55.503 dbt_total 4903 11.6 0.026 0.032 49.605 49.614 hfx_ri_update_ks 7 9.6 0.000 0.000 38.910 38.910 hfx_ri_update_ks_Pmat 7 10.6 1.379 2.092 38.909 38.909 dbt_contract 1473 13.0 0.089 0.101 38.891 38.904 dbt_tas_total 2391 14.1 0.112 0.119 36.727 36.730 qs_energies 1 3.0 0.000 0.000 35.773 35.773 scf_env_do_scf 1 4.0 0.000 0.001 35.615 35.615 qs_ks_update_qs_env 8 6.0 0.000 0.000 35.108 35.108 dbt_tas_multiply 1482 14.0 0.005 0.005 32.696 32.701 dbt_tas_dbm 1482 16.0 0.005 0.005 25.105 25.155 hfx_ri_update_ks_Pmat_KS 567 11.6 0.004 0.004 21.674 21.675 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 21.357 21.357 dbm_multiply 1482 18.0 16.662 21.171 16.662 21.171 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 20.690 20.690 hfx_ri_update_forces 1 7.0 0.060 0.071 16.585 16.593 mp_sync 17597 13.5 12.957 16.038 12.957 16.038 dbt_tas_mm_2 649 17.1 0.003 0.003 15.223 15.272 init_scf_loop 2 5.0 0.000 0.000 14.924 14.924 hfx_ri_forces_Pmat_3c 1 8.0 0.050 0.063 11.940 11.973 hfx_ri_update_ks_Pmat_Px3C 567 11.6 0.002 0.002 7.826 7.827 dbt_copy 2391 12.4 0.029 0.037 5.432 5.921 dbt_crop 2763 14.2 2.850 3.932 3.522 4.668 dbt_tas_mm_3T 659 17.1 0.002 0.002 4.009 4.397 dbt_reshape 1250 13.5 1.874 2.057 3.879 4.167 dbt_tas_mm_3N 163 16.5 0.000 0.001 3.805 3.879 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 3.811 3.811 precalc_derivatives 1 8.0 0.092 0.126 3.457 3.457 dbt_tas_merge 649 14.1 1.546 2.188 2.705 2.978 mp_waitall_2 5961 16.5 2.688 2.915 2.688 2.915 mp_max_i 3372 12.5 1.982 2.444 1.982 2.444 dbm_reserve_blocks 8460 16.8 1.911 2.314 1.911 2.314 dbt_tas_reserve_blocks_index 7551 16.0 0.261 0.341 1.842 2.276 dbt_tas_replicate 909 15.6 0.585 0.722 2.140 2.208 hfx_ri_pre_scf_Pmat_RIx3C 81 13.0 0.000 0.000 2.152 2.161 dbt_tas_communicate_buffer 1825 16.3 0.058 0.065 1.955 2.155 build_3c_derivatives 9 9.0 0.229 0.371 1.967 1.974 dbt_reserve_blocks_index 5473 15.1 0.113 0.140 1.519 1.850 dbt_reserve_blocks_index_array 5438 14.1 0.013 0.015 1.517 1.845 mp_alltoall_i 4327 15.3 1.566 1.737 1.566 1.737 dbt_tas_reshape 916 14.4 0.008 0.008 1.609 1.710 mp_sum_l 38255 15.3 1.188 1.542 1.188 1.542 dbt_communicate_buffer 1250 14.5 0.041 0.051 1.350 1.457 hfx_ri_update_ks_Pmat_copy_2 567 11.6 0.001 0.002 1.416 1.422 convert_to_new_pgrid 4446 16.0 0.036 0.044 1.181 1.381 dbm_copy 3043 16.9 1.145 1.346 1.145 1.346 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_32omp", title="Timings of RI-HFX_H2O-32 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="rest", label="rest", y=62.72300000000001, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=190.209, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=31.94, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_tas_reshape_buffer_fill", label="dbt_tas_reshape_buffer_fill", y=13.535, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_tas_merge", label="dbt_tas_merge", y=12.949, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.472, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_crop", label="dbt_crop", y=7.087, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_max_i", label="mp_max_i", y=0.0, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_32mpi", title="Timings of RI-HFX_H2O-32 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="rest", label="rest", y=15.350999999999999, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=16.662, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=1.379, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_tas_reshape_buffer_fill", label="dbt_tas_reshape_buffer_fill", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_tas_merge", label="dbt_tas_merge", y=1.546, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=1.911, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_crop", label="dbt_crop", y=2.85, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.688, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_sync", label="mp_sync", y=12.957, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_max_i", label="mp_max_i", y=1.982, yerr=0.0 Running RI-MP2_ammonia.inp with 1 threads and 32 ranks... done. Running RI-MP2_ammonia.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.014 0.014 175.038 175.038 qs_energies 1 2.0 0.000 0.000 174.855 174.855 mp2_main 1 3.0 0.000 0.000 169.731 169.731 mp2_gpw_main 1 4.0 0.001 0.001 169.331 169.331 mp2_ri_gpw_compute_in 1 5.0 0.370 0.370 121.620 121.620 mp2_ri_gpw_compute_in_loop 1 6.0 0.009 0.009 110.502 110.502 mp2_eri_3c_integrate_gpw 2656 7.0 0.012 0.012 82.149 82.149 integrate_v_rspace 2666 8.0 0.635 0.635 68.620 68.620 grid_integrate_task_list 2666 9.0 65.795 65.795 65.795 65.795 mp2_ri_gpw_compute_en 1 5.0 0.080 0.080 47.685 47.685 mp2_ri_gpw_compute_en_RI_loop 1 6.0 9.868 9.868 45.912 45.912 mp2_ri_gpw_compute_en_expansio 2080 7.0 2.182 2.182 28.494 28.494 offload_gemm 2080 8.0 26.312 26.312 26.312 26.312 dbcsr_multiply_generic 5322 8.0 0.184 0.184 21.721 21.721 ao_to_mo_and_store_B_mult_1 2656 7.0 0.009 0.009 21.685 21.685 calculate_wavefunction 2656 8.0 8.159 8.159 12.226 12.226 pw_transfer 63872 10.6 0.931 0.931 11.773 11.773 multiply_cannon 5322 9.0 0.437 0.437 10.931 10.931 get_2c_integrals 1 6.0 0.000 0.000 10.746 10.746 fft_wrap_pw1pw2 53228 11.4 0.110 0.110 10.609 10.609 compute_2c_integrals 1 7.0 0.006 0.006 9.883 9.883 compute_2c_integrals_loop_lm 1 8.0 0.011 0.011 9.864 9.864 mp2_eri_2c_integrate_gpw 1 9.0 3.010 3.010 9.853 9.853 multiply_cannon_loop 5322 10.0 0.347 0.347 9.542 9.542 make_m2s 10644 9.0 0.061 0.061 8.317 8.317 make_images 10644 10.0 3.121 3.121 7.968 7.968 multiply_cannon_multrec 5322 11.0 7.700 7.700 7.736 7.736 fft_wrap_pw1pw2_20 21271 12.4 0.551 0.551 7.425 7.425 fft3d_s 53229 13.4 6.776 6.776 6.811 6.811 ao_to_mo_and_store_B_E_Ex_1 2656 7.0 2.174 2.174 6.566 6.566 mp2_ri_gpw_compute_en_ener 2080 7.0 5.738 5.738 5.738 5.738 copy_dbcsr_to_fm 2679 8.0 0.025 0.025 4.876 4.876 scf_env_do_scf 1 3.0 0.000 0.000 4.726 4.726 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 4.726 4.726 potential_pw2rs 5322 10.0 0.140 0.140 4.176 4.176 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-MP2_ammonia_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.021 40.446 40.456 qs_energies 1 2.0 0.000 0.001 40.377 40.377 mp2_main 1 3.0 0.000 0.001 38.058 38.059 mp2_gpw_main 1 4.0 0.001 0.002 37.938 37.939 mp2_ri_gpw_compute_in 1 5.0 0.043 0.043 17.396 22.656 mp2_ri_gpw_compute_in_loop 1 6.0 0.001 0.001 16.033 21.295 mp2_ri_gpw_compute_en 1 5.0 0.164 0.172 20.465 21.044 mp2_eri_3c_integrate_gpw 83 7.0 0.001 0.001 13.977 19.341 integrate_v_rspace 93 8.1 0.100 0.109 13.902 19.170 grid_integrate_task_list 93 9.1 13.537 18.864 13.537 18.864 mp2_ri_gpw_compute_en_RI_loop 1 6.0 0.699 0.877 14.161 14.190 mp2_ri_gpw_compute_en_expansio 65 7.0 0.065 0.084 10.696 10.951 offload_gemm 65 8.0 10.630 10.884 10.630 10.884 mp_min_d 2 7.0 5.295 5.937 5.295 5.937 mp2_ri_get_integ_group_size 1 6.0 0.000 0.000 5.262 5.841 mp2_ri_gpw_compute_en_comm 17 7.0 0.091 0.132 2.434 2.907 mp_sendrecv_dm3 510 8.0 1.899 2.544 1.899 2.544 scf_env_do_scf 1 3.0 0.000 0.000 2.191 2.191 scf_env_do_scf_inner_loop 10 4.0 0.000 0.002 2.191 2.191 dbcsr_multiply_generic 176 8.0 0.008 0.009 1.768 2.070 ao_to_mo_and_store_B_mult_1 83 7.0 0.001 0.001 1.751 2.056 get_2c_integrals 1 6.0 0.000 0.000 1.294 1.320 qs_scf_new_mos 10 5.0 0.000 0.000 1.119 1.176 multiply_cannon 176 9.0 0.014 0.016 1.046 1.172 multiply_cannon_loop 176 10.0 0.002 0.002 0.990 1.114 eigensolver 11 5.8 0.001 0.001 1.069 1.071 compute_2c_integrals 1 7.0 0.002 0.003 1.002 1.020 multiply_cannon_multrec 246 11.0 0.852 0.936 0.857 0.943 compute_2c_integrals_loop_lm 1 8.0 0.001 0.003 0.750 0.940 mp2_eri_2c_integrate_gpw 1 9.0 0.204 0.319 0.749 0.939 cp_fm_diag_elpa 11 6.8 0.000 0.000 0.889 0.891 cp_fm_redistribute_end 11 7.8 0.335 0.883 0.347 0.884 make_m2s 352 9.0 0.003 0.003 0.685 0.877 make_images 352 10.0 0.051 0.059 0.673 0.865 cp_fm_diag_elpa_base 11 7.8 0.522 0.841 0.533 0.855 pw_transfer 2120 10.5 0.040 0.050 0.765 0.844 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_32omp", title="Timings of RI-MP2_ammonia with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="rest", label="rest", y=57.20400000000001, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=65.795, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="offload_gemm", label="offload_gemm", y=26.312, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=9.868, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="calculate_wavefunction", label="calculate_wavefunction", y=8.159, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=7.7, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp_min_d", label="mp_min_d", y=0.0, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=0.0, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_32mpi", title="Timings of RI-MP2_ammonia with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="rest", label="rest", y=7.533999999999992, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=13.537, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="offload_gemm", label="offload_gemm", y=10.63, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=0.699, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="calculate_wavefunction", label="calculate_wavefunction", y=0.0, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.852, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp_min_d", label="mp_min_d", y=5.295, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=1.899, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.083 0.083 126.743 126.743 qs_energies 1 2.0 0.000 0.000 125.361 125.361 scf_env_do_scf 1 3.0 0.000 0.000 118.577 118.577 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 118.577 118.577 qs_ks_update_qs_env 15 5.0 0.000 0.000 49.418 49.418 rebuild_ks_matrix 15 6.0 0.000 0.000 49.208 49.208 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.002 49.208 49.208 qs_scf_new_mos 15 5.0 0.000 0.000 44.464 44.464 eigensolver 15 6.0 0.001 0.001 37.037 37.037 qs_vxc_create 15 8.0 0.029 0.029 33.864 33.864 calculate_dispersion_nonloc 15 9.0 7.289 7.289 29.413 29.413 cp_fm_diag_elpa 15 7.0 0.000 0.000 23.564 23.564 cp_fm_diag_elpa_base 15 8.0 20.862 20.862 23.564 23.564 pw_transfer 1191 10.0 0.056 0.056 22.153 22.153 fft_wrap_pw1pw2 1086 11.0 0.009 0.009 21.984 21.984 qs_rho_update_rho_low 16 5.0 0.000 0.000 21.836 21.836 calculate_rho_elec 16 6.0 0.214 0.214 21.836 21.836 grid_collocate_task_list 16 7.0 20.556 20.556 20.556 20.556 fft_wrap_pw1pw2_150 765 12.0 3.204 3.204 16.014 16.014 sum_up_and_integrate 15 8.0 0.041 0.041 13.951 13.951 integrate_v_rspace 15 9.0 0.019 0.019 13.911 13.911 grid_integrate_task_list 15 10.0 13.389 13.389 13.389 13.389 cp_fm_cholesky_restore 45 7.0 11.104 11.104 11.104 11.104 fft3d_s 1087 13.0 10.348 10.348 10.355 10.355 pw_scatter_s 585 13.1 6.539 6.539 6.539 6.539 fft_wrap_pw1pw2_200 197 12.3 0.681 0.681 5.790 5.790 copy_dbcsr_to_fm 16 5.9 0.001 0.001 5.460 5.460 dbcsr_complete_redistribute 46 8.3 2.235 2.235 5.288 5.288 cp_fm_upper_to_full 30 8.0 5.069 5.069 5.069 5.069 vdW_energy 15 10.0 4.431 4.431 4.431 4.431 xc_vxc_pw_create 15 9.0 0.221 0.221 4.421 4.421 gspace_mixing 14 5.0 0.171 0.171 4.081 4.081 broyden_mixing 14 6.0 3.474 3.474 3.474 3.474 init_scf_run 1 3.0 0.000 0.000 3.266 3.266 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 3.069 3.069 xc_pw_derive 90 11.0 0.001 0.001 2.761 2.761 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.026 63.218 63.229 qs_energies 1 2.0 0.000 0.001 62.969 62.974 scf_env_do_scf 1 3.0 0.000 0.001 58.849 58.850 scf_env_do_scf_inner_loop 15 4.0 0.001 0.003 58.849 58.850 qs_ks_update_qs_env 15 5.0 0.000 0.000 24.609 24.637 rebuild_ks_matrix 15 6.0 0.000 0.000 24.576 24.604 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.003 24.576 24.604 qs_rho_update_rho_low 16 5.0 0.000 0.000 20.981 20.986 calculate_rho_elec 16 6.0 0.007 0.007 20.981 20.986 grid_collocate_task_list 16 7.0 19.352 20.041 19.352 20.041 sum_up_and_integrate 15 8.0 0.006 0.009 14.368 14.427 integrate_v_rspace 15 9.0 0.001 0.001 14.362 14.424 qs_scf_new_mos 15 5.0 0.000 0.000 13.964 14.091 grid_integrate_task_list 15 10.0 12.992 13.620 12.992 13.620 eigensolver 15 6.0 0.001 0.001 12.975 13.023 qs_vxc_create 15 8.0 0.001 0.001 9.921 9.928 cp_fm_diag_elpa 15 7.0 0.000 0.000 9.295 9.304 cp_fm_diag_elpa_base 15 8.0 9.132 9.178 9.280 9.284 calculate_dispersion_nonloc 15 9.0 0.963 1.775 8.163 8.188 pw_transfer 1191 10.0 0.071 0.081 7.529 7.657 fft_wrap_pw1pw2 1086 11.0 0.010 0.016 7.380 7.526 fft3d_ps 1086 13.0 2.269 2.541 5.814 6.097 fft_wrap_pw1pw2_150 765 12.0 0.224 0.272 5.075 5.153 mp_alltoall_z22v 1086 15.0 3.002 3.712 3.002 3.712 cp_fm_cholesky_restore 45 7.0 3.533 3.607 3.533 3.607 yz_to_x 501 13.9 0.207 0.268 2.282 2.689 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 2.466 2.467 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.147 2.352 fft_wrap_pw1pw2_200 197 12.3 0.164 0.201 2.185 2.251 rs_pw_transfer 158 9.4 0.001 0.002 1.581 1.869 xc_vxc_pw_create 15 9.0 0.014 0.019 1.757 1.789 density_rs2pw 16 7.0 0.001 0.001 1.500 1.710 mp_waitany 520 11.3 1.154 1.525 1.154 1.525 x_to_yz 585 14.1 0.314 0.345 1.241 1.516 init_scf_run 1 3.0 0.000 0.000 1.429 1.430 build_core_ppnl 1 5.0 1.255 1.384 1.255 1.384 vdW_energy 15 10.0 1.308 1.365 1.308 1.365 scf_env_initial_rho_setup 1 4.0 0.000 0.000 1.316 1.317 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=50.483999999999995, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=20.862, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=20.556, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=13.389, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=11.104, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=10.348, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=15.207, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=9.132, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=19.352, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=12.992, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=3.533, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.002, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.078 0.078 295.982 295.982 qs_energies 1 2.0 0.000 0.000 295.843 295.843 ls_scf 1 3.0 0.000 0.000 294.594 294.594 ls_scf_main 1 4.0 0.002 0.002 285.380 285.380 density_matrix_trs4 11 5.0 0.012 0.012 192.497 192.497 arnoldi_extremal 12 6.1 0.000 0.000 107.986 107.986 arnoldi_normal_ev 12 7.1 0.014 0.014 107.986 107.986 dbcsr_matrix_vector_mult 652 9.0 0.164 0.164 106.168 106.168 build_subspace 23 8.1 0.076 0.076 106.145 106.145 dbcsr_matrix_vector_mult_local 652 10.0 104.825 104.825 104.834 104.834 ls_scf_dm_to_ks 11 5.0 0.000 0.000 87.621 87.621 matrix_ls_to_qs 11 6.0 0.000 0.000 84.418 84.418 dbcsr_multiply_generic 185 6.1 0.848 0.848 73.451 73.451 dbcsr_copy_into_existing 11 7.0 46.532 46.532 46.532 46.532 multiply_cannon 185 7.1 0.285 0.285 44.795 44.795 dbcsr_complete_redistribute 23 7.5 30.506 30.506 41.628 41.628 matrix_decluster 11 7.0 0.000 0.000 37.885 37.885 multiply_cannon_loop 185 8.1 0.251 0.251 32.412 32.412 make_m2s 370 7.1 0.038 0.038 24.450 24.450 multiply_cannon_multrec 185 9.1 22.857 22.857 22.883 22.883 make_images 370 8.1 10.537 10.537 22.857 22.857 dbcsr_finalize 646 7.5 0.178 0.178 14.503 14.503 dbcsr_merge_all 597 8.5 2.113 2.113 13.435 13.435 setup_rec_index_2d 370 8.1 12.011 12.011 12.011 12.011 tree_to_linear_d 110 9.4 10.143 10.143 10.143 10.143 dbcsr_sort_indices 1103 9.9 10.073 10.073 10.073 10.073 calculate_norms 370 9.1 9.279 9.279 9.279 9.279 quick_finalize 395 10.0 0.370 0.370 8.745 8.745 ls_scf_init_scf 1 4.0 0.000 0.000 8.533 8.533 ls_scf_init_matrix_S 1 5.0 0.000 0.000 8.208 8.208 dbcsr_special_finalize 370 9.1 0.002 0.002 8.085 8.085 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 7.575 7.575 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.008 0.021 65.503 65.513 qs_energies 1 2.0 0.000 0.000 65.420 65.420 ls_scf 1 3.0 0.000 0.000 65.363 65.364 ls_scf_main 1 4.0 0.000 0.006 62.897 62.897 density_matrix_trs4 11 5.0 0.006 0.016 60.351 60.405 dbcsr_multiply_generic 185 6.1 0.056 0.069 56.190 56.399 multiply_cannon 185 7.1 0.030 0.035 46.633 47.635 multiply_cannon_loop 185 8.1 0.101 0.133 44.317 45.389 multiply_cannon_multrec 1480 9.1 27.010 30.449 27.258 30.723 mp_waitall_1 11936 10.3 15.150 19.920 15.150 19.920 multiply_cannon_metrocomm3 1480 9.1 0.012 0.017 9.005 16.085 multiply_cannon_metrocomm1 1480 9.1 0.007 0.010 3.472 8.226 make_m2s 370 7.1 0.033 0.035 6.567 6.623 make_images 370 8.1 0.622 0.712 6.442 6.495 calculate_norms 2960 9.1 4.426 5.828 4.426 5.828 arnoldi_extremal 12 6.1 0.000 0.000 3.399 3.434 arnoldi_normal_ev 12 7.1 0.001 0.004 3.399 3.433 mp_sum_l 1119 5.6 2.101 3.298 2.101 3.298 build_subspace 23 8.1 0.018 0.024 3.274 3.281 make_images_data 370 9.1 0.009 0.020 2.896 3.152 dbcsr_matrix_vector_mult 652 9.0 0.009 0.055 2.268 3.000 hybrid_alltoall_any 393 9.9 0.174 0.865 2.495 2.744 dbcsr_matrix_vector_mult_local 652 10.0 1.619 2.540 1.621 2.543 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.000 1.251 2.313 ls_scf_dm_to_ks 11 5.0 0.000 0.000 2.217 2.298 dbcsr_complete_redistribute 23 7.5 1.170 1.472 1.884 2.006 matrix_ls_to_qs 11 6.0 0.000 0.000 1.840 1.976 ls_scf_init_scf 1 4.0 0.000 0.000 1.893 1.894 ls_scf_init_matrix_S 1 5.0 0.000 0.000 1.869 1.874 matrix_decluster 11 7.0 0.000 0.000 1.698 1.803 make_images_pack 370 9.1 1.551 1.791 1.555 1.796 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.001 1.713 1.714 buffer_matrices_ensure_size 370 8.1 1.228 1.491 1.228 1.491 mp_sum_dv 2907 10.4 1.219 1.338 1.219 1.338 dbcsr_finalize 646 7.5 0.007 0.008 1.179 1.317 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=69.97200000000004, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=104.825, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=46.532, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=30.506, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=22.857, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=12.011, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=9.279, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=14.027000000000001, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=1.619, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.17, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=27.01, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.426, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=2.101, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=15.15, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.007 69.342 69.342 lib_test 1 2.0 0.000 0.000 69.334 69.334 dbcsr_run_tests 3 3.0 0.002 0.002 69.334 69.334 test_multiplies_multiproc 3 4.0 0.001 0.001 53.776 53.776 dbcsr_redistribute 9 5.0 34.900 34.900 36.429 36.429 dbcsr_multiply_generic 9 5.0 0.001 0.001 15.763 15.763 dbcsr_make_random_matrix 9 4.0 12.502 12.502 15.460 15.460 multiply_cannon 9 6.0 0.001 0.001 11.168 11.168 multiply_cannon_loop 9 7.0 0.054 0.054 10.814 10.814 multiply_cannon_multrec 9 8.0 10.760 10.760 10.761 10.761 dbcsr_finalize 27 5.7 0.047 0.047 5.458 5.458 dbcsr_merge_all 18 6.5 1.953 1.953 4.750 4.750 dbcsr_data_release 975 7.6 2.554 2.554 2.554 2.554 tree_to_linear_d 9 7.0 1.814 1.814 1.814 1.814 make_m2s 18 6.0 0.001 0.001 1.539 1.539 make_images 18 7.0 0.525 0.525 1.493 1.493 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.016 17.530 17.535 lib_test 1 2.0 0.000 0.000 17.488 17.505 dbcsr_run_tests 3 3.0 0.000 0.001 17.488 17.504 test_multiplies_multiproc 3 4.0 0.000 0.002 16.671 16.712 dbcsr_multiply_generic 9 5.0 0.001 0.001 14.822 14.909 multiply_cannon 9 6.0 0.001 0.002 13.168 13.522 multiply_cannon_loop 9 7.0 0.002 0.002 12.887 13.252 multiply_cannon_multrec 72 8.0 10.589 11.198 10.589 11.199 mp_waitall_1 576 9.2 2.618 3.359 2.618 3.359 multiply_cannon_metrocomm1 72 8.0 0.001 0.001 1.923 2.779 multiply_cannon_metrocomm3 72 8.0 0.000 0.000 0.368 1.202 dbcsr_make_random_matrix 9 4.0 0.676 0.925 0.792 1.004 mp_sum_l 390 2.5 0.497 0.937 0.497 0.937 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.494 0.933 make_m2s 18 6.0 0.001 0.001 0.654 0.698 make_images 18 7.0 0.021 0.025 0.651 0.695 dbcsr_data_release 444 7.6 0.579 0.676 0.579 0.676 dbcsr_finalize 27 5.7 0.000 0.000 0.526 0.616 dbcsr_destroy 111 5.9 0.000 0.000 0.488 0.597 dbcsr_checksum 6 5.0 0.154 0.517 0.525 0.529 dbcsr_merge_all 18 6.5 0.080 0.100 0.433 0.502 make_images_data 18 8.0 0.000 0.001 0.357 0.431 dbcsr_redistribute 9 5.0 0.224 0.270 0.391 0.419 mp_sum_d 191 1.2 0.372 0.391 0.372 0.391 mp_cart_create 9 5.7 0.333 0.385 0.333 0.385 hybrid_alltoall_any 18 9.0 0.026 0.129 0.304 0.382 dbcsr_mp_make_env 6 4.5 0.000 0.000 0.321 0.366 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=6.672999999999995, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=34.9, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=12.502, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=10.76, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.554, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=1.953, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=2.2669999999999995, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.224, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.676, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=10.589, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.579, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.08, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.497, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=2.618, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.049 0.049 133.484 133.484 qs_mol_dyn_low 1 2.0 0.004 0.004 131.811 131.811 velocity_verlet 5 3.0 0.003 0.003 105.939 105.939 qmmm_el_coupling 6 3.8 0.000 0.000 87.675 87.675 qmmm_elec_with_gaussian 6 4.8 0.087 0.087 87.671 87.671 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 86.962 86.962 qmmm_elec_gaussian_low_G 6 6.8 86.055 86.055 86.055 86.055 qs_forces 6 3.8 0.000 0.000 34.999 34.999 qs_energies 6 4.8 0.000 0.000 31.069 31.069 scf_env_do_scf 6 5.8 0.001 0.001 28.703 28.703 scf_env_do_scf_inner_loop 39 6.8 0.004 0.004 24.849 24.849 rebuild_ks_matrix 45 8.4 0.000 0.000 24.076 24.076 qs_ks_build_kohn_sham_matrix 45 9.4 0.005 0.005 24.076 24.076 qs_ks_update_qs_env 45 7.8 0.000 0.000 20.572 20.572 pw_transfer 966 12.3 0.046 0.046 16.414 16.414 fft_wrap_pw1pw2 801 13.6 0.006 0.006 16.180 16.180 fft_wrap_pw1pw2_150 507 15.2 2.111 2.111 15.686 15.686 qs_vxc_create 45 10.4 0.001 0.001 12.926 12.926 xc_vxc_pw_create 45 11.4 0.654 0.654 12.926 12.926 xc_pw_derive 270 13.4 0.002 0.002 8.953 8.953 fft3d_s 802 15.6 7.493 7.493 7.501 7.501 qs_rho_update_rho_low 45 7.9 0.000 0.000 6.938 6.938 calculate_rho_elec 45 8.9 0.564 0.564 6.938 6.938 xc_rho_set_and_dset_create 45 12.4 0.667 0.667 6.827 6.827 qmmm_forces 6 3.8 0.002 0.002 5.457 5.457 xc_pw_divergence 45 12.4 0.001 0.001 5.386 5.386 pw_scatter_s 429 15.8 5.287 5.287 5.287 5.287 qmmm_forces_with_gaussian 6 4.8 0.112 0.112 5.139 5.139 pw_integral_ab 2539 7.4 4.362 4.362 4.362 4.362 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 4.285 4.285 qs_ks_ddapc 45 10.4 0.001 0.001 4.142 4.142 init_scf_loop 6 6.8 0.000 0.000 3.848 3.848 qmmm_forces_gaussian_low_G 6 6.8 3.566 3.566 3.566 3.566 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 3.511 3.511 grid_collocate_task_list 45 9.9 3.229 3.229 3.229 3.229 density_rs2pw 45 9.9 0.001 0.001 3.144 3.144 sum_up_and_integrate 45 10.4 0.117 0.117 2.995 2.995 pw_poisson_solve 51 9.9 1.241 1.241 2.992 2.992 integrate_v_rspace 45 11.4 0.011 0.011 2.878 2.878 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.047 56.170 56.181 qs_mol_dyn_low 1 2.0 0.002 0.004 55.008 55.069 qs_forces 6 3.8 0.000 0.001 39.525 39.526 qs_energies 6 4.8 0.001 0.001 37.744 37.744 scf_env_do_scf 6 5.8 0.000 0.001 36.780 36.780 scf_env_do_scf_inner_loop 113 6.2 0.002 0.015 35.279 35.280 rebuild_ks_matrix 119 8.1 0.000 0.000 25.621 25.631 qs_ks_build_kohn_sham_matrix 119 9.1 0.014 0.019 25.621 25.631 qs_ks_update_qs_env 119 7.3 0.001 0.002 24.125 24.134 velocity_verlet 5 3.0 0.002 0.003 23.253 23.256 pw_transfer 2446 12.3 0.164 0.188 16.274 16.705 fft_wrap_pw1pw2 2059 13.4 0.020 0.024 15.918 16.425 fft_wrap_pw1pw2_150 1321 14.9 1.089 1.378 15.204 15.611 fft3d_ps 2059 15.4 6.133 7.170 12.098 13.054 qs_vxc_create 119 10.1 0.002 0.002 12.888 12.893 xc_vxc_pw_create 119 11.1 0.139 0.199 12.886 12.891 qs_rho_update_rho_low 119 7.3 0.000 0.001 10.645 10.646 calculate_rho_elec 119 8.3 0.049 0.058 10.644 10.646 xc_pw_derive 714 13.1 0.008 0.010 9.651 9.932 sum_up_and_integrate 119 10.1 0.049 0.071 9.258 9.471 integrate_v_rspace 119 11.1 0.003 0.004 9.209 9.435 qmmm_forces 6 3.8 0.002 0.002 7.677 7.677 rs_pw_transfer 988 11.5 0.010 0.013 7.142 7.464 qmmm_forces_with_gaussian 6 4.8 0.283 0.337 7.015 7.387 qmmm_el_coupling 6 3.8 0.000 0.000 6.753 6.976 qmmm_elec_with_gaussian 6 4.8 0.289 0.358 6.751 6.974 xc_rho_set_and_dset_create 119 12.1 0.339 0.572 6.234 6.852 xc_pw_divergence 119 12.1 0.004 0.005 6.256 6.507 density_rs2pw 119 9.3 0.005 0.007 6.168 6.442 mp_alltoall_z22v 2059 17.4 4.691 6.296 4.691 6.296 potential_pw2rs 119 12.1 0.005 0.007 5.449 5.484 grid_collocate_task_list 119 9.3 4.275 4.627 4.275 4.627 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 3.855 4.075 grid_integrate_task_list 119 12.1 3.384 3.890 3.384 3.890 x_to_yz 1095 16.8 0.755 0.892 3.103 3.831 yz_to_x 964 16.0 0.469 0.636 2.812 3.777 mp_waitany 4028 12.8 2.692 3.753 2.692 3.753 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.331 3.558 qmmm_forces_gaussian_low_G 6 6.8 3.164 3.387 3.164 3.387 qmmm_elec_gaussian_low_G 6 6.8 2.742 2.964 2.742 2.964 rs_pw_transfer_PW2RS_150 125 13.9 1.052 1.333 2.730 2.836 rs_pw_transfer_RS2PW_150 125 11.2 0.806 1.094 2.325 2.665 pw_restrict_s3 18 5.8 1.281 1.510 2.270 2.522 dbcsr_multiply_generic 2588 12.3 0.056 0.070 2.266 2.429 qs_scf_new_mos 113 7.2 0.000 0.000 2.353 2.358 qs_scf_loop_do_ot 113 8.2 0.000 0.000 2.353 2.357 ot_scf_mini 113 9.2 0.001 0.001 2.252 2.257 mp_waitall_1 188862 16.2 1.895 2.251 1.895 2.251 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.000 1.962 2.177 pw_prolongate_s3 18 6.8 1.099 1.287 1.962 2.177 mp_sum_dm3 33 5.7 1.689 1.861 1.689 1.861 qs_ks_ddapc 119 10.1 0.002 0.002 1.757 1.856 mp_sum_d 5820 12.2 1.106 1.730 1.106 1.730 pw_scatter_p 1095 15.8 1.544 1.582 1.544 1.582 pw_integral_ab 2761 7.7 0.976 1.144 1.405 1.536 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 1.506 1.507 pw_gather_p 964 15.0 1.140 1.505 1.140 1.505 init_scf_loop 6 6.8 0.000 0.000 1.498 1.498 ot_mini 113 10.2 0.001 0.007 1.372 1.378 rs_pw_transfer_PW2RS_40 119 14.1 0.189 0.245 0.960 1.244 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=23.492000000000004, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=86.055, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=7.493, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=5.287, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=4.362, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=3.566, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=3.229, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=30.805000000000003, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=2.742, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=0.976, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=3.164, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=4.275, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=3.384, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.691, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=6.133, yerr=0.0 Summary: Performance test took 35 minutes. Status: OK Removing intermediate container 60d5db1f3194 ---> e8c162bebd9b Step 41/42 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 6dc4029f2205 Removing intermediate container 6dc4029f2205 ---> fa72f9915ae0 Step 42/42 : ENTRYPOINT [] ---> Running in 685ad4780106 Removing intermediate container 685ad4780106 ---> 7079adda446b [Warning] One or more build-args [GIT_COMMIT_SHA] were not consumed Successfully built 7079adda446b Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-14b:master Pushing new image... done. #################### Running Image cp2k-perf-openmp #################### Uploading artifacts... done EndDate: 2022-09-05 20:05:18+00:00