StartDate: 2021-06-16 19:40:56+00:00 CpuId: 64x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm CommitSHA: 3c81dfb14877e2dca5ed75034aede06b886d942b CommitTime: 2021-06-16 14:22:47 +0200 CommitAuthor: Ole Schütt CommitSubject: Toolchain: Hardcode HIP_PLATFORM=nvidia for now Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-b51:gittree-95f19d0-buildargs-68b329d Sending build context to Docker daemon 73.73kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> 253da5fdd63f Step 3/9 : WORKDIR /workspace ---> Running in 539d39611b38 Removing intermediate container 539d39611b38 ---> af72d2ae15d3 Step 4/9 : COPY ./scripts/install_basics.sh . ---> 87789deaf798 Step 5/9 : RUN ./install_basics.sh ---> Running in b45492e4ee3f Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 14718 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8_amd64.deb ... Unpacking rsync (3.1.3-8) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container b45492e4ee3f ---> cd9a98a38552 Step 6/9 : COPY ./scripts/install_performance.sh . ---> 3efdb1495930 Step 7/9 : RUN ./install_performance.sh "local" ---> Running in 94a054de096f './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_static.psmp' -> '/opt/cp2k-toolchain/install/arch/local_static.psmp' './local_static.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_static.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container 94a054de096f ---> eb72a3959a3b Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> 0765d9e6a3a9 Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in 90df14e239e2 Removing intermediate container 90df14e239e2 ---> 9f0846a0f20e Successfully built 9f0846a0f20e Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-b51:gittree-7c98060-buildargs-2673c8e Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: 3c81dfb14877e2dca5ed75034aede06b886d942b CommitTime: 2021-06-16 14:22:47 +0200 CommitAuthor: Ole Schütt CommitSubject: Toolchain: Hardcode HIP_PLATFORM=nvidia for now ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.037 0.037 160.239 160.239 qs_mol_dyn_low 1 2.0 0.004 0.004 159.433 159.433 qs_forces 11 3.9 0.001 0.001 159.374 159.374 qs_energies 11 4.9 0.001 0.001 148.001 148.001 scf_env_do_scf 11 5.9 0.001 0.001 119.468 119.468 velocity_verlet 10 3.0 0.002 0.002 107.747 107.747 scf_env_do_scf_inner_loop 108 6.5 0.011 0.011 92.161 92.161 rebuild_ks_matrix 119 8.3 0.001 0.001 42.213 42.213 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 42.212 42.212 qs_rho_update_rho 119 7.7 0.001 0.001 38.161 38.161 calculate_rho_elec 119 8.7 1.572 1.572 38.160 38.160 qs_ks_update_qs_env 119 7.6 0.001 0.001 37.839 37.839 grid_collocate_task_list 119 9.7 31.838 31.838 31.838 31.838 sum_up_and_integrate 119 10.3 0.388 0.388 30.155 30.155 integrate_v_rspace 119 11.3 0.155 0.155 29.766 29.766 init_scf_loop 11 6.9 0.000 0.000 27.110 27.110 grid_integrate_task_list 119 12.3 27.044 27.044 27.044 27.044 qs_scf_new_mos 108 7.5 0.001 0.001 23.769 23.769 qs_scf_loop_do_ot 108 8.5 0.001 0.001 23.768 23.768 dbcsr_multiply_generic 2286 12.5 0.179 0.179 22.468 22.468 prepare_preconditioner 11 7.9 0.000 0.000 22.443 22.443 make_preconditioner 11 8.9 0.000 0.000 22.443 22.443 ot_scf_mini 108 9.5 0.003 0.003 22.339 22.339 make_full_inverse_cholesky 11 9.9 0.000 0.000 20.396 20.396 ot_mini 108 10.5 0.001 0.001 14.505 14.505 init_scf_run 11 5.9 0.001 0.001 13.850 13.850 scf_env_initial_rho_setup 11 6.9 0.001 0.001 13.848 13.848 make_m2s 4572 13.5 0.066 0.066 13.345 13.345 wfi_extrapolate 11 7.9 0.001 0.001 12.985 12.985 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 10.873 10.873 cp_gemm 81 9.0 0.001 0.001 10.706 10.706 cp_gemm_cosma 81 10.0 10.705 10.705 10.705 10.705 cp_fm_cholesky_decompose 22 10.9 8.149 8.149 8.149 8.149 pw_transfer 1439 11.6 0.103 0.103 8.006 8.006 fft_wrap_pw1pw2 1201 12.6 0.010 0.010 7.688 7.688 ot_diis_step 108 11.5 0.006 0.006 7.481 7.481 qs_ot_get_derivative 108 11.5 0.002 0.002 7.020 7.020 make_images 4572 14.5 2.552 2.552 6.984 6.984 dbcsr_complete_redistribute 329 12.2 3.186 3.186 6.713 6.713 dbcsr_make_dense_low 5837 15.5 0.101 0.101 6.603 6.603 fft_wrap_pw1pw2_140 487 13.2 0.688 0.688 6.494 6.494 make_dense_data 5837 16.5 5.769 5.769 6.485 6.485 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.406 6.406 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 6.278 6.278 apply_single 119 13.6 0.000 0.000 6.277 6.277 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 6.211 6.211 dbcsr_make_images_dense 3978 14.8 0.027 0.027 5.908 5.908 qs_create_task_list 11 7.9 0.000 0.000 5.844 5.844 generate_qs_task_list 11 8.9 4.016 4.016 5.844 5.844 copy_dbcsr_to_fm 153 11.3 0.004 0.004 5.567 5.567 cp_fm_cholesky_invert 11 10.9 5.531 5.531 5.531 5.531 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 5.159 5.159 dbcsr_copy 2102 12.0 0.289 0.289 5.099 5.099 multiply_cannon 2286 13.5 0.308 0.308 4.859 4.859 pw_poisson_solve 119 10.3 2.042 2.042 4.774 4.774 dbcsr_copy_into_existing 22 7.9 4.757 4.757 4.757 4.757 density_rs2pw 119 9.7 0.006 0.006 4.750 4.750 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.596 4.596 multiply_cannon_loop 2286 14.5 0.064 0.064 4.100 4.100 qs_ot_get_p 119 10.4 0.001 0.001 4.036 4.036 multiply_cannon_multrec 2286 15.5 3.968 3.968 4.035 4.035 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.847 3.847 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 3.639 3.639 calculate_w_matrix_ot 11 6.9 0.009 0.009 3.639 3.639 fft3d_s 1202 14.6 3.499 3.499 3.505 3.505 copy_fm_to_dbcsr 176 11.2 0.002 0.002 3.281 3.281 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.022 77.076 77.077 qs_mol_dyn_low 1 2.0 0.004 0.005 76.952 76.959 qs_forces 11 3.9 0.002 0.002 76.898 76.898 qs_energies 11 4.9 0.001 0.002 71.659 71.660 scf_env_do_scf 11 5.9 0.001 0.001 66.160 66.161 scf_env_do_scf_inner_loop 108 6.5 0.003 0.011 61.276 61.276 velocity_verlet 10 3.0 0.002 0.002 45.355 45.357 rebuild_ks_matrix 119 8.3 0.001 0.001 30.268 30.319 qs_ks_build_kohn_sham_matrix 119 9.3 0.022 0.023 30.267 30.318 qs_ks_update_qs_env 119 7.6 0.001 0.001 26.873 26.925 qs_rho_update_rho 119 7.7 0.001 0.001 23.942 23.971 calculate_rho_elec 119 8.7 0.048 0.051 23.941 23.970 sum_up_and_integrate 119 10.3 0.052 0.058 23.781 23.804 integrate_v_rspace 119 11.3 0.005 0.005 23.728 23.751 dbcsr_multiply_generic 2286 12.5 0.127 0.129 18.733 18.819 grid_collocate_task_list 119 9.7 17.183 17.883 17.183 17.883 grid_integrate_task_list 119 12.3 17.146 17.643 17.146 17.643 qs_scf_new_mos 108 7.5 0.001 0.001 15.417 15.481 qs_scf_loop_do_ot 108 8.5 0.001 0.001 15.416 15.480 ot_scf_mini 108 9.5 0.003 0.004 14.466 14.536 multiply_cannon 2286 13.5 0.226 0.238 12.648 12.946 multiply_cannon_loop 2286 14.5 0.225 0.232 11.428 11.726 mp_waitall_1 169478 16.3 9.376 9.624 9.376 9.624 ot_mini 108 10.5 0.001 0.001 8.582 8.653 rs_pw_transfer 974 11.9 0.018 0.019 7.092 7.908 density_rs2pw 119 9.7 0.008 0.009 6.111 6.922 multiply_cannon_metrocomm3 18288 15.5 0.075 0.079 5.992 6.280 pw_transfer 1439 11.6 0.134 0.143 6.190 6.251 fft_wrap_pw1pw2 1201 12.6 0.014 0.015 5.887 5.955 potential_pw2rs 119 12.3 0.010 0.010 5.451 5.460 fft_wrap_pw1pw2_140 487 13.2 0.562 0.587 5.068 5.225 init_scf_loop 11 6.9 0.000 0.000 4.866 4.867 fft3d_ps 1201 14.6 2.416 2.574 4.440 4.513 ot_diis_step 108 11.5 0.005 0.005 4.347 4.348 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 4.271 4.328 apply_single 119 13.6 0.001 0.001 4.271 4.327 multiply_cannon_multrec 18288 15.5 4.177 4.311 4.193 4.327 qs_ot_get_derivative 108 11.5 0.001 0.002 4.187 4.251 make_m2s 4572 13.5 0.075 0.078 4.073 4.140 init_scf_run 11 5.9 0.000 0.002 3.805 3.806 scf_env_initial_rho_setup 11 6.9 0.000 0.001 3.805 3.805 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.618 3.627 make_images 4572 14.5 0.187 0.192 3.383 3.451 wfi_extrapolate 11 7.9 0.001 0.001 3.408 3.408 mp_waitany 9880 13.7 2.453 3.287 2.453 3.287 rs_pw_transfer_RS2PW_140 130 11.5 0.525 0.560 2.173 3.005 rs_pw_transfer_PW2RS_140 130 13.9 1.263 1.325 2.647 2.679 qs_ot_get_p 119 10.4 0.001 0.001 2.025 2.103 mp_alltoall_d11v 2130 13.8 1.450 2.095 1.450 2.095 rs_gather_matrices 119 12.3 0.131 0.144 1.074 1.763 prepare_preconditioner 11 7.9 0.000 0.000 1.700 1.717 make_preconditioner 11 8.9 0.000 0.000 1.700 1.717 make_images_data 4572 15.5 0.058 0.062 1.564 1.682 mp_alltoall_z22v 1201 16.6 1.366 1.587 1.366 1.587 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 1.479 1.579 make_full_inverse_cholesky 11 9.9 0.000 0.000 1.535 1.565 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=72.76599999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=31.838, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.044, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=10.705, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=8.149, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="make_dense_data", label="make_dense_data", y=5.769, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.968, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=26.740999999999993, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=17.183, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=17.146, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.177, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitany", label="mp_waitany", y=2.453, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=9.376, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.034 0.034 201.724 201.724 qs_mol_dyn_low 1 2.0 0.004 0.004 200.874 200.874 qs_forces 11 3.9 0.001 0.001 200.814 200.814 qs_energies 11 4.9 0.001 0.001 187.156 187.156 scf_env_do_scf 11 5.9 0.001 0.001 154.272 154.272 velocity_verlet 10 3.0 0.002 0.002 132.996 132.996 scf_env_do_scf_inner_loop 96 6.5 0.010 0.010 124.240 124.240 rebuild_ks_matrix 107 8.3 0.001 0.001 63.582 63.582 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 63.582 63.582 qs_rho_update_rho 107 7.7 0.001 0.001 58.203 58.203 calculate_rho_elec 107 8.7 1.407 1.407 58.202 58.202 qs_ks_update_qs_env 107 7.6 0.001 0.001 56.952 56.952 grid_collocate_task_list 107 9.7 52.473 52.473 52.473 52.473 sum_up_and_integrate 107 10.3 0.370 0.370 52.410 52.410 integrate_v_rspace 107 11.3 0.141 0.141 52.039 52.039 grid_integrate_task_list 107 12.3 49.504 49.504 49.504 49.504 init_scf_loop 11 6.9 0.000 0.000 29.822 29.822 prepare_preconditioner 11 7.9 0.000 0.000 22.421 22.421 make_preconditioner 11 8.9 0.000 0.000 22.421 22.421 qs_scf_new_mos 96 7.5 0.001 0.001 21.811 21.811 qs_scf_loop_do_ot 96 8.5 0.001 0.001 21.810 21.810 dbcsr_multiply_generic 1966 12.4 0.170 0.170 20.730 20.730 ot_scf_mini 96 9.5 0.003 0.003 20.512 20.512 make_full_inverse_cholesky 11 9.9 0.000 0.000 20.342 20.342 init_scf_run 11 5.9 0.001 0.001 16.629 16.629 scf_env_initial_rho_setup 11 6.9 0.001 0.001 16.628 16.628 wfi_extrapolate 11 7.9 0.001 0.001 15.531 15.531 ot_mini 96 10.5 0.001 0.001 13.319 13.319 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 12.354 12.354 make_m2s 3932 13.4 0.059 0.059 12.327 12.327 cp_gemm 81 9.0 0.001 0.001 10.877 10.877 cp_gemm_cosma 81 10.0 10.876 10.876 10.876 10.876 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 8.351 8.351 cp_fm_cholesky_decompose 22 10.9 8.090 8.090 8.090 8.090 qs_env_update_s_mstruct 11 6.9 0.000 0.000 7.820 7.820 pw_transfer 1295 11.6 0.101 0.101 7.387 7.387 qs_create_task_list 11 7.9 0.000 0.000 7.257 7.257 generate_qs_task_list 11 8.9 5.366 5.366 7.257 7.257 fft_wrap_pw1pw2 1081 12.6 0.010 0.010 7.079 7.079 dbcsr_complete_redistribute 317 12.2 3.142 3.142 6.781 6.781 ot_diis_step 96 11.5 0.005 0.005 6.764 6.764 qs_ot_get_derivative 96 11.5 0.001 0.001 6.551 6.551 make_images 3932 14.4 2.321 2.321 6.436 6.436 dbcsr_make_dense_low 4961 15.5 0.096 0.096 6.080 6.080 fft_wrap_pw1pw2_140 439 13.2 0.606 0.606 5.992 5.992 make_dense_data 4961 16.5 5.320 5.320 5.969 5.969 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.770 5.770 apply_single 107 13.6 0.001 0.001 5.770 5.770 copy_dbcsr_to_fm 147 11.2 0.004 0.004 5.557 5.557 cp_fm_cholesky_invert 11 10.9 5.511 5.511 5.511 5.511 dbcsr_make_images_dense 3386 14.7 0.023 0.023 5.444 5.444 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 5.304 5.304 dbcsr_copy 1855 11.9 0.266 0.266 5.210 5.210 dbcsr_copy_into_existing 22 7.9 4.889 4.889 4.889 4.889 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.585 4.585 pw_poisson_solve 107 10.3 1.935 1.935 4.460 4.460 multiply_cannon 1966 13.4 0.295 0.295 4.451 4.451 density_rs2pw 107 9.7 0.006 0.006 4.323 4.323 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.013 132.663 132.664 qs_mol_dyn_low 1 2.0 0.004 0.005 132.556 132.562 qs_forces 11 3.9 0.002 0.002 132.501 132.501 qs_energies 11 4.9 0.001 0.002 123.506 123.508 scf_env_do_scf 11 5.9 0.001 0.001 114.946 114.948 scf_env_do_scf_inner_loop 96 6.5 0.003 0.010 106.780 106.780 velocity_verlet 10 3.0 0.002 0.002 79.181 79.183 rebuild_ks_matrix 107 8.3 0.001 0.001 60.425 60.463 qs_ks_build_kohn_sham_matrix 107 9.3 0.020 0.022 60.425 60.462 sum_up_and_integrate 107 10.3 0.048 0.053 54.451 54.490 integrate_v_rspace 107 11.3 0.004 0.005 54.403 54.442 qs_ks_update_qs_env 107 7.6 0.001 0.001 53.234 53.270 qs_rho_update_rho 107 7.7 0.001 0.001 50.967 50.992 calculate_rho_elec 107 8.7 0.043 0.046 50.966 50.991 grid_integrate_task_list 107 12.3 47.318 48.415 47.318 48.415 grid_collocate_task_list 107 9.7 44.219 45.028 44.219 45.028 dbcsr_multiply_generic 1966 12.4 0.112 0.115 17.042 17.185 qs_scf_new_mos 96 7.5 0.001 0.001 13.791 13.823 qs_scf_loop_do_ot 96 8.5 0.001 0.001 13.791 13.822 ot_scf_mini 96 9.5 0.003 0.004 12.912 12.943 multiply_cannon 1966 13.4 0.198 0.206 11.530 11.827 multiply_cannon_loop 1966 14.4 0.199 0.206 10.408 10.578 mp_waitall_1 146670 16.2 8.559 8.839 8.559 8.839 rs_pw_transfer 878 11.9 0.016 0.017 7.237 8.596 init_scf_loop 11 6.9 0.000 0.001 8.150 8.151 ot_mini 96 10.5 0.001 0.001 7.655 7.684 density_rs2pw 107 9.7 0.007 0.008 6.160 7.536 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.399 7.404 init_scf_run 11 5.9 0.000 0.002 6.813 6.814 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.813 6.813 wfi_extrapolate 11 7.9 0.001 0.001 6.203 6.203 multiply_cannon_metrocomm3 15728 15.4 0.066 0.068 5.465 5.771 pw_transfer 1295 11.6 0.121 0.127 5.659 5.725 fft_wrap_pw1pw2 1081 12.6 0.013 0.016 5.383 5.453 potential_pw2rs 107 12.3 0.009 0.010 5.117 5.125 fft_wrap_pw1pw2_140 439 13.2 0.510 0.531 4.606 4.765 mp_waitany 8968 13.7 2.940 4.323 2.940 4.323 fft3d_ps 1081 14.6 2.172 2.339 4.052 4.112 rs_pw_transfer_RS2PW_140 118 11.5 0.439 0.467 2.579 3.950 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.858 3.916 apply_single 107 13.6 0.001 0.001 3.858 3.916 multiply_cannon_multrec 15728 15.4 3.794 3.888 3.809 3.903 ot_diis_step 96 11.5 0.004 0.004 3.869 3.869 qs_ot_get_derivative 96 11.5 0.001 0.001 3.741 3.776 make_m2s 3932 13.4 0.065 0.068 3.659 3.723 mp_alltoall_d11v 1998 13.7 2.262 3.707 2.262 3.707 rs_gather_matrices 107 12.3 0.124 0.132 1.912 3.363 make_images 3932 14.4 0.164 0.168 3.057 3.124 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=75.26999999999998, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=52.473, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=49.504, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=10.876, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=8.09, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=5.511, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=25.833000000000013, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=44.219, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=47.318, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.794, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=2.94, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.559, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.365 0.365 257.950 257.950 qs_energies 1 2.0 0.000 0.000 256.714 256.714 scf_env_do_scf 1 3.0 0.000 0.000 253.968 253.968 qs_ks_update_qs_env 8 5.0 0.000 0.000 244.864 244.864 rebuild_ks_matrix 7 6.0 0.000 0.000 244.758 244.758 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 244.758 244.758 hfx_ks_matrix 7 8.0 0.000 0.000 178.779 178.779 integrate_four_center 7 9.0 9.224 9.224 178.752 178.752 integrate_four_center_main 7 10.0 1.246 1.246 159.928 159.928 integrate_four_center_bin 447 11.0 158.682 158.682 158.682 158.682 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 146.798 146.798 init_scf_loop 1 4.0 0.000 0.000 107.152 107.152 cp_gemm 129 10.3 0.001 0.001 51.298 51.298 cp_gemm_cosma 129 11.3 51.296 51.296 51.296 51.296 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 30.023 30.023 admm_fit_mo_coeffs 7 9.0 0.000 0.000 27.377 27.377 admm_mo_merge_derivs 7 8.0 0.000 0.000 25.640 25.640 merge_mo_derivs_diag 7 9.0 0.024 0.024 25.640 25.640 purify_mo_diag 7 10.0 0.001 0.001 14.331 14.331 fit_mo_coeffs 7 10.0 0.000 0.000 13.046 13.046 integrate_four_center_load 7 10.0 0.001 0.001 9.065 9.065 hfx_load_balance 1 11.0 0.002 0.002 9.064 9.064 calculate_rho_elec 15 7.4 0.193 0.193 6.199 6.199 grid_collocate_task_list 15 8.4 5.438 5.438 5.438 5.438 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.014 0.017 185.435 185.436 qs_energies 1 2.0 0.000 0.000 185.295 185.296 scf_env_do_scf 1 3.0 0.000 0.000 184.703 184.703 qs_ks_update_qs_env 8 5.0 0.000 0.000 181.507 181.508 rebuild_ks_matrix 7 6.0 0.000 0.000 181.494 181.494 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.002 181.494 181.494 hfx_ks_matrix 7 8.0 0.000 0.001 173.231 173.236 integrate_four_center 7 9.0 0.177 0.524 173.218 173.221 integrate_four_center_main 7 10.0 0.004 0.005 158.727 162.932 integrate_four_center_bin 448 11.0 158.723 162.927 158.723 162.927 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 108.047 108.048 init_scf_loop 1 4.0 0.000 0.000 76.654 76.654 integrate_four_center_load 7 10.0 0.000 0.000 9.300 9.305 hfx_load_balance 1 11.0 0.001 0.002 9.300 9.305 mp_sync 70 11.3 4.222 7.221 4.222 7.221 hfx_load_balance_count 1 12.0 4.483 4.655 4.483 4.655 hfx_load_balance_bin 1 12.0 4.461 4.643 4.461 4.643 qs_vxc_create 14 8.0 0.000 0.000 3.763 3.763 xc_vxc_pw_create 14 9.0 0.021 0.023 3.762 3.763 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=32.06400000000002, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=158.682, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=51.296, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=9.224, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=5.438, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=1.246, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=13.36499999999998, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=158.723, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.177, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.004, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=4.222, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.483, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.461, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.019 359.429 359.429 qs_energies 1 2.0 0.000 0.000 358.896 358.896 mp2_main 1 3.0 0.000 0.000 354.084 354.084 mp2_gpw_main 1 4.0 0.000 0.000 353.882 353.882 rpa_ri_compute_en 1 5.0 0.000 0.000 332.661 332.661 rpa_num_int 1 6.0 0.000 0.000 332.635 332.635 compute_mat_P_omega 1 7.0 0.002 0.002 199.558 199.558 compute_mat_P_omega_contract 10 8.0 13.291 13.291 198.300 198.300 dbcsr_t_total 2336 9.6 0.016 0.016 186.798 186.798 dbcsr_t_contract 787 11.0 49.544 49.544 110.371 110.371 cp_gemm 105 8.4 0.001 0.001 109.689 109.689 cp_gemm_cosma 105 9.4 109.687 109.687 109.687 109.687 dbcsr_t_copy 1103 10.7 21.250 21.250 74.901 74.901 compute_mat_P_omega_calc_M_occ 250 9.0 13.334 13.334 73.663 73.663 GW_matrix_operations 10 7.0 0.007 0.007 73.502 73.502 dbcsr_tas_total 1149 12.2 0.051 0.051 54.615 54.615 dbcsr_tas_multiply 807 12.1 0.002 0.002 53.112 53.112 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 45.015 45.015 dbcsr_multiply_generic 837 15.8 0.145 0.145 39.524 39.524 dbcsr_tas_dbcsr 807 14.1 0.003 0.003 39.240 39.240 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 37.868 37.868 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 35.765 35.765 dbcsr_tas_reserve_blocks_index 3261 13.7 7.588 7.588 29.416 29.416 dbcsr_tas_mm_1N 524 15.1 0.002 0.002 27.286 27.286 multiply_cannon 837 16.8 0.417 0.417 25.871 25.871 dbcsr_tas_copy 574 11.4 17.613 17.613 25.662 25.662 dbcsr_t_reserve_blocks_index 2280 12.5 1.172 1.172 22.549 22.549 multiply_cannon_loop 837 17.8 0.143 0.143 22.476 22.476 dbcsr_reserve_blocks 3717 14.7 21.017 21.017 21.469 21.469 multiply_cannon_multrec 837 18.8 20.470 20.470 21.277 21.277 mp2_ri_gpw_compute_in 1 5.0 0.000 0.000 21.205 21.205 dbcsr_t_reserve_blocks_index_a 2222 11.6 0.010 0.010 21.202 21.202 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 20.364 20.364 compute_QP_energies 1 7.0 0.000 0.000 19.693 19.693 compute_self_energy_cubic_gw 1 8.0 0.106 0.106 19.692 19.692 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 14.915 14.915 dbcsr_t_copy_nocomm 251 12.0 11.753 11.753 14.243 14.243 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 12.417 12.417 make_m2s 1674 16.8 0.110 0.110 11.044 11.044 dbcsr_tas_mm_2 251 15.0 0.001 0.001 10.592 10.592 make_images 1674 17.8 5.145 5.145 10.532 10.532 dbcsr_finalize 9888 13.6 1.608 1.608 8.160 8.160 contract_cubic_gw 21 9.0 0.000 0.000 7.917 7.917 build_3c_integrals 5 6.0 3.661 3.661 7.847 7.847 mp2_ri_gpw_compute_in_copy_3c 6 6.0 0.723 0.723 7.780 7.780 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.011 48.260 48.261 qs_energies 1 2.0 0.000 0.000 48.132 48.136 mp2_main 1 3.0 0.000 0.000 46.531 46.536 mp2_gpw_main 1 4.0 0.000 0.000 46.466 46.470 rpa_ri_compute_en 1 5.0 0.000 0.000 44.612 44.617 rpa_num_int 1 6.0 0.000 0.000 44.604 44.609 dbcsr_t_total 2336 9.6 0.016 0.017 40.105 40.108 compute_mat_P_omega 1 7.0 0.001 0.002 39.075 39.083 compute_mat_P_omega_contract 10 8.0 0.706 0.731 38.934 38.940 dbcsr_t_contract 787 11.0 1.857 2.054 29.647 29.650 dbcsr_tas_total 1149 12.2 0.064 0.067 25.987 25.988 dbcsr_tas_multiply 807 12.1 0.003 0.003 25.859 25.862 dbcsr_tas_dbcsr 807 14.1 0.003 0.004 18.884 18.886 dbcsr_multiply_generic 837 15.8 0.069 0.073 15.382 16.446 compute_mat_P_omega_calc_M_occ 250 9.0 0.688 0.717 12.991 12.991 multiply_cannon 837 16.8 0.131 0.142 9.334 9.966 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.674 9.674 dbcsr_t_copy 1111 10.7 4.049 4.346 8.761 9.142 multiply_cannon_loop 837 17.8 0.039 0.041 8.505 9.112 dbcsr_tas_mm_1N 524 15.1 0.002 0.003 7.990 8.954 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.002 8.281 8.282 mp_sync 8696 11.6 6.825 7.841 6.825 7.841 multiply_cannon_multrec 1386 17.8 6.650 7.274 6.886 7.499 dbcsr_tas_mm_2 251 15.0 0.002 0.002 7.346 7.347 make_m2s 1674 16.8 0.042 0.045 5.085 5.666 make_images 1674 17.8 0.251 0.274 5.006 5.585 compute_QP_energies 1 7.0 0.000 0.000 3.869 3.869 compute_self_energy_cubic_gw 1 8.0 0.005 0.006 3.867 3.869 dbcsr_t_communicate_buffer 1098 11.7 0.082 0.088 3.123 3.289 mp_waitall_2 3776 14.7 2.967 3.209 2.967 3.209 contract_cubic_gw 21 9.0 0.000 0.000 3.009 3.010 make_images_data 1674 18.8 0.034 0.036 2.700 2.850 dbcsr_t_reserve_blocks_index 2849 12.4 0.097 0.106 2.330 2.755 hybrid_alltoall_any 1724 19.5 2.044 2.326 2.579 2.741 dbcsr_t_reserve_blocks_index_a 2791 11.4 0.014 0.016 2.288 2.716 dbcsr_tas_reserve_blocks_index 3300 13.8 0.276 0.305 2.285 2.698 dbcsr_reserve_blocks 3785 14.7 1.993 2.370 2.032 2.413 make_images_pack 1674 18.8 1.854 2.397 1.866 2.410 mp_waitall_1 26582 19.0 1.540 1.916 1.540 1.916 mp2_ri_gpw_compute_in 1 5.0 0.000 0.000 1.851 1.851 convert_to_new_pgrid 2421 14.1 0.016 0.018 1.590 1.753 dbcsr_copy 3323 15.8 1.527 1.691 1.555 1.720 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 1.665 1.672 scf_env_do_scf 1 3.0 0.000 0.000 1.541 1.541 scf_env_do_scf_inner_loop 17 4.0 0.001 0.002 1.541 1.541 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 1.503 1.509 dbcsr_add_anytype 909 13.7 0.842 0.891 1.329 1.384 mp_max_i 2054 9.6 1.056 1.332 1.056 1.332 dbcsr_tas_replicate 396 14.1 0.773 0.854 1.264 1.326 mp_sum_l 9176 14.6 0.941 1.088 0.941 1.088 dbcsr_finalize 10566 13.5 0.039 0.041 0.953 0.983 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=137.46099999999998, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=109.687, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_contract", label="dbcsr_t_contract", y=49.544, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_copy", label="dbcsr_t_copy", y=21.25, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=21.017, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=20.47, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=21.875, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_contract", label="dbcsr_t_contract", y=1.857, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_copy", label="dbcsr_t_copy", y=4.049, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=1.993, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=6.65, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=2.044, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.967, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=6.825, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.110 0.110 287.333 287.333 qs_energies 1 2.0 0.000 0.000 287.144 287.144 ls_scf 1 3.0 0.000 0.000 285.248 285.248 ls_scf_main 1 4.0 0.002 0.002 272.306 272.306 density_matrix_trs4 11 5.0 0.012 0.012 138.405 138.405 ls_scf_dm_to_ks 11 5.0 0.000 0.000 126.601 126.601 matrix_ls_to_qs 11 6.0 0.000 0.000 121.828 121.828 dbcsr_multiply_generic 185 6.1 0.509 0.509 91.328 91.328 dbcsr_copy_into_existing 11 7.0 66.987 66.987 66.988 66.988 dbcsr_complete_redistribute 23 7.5 42.948 42.948 60.031 60.031 multiply_cannon 185 7.1 0.458 0.458 56.500 56.500 matrix_decluster 11 7.0 0.000 0.000 54.839 54.839 multiply_cannon_loop 185 8.1 0.407 0.407 36.963 36.963 multiply_cannon_multrec 185 9.1 34.944 34.944 34.994 34.994 make_m2s 370 7.1 0.037 0.037 29.057 29.057 make_images 370 8.1 7.245 7.245 26.562 26.562 arnoldi_extremal 12 6.1 0.000 0.000 24.851 24.851 arnoldi_normal_ev 12 7.1 0.028 0.028 24.851 24.851 build_subspace 23 8.1 0.138 0.138 24.224 24.224 dbcsr_matrix_vector_mult 652 9.0 0.218 0.218 23.136 23.136 dbcsr_finalize 646 7.5 0.221 0.221 22.464 22.464 dbcsr_matrix_vector_mult_local 652 10.0 21.825 21.825 21.843 21.843 dbcsr_merge_all 597 8.5 3.822 3.822 20.746 20.746 setup_rec_index_2d 370 8.1 18.929 18.929 18.929 18.929 dbcsr_sort_indices 1103 9.9 15.511 15.511 15.511 15.511 tree_to_linear_d 110 9.4 14.678 14.678 14.678 14.678 quick_finalize 395 10.0 0.537 0.537 13.287 13.287 dbcsr_special_finalize 370 9.1 0.003 0.003 12.222 12.222 ls_scf_init_scf 1 4.0 0.000 0.000 12.019 12.019 ls_scf_init_matrix_S 1 5.0 0.000 0.000 11.550 11.550 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 10.671 10.671 dbcsr_dot_sd 144 6.3 9.964 9.964 9.965 9.965 dbcsr_frobenius_norm 142 6.1 8.369 8.369 8.371 8.371 matrix_qs_to_ls 12 5.1 0.000 0.000 7.613 7.613 matrix_cluster 12 6.1 0.000 0.000 7.613 7.613 make_images_data 370 9.1 0.011 0.011 6.919 6.919 dbcsr_new_transposed 2 7.0 0.145 0.145 6.526 6.526 dbcsr_redistribute 2 8.0 6.279 6.279 6.348 6.348 dbcsr_add_d 280 6.0 0.001 0.001 5.807 5.807 dbcsr_add_anytype 280 7.0 1.459 1.459 5.806 5.806 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.016 0.018 88.310 88.311 qs_energies 1 2.0 0.000 0.000 88.194 88.194 ls_scf 1 3.0 0.000 0.000 88.109 88.110 ls_scf_main 1 4.0 0.000 0.002 84.458 84.458 density_matrix_trs4 11 5.0 0.009 0.014 80.763 80.826 dbcsr_multiply_generic 185 6.1 0.072 0.085 75.375 75.557 multiply_cannon 185 7.1 0.044 0.047 63.091 64.344 multiply_cannon_loop 185 8.1 0.196 0.204 59.554 61.454 multiply_cannon_multrec 1480 9.1 40.003 42.726 40.459 43.188 mp_waitall_1 11936 10.3 17.547 19.912 17.547 19.912 multiply_cannon_metrocomm3 1480 9.1 0.018 0.020 10.370 14.311 make_m2s 370 7.1 0.033 0.037 8.468 8.576 make_images 370 8.1 0.732 0.773 8.348 8.461 multiply_cannon_metrocomm1 1480 9.1 0.008 0.010 4.097 5.296 calculate_norms 2960 9.1 4.366 4.524 4.366 4.524 arnoldi_extremal 12 6.1 0.000 0.001 4.341 4.355 arnoldi_normal_ev 12 7.1 0.002 0.008 4.340 4.354 build_subspace 23 8.1 0.038 0.053 4.194 4.198 make_images_data 370 9.1 0.012 0.013 3.406 3.762 dbcsr_matrix_vector_mult 652 9.0 0.019 0.080 3.462 3.622 mp_sum_l 1039 5.9 2.614 3.448 2.614 3.448 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.178 3.249 hybrid_alltoall_any 393 9.9 0.259 1.339 2.757 2.949 dbcsr_complete_redistribute 23 7.5 1.848 1.947 2.805 2.930 matrix_ls_to_qs 11 6.0 0.000 0.000 2.771 2.897 dbcsr_matrix_vector_mult_local 652 10.0 2.649 2.854 2.653 2.858 ls_scf_init_scf 1 4.0 0.000 0.000 2.854 2.855 ls_scf_init_matrix_S 1 5.0 0.000 0.000 2.815 2.824 matrix_decluster 11 7.0 0.000 0.000 2.504 2.632 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 2.578 2.580 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.001 1.803 2.533 make_images_pack 370 9.1 2.193 2.475 2.197 2.480 dbcsr_add_d 280 6.0 0.001 0.001 1.939 2.018 dbcsr_add_anytype 280 7.0 1.037 1.114 1.938 2.017 buffer_matrices_ensure_size 370 8.1 1.838 1.946 1.838 1.946 dbcsr_finalize 646 7.5 0.014 0.014 1.793 1.866 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=101.70000000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=66.987, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=42.948, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=34.944, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=21.825, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=18.929, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=19.283, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.848, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=40.003, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.649, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.366, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=2.614, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=17.547, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.006 107.787 107.787 lib_test 1 2.0 0.000 0.000 107.780 107.780 dbcsr_run_tests 3 3.0 0.003 0.003 107.780 107.780 test_multiplies_multiproc 3 4.0 0.001 0.001 86.276 86.276 dbcsr_redistribute 9 5.0 57.970 57.970 61.712 61.712 dbcsr_multiply_generic 9 5.0 0.001 0.001 22.762 22.762 dbcsr_make_random_matrix 9 4.0 15.828 15.828 21.421 21.421 multiply_cannon 9 6.0 0.004 0.004 16.198 16.198 multiply_cannon_loop 9 7.0 0.005 0.005 15.628 15.628 multiply_cannon_multrec 9 8.0 15.623 15.623 15.624 15.624 dbcsr_finalize 27 5.7 0.004 0.004 9.540 9.540 dbcsr_merge_all 18 6.5 3.424 3.424 8.860 8.860 mp_alltoall_d11v 27 6.0 3.420 3.420 3.420 3.420 tree_to_linear_d 9 7.0 3.410 3.410 3.410 3.410 dbcsr_data_release 975 7.6 2.200 2.200 2.200 2.200 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.005 23.419 23.419 lib_test 1 2.0 0.000 0.000 23.391 23.408 dbcsr_run_tests 3 3.0 0.000 0.001 23.390 23.406 test_multiplies_multiproc 3 4.0 0.001 0.002 22.262 22.326 dbcsr_multiply_generic 9 5.0 0.001 0.002 20.576 20.656 multiply_cannon 9 6.0 0.002 0.003 18.503 19.016 multiply_cannon_loop 9 7.0 0.003 0.004 18.121 18.580 multiply_cannon_multrec 72 8.0 15.296 16.074 15.297 16.075 mp_waitall_1 576 9.2 3.200 3.888 3.200 3.888 multiply_cannon_metrocomm1 72 8.0 0.001 0.001 2.548 3.423 dbcsr_make_random_matrix 9 4.0 0.889 0.901 1.096 1.127 mp_sum_l 310 2.7 0.532 1.000 0.532 1.000 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.526 0.994 make_m2s 18 6.0 0.001 0.001 0.850 0.922 make_images 18 7.0 0.027 0.030 0.847 0.919 dbcsr_finalize 27 5.7 0.000 0.001 0.773 0.849 multiply_cannon_metrocomm3 72 8.0 0.000 0.000 0.267 0.798 dbcsr_merge_all 18 6.5 0.129 0.142 0.705 0.763 dbcsr_redistribute 9 5.0 0.371 0.415 0.634 0.662 dbcsr_data_release 444 7.6 0.482 0.566 0.482 0.566 make_images_data 18 8.0 0.001 0.001 0.424 0.526 dbcsr_destroy 111 5.9 0.006 0.052 0.422 0.491 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=9.321999999999989, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=57.97, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=15.828, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=15.623, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.424, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=3.42, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.2, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=2.5199999999999996, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.371, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.889, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=15.296, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.129, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.482, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=3.2, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.532, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-06-16 20:33:41+00:00