StartDate: 2021-04-21 11:08:28+00:00 CpuId: 64x Intel Xeon W 2000 / Scalable Bronze 3000 / Silver 4000 / Gold 5000 / 6000 / Platinum 8000 (Skylake), 14nm CommitSHA: 4ceb55d7728e483bd4825e6faf97f65e69b001d5 CommitTime: 2021-04-20 23:57:53 +0200 CommitAuthor: Ole Schütt CommitSubject: grid: Refactor GPU backend Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-f73:gittree-27ae894-buildargs-68b329d Sending build context to Docker daemon 74.24kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> 6f11dcde980e Step 3/9 : WORKDIR /workspace ---> Running in f756e72dc84d Removing intermediate container f756e72dc84d ---> 055b9dbc462c Step 4/9 : COPY ./scripts/install_basics.sh . ---> ad83cb144694 Step 5/9 : RUN ./install_basics.sh ---> Running in 7cba4c6cd5eb Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 14718 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8_amd64.deb ... Unpacking rsync (3.1.3-8) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container 7cba4c6cd5eb ---> e4eeb595a958 Step 6/9 : COPY ./scripts/install_performance.sh . ---> 94c425b6f55e Step 7/9 : RUN ./install_performance.sh "local" ---> Running in 97f20902e79c './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_valgrind.psmp' -> '/opt/cp2k-toolchain/install/arch/local_valgrind.psmp' './local_valgrind.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_valgrind.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container 97f20902e79c ---> 0f1ec88dd84d Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> c93aa51286ac Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in fc720d55a5b5 Removing intermediate container fc720d55a5b5 ---> 1c720ef9d74e Successfully built 1c720ef9d74e Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-f73:gittree-a03e945-buildargs-200fced Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: 4ceb55d7728e483bd4825e6faf97f65e69b001d5 CommitTime: 2021-04-20 23:57:53 +0200 CommitAuthor: Ole Schütt CommitSubject: grid: Refactor GPU backend ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.042 0.042 213.330 213.330 qs_mol_dyn_low 1 2.0 0.005 0.005 212.469 212.469 qs_forces 11 3.9 0.002 0.002 212.404 212.404 qs_energies 11 4.9 0.001 0.001 198.303 198.303 scf_env_do_scf 11 5.9 0.001 0.001 165.815 165.815 velocity_verlet 10 3.0 0.002 0.002 148.813 148.813 scf_env_do_scf_inner_loop 108 6.5 0.014 0.014 109.320 109.320 init_scf_loop 11 6.9 0.000 0.000 56.286 56.286 rebuild_ks_matrix 119 8.3 0.001 0.001 51.063 51.063 qs_ks_build_kohn_sham_matrix 119 9.3 0.020 0.020 51.062 51.062 prepare_preconditioner 11 7.9 0.000 0.000 50.601 50.601 make_preconditioner 11 8.9 0.000 0.000 50.601 50.601 make_full_inverse_cholesky 11 9.9 0.000 0.000 48.160 48.160 qs_ks_update_qs_env 119 7.6 0.001 0.001 46.195 46.195 qs_rho_update_rho 119 7.7 0.001 0.001 44.588 44.588 calculate_rho_elec 119 8.7 1.589 1.589 44.587 44.587 grid_collocate_task_list 119 9.7 36.853 36.853 36.853 36.853 sum_up_and_integrate 119 10.3 0.447 0.447 35.820 35.820 integrate_v_rspace 119 11.3 0.173 0.173 35.372 35.372 cp_fm_cholesky_invert 11 10.9 32.014 32.014 32.014 32.014 grid_integrate_task_list 119 12.3 30.380 30.380 30.380 30.380 qs_scf_new_mos 108 7.5 0.001 0.001 27.828 27.828 qs_scf_loop_do_ot 108 8.5 0.001 0.001 27.827 27.827 dbcsr_multiply_generic 2286 12.5 0.188 0.188 27.632 27.632 ot_scf_mini 108 9.5 0.004 0.004 26.223 26.223 ot_mini 108 10.5 0.001 0.001 17.893 17.893 make_m2s 4572 13.5 0.068 0.068 17.092 17.092 init_scf_run 11 5.9 0.001 0.001 16.482 16.482 scf_env_initial_rho_setup 11 6.9 0.001 0.001 16.481 16.481 wfi_extrapolate 11 7.9 0.001 0.001 15.012 15.012 cp_gemm 81 9.0 0.000 0.000 12.019 12.019 cp_gemm_fm_gemm 81 10.0 0.000 0.000 12.018 12.018 cp_fm_gemm 81 11.0 12.018 12.018 12.018 12.018 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.860 11.860 pw_transfer 1439 11.6 0.110 0.110 11.385 11.385 fft_wrap_pw1pw2 1201 12.6 0.011 0.011 11.003 11.003 ot_diis_step 108 11.5 0.006 0.006 9.601 9.601 fft_wrap_pw1pw2_140 487 13.2 1.518 1.518 9.395 9.395 make_images 4572 14.5 2.755 2.755 9.352 9.352 cp_fm_cholesky_decompose 22 10.9 8.825 8.825 8.825 8.825 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 8.644 8.644 apply_single 119 13.6 0.001 0.001 8.644 8.644 qs_ot_get_derivative 108 11.5 0.002 0.002 8.287 8.287 dbcsr_make_dense_low 5837 15.5 0.115 0.115 7.985 7.985 make_dense_data 5837 16.5 6.669 6.669 7.852 7.852 dbcsr_complete_redistribute 329 12.2 3.422 3.422 7.561 7.561 dbcsr_copy 2102 12.0 0.638 0.638 7.202 7.202 dbcsr_make_images_dense 3978 14.8 0.026 0.026 7.201 7.201 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.174 7.174 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 6.925 6.925 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.557 6.557 dbcsr_copy_into_existing 22 7.9 6.507 6.507 6.507 6.507 copy_dbcsr_to_fm 153 11.3 0.004 0.004 6.164 6.164 density_rs2pw 119 9.7 0.007 0.007 6.145 6.145 qs_create_task_list 11 7.9 0.000 0.000 5.965 5.965 generate_qs_task_list 11 8.9 3.938 3.938 5.965 5.965 pw_poisson_solve 119 10.3 2.263 2.263 5.246 5.246 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 5.122 5.122 fft3d_s 1202 14.6 5.039 5.039 5.045 5.045 multiply_cannon 2286 13.5 0.335 0.335 5.019 5.019 potential_pw2rs 119 12.3 0.085 0.085 4.819 4.819 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.629 4.629 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.013 123.025 123.026 qs_mol_dyn_low 1 2.0 0.007 0.008 122.904 122.910 qs_forces 11 3.9 0.002 0.003 122.848 122.849 qs_energies 11 4.9 0.001 0.001 116.847 116.850 scf_env_do_scf 11 5.9 0.001 0.001 104.402 104.404 velocity_verlet 10 3.0 0.002 0.002 83.611 83.612 scf_env_do_scf_inner_loop 108 6.5 0.006 0.015 70.111 70.112 init_scf_loop 11 6.9 0.001 0.001 34.271 34.272 rebuild_ks_matrix 119 8.3 0.001 0.001 33.866 33.933 qs_ks_build_kohn_sham_matrix 119 9.3 0.022 0.023 33.865 33.933 prepare_preconditioner 11 7.9 0.000 0.000 30.824 30.835 make_preconditioner 11 8.9 0.000 0.000 30.824 30.835 make_full_inverse_cholesky 11 9.9 0.000 0.000 30.634 30.662 qs_ks_update_qs_env 119 7.6 0.001 0.001 29.827 29.863 cp_fm_cholesky_invert 11 10.9 29.802 29.821 29.802 29.821 qs_rho_update_rho 119 7.7 0.001 0.001 27.308 27.336 calculate_rho_elec 119 8.7 0.049 0.051 27.307 27.335 sum_up_and_integrate 119 10.3 0.058 0.065 25.586 25.706 integrate_v_rspace 119 11.3 0.005 0.005 25.527 25.652 dbcsr_multiply_generic 2286 12.5 0.139 0.144 21.444 21.626 qs_scf_new_mos 108 7.5 0.001 0.001 18.449 18.495 qs_scf_loop_do_ot 108 8.5 0.001 0.001 18.448 18.494 grid_collocate_task_list 119 9.7 17.292 18.030 17.292 18.030 grid_integrate_task_list 119 12.3 17.127 17.686 17.127 17.686 ot_scf_mini 108 9.5 0.003 0.004 17.467 17.516 multiply_cannon 2286 13.5 0.249 0.254 14.912 15.324 multiply_cannon_loop 2286 14.5 0.223 0.232 13.101 13.320 mp_waitall_1 169478 16.3 11.849 13.004 11.849 13.004 ot_mini 108 10.5 0.001 0.001 10.783 10.830 rs_pw_transfer 974 11.9 0.018 0.020 9.699 10.561 density_rs2pw 119 9.7 0.009 0.009 9.244 10.223 pw_transfer 1439 11.6 0.151 0.158 8.765 8.971 init_scf_run 11 5.9 0.000 0.002 8.862 8.862 scf_env_initial_rho_setup 11 6.9 0.000 0.001 8.861 8.862 fft_wrap_pw1pw2 1201 12.6 0.015 0.016 8.437 8.632 wfi_extrapolate 11 7.9 0.001 0.001 8.294 8.295 multiply_cannon_metrocomm3 18288 15.5 0.075 0.078 7.221 7.952 fft_wrap_pw1pw2_140 487 13.2 0.632 0.676 6.707 7.456 potential_pw2rs 119 12.3 0.010 0.011 7.133 7.262 fft3d_ps 1201 14.6 2.714 2.925 6.817 7.097 cp_gemm 81 9.0 0.000 0.000 6.600 6.610 cp_gemm_fm_gemm 81 10.0 0.000 0.000 6.600 6.610 cp_fm_gemm 81 11.0 6.599 6.610 6.599 6.610 ot_diis_step 108 11.5 0.005 0.005 6.456 6.457 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 5.983 6.193 apply_single 119 13.6 0.001 0.001 5.983 6.193 make_m2s 4572 13.5 0.075 0.080 4.569 5.086 mp_waitany 9880 13.7 3.965 4.870 3.965 4.870 multiply_cannon_multrec 18288 15.5 4.399 4.767 4.416 4.784 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.387 4.428 rs_pw_transfer_RS2PW_140 130 11.5 0.621 0.681 3.420 4.402 make_images 4572 14.5 0.185 0.192 3.857 4.366 qs_ot_get_derivative 108 11.5 0.001 0.002 4.280 4.329 mp_alltoall_z22v 1201 16.6 3.365 3.879 3.365 3.879 rs_pw_transfer_PW2RS_140 130 13.9 1.457 1.855 3.279 3.414 mp_sum_d 4125 12.0 2.320 2.960 2.320 2.960 yz_to_x 487 15.3 0.272 0.346 2.490 2.820 qs_ot_get_p 119 10.4 0.001 0.001 2.486 2.527 make_images_data 4572 15.5 0.058 0.063 1.973 2.502 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=93.24000000000001, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=36.853, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=32.014, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=30.38, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_gemm", label="cp_fm_gemm", y=12.018, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=8.825, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=40.355999999999995, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=17.292, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=29.802, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=17.127, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_gemm", label="cp_fm_gemm", y=6.599, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=11.849, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.033 0.033 247.171 247.171 qs_mol_dyn_low 1 2.0 0.004 0.004 246.263 246.263 qs_forces 11 3.9 0.002 0.002 246.202 246.202 qs_energies 11 4.9 0.001 0.001 231.563 231.563 scf_env_do_scf 11 5.9 0.001 0.001 195.663 195.663 velocity_verlet 10 3.0 0.002 0.002 169.612 169.612 scf_env_do_scf_inner_loop 96 6.5 0.012 0.012 136.537 136.537 rebuild_ks_matrix 107 8.3 0.001 0.001 67.281 67.281 qs_ks_build_kohn_sham_matrix 107 9.3 0.019 0.019 67.280 67.280 qs_rho_update_rho 107 7.7 0.001 0.001 66.551 66.551 calculate_rho_elec 107 8.7 1.864 1.864 66.550 66.550 qs_ks_update_qs_env 107 7.6 0.001 0.001 60.453 60.453 grid_collocate_task_list 107 9.7 59.305 59.305 59.305 59.305 init_scf_loop 11 6.9 0.001 0.001 58.906 58.906 sum_up_and_integrate 107 10.3 0.395 0.395 53.708 53.708 integrate_v_rspace 107 11.3 0.160 0.160 53.313 53.313 prepare_preconditioner 11 7.9 0.000 0.000 51.057 51.057 make_preconditioner 11 8.9 0.000 0.000 51.057 51.057 grid_integrate_task_list 107 12.3 49.771 49.771 49.771 49.771 make_full_inverse_cholesky 11 9.9 0.000 0.000 48.594 48.594 cp_fm_cholesky_invert 11 10.9 32.268 32.268 32.268 32.268 qs_scf_new_mos 96 7.5 0.001 0.001 22.799 22.799 qs_scf_loop_do_ot 96 8.5 0.001 0.001 22.798 22.798 dbcsr_multiply_generic 1966 12.4 0.165 0.165 22.261 22.261 ot_scf_mini 96 9.5 0.003 0.003 21.067 21.067 init_scf_run 11 5.9 0.001 0.001 18.355 18.355 scf_env_initial_rho_setup 11 6.9 0.001 0.001 18.354 18.354 wfi_extrapolate 11 7.9 0.001 0.001 17.166 17.166 ot_mini 96 10.5 0.001 0.001 14.049 14.049 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 13.298 13.298 make_m2s 3932 13.4 0.060 0.060 13.080 13.080 cp_gemm 81 9.0 0.001 0.001 12.230 12.230 cp_gemm_fm_gemm 81 10.0 0.000 0.000 12.230 12.230 cp_fm_gemm 81 11.0 12.229 12.229 12.229 12.229 pw_transfer 1295 11.6 0.099 0.099 9.542 9.542 cp_fm_cholesky_decompose 22 10.9 8.825 8.825 8.825 8.825 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 8.661 8.661 fft_wrap_pw1pw2 1081 12.6 0.010 0.010 8.633 8.633 dbcsr_complete_redistribute 317 12.2 3.502 3.502 7.944 7.944 qs_env_update_s_mstruct 11 6.9 0.000 0.000 7.935 7.935 qs_create_task_list 11 7.9 0.000 0.000 7.330 7.330 generate_qs_task_list 11 8.9 5.255 5.255 7.330 7.330 make_images 3932 14.4 2.366 2.366 7.209 7.209 qs_ot_get_derivative 96 11.5 0.002 0.002 7.187 7.187 fft_wrap_pw1pw2_140 439 13.2 0.714 0.714 7.183 7.183 dbcsr_copy 1855 11.9 0.297 0.297 6.998 6.998 ot_diis_step 96 11.5 0.005 0.005 6.858 6.858 copy_dbcsr_to_fm 147 11.2 0.004 0.004 6.404 6.404 dbcsr_copy_into_existing 22 7.9 6.244 6.244 6.244 6.244 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 6.123 6.123 apply_single 107 13.6 0.001 0.001 6.122 6.122 dbcsr_make_dense_low 4961 15.5 0.098 0.098 6.081 6.081 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 5.975 5.975 make_dense_data 4961 16.5 5.276 5.276 5.968 5.968 pw_poisson_solve 107 10.3 2.199 2.199 5.447 5.447 dbcsr_make_images_dense 3386 14.7 0.022 0.022 5.395 5.395 density_rs2pw 107 9.7 0.006 0.006 5.381 5.381 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 5.300 5.300 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.013 181.072 181.073 qs_mol_dyn_low 1 2.0 0.005 0.006 180.960 180.966 qs_forces 11 3.9 0.002 0.002 180.904 180.905 qs_energies 11 4.9 0.001 0.002 171.746 171.748 scf_env_do_scf 11 5.9 0.001 0.001 155.684 155.685 scf_env_do_scf_inner_loop 96 6.5 0.006 0.013 117.318 117.318 velocity_verlet 10 3.0 0.002 0.002 116.865 116.867 rebuild_ks_matrix 107 8.3 0.001 0.001 66.636 66.712 qs_ks_build_kohn_sham_matrix 107 9.3 0.020 0.021 66.635 66.711 qs_ks_update_qs_env 107 7.6 0.001 0.001 59.294 59.368 sum_up_and_integrate 107 10.3 0.052 0.059 56.410 58.093 integrate_v_rspace 107 11.3 0.004 0.005 56.357 58.043 qs_rho_update_rho 107 7.7 0.001 0.001 53.846 53.868 calculate_rho_elec 107 8.7 0.044 0.046 53.845 53.867 grid_integrate_task_list 107 12.3 46.825 47.704 46.825 47.704 grid_collocate_task_list 107 9.7 44.077 45.147 44.077 45.147 init_scf_loop 11 6.9 0.001 0.001 38.346 38.347 prepare_preconditioner 11 7.9 0.000 0.000 31.022 31.032 make_preconditioner 11 8.9 0.000 0.000 31.022 31.032 make_full_inverse_cholesky 11 9.9 0.000 0.000 30.845 30.870 cp_fm_cholesky_invert 11 10.9 30.053 30.075 30.053 30.075 dbcsr_multiply_generic 1966 12.4 0.120 0.124 23.075 23.719 qs_scf_new_mos 96 7.5 0.001 0.001 16.750 16.821 qs_scf_loop_do_ot 96 8.5 0.001 0.001 16.749 16.820 ot_scf_mini 96 9.5 0.003 0.003 15.904 15.973 multiply_cannon 1966 13.4 0.216 0.220 14.711 15.232 mp_waitall_1 146670 16.2 11.793 12.603 11.793 12.603 init_scf_run 11 5.9 0.000 0.002 12.400 12.401 scf_env_initial_rho_setup 11 6.9 0.000 0.001 12.400 12.401 multiply_cannon_loop 1966 14.4 0.198 0.334 12.167 12.394 wfi_extrapolate 11 7.9 0.001 0.001 11.564 11.564 rs_pw_transfer 878 11.9 0.016 0.018 9.978 11.241 density_rs2pw 107 9.7 0.008 0.008 8.921 10.205 ot_mini 96 10.5 0.001 0.001 8.813 8.882 pw_transfer 1295 11.6 0.136 0.145 8.106 8.643 fft_wrap_pw1pw2 1081 12.6 0.013 0.014 7.815 8.364 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.574 7.580 multiply_cannon_metrocomm3 15728 15.4 0.065 0.067 7.007 7.568 potential_pw2rs 107 12.3 0.009 0.009 7.268 7.298 fft3d_ps 1081 14.6 2.358 2.675 6.383 6.977 cp_gemm 81 9.0 0.000 0.000 6.744 6.755 cp_gemm_fm_gemm 81 10.0 0.000 0.000 6.744 6.754 cp_fm_gemm 81 11.0 6.744 6.754 6.744 6.754 fft_wrap_pw1pw2_140 439 13.2 0.565 0.643 5.654 6.281 mp_waitany 8968 13.7 4.380 5.547 4.380 5.547 mp_alltoall_d11v 1998 13.7 2.827 5.504 2.827 5.504 rs_pw_transfer_RS2PW_140 118 11.5 0.480 0.521 3.853 5.113 make_m2s 3932 13.4 0.064 0.069 4.403 4.857 rs_gather_matrices 107 12.3 0.136 0.151 2.204 4.824 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 4.618 4.720 apply_single 107 13.6 0.001 0.001 4.618 4.719 ot_diis_step 96 11.5 0.004 0.004 4.585 4.585 qs_ot_get_derivative 96 11.5 0.001 0.001 4.192 4.264 make_images 3932 14.4 0.163 0.168 3.777 4.196 multiply_cannon_multrec 15728 15.4 3.859 4.007 3.874 4.022 mp_sum_l 9586 13.2 3.172 3.853 3.172 3.853 mp_alltoall_z22v 1081 16.6 3.381 3.701 3.381 3.701 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=84.77300000000002, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=59.305, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=49.771, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=32.268, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_gemm", label="cp_fm_gemm", y=12.229, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=8.825, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=41.58000000000001, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=44.077, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=46.825, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=30.053, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_gemm", label="cp_fm_gemm", y=6.744, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=11.793, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.424 0.424 270.501 270.501 qs_energies 1 2.0 0.000 0.000 269.134 269.134 scf_env_do_scf 1 3.0 0.000 0.000 266.099 266.099 qs_ks_update_qs_env 8 5.0 0.000 0.000 253.761 253.761 rebuild_ks_matrix 7 6.0 0.000 0.000 253.639 253.639 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 253.639 253.639 hfx_ks_matrix 7 8.0 0.000 0.000 180.324 180.324 integrate_four_center 7 9.0 9.762 9.762 180.291 180.291 integrate_four_center_main 7 10.0 1.235 1.235 161.071 161.071 integrate_four_center_bin 447 11.0 159.836 159.836 159.836 159.836 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 151.164 151.164 init_scf_loop 1 4.0 0.000 0.000 114.918 114.918 cp_gemm 129 10.3 0.001 0.001 56.382 56.382 cp_gemm_fm_gemm 129 11.3 0.000 0.000 56.381 56.381 cp_fm_gemm 129 12.3 56.380 56.380 56.380 56.380 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 34.224 34.224 admm_fit_mo_coeffs 7 9.0 0.000 0.000 31.370 31.370 admm_mo_merge_derivs 7 8.0 0.000 0.000 27.367 27.367 merge_mo_derivs_diag 7 9.0 0.023 0.023 27.367 27.367 purify_mo_diag 7 10.0 0.001 0.001 15.721 15.721 fit_mo_coeffs 7 10.0 0.000 0.000 15.649 15.649 integrate_four_center_load 7 10.0 0.000 0.000 8.877 8.877 hfx_load_balance 1 11.0 0.002 0.002 8.877 8.877 calculate_rho_elec 15 7.4 0.197 0.197 6.919 6.919 prepare_preconditioner 1 5.0 0.000 0.000 6.499 6.499 make_preconditioner 1 6.0 0.000 0.000 6.499 6.499 grid_collocate_task_list 15 8.4 5.704 5.704 5.704 5.704 qs_vxc_create 14 8.0 0.000 0.000 5.563 5.563 xc_vxc_pw_create 14 9.0 0.197 0.197 5.562 5.562 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.061 215.005 215.006 qs_energies 1 2.0 0.000 0.001 214.816 214.817 scf_env_do_scf 1 3.0 0.000 0.000 214.092 214.092 qs_ks_update_qs_env 8 5.0 0.000 0.000 207.765 207.765 rebuild_ks_matrix 7 6.0 0.000 0.000 207.750 207.750 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 207.750 207.750 hfx_ks_matrix 7 8.0 0.000 0.001 172.592 172.611 integrate_four_center 7 9.0 0.135 0.408 172.580 172.599 integrate_four_center_main 7 10.0 0.005 0.005 158.788 162.005 integrate_four_center_bin 448 11.0 158.783 162.001 158.783 162.001 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 127.334 127.334 init_scf_loop 1 4.0 0.000 0.000 86.756 86.757 cp_gemm 129 10.3 0.001 0.001 25.590 25.599 cp_gemm_fm_gemm 129 11.3 0.000 0.000 25.589 25.598 cp_fm_gemm 129 12.3 25.589 25.598 25.589 25.598 admm_mo_merge_derivs 7 8.0 0.000 0.000 15.013 15.016 merge_mo_derivs_diag 7 9.0 0.013 0.013 15.013 15.016 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 12.535 12.543 admm_fit_mo_coeffs 7 9.0 0.000 0.000 11.373 11.375 integrate_four_center_load 7 10.0 0.000 0.000 8.885 8.890 hfx_load_balance 1 11.0 0.001 0.001 8.885 8.889 mp_sync 56 10.8 3.977 6.853 3.977 6.853 purify_mo_diag 7 10.0 0.000 0.000 6.782 6.786 qs_vxc_create 14 8.0 0.001 0.001 4.884 4.884 xc_vxc_pw_create 14 9.0 0.016 0.017 4.884 4.884 fit_mo_coeffs 7 10.0 0.000 0.000 4.591 4.595 hfx_load_balance_count 1 12.0 4.350 4.441 4.350 4.441 hfx_load_balance_bin 1 12.0 4.345 4.438 4.345 4.438 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=37.583999999999946, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=159.836, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_fm_gemm", label="cp_fm_gemm", y=56.38, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=9.762, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=5.704, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=1.235, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=17.821000000000026, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=158.783, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_fm_gemm", label="cp_fm_gemm", y=25.589, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.135, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.005, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.35, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.345, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=3.977, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.127 0.127 335.031 335.031 qs_energies 1 2.0 0.000 0.000 334.816 334.816 ls_scf 1 3.0 0.000 0.000 332.653 332.653 ls_scf_main 1 4.0 0.002 0.002 315.215 315.215 density_matrix_trs4 11 5.0 0.013 0.013 162.648 162.648 ls_scf_dm_to_ks 11 5.0 0.000 0.000 144.799 144.799 matrix_ls_to_qs 11 6.0 0.000 0.000 139.949 139.949 dbcsr_multiply_generic 185 6.1 0.516 0.516 107.843 107.843 dbcsr_copy_into_existing 11 7.0 80.638 80.638 80.638 80.638 dbcsr_complete_redistribute 23 7.5 43.985 43.985 64.829 64.829 multiply_cannon 185 7.1 0.476 0.476 64.050 64.050 matrix_decluster 11 7.0 0.000 0.000 59.309 59.309 multiply_cannon_loop 185 8.1 0.515 0.515 44.103 44.103 multiply_cannon_multrec 185 9.1 40.653 40.653 40.711 40.711 make_m2s 370 7.1 0.034 0.034 37.209 37.209 make_images 370 8.1 8.416 8.416 34.415 34.415 arnoldi_extremal 12 6.1 0.000 0.000 30.936 30.936 arnoldi_normal_ev 12 7.1 0.026 0.026 30.935 30.935 build_subspace 23 8.1 0.143 0.143 30.266 30.266 dbcsr_finalize 646 7.5 0.240 0.240 27.571 27.571 dbcsr_matrix_vector_mult 652 9.0 0.227 0.227 27.205 27.205 dbcsr_matrix_vector_mult_local 652 10.0 25.815 25.815 25.835 25.835 dbcsr_merge_all 597 8.5 4.932 4.932 25.711 25.711 setup_rec_index_2d 370 8.1 19.296 19.296 19.296 19.296 tree_to_linear_d 110 9.4 18.300 18.300 18.300 18.300 dbcsr_sort_indices 1103 9.9 16.581 16.581 16.581 16.581 ls_scf_init_scf 1 4.0 0.000 0.000 16.463 16.463 ls_scf_init_matrix_S 1 5.0 0.000 0.000 15.969 15.969 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 14.988 14.988 quick_finalize 395 10.0 1.101 1.101 14.650 14.650 dbcsr_special_finalize 370 9.1 0.003 0.003 13.543 13.543 make_images_data 370 9.1 0.013 0.013 12.308 12.308 dbcsr_dot_sd 144 6.3 11.421 11.421 11.422 11.422 hybrid_alltoall_any 393 9.9 9.513 9.513 10.285 10.285 dbcsr_frobenius_norm 142 6.1 9.236 9.236 9.239 9.239 dbcsr_new_transposed 2 7.0 0.175 0.175 8.215 8.215 matrix_qs_to_ls 12 5.1 0.000 0.000 8.063 8.063 matrix_cluster 12 6.1 0.000 0.000 8.063 8.063 dbcsr_redistribute 2 8.0 7.944 7.944 8.004 8.004 dbcsr_add_d 280 6.0 0.001 0.001 7.536 7.536 dbcsr_add_anytype 280 7.0 1.846 1.846 7.535 7.535 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.012 113.610 113.611 qs_energies 1 2.0 0.000 0.000 113.507 113.508 ls_scf 1 3.0 0.000 0.000 113.425 113.426 ls_scf_main 1 4.0 0.001 0.002 108.969 108.970 density_matrix_trs4 11 5.0 0.009 0.015 104.768 104.855 dbcsr_multiply_generic 185 6.1 0.079 0.090 98.546 98.887 multiply_cannon 185 7.1 0.050 0.053 81.700 82.864 multiply_cannon_loop 185 8.1 0.271 0.465 77.187 78.164 multiply_cannon_multrec 1480 9.1 48.241 51.451 48.832 52.031 mp_waitall_1 11936 10.3 26.226 31.189 26.226 31.189 multiply_cannon_metrocomm3 1480 9.1 0.021 0.024 16.098 20.862 make_m2s 370 7.1 0.037 0.041 11.339 11.541 make_images 370 8.1 0.760 0.803 11.209 11.419 multiply_cannon_metrocomm1 1480 9.1 0.010 0.011 5.791 9.151 calculate_norms 2960 9.1 6.090 7.129 6.090 7.129 make_images_data 370 9.1 0.013 0.015 4.762 5.274 mp_sum_l 1039 5.9 4.070 5.217 4.070 5.217 arnoldi_extremal 12 6.1 0.000 0.001 4.406 4.436 arnoldi_normal_ev 12 7.1 0.002 0.008 4.405 4.436 build_subspace 23 8.1 0.050 0.067 4.265 4.269 hybrid_alltoall_any 393 9.9 0.343 1.735 3.952 4.260 dbcsr_multiply_generic_mpsum_f 137 7.1 0.001 0.001 3.048 4.180 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.648 3.783 dbcsr_matrix_vector_mult 652 9.0 0.019 0.085 3.515 3.662 dbcsr_complete_redistribute 23 7.5 1.922 2.118 3.208 3.465 ls_scf_init_scf 1 4.0 0.000 0.000 3.430 3.431 matrix_ls_to_qs 11 6.0 0.000 0.000 3.160 3.418 ls_scf_init_matrix_S 1 5.0 0.000 0.000 3.388 3.398 matrix_decluster 11 7.0 0.000 0.000 2.897 3.151 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 3.103 3.106 make_images_pack 370 9.1 2.747 3.079 2.752 3.084 buffer_matrices_ensure_size 370 8.1 2.525 3.047 2.525 3.047 dbcsr_matrix_vector_mult_local 652 10.0 2.688 2.934 2.691 2.938 dbcsr_add_d 280 6.0 0.001 0.001 2.446 2.627 dbcsr_add_anytype 280 7.0 1.370 1.561 2.444 2.626 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=124.644, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=80.638, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=43.985, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=40.653, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=25.815, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=19.296, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="make_images_pack", label="make_images_pack", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=21.62599999999999, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.922, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=48.241, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.688, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=26.226, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="make_images_pack", label="make_images_pack", y=2.747, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=4.07, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=6.09, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.005 130.709 130.709 lib_test 1 2.0 0.000 0.000 130.702 130.702 dbcsr_run_tests 3 3.0 0.003 0.003 130.702 130.702 test_multiplies_multiproc 3 4.0 0.001 0.001 107.325 107.325 dbcsr_redistribute 9 5.0 70.998 70.998 75.256 75.256 dbcsr_multiply_generic 9 5.0 0.001 0.001 30.157 30.157 dbcsr_make_random_matrix 9 4.0 16.126 16.126 23.277 23.277 multiply_cannon 9 6.0 0.004 0.004 22.263 22.263 multiply_cannon_loop 9 7.0 0.005 0.005 21.639 21.639 multiply_cannon_multrec 9 8.0 21.633 21.633 21.634 21.634 dbcsr_finalize 27 5.7 0.005 0.005 11.668 11.668 dbcsr_merge_all 18 6.5 3.968 3.968 10.941 10.941 tree_to_linear_d 9 7.0 4.383 4.383 4.383 4.383 mp_alltoall_d11v 27 6.0 3.944 3.944 3.944 3.944 make_m2s 18 6.0 0.001 0.001 2.809 2.809 make_images 18 7.0 0.666 0.666 2.721 2.721 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.005 29.896 29.897 lib_test 1 2.0 0.000 0.000 29.869 29.885 dbcsr_run_tests 3 3.0 0.000 0.001 29.868 29.884 test_multiplies_multiproc 3 4.0 0.001 0.001 28.684 28.820 dbcsr_multiply_generic 9 5.0 0.001 0.002 26.733 26.808 multiply_cannon 9 6.0 0.002 0.003 24.022 24.618 multiply_cannon_loop 9 7.0 0.004 0.004 23.534 24.076 multiply_cannon_multrec 72 8.0 19.302 20.433 19.304 20.435 mp_waitall_1 576 9.2 4.776 6.348 4.776 6.348 multiply_cannon_metrocomm1 72 8.0 0.002 0.002 3.794 5.614 mp_sum_l 310 2.7 0.750 1.392 0.750 1.392 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.744 1.385 make_m2s 18 6.0 0.001 0.001 1.156 1.227 make_images 18 7.0 0.028 0.031 1.153 1.224 dbcsr_make_random_matrix 9 4.0 0.892 0.908 1.148 1.195 dbcsr_finalize 27 5.7 0.001 0.001 0.916 1.047 multiply_cannon_metrocomm3 72 8.0 0.000 0.001 0.425 0.998 dbcsr_merge_all 18 6.5 0.162 0.183 0.860 0.975 dbcsr_redistribute 9 5.0 0.444 0.504 0.799 0.837 make_images_data 18 8.0 0.001 0.001 0.613 0.702 hybrid_alltoall_any 18 9.0 0.049 0.207 0.503 0.611 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=13.600999999999999, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=70.998, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=21.633, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=16.126, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="tree_to_linear_d", label="tree_to_linear_d", y=4.383, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.968, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=3.570000000000004, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.444, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=19.302, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.892, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="tree_to_linear_d", label="tree_to_linear_d", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.162, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=4.776, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.75, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-04-21 12:00:02+00:00