StartDate: 2021-04-12 20:19:58+00:00 CpuId: 64x Intel Xeon W 2000 / Scalable Bronze 3000 / Silver 4000 / Gold 5000 / 6000 / Platinum 8000 (Skylake), 14nm CommitSHA: 264012ee8e04e215969dab685d19eea9a25848f8 CommitTime: 2021-04-12 22:00:21 +0200 CommitAuthor: Matthias Krack CommitSubject: Update to libvori-210412 (Martin Brehm) Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-f73:gittree-27ae894-buildargs-68b329d Sending build context to Docker daemon 74.24kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> 6f11dcde980e Step 3/9 : WORKDIR /workspace ---> Running in f6b0eb9c6d5a Removing intermediate container f6b0eb9c6d5a ---> 1345bd8904e4 Step 4/9 : COPY ./scripts/install_basics.sh . ---> b4302b1d4452 Step 5/9 : RUN ./install_basics.sh ---> Running in 61db8fd08272 Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 14718 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8_amd64.deb ... Unpacking rsync (3.1.3-8) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container 61db8fd08272 ---> e1b49a45fbdf Step 6/9 : COPY ./scripts/install_performance.sh . ---> 474a54380f8d Step 7/9 : RUN ./install_performance.sh "local" ---> Running in ca3fbf84ade0 './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_valgrind.psmp' -> '/opt/cp2k-toolchain/install/arch/local_valgrind.psmp' './local_valgrind.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_valgrind.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container ca3fbf84ade0 ---> 1faab453406a Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> be7da1f48859 Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in b63013fb4a29 Removing intermediate container b63013fb4a29 ---> de676b56a8a7 Successfully built de676b56a8a7 Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-f73:gittree-a03e945-buildargs-200fced Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: 264012ee8e04e215969dab685d19eea9a25848f8 CommitTime: 2021-04-12 22:00:21 +0200 CommitAuthor: Matthias Krack CommitSubject: Update to libvori-210412 (Martin Brehm) ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.040 0.040 216.126 216.126 qs_mol_dyn_low 1 2.0 0.004 0.004 215.260 215.260 qs_forces 11 3.9 0.002 0.002 215.198 215.198 qs_energies 11 4.9 0.001 0.001 201.445 201.445 scf_env_do_scf 11 5.9 0.001 0.001 167.975 167.975 velocity_verlet 10 3.0 0.002 0.002 152.041 152.041 scf_env_do_scf_inner_loop 108 6.5 0.014 0.014 111.045 111.045 init_scf_loop 11 6.9 0.000 0.000 56.722 56.722 prepare_preconditioner 11 7.9 0.000 0.000 51.446 51.446 make_preconditioner 11 8.9 0.000 0.000 51.446 51.446 make_full_inverse_cholesky 11 9.9 0.000 0.000 49.387 49.387 rebuild_ks_matrix 119 8.3 0.001 0.001 49.005 49.005 qs_ks_build_kohn_sham_matrix 119 9.3 0.022 0.022 49.004 49.004 qs_rho_update_rho 119 7.7 0.001 0.001 46.326 46.326 calculate_rho_elec 119 8.7 1.629 1.629 46.325 46.325 qs_ks_update_qs_env 119 7.6 0.001 0.001 43.926 43.926 grid_collocate_task_list 119 9.7 37.983 37.983 37.983 37.983 sum_up_and_integrate 119 10.3 0.438 0.438 35.080 35.080 integrate_v_rspace 119 11.3 0.193 0.193 34.642 34.642 cp_fm_cholesky_invert 11 10.9 32.583 32.583 32.583 32.583 grid_integrate_task_list 119 12.3 30.809 30.809 30.809 30.809 qs_scf_new_mos 108 7.5 0.001 0.001 29.373 29.373 qs_scf_loop_do_ot 108 8.5 0.001 0.001 29.372 29.372 dbcsr_multiply_generic 2286 12.5 0.216 0.216 28.308 28.308 ot_scf_mini 108 9.5 0.004 0.004 27.602 27.602 ot_mini 108 10.5 0.001 0.001 18.753 18.753 make_m2s 4572 13.5 0.075 0.075 17.384 17.384 init_scf_run 11 5.9 0.001 0.001 16.654 16.654 scf_env_initial_rho_setup 11 6.9 0.001 0.001 16.653 16.653 wfi_extrapolate 11 7.9 0.001 0.001 15.690 15.690 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 12.546 12.546 cp_gemm 81 9.0 0.000 0.000 12.524 12.524 cp_gemm_fm_gemm 81 10.0 0.000 0.000 12.524 12.524 cp_fm_gemm 81 11.0 12.523 12.523 12.523 12.523 pw_transfer 1439 11.6 0.123 0.123 11.702 11.702 fft_wrap_pw1pw2 1201 12.6 0.013 0.013 11.289 11.289 cp_fm_cholesky_decompose 22 10.9 10.160 10.160 10.160 10.160 qs_ot_get_derivative 108 11.5 0.002 0.002 9.636 9.636 fft_wrap_pw1pw2_140 487 13.2 0.747 0.747 9.404 9.404 ot_diis_step 108 11.5 0.006 0.006 9.112 9.112 make_images 4572 14.5 2.867 2.867 9.004 9.004 dbcsr_make_dense_low 5837 15.5 0.139 0.139 8.614 8.614 make_dense_data 5837 16.5 6.776 6.776 8.455 8.455 dbcsr_make_images_dense 3978 14.8 0.029 0.029 7.737 7.737 dbcsr_copy 2102 12.0 0.338 0.338 7.726 7.726 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 7.565 7.565 apply_single 119 13.6 0.001 0.001 7.564 7.564 dbcsr_copy_into_existing 22 7.9 7.280 7.280 7.281 7.281 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.073 7.073 dbcsr_complete_redistribute 329 12.2 3.252 3.252 6.957 6.957 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.727 6.727 density_rs2pw 119 9.7 0.007 0.007 6.713 6.713 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.677 6.677 qs_create_task_list 11 7.9 0.000 0.000 6.106 6.106 generate_qs_task_list 11 8.9 4.082 4.082 6.106 6.106 fft3d_s 1202 14.6 5.781 5.781 5.788 5.788 copy_dbcsr_to_fm 153 11.3 0.004 0.004 5.743 5.743 multiply_cannon 2286 13.5 0.421 0.421 5.512 5.512 pw_poisson_solve 119 10.3 2.269 2.269 5.280 5.280 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 5.103 5.103 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.522 4.522 multiply_cannon_loop 2286 14.5 0.103 0.103 4.490 4.490 qs_ot_get_p 119 10.4 0.001 0.001 4.447 4.447 multiply_cannon_multrec 2286 15.5 4.299 4.299 4.385 4.385 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.013 132.334 132.335 qs_mol_dyn_low 1 2.0 0.007 0.008 132.202 132.209 qs_forces 11 3.9 0.002 0.003 132.139 132.140 qs_energies 11 4.9 0.002 0.002 126.072 126.077 scf_env_do_scf 11 5.9 0.001 0.001 112.399 112.403 velocity_verlet 10 3.0 0.002 0.002 90.258 90.259 scf_env_do_scf_inner_loop 108 6.5 0.007 0.016 75.831 75.832 init_scf_loop 11 6.9 0.001 0.001 36.536 36.538 rebuild_ks_matrix 119 8.3 0.001 0.001 36.072 36.103 qs_ks_build_kohn_sham_matrix 119 9.3 0.027 0.029 36.071 36.103 prepare_preconditioner 11 7.9 0.000 0.000 32.678 32.700 make_preconditioner 11 8.9 0.000 0.000 32.678 32.699 make_full_inverse_cholesky 11 9.9 0.000 0.000 32.446 32.481 qs_ks_update_qs_env 119 7.6 0.001 0.002 31.987 32.024 cp_fm_cholesky_invert 11 10.9 31.396 31.418 31.396 31.418 qs_rho_update_rho 119 7.7 0.001 0.002 28.520 28.568 calculate_rho_elec 119 8.7 0.051 0.054 28.519 28.567 sum_up_and_integrate 119 10.3 0.073 0.079 27.566 27.595 integrate_v_rspace 119 11.3 0.006 0.007 27.493 27.526 dbcsr_multiply_generic 2286 12.5 0.160 0.163 25.164 25.227 qs_scf_new_mos 108 7.5 0.001 0.002 21.316 21.351 qs_scf_loop_do_ot 108 8.5 0.001 0.002 21.314 21.350 ot_scf_mini 108 9.5 0.004 0.005 20.001 20.034 grid_collocate_task_list 119 9.7 19.022 19.785 19.022 19.785 grid_integrate_task_list 119 12.3 18.555 19.037 18.555 19.037 multiply_cannon 2286 13.5 0.265 0.274 16.943 17.305 multiply_cannon_loop 2286 14.5 0.274 0.284 15.305 15.690 mp_waitall_1 169478 16.3 13.192 13.502 13.192 13.502 ot_mini 108 10.5 0.001 0.001 11.663 11.693 rs_pw_transfer 974 11.9 0.022 0.023 9.760 10.622 init_scf_run 11 5.9 0.000 0.002 9.834 9.835 scf_env_initial_rho_setup 11 6.9 0.000 0.001 9.834 9.835 density_rs2pw 119 9.7 0.010 0.012 8.656 9.527 wfi_extrapolate 11 7.9 0.001 0.002 9.264 9.265 multiply_cannon_metrocomm3 18288 15.5 0.091 0.094 8.436 8.667 pw_transfer 1439 11.6 0.162 0.179 8.486 8.556 fft_wrap_pw1pw2 1201 12.6 0.017 0.019 8.118 8.192 potential_pw2rs 119 12.3 0.011 0.013 7.278 7.292 cp_gemm 81 9.0 0.000 0.001 7.075 7.085 cp_gemm_fm_gemm 81 10.0 0.000 0.000 7.074 7.084 cp_fm_gemm 81 11.0 7.074 7.084 7.074 7.084 fft_wrap_pw1pw2_140 487 13.2 0.715 0.747 6.779 6.960 fft3d_ps 1201 14.6 3.200 3.427 6.224 6.337 ot_diis_step 108 11.5 0.005 0.006 5.827 5.828 apply_preconditioner_dbcsr 119 12.6 0.000 0.001 5.766 5.796 apply_single 119 13.6 0.001 0.001 5.766 5.795 qs_ot_get_derivative 108 11.5 0.002 0.002 5.758 5.786 multiply_cannon_multrec 18288 15.5 5.226 5.449 5.247 5.470 make_m2s 4572 13.5 0.085 0.091 5.371 5.465 make_images 4572 14.5 0.202 0.212 4.562 4.655 mp_waitany 9880 13.7 3.694 4.590 3.694 4.590 qs_ks_update_qs_env_forces 11 4.9 0.000 0.001 4.384 4.395 rs_pw_transfer_RS2PW_140 130 11.5 0.637 0.658 3.217 4.092 rs_pw_transfer_PW2RS_140 130 13.9 1.507 1.560 3.325 3.380 qs_ot_get_p 119 10.4 0.001 0.002 3.221 3.265 mp_alltoall_d11v 2130 13.8 2.082 2.695 2.082 2.695 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=92.06800000000001, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=37.983, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=32.583, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=30.809, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_gemm", label="cp_fm_gemm", y=12.523, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=10.16, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=43.095, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=19.022, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=31.396, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=18.555, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_gemm", label="cp_fm_gemm", y=7.074, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=13.192, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.036 0.036 253.969 253.969 qs_mol_dyn_low 1 2.0 0.004 0.004 253.083 253.083 qs_forces 11 3.9 0.002 0.002 253.019 253.019 qs_energies 11 4.9 0.001 0.001 236.928 236.928 scf_env_do_scf 11 5.9 0.001 0.001 199.456 199.456 velocity_verlet 10 3.0 0.002 0.002 175.075 175.075 scf_env_do_scf_inner_loop 96 6.5 0.013 0.013 138.668 138.668 rebuild_ks_matrix 107 8.3 0.001 0.001 70.307 70.307 qs_ks_build_kohn_sham_matrix 107 9.3 0.021 0.021 70.306 70.306 qs_rho_update_rho 107 7.7 0.001 0.001 62.986 62.986 calculate_rho_elec 107 8.7 1.461 1.461 62.985 62.985 qs_ks_update_qs_env 107 7.6 0.001 0.001 62.456 62.456 init_scf_loop 11 6.9 0.001 0.001 60.539 60.539 sum_up_and_integrate 107 10.3 0.401 0.401 56.121 56.121 grid_collocate_task_list 107 9.7 55.859 55.859 55.859 55.859 integrate_v_rspace 107 11.3 0.205 0.205 55.720 55.720 prepare_preconditioner 11 7.9 0.000 0.000 51.637 51.637 make_preconditioner 11 8.9 0.000 0.000 51.637 51.637 grid_integrate_task_list 107 12.3 51.032 51.032 51.032 51.032 make_full_inverse_cholesky 11 9.9 0.000 0.000 49.533 49.533 cp_fm_cholesky_invert 11 10.9 32.866 32.866 32.866 32.866 qs_scf_new_mos 96 7.5 0.001 0.001 27.326 27.326 qs_scf_loop_do_ot 96 8.5 0.001 0.001 27.325 27.325 dbcsr_multiply_generic 1966 12.4 0.187 0.187 26.147 26.147 ot_scf_mini 96 9.5 0.003 0.003 25.630 25.630 init_scf_run 11 5.9 0.001 0.001 19.141 19.141 scf_env_initial_rho_setup 11 6.9 0.001 0.001 19.140 19.140 wfi_extrapolate 11 7.9 0.001 0.001 17.918 17.918 ot_mini 96 10.5 0.001 0.001 17.849 17.849 make_m2s 3932 13.4 0.065 0.065 15.438 15.438 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 13.987 13.987 pw_transfer 1295 11.6 0.111 0.111 12.575 12.575 cp_gemm 81 9.0 0.000 0.000 12.232 12.232 cp_gemm_fm_gemm 81 10.0 0.000 0.000 12.231 12.231 cp_fm_gemm 81 11.0 12.231 12.231 12.231 12.231 fft_wrap_pw1pw2 1081 12.6 0.011 0.011 12.191 12.191 fft_wrap_pw1pw2_140 439 13.2 1.246 1.246 10.928 10.928 qs_ot_get_derivative 96 11.5 0.002 0.002 10.120 10.120 cp_fm_cholesky_decompose 22 10.9 9.700 9.700 9.700 9.700 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 9.681 9.681 make_images 3932 14.4 2.511 2.511 8.325 8.325 dbcsr_copy 1855 11.9 1.189 1.189 8.235 8.235 qs_env_update_s_mstruct 11 6.9 0.000 0.000 8.050 8.050 ot_diis_step 96 11.5 0.006 0.006 7.725 7.725 qs_create_task_list 11 7.9 0.000 0.000 7.428 7.428 generate_qs_task_list 11 8.9 5.396 5.396 7.428 7.428 dbcsr_complete_redistribute 317 12.2 3.284 3.284 7.246 7.246 dbcsr_copy_into_existing 22 7.9 6.969 6.969 6.970 6.970 dbcsr_make_dense_low 4961 15.5 0.117 0.117 6.441 6.441 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.407 6.407 make_dense_data 4961 16.5 5.367 5.367 6.307 6.307 fft3d_s 1082 14.6 6.249 6.249 6.256 6.256 copy_dbcsr_to_fm 147 11.2 0.004 0.004 5.958 5.958 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.739 5.739 apply_single 107 13.6 0.001 0.001 5.739 5.739 dbcsr_make_images_dense 3386 14.7 0.024 0.024 5.677 5.677 density_rs2pw 107 9.7 0.007 0.007 5.665 5.665 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 5.228 5.228 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.016 183.602 183.603 qs_mol_dyn_low 1 2.0 0.005 0.006 183.470 183.478 qs_forces 11 3.9 0.002 0.003 183.411 183.411 qs_energies 11 4.9 0.001 0.002 173.735 173.738 scf_env_do_scf 11 5.9 0.001 0.002 157.472 157.473 velocity_verlet 10 3.0 0.002 0.002 119.828 119.829 scf_env_do_scf_inner_loop 96 6.5 0.006 0.015 117.810 117.811 rebuild_ks_matrix 107 8.3 0.001 0.001 65.453 65.486 qs_ks_build_kohn_sham_matrix 107 9.3 0.023 0.025 65.452 65.485 sum_up_and_integrate 107 10.3 0.063 0.068 58.034 58.079 integrate_v_rspace 107 11.3 0.005 0.006 57.971 58.017 qs_ks_update_qs_env 107 7.6 0.001 0.002 57.640 57.671 qs_rho_update_rho 107 7.7 0.001 0.001 54.809 54.840 calculate_rho_elec 107 8.7 0.046 0.049 54.808 54.839 grid_integrate_task_list 107 12.3 48.449 50.115 48.449 50.115 grid_collocate_task_list 107 9.7 45.691 47.240 45.691 47.240 init_scf_loop 11 6.9 0.001 0.001 39.637 39.638 prepare_preconditioner 11 7.9 0.000 0.000 32.463 32.471 make_preconditioner 11 8.9 0.000 0.000 32.462 32.471 make_full_inverse_cholesky 11 9.9 0.000 0.000 32.259 32.291 cp_fm_cholesky_invert 11 10.9 31.222 31.246 31.222 31.246 dbcsr_multiply_generic 1966 12.4 0.134 0.137 21.374 21.517 qs_scf_new_mos 96 7.5 0.001 0.001 17.598 17.635 qs_scf_loop_do_ot 96 8.5 0.001 0.002 17.597 17.634 ot_scf_mini 96 9.5 0.003 0.004 16.540 16.584 multiply_cannon 1966 13.4 0.222 0.227 14.664 14.996 multiply_cannon_loop 1966 14.4 0.231 0.241 13.348 13.757 init_scf_run 11 5.9 0.000 0.002 12.541 12.542 scf_env_initial_rho_setup 11 6.9 0.000 0.001 12.541 12.541 mp_waitall_1 146670 16.2 11.601 11.954 11.601 11.954 wfi_extrapolate 11 7.9 0.001 0.002 11.757 11.758 rs_pw_transfer 878 11.9 0.020 0.021 9.836 11.381 density_rs2pw 107 9.7 0.009 0.010 8.322 9.891 ot_mini 96 10.5 0.001 0.001 9.492 9.534 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 8.058 8.066 multiply_cannon_metrocomm3 15728 15.4 0.077 0.081 7.338 7.731 pw_transfer 1295 11.6 0.142 0.149 7.534 7.616 fft_wrap_pw1pw2 1081 12.6 0.015 0.015 7.212 7.284 potential_pw2rs 107 12.3 0.010 0.011 6.985 7.000 cp_gemm 81 9.0 0.000 0.001 6.851 6.861 cp_gemm_fm_gemm 81 10.0 0.000 0.000 6.851 6.861 cp_fm_gemm 81 11.0 6.850 6.861 6.850 6.861 fft_wrap_pw1pw2_140 439 13.2 0.632 0.666 6.058 6.250 mp_waitany 8968 13.7 4.218 5.705 4.218 5.705 fft3d_ps 1081 14.6 2.779 2.973 5.554 5.652 rs_pw_transfer_RS2PW_140 118 11.5 0.524 0.546 3.658 5.189 apply_preconditioner_dbcsr 107 12.6 0.000 0.001 5.020 5.061 apply_single 107 13.6 0.001 0.001 5.019 5.061 ot_diis_step 96 11.5 0.005 0.005 4.950 4.950 multiply_cannon_multrec 15728 15.4 4.594 4.797 4.611 4.814 mp_alltoall_d11v 1998 13.7 2.920 4.633 2.920 4.633 make_m2s 3932 13.4 0.071 0.076 4.522 4.593 qs_ot_get_derivative 96 11.5 0.001 0.002 4.494 4.540 rs_gather_matrices 107 12.3 0.161 0.176 2.471 4.122 make_images 3932 14.4 0.172 0.176 3.824 3.905 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=92.281, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=55.859, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=51.032, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=32.866, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_gemm", label="cp_fm_gemm", y=12.231, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=9.7, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=39.789000000000016, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=45.691, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=48.449, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=31.222, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_gemm", label="cp_fm_gemm", y=6.85, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=11.601, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.364 0.364 269.755 269.755 qs_energies 1 2.0 0.000 0.000 268.484 268.484 scf_env_do_scf 1 3.0 0.000 0.000 264.083 264.083 qs_ks_update_qs_env 8 5.0 0.000 0.000 250.916 250.916 rebuild_ks_matrix 7 6.0 0.000 0.000 250.802 250.802 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 250.802 250.802 hfx_ks_matrix 7 8.0 0.000 0.000 179.507 179.507 integrate_four_center 7 9.0 9.855 9.855 179.476 179.476 integrate_four_center_main 7 10.0 1.768 1.768 160.132 160.132 integrate_four_center_bin 446 11.0 158.364 158.364 158.364 158.364 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 152.171 152.171 init_scf_loop 1 4.0 0.000 0.000 111.893 111.893 cp_gemm 129 10.3 0.001 0.001 54.935 54.935 cp_gemm_fm_gemm 129 11.3 0.000 0.000 54.935 54.935 cp_fm_gemm 129 12.3 54.934 54.934 54.934 54.934 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 32.278 32.278 admm_fit_mo_coeffs 7 9.0 0.000 0.000 29.449 29.449 admm_mo_merge_derivs 7 8.0 0.000 0.000 27.775 27.775 merge_mo_derivs_diag 7 9.0 0.023 0.023 27.775 27.775 purify_mo_diag 7 10.0 0.001 0.001 15.900 15.900 fit_mo_coeffs 7 10.0 0.000 0.000 13.549 13.549 integrate_four_center_load 7 10.0 0.000 0.000 8.925 8.925 hfx_load_balance 1 11.0 0.003 0.003 8.925 8.925 prepare_preconditioner 1 5.0 0.000 0.000 7.007 7.007 make_preconditioner 1 6.0 0.000 0.000 7.007 7.007 calculate_rho_elec 15 7.4 0.201 0.201 6.893 6.893 grid_collocate_task_list 15 8.4 6.047 6.047 6.047 6.047 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.070 218.836 218.837 qs_energies 1 2.0 0.000 0.000 218.598 218.599 scf_env_do_scf 1 3.0 0.000 0.000 217.882 217.882 qs_ks_update_qs_env 8 5.0 0.000 0.000 211.186 211.187 rebuild_ks_matrix 7 6.0 0.000 0.000 211.171 211.171 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 211.171 211.171 hfx_ks_matrix 7 8.0 0.001 0.001 174.594 174.615 integrate_four_center 7 9.0 0.122 0.399 174.580 174.602 integrate_four_center_main 7 10.0 0.005 0.005 160.223 164.344 integrate_four_center_bin 448 11.0 160.219 164.339 160.219 164.339 scf_env_do_scf_inner_loop 7 4.0 0.001 0.002 129.772 129.772 init_scf_loop 1 4.0 0.000 0.000 88.108 88.109 cp_gemm 129 10.3 0.001 0.001 26.576 26.586 cp_gemm_fm_gemm 129 11.3 0.000 0.001 26.575 26.586 cp_fm_gemm 129 12.3 26.575 26.585 26.575 26.585 admm_mo_merge_derivs 7 8.0 0.000 0.000 15.386 15.388 merge_mo_derivs_diag 7 9.0 0.013 0.014 15.386 15.388 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 13.448 13.457 admm_fit_mo_coeffs 7 9.0 0.000 0.000 11.988 11.991 integrate_four_center_load 7 10.0 0.000 0.000 9.294 9.304 hfx_load_balance 1 11.0 0.001 0.001 9.294 9.304 mp_sync 56 10.8 4.166 7.383 4.166 7.383 purify_mo_diag 7 10.0 0.000 0.001 6.936 6.940 fit_mo_coeffs 7 10.0 0.000 0.000 5.052 5.054 qs_vxc_create 14 8.0 0.001 0.001 4.865 4.865 xc_vxc_pw_create 14 9.0 0.017 0.019 4.864 4.865 hfx_load_balance_count 1 12.0 4.415 4.638 4.415 4.638 hfx_load_balance_bin 1 12.0 4.420 4.610 4.420 4.610 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=38.787000000000006, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=158.364, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_fm_gemm", label="cp_fm_gemm", y=54.934, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=9.855, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=6.047, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=1.768, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=18.914000000000044, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=160.219, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_fm_gemm", label="cp_fm_gemm", y=26.575, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.122, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.005, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=4.166, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.42, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.415, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.090 0.090 328.699 328.699 qs_energies 1 2.0 0.000 0.000 328.529 328.529 ls_scf 1 3.0 0.000 0.000 326.561 326.561 ls_scf_main 1 4.0 0.002 0.002 311.425 311.425 density_matrix_trs4 11 5.0 0.012 0.012 161.039 161.039 ls_scf_dm_to_ks 11 5.0 0.000 0.000 142.167 142.167 matrix_ls_to_qs 11 6.0 0.000 0.000 137.524 137.524 dbcsr_multiply_generic 185 6.1 0.500 0.500 103.927 103.927 dbcsr_copy_into_existing 11 7.0 82.282 82.282 82.282 82.282 multiply_cannon 185 7.1 1.103 1.103 61.122 61.122 dbcsr_complete_redistribute 23 7.5 43.079 43.079 60.369 60.369 matrix_decluster 11 7.0 0.000 0.000 55.240 55.240 multiply_cannon_loop 185 8.1 0.408 0.408 40.180 40.180 multiply_cannon_multrec 185 9.1 37.519 37.519 37.573 37.573 make_m2s 370 7.1 0.032 0.032 35.763 35.763 arnoldi_extremal 12 6.1 0.000 0.000 33.639 33.639 arnoldi_normal_ev 12 7.1 0.027 0.027 33.638 33.638 make_images 370 8.1 7.977 7.977 32.969 32.969 build_subspace 23 8.1 0.138 0.138 32.896 32.896 dbcsr_matrix_vector_mult 652 9.0 0.217 0.217 32.007 32.007 dbcsr_matrix_vector_mult_local 652 10.0 30.607 30.607 30.625 30.625 dbcsr_finalize 646 7.5 0.230 0.230 23.227 23.227 dbcsr_merge_all 597 8.5 4.207 4.207 21.369 21.369 setup_rec_index_2d 370 8.1 19.681 19.681 19.681 19.681 dbcsr_sort_indices 1103 9.9 18.797 18.797 18.797 18.797 quick_finalize 395 10.0 0.549 0.549 16.028 16.028 tree_to_linear_d 110 9.4 14.826 14.826 14.826 14.826 dbcsr_special_finalize 370 9.1 0.003 0.003 14.766 14.766 ls_scf_init_scf 1 4.0 0.000 0.000 13.944 13.944 ls_scf_init_matrix_S 1 5.0 0.000 0.000 13.445 13.445 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 12.508 12.508 make_images_data 370 9.1 0.011 0.011 10.033 10.033 dbcsr_dot_sd 144 6.3 9.752 9.752 9.754 9.754 hybrid_alltoall_any 393 9.9 7.857 7.857 8.675 8.675 matrix_qs_to_ls 12 5.1 0.000 0.000 8.396 8.396 matrix_cluster 12 6.1 0.000 0.000 8.396 8.396 dbcsr_frobenius_norm 142 6.1 8.382 8.382 8.384 8.384 dbcsr_new_transposed 2 7.0 0.147 0.147 8.289 8.289 dbcsr_redistribute 2 8.0 8.028 8.028 8.097 8.097 dbcsr_add_d 280 6.0 0.001 0.001 6.877 6.877 dbcsr_add_anytype 280 7.0 1.559 1.559 6.876 6.876 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.008 0.009 97.437 97.438 qs_energies 1 2.0 0.000 0.000 97.334 97.334 ls_scf 1 3.0 0.000 0.000 97.246 97.247 ls_scf_main 1 4.0 0.000 0.003 93.624 93.625 density_matrix_trs4 11 5.0 0.009 0.014 89.718 89.777 dbcsr_multiply_generic 185 6.1 0.073 0.093 83.582 83.714 multiply_cannon 185 7.1 0.047 0.058 69.237 70.472 multiply_cannon_loop 185 8.1 0.213 0.224 64.997 66.762 multiply_cannon_multrec 1480 9.1 43.027 45.268 43.534 45.765 mp_waitall_1 11936 10.3 19.811 22.268 19.811 22.268 multiply_cannon_metrocomm3 1480 9.1 0.019 0.020 11.902 15.883 make_m2s 370 7.1 0.034 0.037 9.566 10.040 make_images 370 8.1 0.741 0.767 9.444 9.921 multiply_cannon_metrocomm1 1480 9.1 0.009 0.010 4.417 6.259 calculate_norms 2960 9.1 4.852 5.234 4.852 5.234 make_images_data 370 9.1 0.012 0.013 3.846 4.539 arnoldi_extremal 12 6.1 0.000 0.001 4.361 4.371 arnoldi_normal_ev 12 7.1 0.002 0.008 4.360 4.370 mp_sum_l 1039 5.9 3.469 4.318 3.469 4.318 build_subspace 23 8.1 0.037 0.052 4.192 4.196 dbcsr_matrix_vector_mult 652 9.0 0.019 0.079 3.506 3.593 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.393 3.462 hybrid_alltoall_any 393 9.9 0.282 1.440 3.040 3.365 dbcsr_complete_redistribute 23 7.5 1.871 1.957 2.936 3.114 matrix_ls_to_qs 11 6.0 0.000 0.000 2.904 3.092 dbcsr_multiply_generic_mpsum_f 137 7.1 0.001 0.001 2.386 3.053 ls_scf_init_scf 1 4.0 0.000 0.000 2.830 2.831 matrix_decluster 11 7.0 0.000 0.000 2.641 2.817 ls_scf_init_matrix_S 1 5.0 0.000 0.000 2.789 2.796 dbcsr_matrix_vector_mult_local 652 10.0 2.672 2.787 2.675 2.791 make_images_pack 370 9.1 2.322 2.589 2.327 2.593 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.001 2.563 2.566 dbcsr_add_d 280 6.0 0.001 0.002 2.055 2.193 dbcsr_add_anytype 280 7.0 1.115 1.247 2.054 2.191 buffer_matrices_ensure_size 370 8.1 2.019 2.143 2.019 2.143 dbcsr_finalize 646 7.5 0.014 0.015 1.883 2.048 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=115.531, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=82.282, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=43.079, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=37.519, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=30.607, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=19.681, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=21.735, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.871, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=43.027, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.672, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.852, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=3.469, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=19.811, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.006 122.595 122.595 lib_test 1 2.0 0.000 0.000 122.588 122.588 dbcsr_run_tests 3 3.0 0.003 0.003 122.588 122.588 test_multiplies_multiproc 3 4.0 0.001 0.001 101.120 101.120 dbcsr_redistribute 9 5.0 68.453 68.453 72.052 72.052 dbcsr_multiply_generic 9 5.0 0.001 0.001 25.299 25.299 dbcsr_make_random_matrix 9 4.0 15.786 15.786 21.378 21.378 multiply_cannon 9 6.0 0.004 0.004 17.013 17.013 multiply_cannon_loop 9 7.0 0.005 0.005 16.444 16.444 multiply_cannon_multrec 9 8.0 16.438 16.438 16.439 16.439 dbcsr_finalize 27 5.7 0.004 0.004 9.620 9.620 dbcsr_merge_all 18 6.5 3.314 3.314 8.963 8.963 make_m2s 18 6.0 0.001 0.001 3.673 3.673 make_images 18 7.0 0.821 0.821 3.593 3.593 tree_to_linear_d 9 7.0 3.393 3.393 3.393 3.393 mp_alltoall_d11v 27 6.0 3.301 3.301 3.301 3.301 dbcsr_checksum 6 5.0 2.758 2.758 2.758 2.758 dbcsr_data_release 975 7.6 2.559 2.559 2.559 2.559 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.005 26.570 26.570 lib_test 1 2.0 0.000 0.000 26.544 26.561 dbcsr_run_tests 3 3.0 0.000 0.001 26.542 26.559 test_multiplies_multiproc 3 4.0 0.001 0.001 25.412 25.467 dbcsr_multiply_generic 9 5.0 0.001 0.001 23.199 23.274 multiply_cannon 9 6.0 0.002 0.003 20.694 21.293 multiply_cannon_loop 9 7.0 0.003 0.003 20.029 20.560 multiply_cannon_multrec 72 8.0 16.243 16.668 16.244 16.670 mp_waitall_1 576 9.2 4.201 5.100 4.201 5.100 multiply_cannon_metrocomm1 72 8.0 0.001 0.001 3.286 4.303 mp_sum_l 310 2.7 0.894 1.820 0.894 1.820 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.890 1.816 dbcsr_make_random_matrix 9 4.0 0.887 0.913 1.095 1.301 make_m2s 18 6.0 0.001 0.001 0.929 1.218 make_images 18 7.0 0.027 0.029 0.926 1.215 multiply_cannon_metrocomm3 72 8.0 0.000 0.000 0.488 1.025 dbcsr_finalize 27 5.7 0.000 0.001 0.762 0.990 dbcsr_merge_all 18 6.5 0.127 0.146 0.710 0.962 dbcsr_redistribute 9 5.0 0.396 0.450 0.729 0.770 make_images_data 18 8.0 0.001 0.001 0.470 0.739 hybrid_alltoall_any 18 9.0 0.038 0.152 0.394 0.658 make_images_pack 18 8.0 0.312 0.644 0.312 0.644 dbcsr_data_copy_aa2 18 7.5 0.412 0.638 0.412 0.638 dbcsr_data_release 444 7.6 0.470 0.539 0.470 0.539 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=12.651999999999987, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=68.453, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=16.438, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=15.786, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="tree_to_linear_d", label="tree_to_linear_d", y=3.393, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.314, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.559, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=3.352000000000004, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.396, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=16.243, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.887, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="tree_to_linear_d", label="tree_to_linear_d", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.127, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.47, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.894, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=4.201, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-04-12 21:11:42+00:00