StartDate: 2021-06-24 11:38:31+00:00 CpuId: 64x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm CommitSHA: 39cf5242465d8ab0732fd4b4c6aecc5a67344d21 CommitTime: 2021-06-23 21:16:25 +0200 CommitAuthor: Ole Schütt CommitSubject: fm: Upgrade to ELPA 2021.05.001 and use GPU kernel by default Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-b51:gittree-736db60-buildargs-68b329d Sending build context to Docker daemon 73.73kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> ee3a0372b046 Step 3/9 : WORKDIR /workspace ---> Running in 797bc6306040 Removing intermediate container 797bc6306040 ---> 8c514ca8b90c Step 4/9 : COPY ./scripts/install_basics.sh . ---> f3983dae66e9 Step 5/9 : RUN ./install_basics.sh ---> Running in 6769c5dcf340 Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 14724 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8_amd64.deb ... Unpacking rsync (3.1.3-8) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container 6769c5dcf340 ---> f6ce0986e027 Step 6/9 : COPY ./scripts/install_performance.sh . ---> 8a4a2f65d3b9 Step 7/9 : RUN ./install_performance.sh "local" ---> Running in 38c1aab4dc8a './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_static.psmp' -> '/opt/cp2k-toolchain/install/arch/local_static.psmp' './local_static.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_static.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container 38c1aab4dc8a ---> 1d79fa6ecdbf Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> cf9608298610 Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in ba919df05793 Removing intermediate container ba919df05793 ---> bd82237bba20 Successfully built bd82237bba20 Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-b51:gittree-7c98060-buildargs-93ede11 Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: 39cf5242465d8ab0732fd4b4c6aecc5a67344d21 CommitTime: 2021-06-23 21:16:25 +0200 CommitAuthor: Ole Schütt CommitSubject: fm: Upgrade to ELPA 2021.05.001 and use GPU kernel by default ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.032 0.032 153.402 153.402 qs_mol_dyn_low 1 2.0 0.004 0.004 152.612 152.612 qs_forces 11 3.9 0.001 0.001 152.553 152.553 qs_energies 11 4.9 0.001 0.001 141.577 141.577 scf_env_do_scf 11 5.9 0.001 0.001 113.790 113.790 velocity_verlet 10 3.0 0.002 0.002 103.402 103.402 scf_env_do_scf_inner_loop 108 6.5 0.011 0.011 88.370 88.370 rebuild_ks_matrix 119 8.3 0.001 0.001 41.227 41.227 qs_ks_build_kohn_sham_matrix 119 9.3 0.018 0.018 41.226 41.226 qs_ks_update_qs_env 119 7.6 0.001 0.001 36.923 36.923 qs_rho_update_rho 119 7.7 0.001 0.001 36.343 36.343 calculate_rho_elec 119 8.7 1.544 1.544 36.342 36.342 grid_collocate_task_list 119 9.7 30.440 30.440 30.440 30.440 sum_up_and_integrate 119 10.3 0.380 0.380 29.767 29.767 integrate_v_rspace 119 11.3 0.148 0.148 29.387 29.387 grid_integrate_task_list 119 12.3 26.867 26.867 26.867 26.867 init_scf_loop 11 6.9 0.000 0.000 25.228 25.228 qs_scf_new_mos 108 7.5 0.001 0.001 22.432 22.432 qs_scf_loop_do_ot 108 8.5 0.001 0.001 22.431 22.431 ot_scf_mini 108 9.5 0.003 0.003 21.117 21.117 dbcsr_multiply_generic 2286 12.5 0.163 0.163 21.038 21.038 prepare_preconditioner 11 7.9 0.000 0.000 20.728 20.728 make_preconditioner 11 8.9 0.000 0.000 20.728 20.728 make_full_inverse_cholesky 11 9.9 0.000 0.000 18.718 18.718 ot_mini 108 10.5 0.001 0.001 13.707 13.707 init_scf_run 11 5.9 0.001 0.001 13.476 13.476 scf_env_initial_rho_setup 11 6.9 0.001 0.001 13.474 13.474 wfi_extrapolate 11 7.9 0.001 0.001 12.653 12.653 make_m2s 4572 13.5 0.062 0.062 12.512 12.512 cp_gemm 81 9.0 0.001 0.001 10.584 10.584 cp_gemm_cosma 81 10.0 10.583 10.583 10.583 10.583 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 10.572 10.572 pw_transfer 1439 11.6 0.094 0.094 7.358 7.358 cp_fm_cholesky_decompose 22 10.9 7.303 7.303 7.303 7.303 ot_diis_step 108 11.5 0.005 0.005 7.212 7.212 fft_wrap_pw1pw2 1201 12.6 0.010 0.010 7.070 7.070 dbcsr_make_dense_low 5837 15.5 0.087 0.087 6.547 6.547 qs_ot_get_derivative 108 11.5 0.001 0.001 6.491 6.491 make_dense_data 5837 16.5 5.835 5.835 6.443 6.443 dbcsr_complete_redistribute 329 12.2 3.056 3.056 6.421 6.421 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.365 6.365 make_images 4572 14.5 2.309 2.309 6.286 6.286 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 6.137 6.137 apply_single 119 13.6 0.000 0.000 6.137 6.137 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 6.133 6.133 fft_wrap_pw1pw2_140 487 13.2 0.590 0.590 5.962 5.962 dbcsr_make_images_dense 3978 14.8 0.025 0.025 5.834 5.834 qs_create_task_list 11 7.9 0.000 0.000 5.825 5.825 generate_qs_task_list 11 8.9 3.951 3.951 5.825 5.825 copy_dbcsr_to_fm 153 11.3 0.003 0.003 5.282 5.282 cp_fm_cholesky_invert 11 10.9 5.107 5.107 5.107 5.107 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 4.841 4.841 multiply_cannon 2286 13.5 0.258 0.258 4.622 4.622 pw_poisson_solve 119 10.3 1.922 1.922 4.586 4.586 dbcsr_copy 2102 12.0 0.272 0.272 4.552 4.552 density_rs2pw 119 9.7 0.006 0.006 4.358 4.358 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.344 4.344 dbcsr_copy_into_existing 22 7.9 4.238 4.238 4.238 4.238 multiply_cannon_loop 2286 14.5 0.044 0.044 3.958 3.958 multiply_cannon_multrec 2286 15.5 3.849 3.849 3.912 3.912 qs_ot_get_p 119 10.4 0.001 0.001 3.870 3.870 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.593 3.593 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 3.573 3.573 calculate_w_matrix_ot 11 6.9 0.009 0.009 3.573 3.573 copy_fm_to_dbcsr 176 11.2 0.002 0.002 3.179 3.179 fft3d_s 1202 14.6 3.153 3.153 3.159 3.159 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.013 70.798 70.799 qs_mol_dyn_low 1 2.0 0.004 0.005 70.691 70.697 qs_forces 11 3.9 0.002 0.002 70.638 70.639 qs_energies 11 4.9 0.001 0.001 65.699 65.700 scf_env_do_scf 11 5.9 0.001 0.001 60.609 60.610 scf_env_do_scf_inner_loop 108 6.5 0.003 0.011 56.209 56.209 velocity_verlet 10 3.0 0.002 0.002 41.385 41.386 rebuild_ks_matrix 119 8.3 0.001 0.001 28.331 28.363 qs_ks_build_kohn_sham_matrix 119 9.3 0.020 0.021 28.330 28.363 qs_ks_update_qs_env 119 7.6 0.001 0.001 25.133 25.165 qs_rho_update_rho 119 7.7 0.001 0.001 22.557 22.565 calculate_rho_elec 119 8.7 0.048 0.049 22.556 22.564 sum_up_and_integrate 119 10.3 0.044 0.047 22.507 22.550 integrate_v_rspace 119 11.3 0.004 0.005 22.462 22.507 grid_integrate_task_list 119 12.3 16.539 17.206 16.539 17.206 grid_collocate_task_list 119 9.7 16.505 17.043 16.505 17.043 dbcsr_multiply_generic 2286 12.5 0.118 0.122 16.210 16.396 qs_scf_new_mos 108 7.5 0.001 0.001 13.230 13.275 qs_scf_loop_do_ot 108 8.5 0.001 0.001 13.230 13.274 ot_scf_mini 108 9.5 0.003 0.003 12.425 12.475 multiply_cannon 2286 13.5 0.214 0.221 10.931 11.251 multiply_cannon_loop 2286 14.5 0.197 0.213 9.939 10.232 mp_waitall_1 169478 16.3 7.928 8.162 7.928 8.162 ot_mini 108 10.5 0.001 0.001 7.394 7.441 rs_pw_transfer 974 11.9 0.016 0.018 6.369 7.213 density_rs2pw 119 9.7 0.008 0.009 5.474 6.332 pw_transfer 1439 11.6 0.128 0.134 5.491 5.579 multiply_cannon_metrocomm3 18288 15.5 0.067 0.073 5.029 5.344 fft_wrap_pw1pw2 1201 12.6 0.014 0.015 5.211 5.285 potential_pw2rs 119 12.3 0.009 0.010 4.868 4.874 fft_wrap_pw1pw2_140 487 13.2 0.531 0.548 4.530 4.698 init_scf_loop 11 6.9 0.000 0.000 4.385 4.385 multiply_cannon_multrec 18288 15.5 3.814 3.932 3.828 3.947 fft3d_ps 1201 14.6 2.131 2.238 3.863 3.925 ot_diis_step 108 11.5 0.004 0.005 3.777 3.777 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.708 3.747 apply_single 119 13.6 0.001 0.001 3.707 3.746 make_m2s 4572 13.5 0.071 0.074 3.604 3.665 qs_ot_get_derivative 108 11.5 0.001 0.001 3.588 3.631 init_scf_run 11 5.9 0.000 0.002 3.509 3.510 scf_env_initial_rho_setup 11 6.9 0.000 0.001 3.509 3.509 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.401 3.405 wfi_extrapolate 11 7.9 0.001 0.001 3.143 3.143 make_images 4572 14.5 0.180 0.184 2.961 3.025 mp_waitany 9880 13.7 2.152 3.016 2.152 3.016 rs_pw_transfer_RS2PW_140 130 11.5 0.521 0.554 1.951 2.810 rs_pw_transfer_PW2RS_140 130 13.9 1.170 1.225 2.454 2.486 mp_alltoall_d11v 2130 13.8 1.343 1.909 1.343 1.909 qs_ot_get_p 119 10.4 0.001 0.001 1.697 1.734 rs_gather_matrices 119 12.3 0.122 0.137 1.006 1.585 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 1.407 1.507 make_images_data 4572 15.5 0.055 0.060 1.397 1.507 prepare_preconditioner 11 7.9 0.000 0.000 1.432 1.441 make_preconditioner 11 8.9 0.000 0.000 1.432 1.441 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=68.52499999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=30.44, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=26.867, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=10.583, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=7.303, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="make_dense_data", label="make_dense_data", y=5.835, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.849, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=23.860000000000007, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=16.505, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=16.539, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.814, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitany", label="mp_waitany", y=2.152, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=7.928, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.038 0.038 192.675 192.675 qs_mol_dyn_low 1 2.0 0.004 0.004 191.867 191.867 qs_forces 11 3.9 0.002 0.002 191.807 191.807 qs_energies 11 4.9 0.001 0.001 178.272 178.272 scf_env_do_scf 11 5.9 0.001 0.001 146.458 146.458 velocity_verlet 10 3.0 0.002 0.002 126.832 126.832 scf_env_do_scf_inner_loop 96 6.5 0.009 0.009 118.477 118.477 rebuild_ks_matrix 107 8.3 0.001 0.001 61.836 61.836 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 61.835 61.835 qs_rho_update_rho 107 7.7 0.001 0.001 55.615 55.615 calculate_rho_elec 107 8.7 1.385 1.385 55.614 55.614 qs_ks_update_qs_env 107 7.6 0.001 0.001 55.399 55.399 sum_up_and_integrate 107 10.3 0.363 0.363 51.221 51.221 integrate_v_rspace 107 11.3 0.136 0.136 50.858 50.858 grid_collocate_task_list 107 9.7 50.260 50.260 50.260 50.260 grid_integrate_task_list 107 12.3 48.522 48.522 48.522 48.522 init_scf_loop 11 6.9 0.000 0.000 27.776 27.776 prepare_preconditioner 11 7.9 0.000 0.000 20.566 20.566 make_preconditioner 11 8.9 0.000 0.000 20.566 20.566 qs_scf_new_mos 96 7.5 0.001 0.001 19.777 19.777 qs_scf_loop_do_ot 96 8.5 0.001 0.001 19.776 19.776 dbcsr_multiply_generic 1966 12.4 0.145 0.145 18.633 18.633 ot_scf_mini 96 9.5 0.003 0.003 18.586 18.586 make_full_inverse_cholesky 11 9.9 0.000 0.000 18.455 18.455 init_scf_run 11 5.9 0.001 0.001 15.818 15.818 scf_env_initial_rho_setup 11 6.9 0.001 0.001 15.817 15.817 wfi_extrapolate 11 7.9 0.001 0.001 14.766 14.766 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 12.286 12.286 ot_mini 96 10.5 0.001 0.001 12.003 12.003 make_m2s 3932 13.4 0.053 0.053 10.920 10.920 cp_gemm 81 9.0 0.001 0.001 10.375 10.375 cp_gemm_cosma 81 10.0 10.374 10.374 10.374 10.374 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 8.124 8.124 qs_env_update_s_mstruct 11 6.9 0.000 0.000 7.438 7.438 cp_fm_cholesky_decompose 22 10.9 7.037 7.037 7.037 7.037 qs_create_task_list 11 7.9 0.000 0.000 6.888 6.888 generate_qs_task_list 11 8.9 5.117 5.117 6.888 6.888 pw_transfer 1295 11.6 0.086 0.086 6.767 6.767 dbcsr_complete_redistribute 317 12.2 3.086 3.086 6.601 6.601 fft_wrap_pw1pw2 1081 12.6 0.009 0.009 6.500 6.500 qs_ot_get_derivative 96 11.5 0.001 0.001 6.062 6.062 ot_diis_step 96 11.5 0.005 0.005 5.937 5.937 make_images 3932 14.4 2.070 2.070 5.645 5.645 dbcsr_copy 1855 11.9 0.252 0.252 5.612 5.612 dbcsr_make_dense_low 4961 15.5 0.084 0.084 5.519 5.519 fft_wrap_pw1pw2_140 439 13.2 0.566 0.566 5.494 5.494 make_dense_data 4961 16.5 4.842 4.842 5.421 5.421 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 5.408 5.408 dbcsr_copy_into_existing 22 7.9 5.319 5.319 5.319 5.319 copy_dbcsr_to_fm 147 11.2 0.003 0.003 5.316 5.316 cp_fm_cholesky_invert 11 10.9 5.064 5.064 5.064 5.064 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.050 5.050 apply_single 107 13.6 0.000 0.000 5.050 5.050 dbcsr_make_images_dense 3386 14.7 0.021 0.021 4.883 4.883 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.401 4.401 pw_poisson_solve 107 10.3 1.890 1.890 4.336 4.336 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.167 4.167 multiply_cannon 1966 13.4 0.233 0.233 4.142 4.142 density_rs2pw 107 9.7 0.005 0.005 3.969 3.969 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.010 122.882 122.883 qs_mol_dyn_low 1 2.0 0.004 0.005 122.786 122.792 qs_forces 11 3.9 0.002 0.002 122.734 122.734 qs_energies 11 4.9 0.001 0.001 114.179 114.180 scf_env_do_scf 11 5.9 0.001 0.001 106.310 106.311 scf_env_do_scf_inner_loop 96 6.5 0.003 0.010 98.854 98.854 velocity_verlet 10 3.0 0.002 0.002 72.512 72.513 rebuild_ks_matrix 107 8.3 0.000 0.001 56.778 56.821 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.019 56.778 56.820 sum_up_and_integrate 107 10.3 0.038 0.042 51.624 51.685 integrate_v_rspace 107 11.3 0.004 0.005 51.586 51.649 qs_ks_update_qs_env 107 7.6 0.001 0.001 49.942 49.981 qs_rho_update_rho 107 7.7 0.001 0.001 48.204 48.213 calculate_rho_elec 107 8.7 0.043 0.044 48.203 48.213 grid_integrate_task_list 107 12.3 45.569 46.851 45.569 46.851 grid_collocate_task_list 107 9.7 42.500 43.592 42.500 43.592 dbcsr_multiply_generic 1966 12.4 0.102 0.105 14.102 14.386 qs_scf_new_mos 96 7.5 0.001 0.001 11.341 11.368 qs_scf_loop_do_ot 96 8.5 0.001 0.001 11.340 11.368 ot_scf_mini 96 9.5 0.003 0.003 10.642 10.668 multiply_cannon 1966 13.4 0.183 0.188 9.549 9.697 multiply_cannon_loop 1966 14.4 0.169 0.179 8.703 8.873 rs_pw_transfer 878 11.9 0.014 0.016 6.169 7.473 init_scf_loop 11 6.9 0.000 0.000 7.441 7.442 mp_waitall_1 146670 16.2 6.925 7.187 6.925 7.187 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.022 7.030 density_rs2pw 107 9.7 0.007 0.008 5.213 6.553 ot_mini 96 10.5 0.001 0.001 6.328 6.357 init_scf_run 11 5.9 0.000 0.002 6.263 6.263 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.263 6.263 wfi_extrapolate 11 7.9 0.001 0.001 5.675 5.675 pw_transfer 1295 11.6 0.114 0.121 4.831 4.886 fft_wrap_pw1pw2 1081 12.6 0.012 0.014 4.587 4.646 multiply_cannon_metrocomm3 15728 15.4 0.058 0.061 4.372 4.627 potential_pw2rs 107 12.3 0.008 0.009 4.405 4.412 fft_wrap_pw1pw2_140 439 13.2 0.468 0.488 4.000 4.093 mp_waitany 8968 13.7 2.376 3.677 2.376 3.677 multiply_cannon_multrec 15728 15.4 3.381 3.542 3.394 3.554 fft3d_ps 1081 14.6 1.867 1.954 3.386 3.449 rs_pw_transfer_RS2PW_140 118 11.5 0.414 0.442 2.102 3.405 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.242 3.269 apply_single 107 13.6 0.000 0.001 3.242 3.269 ot_diis_step 96 11.5 0.004 0.004 3.266 3.267 mp_alltoall_d11v 1998 13.7 1.862 3.197 1.862 3.197 make_m2s 3932 13.4 0.061 0.064 3.113 3.157 qs_ot_get_derivative 96 11.5 0.001 0.001 3.037 3.063 rs_gather_matrices 107 12.3 0.108 0.118 1.563 2.870 make_images 3932 14.4 0.157 0.163 2.560 2.608 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=71.16300000000001, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=50.26, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=48.522, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=10.374, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=7.037, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=5.319, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=22.131, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=42.5, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=45.569, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=2.376, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.381, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=6.925, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.341 0.341 254.590 254.590 qs_energies 1 2.0 0.000 0.000 253.364 253.364 scf_env_do_scf 1 3.0 0.000 0.000 250.651 250.651 qs_ks_update_qs_env 8 5.0 0.000 0.000 241.982 241.982 rebuild_ks_matrix 7 6.0 0.000 0.000 241.874 241.874 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 241.874 241.874 hfx_ks_matrix 7 8.0 0.000 0.000 177.288 177.288 integrate_four_center 7 9.0 9.427 9.427 177.259 177.259 integrate_four_center_main 7 10.0 1.335 1.335 158.468 158.468 integrate_four_center_bin 443 11.0 157.132 157.132 157.132 157.132 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 144.263 144.263 init_scf_loop 1 4.0 0.000 0.000 106.369 106.369 cp_gemm 129 10.3 0.001 0.001 50.225 50.225 cp_gemm_cosma 129 11.3 50.224 50.224 50.224 50.224 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 29.388 29.388 admm_fit_mo_coeffs 7 9.0 0.000 0.000 26.804 26.804 admm_mo_merge_derivs 7 8.0 0.000 0.000 25.027 25.027 merge_mo_derivs_diag 7 9.0 0.023 0.023 25.027 25.027 purify_mo_diag 7 10.0 0.001 0.001 13.999 13.999 fit_mo_coeffs 7 10.0 0.000 0.000 12.805 12.805 integrate_four_center_load 7 10.0 0.000 0.000 8.825 8.825 hfx_load_balance 1 11.0 0.002 0.002 8.825 8.825 calculate_rho_elec 15 7.4 0.191 0.191 5.986 5.986 grid_collocate_task_list 15 8.4 5.232 5.232 5.232 5.232 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.017 181.284 181.285 qs_energies 1 2.0 0.000 0.000 181.143 181.144 scf_env_do_scf 1 3.0 0.000 0.000 180.608 180.608 qs_ks_update_qs_env 8 5.0 0.000 0.000 177.617 177.617 rebuild_ks_matrix 7 6.0 0.000 0.000 177.604 177.605 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.002 177.604 177.605 hfx_ks_matrix 7 8.0 0.000 0.001 169.743 169.747 integrate_four_center 7 9.0 0.165 0.499 169.730 169.732 integrate_four_center_main 7 10.0 0.004 0.005 156.696 159.374 integrate_four_center_bin 448 11.0 156.692 159.369 156.692 159.369 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 105.363 105.363 init_scf_loop 1 4.0 0.000 0.000 75.244 75.244 integrate_four_center_load 7 10.0 0.000 0.000 8.739 8.741 hfx_load_balance 1 11.0 0.001 0.001 8.739 8.741 mp_sync 70 11.3 3.363 5.932 3.363 5.932 hfx_load_balance_bin 1 12.0 4.309 4.372 4.309 4.372 hfx_load_balance_count 1 12.0 4.298 4.366 4.298 4.366 qs_vxc_create 14 8.0 0.000 0.000 3.679 3.679 xc_vxc_pw_create 14 9.0 0.020 0.022 3.679 3.679 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=31.24000000000001, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=157.132, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=50.224, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=9.427, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=5.232, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=1.335, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=12.453000000000003, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=156.692, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.165, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.004, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=3.363, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.298, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.309, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.014 0.014 346.297 346.297 qs_energies 1 2.0 0.000 0.000 345.712 345.712 mp2_main 1 3.0 0.000 0.000 341.190 341.190 mp2_gpw_main 1 4.0 0.000 0.000 341.006 341.006 rpa_ri_compute_en 1 5.0 0.000 0.000 320.612 320.612 rpa_num_int 1 6.0 0.000 0.000 320.586 320.586 compute_mat_P_omega 1 7.0 0.002 0.002 193.457 193.457 compute_mat_P_omega_contract 10 8.0 12.900 12.900 192.258 192.258 dbcsr_t_total 2336 9.6 0.015 0.015 181.392 181.392 dbcsr_t_contract 787 11.0 48.614 48.614 107.070 107.070 cp_gemm 105 8.4 0.001 0.001 104.383 104.383 cp_gemm_cosma 105 9.4 104.382 104.382 104.382 104.382 dbcsr_t_copy 1103 10.7 20.744 20.744 72.839 72.839 compute_mat_P_omega_calc_M_occ 250 9.0 12.940 12.940 71.487 71.487 GW_matrix_operations 10 7.0 0.006 0.006 69.484 69.484 dbcsr_tas_total 1149 12.2 0.046 0.046 52.441 52.441 dbcsr_tas_multiply 807 12.1 0.002 0.002 50.979 50.979 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 43.710 43.710 dbcsr_multiply_generic 837 15.8 0.131 0.131 37.887 37.887 dbcsr_tas_dbcsr 807 14.1 0.002 0.002 37.607 37.607 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 36.263 36.263 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 34.419 34.419 dbcsr_tas_reserve_blocks_index 3261 13.7 7.248 7.248 28.174 28.174 dbcsr_tas_mm_1N 524 15.1 0.002 0.002 26.208 26.208 multiply_cannon 837 16.8 0.387 0.387 24.955 24.955 dbcsr_tas_copy 574 11.4 17.290 17.290 24.818 24.818 dbcsr_t_reserve_blocks_index 2280 12.5 1.111 1.111 21.758 21.758 multiply_cannon_loop 837 17.8 0.136 0.136 21.736 21.736 multiply_cannon_multrec 837 18.8 19.857 19.857 20.577 20.577 dbcsr_reserve_blocks 3717 14.7 20.115 20.115 20.569 20.569 dbcsr_t_reserve_blocks_index_a 2222 11.6 0.009 0.009 20.444 20.444 mp2_ri_gpw_compute_in 1 5.0 0.000 0.000 20.379 20.379 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 19.758 19.758 compute_QP_energies 1 7.0 0.000 0.000 19.412 19.412 compute_self_energy_cubic_gw 1 8.0 0.104 0.104 19.412 19.412 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 14.601 14.601 dbcsr_t_copy_nocomm 251 12.0 11.553 11.553 13.952 13.952 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 11.807 11.807 make_m2s 1674 16.8 0.102 0.102 10.506 10.506 dbcsr_tas_mm_2 251 15.0 0.001 0.001 10.057 10.057 make_images 1674 17.8 4.981 4.981 10.026 10.026 dbcsr_finalize 9888 13.6 1.590 1.590 7.828 7.828 contract_cubic_gw 21 9.0 0.000 0.000 7.738 7.738 build_3c_integrals 5 6.0 3.590 3.590 7.695 7.695 mp2_ri_gpw_compute_in_copy_3c 6 6.0 0.682 0.682 7.565 7.565 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.008 0.010 49.067 49.069 qs_energies 1 2.0 0.000 0.000 48.957 48.963 mp2_main 1 3.0 0.000 0.000 47.485 47.490 mp2_gpw_main 1 4.0 0.000 0.000 47.424 47.429 rpa_ri_compute_en 1 5.0 0.000 0.000 45.588 45.593 rpa_num_int 1 6.0 0.000 0.000 45.580 45.585 dbcsr_t_total 2336 9.6 0.015 0.016 41.180 41.182 compute_mat_P_omega 1 7.0 0.001 0.001 40.057 40.063 compute_mat_P_omega_contract 10 8.0 0.741 0.767 39.946 39.951 dbcsr_t_contract 787 11.0 1.846 2.021 30.359 30.362 dbcsr_tas_total 1149 12.2 0.060 0.064 26.693 26.693 dbcsr_tas_multiply 807 12.1 0.002 0.003 26.556 26.558 dbcsr_tas_dbcsr 807 14.1 0.003 0.003 19.515 19.515 dbcsr_multiply_generic 837 15.8 0.068 0.071 16.152 17.316 compute_mat_P_omega_calc_M_occ 250 9.0 0.722 0.748 13.374 13.375 multiply_cannon 837 16.8 0.131 0.147 9.538 10.066 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.782 9.783 dbcsr_tas_mm_1N 524 15.1 0.002 0.003 8.565 9.580 dbcsr_t_copy 1111 10.7 4.182 4.453 9.263 9.573 multiply_cannon_loop 837 17.8 0.040 0.043 8.689 9.191 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 8.586 8.587 multiply_cannon_multrec 1386 17.8 6.762 7.302 7.006 7.523 mp_sync 8696 11.6 6.394 7.449 6.394 7.449 dbcsr_tas_mm_2 251 15.0 0.002 0.002 7.411 7.411 make_m2s 1674 16.8 0.042 0.045 5.685 6.493 make_images 1674 17.8 0.246 0.264 5.606 6.413 compute_QP_energies 1 7.0 0.000 0.000 3.974 3.974 compute_self_energy_cubic_gw 1 8.0 0.005 0.006 3.972 3.974 dbcsr_t_communicate_buffer 1098 11.7 0.081 0.088 3.424 3.609 mp_waitall_2 3776 14.7 3.247 3.450 3.247 3.450 make_images_data 1674 18.8 0.034 0.036 3.015 3.162 contract_cubic_gw 21 9.0 0.000 0.000 3.118 3.118 dbcsr_t_reserve_blocks_index 2849 12.4 0.096 0.103 2.617 3.108 dbcsr_t_reserve_blocks_index_a 2791 11.4 0.014 0.015 2.575 3.065 dbcsr_tas_reserve_blocks_index 3300 13.8 0.268 0.296 2.576 3.061 hybrid_alltoall_any 1724 19.5 2.321 2.624 2.902 3.053 make_images_pack 1674 18.8 2.156 2.911 2.166 2.924 dbcsr_reserve_blocks 3785 14.7 2.298 2.754 2.337 2.796 mp_waitall_1 26582 19.0 1.571 2.059 1.571 2.059 convert_to_new_pgrid 2421 14.1 0.015 0.017 1.806 1.981 dbcsr_copy 3323 15.8 1.746 1.926 1.773 1.952 mp2_ri_gpw_compute_in 1 5.0 0.000 0.000 1.834 1.834 compute_mat_P_omega_copy_M_vir 250 9.0 0.001 0.002 1.689 1.696 dbcsr_add_anytype 909 13.7 0.964 1.016 1.508 1.572 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 1.504 1.508 scf_env_do_scf 1 3.0 0.000 0.000 1.417 1.417 scf_env_do_scf_inner_loop 17 4.0 0.001 0.002 1.417 1.417 dbcsr_tas_replicate 396 14.1 0.751 0.836 1.231 1.303 mp_max_i 2054 9.6 0.986 1.254 0.986 1.254 dbcsr_finalize 10566 13.5 0.038 0.040 1.023 1.065 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=132.585, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=104.382, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_contract", label="dbcsr_t_contract", y=48.614, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_copy", label="dbcsr_t_copy", y=20.744, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=20.115, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=19.857, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=22.016999999999996, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_contract", label="dbcsr_t_contract", y=1.846, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_copy", label="dbcsr_t_copy", y=4.182, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=2.298, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=6.762, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=3.247, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=2.321, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=6.394, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.087 0.087 277.188 277.188 qs_energies 1 2.0 0.000 0.000 277.024 277.024 ls_scf 1 3.0 0.000 0.000 275.126 275.126 ls_scf_main 1 4.0 0.002 0.002 263.048 263.048 density_matrix_trs4 11 5.0 0.012 0.012 137.094 137.094 ls_scf_dm_to_ks 11 5.0 0.000 0.000 118.901 118.901 matrix_ls_to_qs 11 6.0 0.000 0.000 114.162 114.162 dbcsr_multiply_generic 185 6.1 0.498 0.498 91.136 91.136 dbcsr_copy_into_existing 11 7.0 60.670 60.670 60.670 60.670 dbcsr_complete_redistribute 23 7.5 42.157 42.157 58.494 58.494 multiply_cannon 185 7.1 0.433 0.433 56.485 56.485 matrix_decluster 11 7.0 0.000 0.000 53.490 53.490 multiply_cannon_loop 185 8.1 0.396 0.396 37.287 37.287 multiply_cannon_multrec 185 9.1 35.136 35.136 35.186 35.186 make_m2s 370 7.1 0.030 0.030 28.875 28.875 make_images 370 8.1 7.282 7.282 26.482 26.482 arnoldi_extremal 12 6.1 0.000 0.000 24.391 24.391 arnoldi_normal_ev 12 7.1 0.027 0.027 24.391 24.391 build_subspace 23 8.1 0.133 0.133 23.766 23.766 dbcsr_matrix_vector_mult 652 9.0 0.231 0.231 22.907 22.907 dbcsr_matrix_vector_mult_local 652 10.0 21.616 21.616 21.635 21.635 dbcsr_finalize 646 7.5 0.218 0.218 21.416 21.416 dbcsr_merge_all 597 8.5 3.536 3.536 19.788 19.788 setup_rec_index_2d 370 8.1 18.615 18.615 18.615 18.615 dbcsr_sort_indices 1103 9.9 15.159 15.159 15.159 15.159 tree_to_linear_d 110 9.4 13.963 13.963 13.963 13.963 quick_finalize 395 10.0 0.518 0.518 12.969 12.969 dbcsr_special_finalize 370 9.1 0.002 0.002 11.921 11.921 ls_scf_init_scf 1 4.0 0.000 0.000 11.210 11.210 ls_scf_init_matrix_S 1 5.0 0.000 0.000 10.753 10.753 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 9.891 9.891 dbcsr_dot_sd 144 6.3 9.465 9.465 9.466 9.466 dbcsr_frobenius_norm 142 6.1 8.018 8.018 8.020 8.020 matrix_qs_to_ls 12 5.1 0.000 0.000 7.340 7.340 matrix_cluster 12 6.1 0.000 0.000 7.340 7.340 make_images_data 370 9.1 0.011 0.011 7.146 7.146 dbcsr_new_transposed 2 7.0 0.171 0.171 5.974 5.974 hybrid_alltoall_any 393 9.9 5.126 5.126 5.948 5.948 dbcsr_redistribute 2 8.0 5.704 5.704 5.768 5.768 dbcsr_add_d 280 6.0 0.001 0.001 5.678 5.678 dbcsr_add_anytype 280 7.0 1.597 1.597 5.678 5.678 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.009 93.673 93.674 qs_energies 1 2.0 0.000 0.000 93.591 93.591 ls_scf 1 3.0 0.000 0.000 93.512 93.512 ls_scf_main 1 4.0 0.000 0.003 89.881 89.881 density_matrix_trs4 11 5.0 0.009 0.013 86.156 86.212 dbcsr_multiply_generic 185 6.1 0.070 0.080 80.937 81.147 multiply_cannon 185 7.1 0.043 0.047 67.735 68.583 multiply_cannon_loop 185 8.1 0.209 0.219 63.896 64.900 multiply_cannon_multrec 1480 9.1 41.686 43.714 42.155 44.183 mp_waitall_1 11936 10.3 19.989 23.301 19.989 23.301 multiply_cannon_metrocomm3 1480 9.1 0.018 0.020 11.779 15.597 make_m2s 370 7.1 0.034 0.037 9.041 9.160 make_images 370 8.1 0.719 0.746 8.921 9.044 multiply_cannon_metrocomm1 1480 9.1 0.008 0.010 4.881 7.428 calculate_norms 2960 9.1 4.801 5.022 4.801 5.022 make_images_data 370 9.1 0.012 0.013 3.699 4.137 arnoldi_extremal 12 6.1 0.000 0.001 3.898 3.911 arnoldi_normal_ev 12 7.1 0.002 0.008 3.898 3.910 build_subspace 23 8.1 0.037 0.051 3.769 3.772 mp_sum_l 1039 5.9 2.940 3.616 2.940 3.616 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.228 3.306 dbcsr_matrix_vector_mult 652 9.0 0.018 0.077 3.168 3.242 hybrid_alltoall_any 393 9.9 0.302 1.538 3.022 3.232 dbcsr_complete_redistribute 23 7.5 1.769 1.863 2.818 2.885 matrix_ls_to_qs 11 6.0 0.000 0.000 2.790 2.865 ls_scf_init_scf 1 4.0 0.000 0.000 2.776 2.778 ls_scf_init_matrix_S 1 5.0 0.000 0.000 2.740 2.750 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.000 2.117 2.697 dbcsr_matrix_vector_mult_local 652 10.0 2.493 2.636 2.496 2.640 matrix_decluster 11 7.0 0.000 0.000 2.533 2.600 make_images_pack 370 9.1 2.402 2.594 2.407 2.599 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.001 2.510 2.512 buffer_matrices_ensure_size 370 8.1 2.182 2.285 2.182 2.285 dbcsr_add_d 280 6.0 0.001 0.002 2.075 2.158 dbcsr_add_anytype 280 7.0 1.120 1.170 2.074 2.157 dbcsr_finalize 646 7.5 0.014 0.014 1.850 1.913 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=98.99399999999997, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=60.67, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=42.157, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=35.136, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=21.616, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=18.615, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=19.995000000000005, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.769, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=41.686, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.493, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=19.989, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.801, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=2.94, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.007 104.651 104.651 lib_test 1 2.0 0.000 0.000 104.643 104.643 dbcsr_run_tests 3 3.0 0.003 0.003 104.642 104.642 test_multiplies_multiproc 3 4.0 0.001 0.001 83.639 83.639 dbcsr_redistribute 9 5.0 54.490 54.490 58.121 58.121 dbcsr_multiply_generic 9 5.0 0.001 0.001 23.787 23.787 dbcsr_make_random_matrix 9 4.0 15.435 15.435 20.922 20.922 multiply_cannon 9 6.0 0.004 0.004 17.525 17.525 multiply_cannon_loop 9 7.0 0.005 0.005 16.961 16.961 multiply_cannon_multrec 9 8.0 16.955 16.955 16.956 16.956 dbcsr_finalize 27 5.7 0.004 0.004 9.213 9.213 dbcsr_merge_all 18 6.5 3.234 3.234 8.566 8.566 tree_to_linear_d 9 7.0 3.335 3.335 3.335 3.335 mp_alltoall_d11v 27 6.0 3.325 3.325 3.325 3.325 dbcsr_data_release 975 7.6 2.104 2.104 2.104 2.104 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.006 25.830 25.831 lib_test 1 2.0 0.000 0.000 25.804 25.821 dbcsr_run_tests 3 3.0 0.000 0.001 25.801 25.818 test_multiplies_multiproc 3 4.0 0.001 0.001 24.685 24.751 dbcsr_multiply_generic 9 5.0 0.001 0.001 22.927 22.996 multiply_cannon 9 6.0 0.002 0.003 20.740 21.134 multiply_cannon_loop 9 7.0 0.004 0.004 20.312 20.726 multiply_cannon_multrec 72 8.0 17.038 17.607 17.039 17.608 mp_waitall_1 576 9.2 3.695 4.513 3.695 4.513 multiply_cannon_metrocomm1 72 8.0 0.002 0.002 2.880 3.913 mp_sum_l 310 2.7 0.559 1.280 0.559 1.280 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.555 1.276 dbcsr_make_random_matrix 9 4.0 0.880 0.909 1.084 1.124 multiply_cannon_metrocomm3 72 8.0 0.000 0.000 0.384 1.082 make_m2s 18 6.0 0.001 0.001 0.941 0.997 make_images 18 7.0 0.027 0.028 0.938 0.994 dbcsr_finalize 27 5.7 0.000 0.001 0.762 0.824 dbcsr_merge_all 18 6.5 0.134 0.153 0.706 0.763 dbcsr_redistribute 9 5.0 0.392 0.457 0.689 0.735 make_images_data 18 8.0 0.001 0.001 0.478 0.557 dbcsr_data_release 444 7.6 0.460 0.525 0.460 0.525 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=9.007000000000005, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=54.49, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=16.955, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=15.435, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="tree_to_linear_d", label="tree_to_linear_d", y=3.335, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=3.325, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.104, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=2.8059999999999974, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.392, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=17.038, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.88, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="tree_to_linear_d", label="tree_to_linear_d", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.46, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=3.695, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.559, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-06-24 12:29:30+00:00