StartDate: 2021-11-20 11:50:48+00:00 CpuId: 64x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm CommitSHA: e36e3b6d4cf19f459fbc373f385ef161ce6d7aa8 CommitTime: 2021-11-20 12:09:54 +0100 CommitAuthor: Ole Schuett CommitSubject: fm: Simplify initialization of seed variable Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-b51:gittree-d90c8ee-buildargs-68b329d Sending build context to Docker daemon 77.31kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> 4391ba7f365d Step 3/9 : WORKDIR /workspace ---> Running in 622156fe0445 Removing intermediate container 622156fe0445 ---> 48ef93b23696 Step 4/9 : COPY ./scripts/install_basics.sh . ---> 7329d69569a7 Step 5/9 : RUN ./install_basics.sh ---> Running in 565ac2387039 Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 15383 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8ubuntu0.1_amd64.deb ... Unpacking rsync (3.1.3-8ubuntu0.1) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8ubuntu0.1) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container 565ac2387039 ---> 6fcd16e3c8c8 Step 6/9 : COPY ./scripts/install_performance.sh . ---> edff43acc1ed Step 7/9 : RUN ./install_performance.sh "local" ---> Running in d77692af0d80 './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_static.psmp' -> '/opt/cp2k-toolchain/install/arch/local_static.psmp' './local_static.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_static.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container d77692af0d80 ---> 6046b48ab10a Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> c873ba58d6ab Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in 408994d3a615 Removing intermediate container 408994d3a615 ---> a8a8f3daf633 Successfully built a8a8f3daf633 Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-b51:gittree-cfe3b56-buildargs-122d696 Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: e36e3b6d4cf19f459fbc373f385ef161ce6d7aa8 CommitTime: 2021-11-20 12:09:54 +0100 CommitAuthor: Ole Schuett CommitSubject: fm: Simplify initialization of seed variable ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.036 0.036 155.260 155.260 qs_mol_dyn_low 1 2.0 0.004 0.004 154.487 154.487 qs_forces 11 3.9 0.001 0.001 154.432 154.432 qs_energies 11 4.9 0.001 0.001 144.340 144.340 scf_env_do_scf 11 5.9 0.001 0.001 117.306 117.306 velocity_verlet 10 3.0 0.002 0.002 106.791 106.791 scf_env_do_scf_inner_loop 108 6.5 0.009 0.009 79.796 79.796 init_scf_loop 11 6.9 0.000 0.000 37.324 37.324 prepare_preconditioner 11 7.9 0.000 0.000 33.342 33.342 make_preconditioner 11 8.9 0.000 0.000 33.342 33.342 rebuild_ks_matrix 119 8.3 0.001 0.001 33.134 33.134 qs_ks_build_kohn_sham_matrix 119 9.3 0.017 0.017 33.133 33.133 make_full_inverse_cholesky 11 9.9 0.000 0.000 31.388 31.388 qs_ks_update_qs_env 119 7.6 0.001 0.001 30.943 30.943 qs_scf_new_mos 108 7.5 0.001 0.001 27.473 27.473 qs_scf_loop_do_ot 108 8.5 0.001 0.001 27.472 27.472 qs_rho_update_rho 119 7.7 0.001 0.001 27.345 27.345 calculate_rho_elec 119 8.7 1.538 1.538 27.345 27.345 ot_scf_mini 108 9.5 0.003 0.003 25.630 25.630 dbcsr_multiply_generic 2286 12.5 0.172 0.172 23.138 23.138 grid_collocate_task_list 119 9.7 21.356 21.356 21.356 21.356 sum_up_and_integrate 119 10.3 0.396 0.396 21.246 21.246 integrate_v_rspace 119 11.3 0.554 0.554 20.849 20.849 cp_fm_cholesky_invert 11 10.9 19.244 19.244 19.244 19.244 grid_integrate_task_list 119 12.3 17.910 17.910 17.910 17.910 ot_mini 108 10.5 0.001 0.001 15.115 15.115 init_scf_run 11 5.9 0.001 0.001 12.992 12.992 scf_env_initial_rho_setup 11 6.9 0.001 0.001 12.991 12.991 make_m2s 4572 13.5 0.065 0.065 12.929 12.929 wfi_extrapolate 11 7.9 0.001 0.001 12.233 12.233 cp_gemm 81 9.0 0.000 0.000 10.603 10.603 cp_gemm_cosma 81 10.0 10.603 10.603 10.603 10.603 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 10.340 10.340 qs_ot_get_derivative 108 11.5 0.001 0.001 7.811 7.811 pw_transfer 1439 11.6 0.087 0.087 7.510 7.510 ot_diis_step 108 11.5 0.006 0.006 7.300 7.300 fft_wrap_pw1pw2 1201 12.6 0.010 0.010 7.209 7.209 make_images 4572 14.5 2.531 2.531 6.754 6.754 qs_ot_get_p 119 10.4 0.001 0.001 6.586 6.586 dbcsr_make_dense_low 5837 15.5 0.100 0.100 6.356 6.356 make_dense_data 5837 16.5 5.597 5.597 6.234 6.234 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 6.207 6.207 apply_single 119 13.6 0.000 0.000 6.207 6.207 cp_fm_cholesky_decompose 22 10.9 6.153 6.153 6.153 6.153 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.136 6.136 fft_wrap_pw1pw2_140 487 13.2 0.691 0.691 6.106 6.106 dbcsr_complete_redistribute 329 12.2 2.876 2.876 6.061 6.061 multiply_cannon 2286 13.5 0.957 0.957 5.729 5.729 qs_env_update_s_mstruct 11 6.9 0.000 0.000 5.721 5.721 dbcsr_make_images_dense 3978 14.8 0.027 0.027 5.719 5.719 dbcsr_copy 2102 12.0 0.284 0.284 5.480 5.480 qs_create_task_list 11 7.9 0.000 0.000 5.204 5.204 generate_qs_task_list 11 8.9 3.611 3.611 5.204 5.204 dbcsr_copy_into_existing 22 7.9 5.151 5.151 5.151 5.151 qs_ot_p2m_diag 50 11.0 0.220 0.220 5.035 5.035 copy_dbcsr_to_fm 153 11.3 0.003 0.003 4.950 4.950 pw_poisson_solve 119 10.3 1.993 1.993 4.631 4.631 cp_dbcsr_syevd 50 12.0 0.004 0.004 4.467 4.467 density_rs2pw 119 9.7 0.006 0.006 4.451 4.451 cp_fm_diag_elpa 50 13.0 0.000 0.000 4.307 4.307 cp_fm_diag_elpa_base 50 14.0 4.249 4.249 4.306 4.306 multiply_cannon_loop 2286 14.5 0.048 0.048 4.225 4.225 multiply_cannon_multrec 2286 15.5 4.109 4.109 4.176 4.176 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.098 4.098 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.080 4.080 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.954 3.954 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 3.546 3.546 calculate_w_matrix_ot 11 6.9 0.008 0.008 3.546 3.546 copy_fm_to_dbcsr 176 11.2 0.002 0.002 3.263 3.263 fft3d_s 1202 14.6 3.150 3.150 3.157 3.157 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.012 67.300 67.301 qs_mol_dyn_low 1 2.0 0.005 0.005 67.178 67.184 qs_forces 11 3.9 0.002 0.002 67.122 67.122 qs_energies 11 4.9 0.001 0.001 62.381 62.383 scf_env_do_scf 11 5.9 0.001 0.001 56.467 56.468 scf_env_do_scf_inner_loop 108 6.5 0.003 0.011 52.379 52.379 velocity_verlet 10 3.0 0.002 0.003 40.026 40.028 rebuild_ks_matrix 119 8.3 0.001 0.001 26.584 26.628 qs_ks_build_kohn_sham_matrix 119 9.3 0.021 0.022 26.583 26.628 qs_ks_update_qs_env 119 7.6 0.001 0.001 23.634 23.676 sum_up_and_integrate 119 10.3 0.039 0.042 21.228 21.249 integrate_v_rspace 119 11.3 0.004 0.005 21.189 21.210 qs_rho_update_rho 119 7.7 0.001 0.001 20.861 20.866 calculate_rho_elec 119 8.7 0.047 0.049 20.860 20.866 grid_integrate_task_list 119 12.3 15.501 16.076 15.501 16.076 grid_collocate_task_list 119 9.7 15.267 15.985 15.267 15.985 dbcsr_multiply_generic 2286 12.5 0.128 0.129 14.852 15.002 qs_scf_new_mos 108 7.5 0.001 0.001 12.246 12.283 qs_scf_loop_do_ot 108 8.5 0.001 0.001 12.245 12.282 ot_scf_mini 108 9.5 0.003 0.003 11.490 11.522 multiply_cannon 2286 13.5 0.235 0.241 9.833 10.109 multiply_cannon_loop 2286 14.5 0.206 0.216 8.918 9.355 mp_waitall_1 169478 16.3 7.269 7.684 7.269 7.684 rs_pw_transfer 974 11.9 0.015 0.017 5.988 6.815 ot_mini 108 10.5 0.001 0.001 6.742 6.780 density_rs2pw 119 9.7 0.008 0.009 5.095 5.952 multiply_cannon_metrocomm3 18288 15.5 0.075 0.079 4.648 5.185 pw_transfer 1439 11.6 0.144 0.160 5.024 5.096 fft_wrap_pw1pw2 1201 12.6 0.013 0.015 4.748 4.822 potential_pw2rs 119 12.3 0.010 0.010 4.518 4.529 fft_wrap_pw1pw2_140 487 13.2 0.514 0.536 4.159 4.303 init_scf_loop 11 6.9 0.000 0.001 4.073 4.073 init_scf_run 11 5.9 0.000 0.001 4.050 4.051 scf_env_initial_rho_setup 11 6.9 0.000 0.001 4.050 4.050 wfi_extrapolate 11 7.9 0.001 0.001 3.655 3.655 fft3d_ps 1201 14.6 1.922 2.025 3.463 3.553 make_m2s 4572 13.5 0.074 0.076 3.458 3.507 qs_ot_get_derivative 108 11.5 0.001 0.001 3.410 3.440 multiply_cannon_multrec 18288 15.5 3.202 3.312 3.218 3.328 ot_diis_step 108 11.5 0.005 0.006 3.308 3.308 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.216 3.284 apply_single 119 13.6 0.001 0.001 3.215 3.284 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.152 3.156 make_images 4572 14.5 0.191 0.194 2.795 2.856 mp_waitany 9880 13.7 2.034 2.824 2.034 2.824 rs_pw_transfer_RS2PW_140 130 11.5 0.497 0.538 1.854 2.693 rs_pw_transfer_PW2RS_140 130 13.9 1.081 1.148 2.291 2.322 mp_alltoall_d11v 2130 13.8 1.421 1.983 1.421 1.983 qs_ot_get_p 119 10.4 0.001 0.001 1.686 1.747 rs_gather_matrices 119 12.3 0.121 0.130 1.121 1.710 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 1.434 1.552 make_images_data 4572 15.5 0.062 0.069 1.295 1.398 qs_energies_init_hamiltonians 11 5.9 0.000 0.001 1.391 1.391 prepare_preconditioner 11 7.9 0.000 0.000 1.344 1.362 make_preconditioner 11 8.9 0.000 0.000 1.344 1.362 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=75.88499999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.356, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=19.244, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=17.91, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=10.603, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=6.153, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.109, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=24.027, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=15.267, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=15.501, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.202, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=7.269, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitany", label="mp_waitany", y=2.034, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.031 0.031 207.740 207.740 qs_mol_dyn_low 1 2.0 0.004 0.004 206.963 206.963 qs_forces 11 3.9 0.002 0.002 206.905 206.905 qs_energies 11 4.9 0.001 0.001 193.044 193.044 scf_env_do_scf 11 5.9 0.001 0.001 161.497 161.497 velocity_verlet 10 3.0 0.002 0.002 139.170 139.170 scf_env_do_scf_inner_loop 96 6.5 0.008 0.008 120.228 120.228 rebuild_ks_matrix 107 8.3 0.001 0.001 61.208 61.208 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 61.207 61.207 qs_ks_update_qs_env 107 7.6 0.001 0.001 55.117 55.117 qs_rho_update_rho 107 7.7 0.001 0.001 53.455 53.455 calculate_rho_elec 107 8.7 1.382 1.382 53.454 53.454 sum_up_and_integrate 107 10.3 0.357 0.357 50.610 50.610 integrate_v_rspace 107 11.3 0.505 0.505 50.253 50.253 grid_collocate_task_list 107 9.7 48.197 48.197 48.197 48.197 grid_integrate_task_list 107 12.3 47.584 47.584 47.584 47.584 init_scf_loop 11 6.9 0.000 0.000 41.071 41.071 prepare_preconditioner 11 7.9 0.000 0.000 33.868 33.868 make_preconditioner 11 8.9 0.000 0.000 33.868 33.868 make_full_inverse_cholesky 11 9.9 0.000 0.000 31.969 31.969 qs_scf_new_mos 96 7.5 0.001 0.001 23.859 23.859 qs_scf_loop_do_ot 96 8.5 0.001 0.001 23.859 23.859 ot_scf_mini 96 9.5 0.003 0.003 22.226 22.226 dbcsr_multiply_generic 1966 12.4 0.151 0.151 20.264 20.264 cp_fm_cholesky_invert 11 10.9 19.262 19.262 19.262 19.262 init_scf_run 11 5.9 0.001 0.001 15.905 15.905 scf_env_initial_rho_setup 11 6.9 0.001 0.001 15.904 15.904 wfi_extrapolate 11 7.9 0.001 0.001 14.884 14.884 ot_mini 96 10.5 0.001 0.001 13.035 13.035 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.895 11.895 make_m2s 3932 13.4 0.056 0.056 11.318 11.318 cp_gemm 81 9.0 0.000 0.000 10.527 10.527 cp_gemm_cosma 81 10.0 10.527 10.527 10.527 10.527 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.678 7.678 qs_env_update_s_mstruct 11 6.9 0.000 0.000 7.005 7.005 qs_ot_get_derivative 96 11.5 0.001 0.001 6.769 6.769 pw_transfer 1295 11.6 0.083 0.083 6.606 6.606 cp_fm_cholesky_decompose 22 10.9 6.570 6.570 6.570 6.570 qs_create_task_list 11 7.9 0.000 0.000 6.475 6.475 generate_qs_task_list 11 8.9 4.837 4.837 6.475 6.475 fft_wrap_pw1pw2 1081 12.6 0.009 0.009 6.344 6.344 ot_diis_step 96 11.5 0.005 0.005 6.263 6.263 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.180 6.180 dbcsr_complete_redistribute 317 12.2 2.886 2.886 6.154 6.154 make_images 3932 14.4 2.314 2.314 6.004 6.004 qs_ot_get_p 107 10.4 0.001 0.001 5.775 5.775 dbcsr_copy 1855 11.9 0.261 0.261 5.628 5.628 dbcsr_make_dense_low 4961 15.5 0.086 0.086 5.483 5.483 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.384 5.384 apply_single 107 13.6 0.001 0.001 5.384 5.384 make_dense_data 4961 16.5 4.868 4.868 5.378 5.378 fft_wrap_pw1pw2_140 439 13.2 0.561 0.561 5.350 5.350 dbcsr_copy_into_existing 22 7.9 5.323 5.323 5.324 5.324 copy_dbcsr_to_fm 147 11.2 0.003 0.003 5.131 5.131 multiply_cannon 1966 13.4 0.843 0.843 5.094 5.094 dbcsr_make_images_dense 3386 14.7 0.023 0.023 4.913 4.913 qs_ot_p2m_diag 44 11.0 0.185 0.185 4.457 4.457 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.300 4.300 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.288 4.288 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.012 121.263 121.264 qs_mol_dyn_low 1 2.0 0.005 0.005 121.148 121.154 qs_forces 11 3.9 0.002 0.002 121.092 121.092 qs_energies 11 4.9 0.001 0.001 112.651 112.653 scf_env_do_scf 11 5.9 0.001 0.001 103.822 103.823 scf_env_do_scf_inner_loop 96 6.5 0.003 0.010 96.420 96.420 velocity_verlet 10 3.0 0.002 0.002 72.179 72.181 rebuild_ks_matrix 107 8.3 0.001 0.001 56.000 56.032 qs_ks_build_kohn_sham_matrix 107 9.3 0.019 0.020 55.999 56.031 sum_up_and_integrate 107 10.3 0.035 0.037 51.144 51.168 integrate_v_rspace 107 11.3 0.004 0.004 51.109 51.131 qs_ks_update_qs_env 107 7.6 0.001 0.001 49.351 49.381 qs_rho_update_rho 107 7.7 0.001 0.001 46.992 46.996 calculate_rho_elec 107 8.7 0.043 0.045 46.992 46.995 grid_integrate_task_list 107 12.3 45.427 46.078 45.427 46.078 grid_collocate_task_list 107 9.7 41.562 42.139 41.562 42.139 dbcsr_multiply_generic 1966 12.4 0.111 0.114 13.180 13.262 qs_scf_new_mos 96 7.5 0.001 0.001 10.665 10.704 qs_scf_loop_do_ot 96 8.5 0.001 0.001 10.664 10.703 ot_scf_mini 96 9.5 0.003 0.003 9.994 10.028 multiply_cannon 1966 13.4 0.201 0.206 8.764 9.018 multiply_cannon_loop 1966 14.4 0.179 0.194 7.965 8.284 init_scf_loop 11 6.9 0.000 0.000 7.386 7.387 rs_pw_transfer 878 11.9 0.014 0.015 5.945 7.140 init_scf_run 11 5.9 0.000 0.002 6.900 6.900 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.900 6.900 mp_waitall_1 146670 16.2 6.535 6.888 6.535 6.888 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 6.837 6.844 wfi_extrapolate 11 7.9 0.001 0.001 6.316 6.316 density_rs2pw 107 9.7 0.007 0.008 4.980 6.192 ot_mini 96 10.5 0.001 0.001 5.855 5.894 pw_transfer 1295 11.6 0.130 0.138 4.560 4.627 multiply_cannon_metrocomm3 15728 15.4 0.065 0.068 4.163 4.442 fft_wrap_pw1pw2 1081 12.6 0.012 0.013 4.310 4.380 potential_pw2rs 107 12.3 0.009 0.009 4.174 4.184 fft_wrap_pw1pw2_140 439 13.2 0.463 0.484 3.772 3.910 mp_waitany 8968 13.7 2.331 3.558 2.331 3.558 rs_pw_transfer_RS2PW_140 118 11.5 0.391 0.412 2.074 3.271 fft3d_ps 1081 14.6 1.746 1.869 3.144 3.240 make_m2s 3932 13.4 0.063 0.066 3.074 3.116 mp_alltoall_d11v 1998 13.7 1.747 3.044 1.747 3.044 multiply_cannon_multrec 15728 15.4 2.874 3.010 2.888 3.024 qs_ot_get_derivative 96 11.5 0.001 0.001 2.936 2.969 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 2.860 2.906 apply_single 107 13.6 0.001 0.001 2.860 2.905 ot_diis_step 96 11.5 0.005 0.005 2.898 2.899 rs_gather_matrices 107 12.3 0.107 0.119 1.459 2.746 make_images 3932 14.4 0.167 0.171 2.502 2.550 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=75.6, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=48.197, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=47.584, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=19.262, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=10.527, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=6.57, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=22.534000000000006, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=41.562, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=45.427, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=2.874, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=2.331, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=6.535, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.500 0.500 252.332 252.332 qs_energies 1 2.0 0.000 0.000 251.003 251.003 scf_env_do_scf 1 3.0 0.000 0.000 248.624 248.624 qs_ks_update_qs_env 8 5.0 0.000 0.000 231.152 231.152 rebuild_ks_matrix 7 6.0 0.000 0.000 231.047 231.047 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 231.047 231.047 hfx_ks_matrix 7 8.0 0.000 0.000 168.297 168.297 integrate_four_center 7 9.0 2.873 2.873 168.267 168.267 integrate_four_center_main 7 10.0 1.094 1.094 155.891 155.891 integrate_four_center_bin 449 11.0 154.796 154.796 154.796 154.796 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 142.448 142.448 init_scf_loop 1 4.0 0.000 0.000 106.160 106.160 cp_gemm 129 10.3 0.000 0.000 47.895 47.895 cp_gemm_cosma 129 11.3 47.895 47.895 47.895 47.895 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 29.520 29.520 admm_fit_mo_coeffs 7 9.0 0.000 0.000 27.858 27.858 admm_mo_merge_derivs 7 8.0 0.000 0.000 24.504 24.504 merge_mo_derivs_diag 7 9.0 0.022 0.022 24.504 24.504 purify_mo_diag 7 10.0 0.001 0.001 16.136 16.136 prepare_preconditioner 1 5.0 0.000 0.000 13.615 13.615 make_preconditioner 1 6.0 0.000 0.000 13.615 13.615 fit_mo_coeffs 7 10.0 0.000 0.000 11.722 11.722 integrate_four_center_load 7 10.0 0.000 0.000 9.109 9.109 hfx_load_balance 1 11.0 0.002 0.002 9.109 9.109 arnoldi_normal_ev 11 9.3 0.002 0.002 8.234 8.234 estimate_cond_num 1 7.0 0.000 0.000 8.159 8.159 build_subspace 28 9.5 0.013 0.013 8.133 8.133 qs_vxc_create 14 8.0 0.000 0.000 5.107 5.107 xc_vxc_pw_create 14 9.0 0.912 0.912 5.107 5.107 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.205 0.211 182.228 182.229 qs_energies 1 2.0 0.000 0.001 181.884 181.885 scf_env_do_scf 1 3.0 0.000 0.000 181.341 181.341 qs_ks_update_qs_env 8 5.0 0.000 0.000 178.542 178.543 rebuild_ks_matrix 7 6.0 0.000 0.000 178.530 178.530 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 178.530 178.530 hfx_ks_matrix 7 8.0 0.000 0.000 167.831 167.834 integrate_four_center 7 9.0 0.094 0.410 167.816 167.818 integrate_four_center_main 7 10.0 0.004 0.005 153.274 156.862 integrate_four_center_bin 448 11.0 153.270 156.858 153.270 156.858 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 106.044 106.044 init_scf_loop 1 4.0 0.000 0.000 75.296 75.296 integrate_four_center_load 7 10.0 0.000 0.001 9.212 9.218 hfx_load_balance 1 11.0 0.001 0.002 9.212 9.218 mp_sync 70 11.3 4.496 6.724 4.496 6.724 hfx_load_balance_bin 1 12.0 4.406 4.623 4.406 4.623 hfx_load_balance_count 1 12.0 4.389 4.577 4.389 4.577 cp_gemm 129 10.3 0.000 0.000 3.724 3.730 cp_gemm_cosma 129 11.3 3.724 3.730 3.724 3.730 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=44.762, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=154.796, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=47.895, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=2.873, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=1.094, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="xc_vxc_pw_create", label="xc_vxc_pw_create", y=0.912, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=11.844999999999999, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=153.27, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=3.724, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.094, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.004, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="xc_vxc_pw_create", label="xc_vxc_pw_create", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=4.496, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.389, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.406, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.014 0.014 358.532 358.532 qs_energies 1 2.0 0.000 0.000 358.046 358.046 mp2_main 1 3.0 0.000 0.000 351.743 351.743 mp2_gpw_main 1 4.0 0.000 0.000 351.385 351.385 rpa_ri_compute_en 1 5.0 0.000 0.000 337.217 337.217 rpa_num_int 1 6.0 0.000 0.000 337.192 337.192 compute_mat_P_omega 1 7.0 0.002 0.002 205.767 205.767 compute_mat_P_omega_contract 10 8.0 11.834 11.834 204.524 204.524 dbcsr_t_total 2336 9.6 0.015 0.015 195.450 195.450 dbcsr_t_contract 787 11.0 46.110 46.110 124.409 124.409 cp_gemm 105 8.4 0.000 0.000 101.938 101.938 cp_gemm_cosma 105 9.4 101.938 101.938 101.938 101.938 compute_mat_P_omega_calc_M_occ 250 9.0 11.887 11.887 78.648 78.648 GW_matrix_operations 10 7.0 0.005 0.005 74.990 74.990 dbcsr_tas_total 1149 12.2 0.048 0.048 72.317 72.317 dbcsr_tas_multiply 807 12.1 0.003 0.003 70.927 70.927 dbcsr_t_copy 1103 10.7 19.558 19.558 69.629 69.629 dbcsr_multiply_generic 837 15.8 0.129 0.129 57.623 57.623 dbcsr_tas_dbcsr 807 14.1 0.003 0.003 57.199 57.199 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 51.886 51.886 dbcsr_tas_mm_1N 524 15.1 0.002 0.002 45.360 45.360 multiply_cannon 837 16.8 18.258 18.258 44.810 44.810 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 35.388 35.388 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 33.598 33.598 dbcsr_tas_reserve_blocks_index 3261 13.7 7.178 7.178 26.728 26.728 multiply_cannon_loop 837 17.8 0.165 0.165 23.965 23.965 dbcsr_tas_copy 574 11.4 16.238 16.238 23.793 23.793 multiply_cannon_multrec 837 18.8 22.380 22.380 22.880 22.880 dbcsr_t_reserve_blocks_index 2280 12.5 1.237 1.237 20.469 20.469 dbcsr_t_reserve_blocks_index_a 2222 11.6 0.009 0.009 20.175 20.175 dbcsr_reserve_blocks 3717 14.7 18.875 18.875 19.252 19.252 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 19.037 19.037 compute_QP_energies 1 7.0 0.000 0.000 18.536 18.536 compute_self_energy_cubic_gw 1 8.0 0.095 0.095 18.536 18.536 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 14.153 14.153 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 13.946 13.946 dbcsr_t_copy_nocomm 251 12.0 10.855 10.855 13.200 13.200 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 11.772 11.772 make_m2s 1674 16.8 0.103 0.103 10.357 10.357 dbcsr_tas_mm_2 251 15.0 0.001 0.001 10.033 10.033 make_images 1674 17.8 4.841 4.841 9.808 9.808 cp_fm_cholesky_invert 10 8.0 8.987 8.987 8.987 8.987 dbcsr_finalize 9888 13.6 1.533 1.533 7.994 7.994 contract_cubic_gw 21 9.0 0.000 0.000 7.702 7.702 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.010 54.498 54.499 qs_energies 1 2.0 0.001 0.001 54.370 54.376 mp2_main 1 3.0 0.001 0.001 52.978 52.985 mp2_gpw_main 1 4.0 0.000 0.001 52.923 52.930 rpa_ri_compute_en 1 5.0 0.000 0.000 50.952 50.958 rpa_num_int 1 6.0 0.000 0.000 50.944 50.950 dbcsr_t_total 2336 9.6 0.016 0.017 40.023 40.024 compute_mat_P_omega 1 7.0 0.001 0.002 39.061 39.073 compute_mat_P_omega_contract 10 8.0 0.729 0.764 38.826 38.831 dbcsr_t_contract 787 11.0 1.848 2.017 29.428 29.434 dbcsr_tas_total 1149 12.2 0.061 0.065 25.785 25.786 dbcsr_tas_multiply 807 12.1 0.003 0.003 25.626 25.628 dbcsr_tas_dbcsr 807 14.1 0.003 0.004 18.613 18.613 dbcsr_multiply_generic 837 15.8 0.068 0.072 15.361 16.148 compute_mat_P_omega_calc_M_occ 250 9.0 0.706 0.741 13.148 13.148 multiply_cannon 837 16.8 0.127 0.142 8.970 9.408 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.354 9.354 dbcsr_t_copy 1111 10.7 4.036 4.278 9.018 9.321 dbcsr_tas_mm_1N 524 15.1 0.003 0.003 8.185 8.923 multiply_cannon_loop 837 17.8 0.039 0.042 8.158 8.554 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 8.299 8.300 cp_gemm 105 8.4 0.000 0.000 7.429 7.446 cp_gemm_cosma 105 9.4 7.429 7.446 7.429 7.446 mp_sync 8696 11.6 6.318 7.434 6.318 7.434 dbcsr_tas_mm_2 251 15.0 0.002 0.002 7.030 7.030 multiply_cannon_multrec 1386 17.8 6.358 6.717 6.585 6.931 make_m2s 1674 16.8 0.042 0.045 5.494 5.985 make_images 1674 17.8 0.242 0.253 5.415 5.908 GW_matrix_operations 10 7.0 0.001 0.001 4.821 4.827 compute_QP_energies 1 7.0 0.000 0.001 4.012 4.012 compute_self_energy_cubic_gw 1 8.0 0.005 0.005 4.008 4.011 dbcsr_t_communicate_buffer 1098 11.7 0.091 0.101 3.318 3.450 mp_waitall_2 3776 14.7 3.111 3.332 3.111 3.332 contract_cubic_gw 21 9.0 0.000 0.000 3.007 3.007 make_images_data 1674 18.8 0.035 0.038 2.840 2.989 hybrid_alltoall_any 1724 19.5 2.186 2.480 2.733 2.867 dbcsr_t_reserve_blocks_index_a 2791 11.4 0.017 0.020 2.500 2.849 dbcsr_t_reserve_blocks_index 2849 12.4 0.102 0.108 2.496 2.847 dbcsr_tas_reserve_blocks_index 3300 13.8 0.271 0.293 2.450 2.797 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 2.695 2.704 make_images_pack 1674 18.8 2.143 2.602 2.156 2.616 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 2.584 2.592 dbcsr_reserve_blocks 3785 14.7 2.168 2.491 2.207 2.532 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 1.969 1.970 convert_to_new_pgrid 2421 14.1 0.017 0.019 1.749 1.866 dbcsr_copy 3323 15.8 1.687 1.810 1.716 1.839 mp_waitall_1 26582 19.0 1.474 1.835 1.474 1.835 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 1.645 1.651 dbcsr_add_anytype 909 13.7 0.934 0.998 1.465 1.534 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 1.463 1.468 dbcsr_tas_replicate 396 14.1 0.797 0.877 1.333 1.414 scf_env_do_scf 1 3.0 0.000 0.000 1.329 1.329 scf_env_do_scf_inner_loop 17 4.0 0.000 0.002 1.329 1.329 mp_max_i 2058 9.6 0.996 1.251 0.996 1.251 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=149.671, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=101.938, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_contract", label="dbcsr_t_contract", y=46.11, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=22.38, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_copy", label="dbcsr_t_copy", y=19.558, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=18.875, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=23.229999999999997, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=7.429, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_contract", label="dbcsr_t_contract", y=1.848, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=6.358, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_copy", label="dbcsr_t_copy", y=4.036, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=2.168, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=3.111, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=6.318, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.094 0.094 189.737 189.737 qs_energies 1 2.0 0.000 0.000 188.036 188.036 scf_env_do_scf 1 3.0 0.000 0.000 178.171 178.171 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 178.171 178.171 qs_scf_new_mos 15 5.0 0.000 0.000 78.909 78.909 qs_ks_update_qs_env 15 5.0 0.000 0.000 68.801 68.801 rebuild_ks_matrix 15 6.0 0.000 0.000 68.443 68.443 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.002 68.443 68.443 eigensolver 15 6.0 0.002 0.002 65.502 65.502 cp_fm_diag_elpa 15 7.0 0.000 0.000 51.761 51.761 cp_fm_diag_elpa_base 15 8.0 47.163 47.163 51.761 51.761 qs_vxc_create 15 8.0 0.044 0.044 45.057 45.057 calculate_dispersion_nonloc 15 9.0 9.083 9.083 39.210 39.210 pw_transfer 1191 9.8 0.088 0.088 26.141 26.141 fft_wrap_pw1pw2 1086 10.9 0.013 0.013 25.857 25.857 qs_rho_update_rho 16 5.0 0.000 0.000 24.529 24.529 calculate_rho_elec 16 6.0 0.341 0.341 24.529 24.529 grid_collocate_task_list 16 7.0 23.015 23.015 23.015 23.015 sum_up_and_integrate 15 8.0 0.084 0.084 21.831 21.831 integrate_v_rspace 15 9.0 0.034 0.034 21.747 21.747 grid_integrate_task_list 15 10.0 21.139 21.139 21.139 21.139 fft_wrap_pw1pw2_150 765 12.0 3.385 3.385 19.685 19.685 fft3d_s 1087 12.8 10.666 10.666 10.677 10.677 copy_dbcsr_to_fm 16 5.9 0.001 0.001 10.475 10.475 pw_scatter_s 585 13.0 10.229 10.229 10.229 10.229 dbcsr_complete_redistribute 46 8.3 3.440 3.440 9.506 9.506 cp_fm_cholesky_restore 45 7.0 9.279 9.279 9.279 9.279 cp_fm_upper_to_full 30 8.0 9.058 9.058 9.058 9.058 vdW_energy 15 10.0 8.103 8.103 8.103 8.103 gspace_mixing 14 5.0 0.273 0.273 7.226 7.226 broyden_mixing 14 6.0 6.500 6.500 6.500 6.500 fft_wrap_pw1pw2_200 197 11.5 0.335 0.335 5.918 5.918 xc_vxc_pw_create 15 9.0 1.633 1.633 5.803 5.803 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 4.619 4.619 init_scf_run 1 3.0 0.000 0.000 4.470 4.470 dbcsr_finalize 159 9.9 0.025 0.025 4.166 4.166 dbcsr_merge_all 91 11.1 0.079 0.079 4.011 4.011 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.015 0.016 81.717 81.718 qs_energies 1 2.0 0.001 0.001 81.325 81.325 scf_env_do_scf 1 3.0 0.000 0.000 76.204 76.204 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 76.204 76.204 qs_ks_update_qs_env 15 5.0 0.000 0.000 37.248 37.265 rebuild_ks_matrix 15 6.0 0.000 0.000 37.201 37.217 qs_ks_build_kohn_sham_matrix 15 7.0 0.004 0.004 37.201 37.217 qs_rho_update_rho 16 5.0 0.000 0.000 23.169 23.171 calculate_rho_elec 16 6.0 0.011 0.012 23.169 23.171 sum_up_and_integrate 15 8.0 0.011 0.013 22.413 22.451 integrate_v_rspace 15 9.0 0.001 0.001 22.402 22.438 grid_collocate_task_list 16 7.0 21.407 22.119 21.407 22.119 grid_integrate_task_list 15 10.0 20.784 21.503 20.784 21.503 qs_scf_new_mos 15 5.0 0.001 0.001 16.309 16.347 eigensolver 15 6.0 0.002 0.002 14.973 14.982 qs_vxc_create 15 8.0 0.001 0.001 14.303 14.310 calculate_dispersion_nonloc 15 9.0 1.409 1.472 11.609 11.621 cp_fm_diag_elpa 15 7.0 0.000 0.000 10.992 10.999 cp_fm_diag_elpa_base 15 8.0 10.754 10.793 10.987 10.990 pw_transfer 1191 9.8 0.129 0.149 10.500 10.604 fft_wrap_pw1pw2 1086 10.9 0.021 0.023 10.219 10.334 fft3d_ps 1086 12.9 4.400 4.584 7.662 7.818 fft_wrap_pw1pw2_150 765 12.0 0.630 0.685 6.745 6.814 cp_fm_cholesky_restore 45 7.0 3.755 3.807 3.755 3.807 fft_wrap_pw1pw2_200 197 11.5 0.345 0.369 3.340 3.416 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 3.163 3.163 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.716 2.989 xc_vxc_pw_create 15 9.0 0.050 0.070 2.693 2.715 mp_alltoall_z22v 1086 14.9 1.955 2.264 1.955 2.264 vdW_energy 15 10.0 2.077 2.172 2.077 2.172 rs_pw_transfer 158 9.4 0.002 0.003 1.697 2.127 build_core_ppnl 1 5.0 1.826 2.036 1.826 2.036 x_to_yz 585 14.0 0.819 0.863 1.833 1.896 density_rs2pw 16 7.0 0.001 0.002 1.610 1.814 mp_waitany 520 11.3 1.104 1.660 1.104 1.660 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=68.24600000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=47.163, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=23.015, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=21.139, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=10.666, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=10.229, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=9.279, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=20.616999999999997, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=10.754, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.407, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=20.784, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=3.755, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=4.4, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.084 0.084 329.947 329.947 qs_energies 1 2.0 0.000 0.000 329.792 329.792 ls_scf 1 3.0 0.000 0.000 328.044 328.044 ls_scf_main 1 4.0 0.002 0.002 312.640 312.640 density_matrix_trs4 11 5.0 0.011 0.011 173.873 173.873 ls_scf_dm_to_ks 11 5.0 0.000 0.000 132.329 132.329 matrix_ls_to_qs 11 6.0 0.000 0.000 127.988 127.988 dbcsr_multiply_generic 185 6.1 0.465 0.465 108.585 108.585 dbcsr_copy_into_existing 11 7.0 78.562 78.562 78.562 78.562 multiply_cannon 185 7.1 3.138 3.138 74.815 74.815 dbcsr_complete_redistribute 23 7.5 38.727 38.727 54.001 54.001 multiply_cannon_loop 185 8.1 0.394 0.394 53.964 53.964 multiply_cannon_multrec 185 9.1 51.817 51.817 51.868 51.868 matrix_decluster 11 7.0 0.000 0.000 49.425 49.425 arnoldi_extremal 12 6.1 0.000 0.000 46.668 46.668 arnoldi_normal_ev 12 7.1 0.029 0.029 46.668 46.668 build_subspace 23 8.1 0.130 0.130 46.044 46.044 dbcsr_matrix_vector_mult 652 9.0 0.250 0.250 35.917 35.917 dbcsr_matrix_vector_mult_local 652 10.0 34.339 34.339 34.347 34.347 make_m2s 370 7.1 0.030 0.030 27.701 27.701 make_images 370 8.1 7.243 7.243 25.337 25.337 dbcsr_finalize 646 7.5 0.220 0.220 20.867 20.867 dbcsr_merge_all 597 8.5 3.681 3.681 18.847 18.847 setup_rec_index_2d 370 8.1 17.566 17.566 17.566 17.566 dbcsr_sort_indices 1103 9.9 14.243 14.243 14.243 14.243 ls_scf_init_scf 1 4.0 0.000 0.000 14.131 14.131 ls_scf_init_matrix_S 1 5.0 0.000 0.000 13.702 13.702 tree_to_linear_d 110 9.4 12.989 12.989 12.989 12.989 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 12.887 12.887 quick_finalize 395 10.0 0.473 0.473 12.140 12.140 dbcsr_special_finalize 370 9.1 0.003 0.003 11.192 11.192 dbcsr_dot_sd 144 6.3 8.748 8.748 8.749 8.749 dbcsr_new_transposed 2 7.0 0.138 0.138 7.723 7.723 dbcsr_frobenius_norm 142 6.1 7.720 7.720 7.722 7.722 dbcsr_redistribute 2 8.0 7.478 7.478 7.548 7.548 make_images_data 370 9.1 0.009 0.009 6.733 6.733 matrix_qs_to_ls 12 5.1 0.000 0.000 6.693 6.693 matrix_cluster 12 6.1 0.000 0.000 6.693 6.693 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.015 89.004 89.005 qs_energies 1 2.0 0.000 0.000 88.896 88.896 ls_scf 1 3.0 0.000 0.000 88.786 88.787 ls_scf_main 1 4.0 0.001 0.002 85.247 85.247 density_matrix_trs4 11 5.0 0.008 0.013 81.638 81.717 dbcsr_multiply_generic 185 6.1 0.072 0.082 76.653 76.878 multiply_cannon 185 7.1 0.038 0.041 63.639 64.744 multiply_cannon_loop 185 8.1 0.192 0.205 60.019 61.871 multiply_cannon_multrec 1480 9.1 38.797 40.810 39.246 41.269 mp_waitall_1 11936 10.3 19.194 22.606 19.194 22.606 multiply_cannon_metrocomm3 1480 9.1 0.016 0.018 11.209 15.741 make_m2s 370 7.1 0.033 0.036 8.680 8.761 make_images 370 8.1 0.693 0.724 8.562 8.647 multiply_cannon_metrocomm1 1480 9.1 0.010 0.011 4.802 7.614 calculate_norms 2960 9.1 4.493 4.757 4.493 4.757 make_images_data 370 9.1 0.012 0.012 3.540 3.818 arnoldi_extremal 12 6.1 0.000 0.001 3.672 3.682 arnoldi_normal_ev 12 7.1 0.002 0.008 3.672 3.682 mp_sum_l 1039 5.9 3.073 3.569 3.073 3.569 build_subspace 23 8.1 0.037 0.049 3.550 3.553 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.123 3.179 hybrid_alltoall_any 393 9.9 0.290 1.459 2.864 3.175 dbcsr_matrix_vector_mult 652 9.0 0.018 0.078 3.007 3.130 dbcsr_complete_redistribute 23 7.5 1.754 1.817 2.780 2.850 matrix_ls_to_qs 11 6.0 0.000 0.000 2.771 2.841 ls_scf_init_scf 1 4.0 0.000 0.000 2.724 2.724 ls_scf_init_matrix_S 1 5.0 0.000 0.000 2.690 2.696 dbcsr_matrix_vector_mult_local 652 10.0 2.433 2.663 2.437 2.666 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.001 2.121 2.584 matrix_decluster 11 7.0 0.000 0.000 2.508 2.578 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 2.460 2.463 make_images_pack 370 9.1 2.280 2.446 2.284 2.450 buffer_matrices_ensure_size 370 8.1 2.057 2.185 2.057 2.185 dbcsr_add_d 280 6.0 0.001 0.002 1.903 1.979 dbcsr_add_anytype 280 7.0 1.009 1.074 1.902 1.978 dbcsr_finalize 646 7.5 0.013 0.014 1.852 1.944 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=108.936, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=78.562, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=51.817, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=38.727, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=34.339, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=17.566, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=19.26000000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=38.797, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.754, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.433, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.493, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=19.194, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=3.073, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.006 107.533 107.533 lib_test 1 2.0 0.000 0.000 107.526 107.526 dbcsr_run_tests 3 3.0 0.003 0.003 107.526 107.526 test_multiplies_multiproc 3 4.0 0.001 0.001 88.152 88.152 dbcsr_redistribute 9 5.0 61.091 61.091 64.492 64.492 dbcsr_multiply_generic 9 5.0 0.001 0.001 21.850 21.850 dbcsr_make_random_matrix 9 4.0 14.069 14.069 19.286 19.286 multiply_cannon 9 6.0 0.002 0.002 15.261 15.261 multiply_cannon_loop 9 7.0 0.002 0.002 14.760 14.760 multiply_cannon_multrec 9 8.0 14.757 14.757 14.757 14.757 dbcsr_finalize 27 5.7 0.004 0.004 9.007 9.007 dbcsr_merge_all 18 6.5 3.238 3.238 8.284 8.284 tree_to_linear_d 9 7.0 3.141 3.141 3.141 3.141 mp_alltoall_d11v 27 6.0 3.065 3.065 3.065 3.065 dbcsr_data_release 975 7.6 2.429 2.429 2.429 2.429 make_m2s 18 6.0 0.001 0.001 2.233 2.233 make_images 18 7.0 0.733 0.733 2.164 2.164 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.005 24.534 24.535 lib_test 1 2.0 0.000 0.000 24.505 24.525 dbcsr_run_tests 3 3.0 0.000 0.001 24.504 24.523 test_multiplies_multiproc 3 4.0 0.000 0.001 23.356 23.422 dbcsr_multiply_generic 9 5.0 0.002 0.002 21.470 21.564 multiply_cannon 9 6.0 0.002 0.002 19.241 19.603 multiply_cannon_loop 9 7.0 0.003 0.004 18.801 19.148 multiply_cannon_multrec 72 8.0 15.779 16.651 15.780 16.652 mp_waitall_1 576 9.2 3.429 4.143 3.429 4.143 multiply_cannon_metrocomm1 72 8.0 0.001 0.002 2.673 3.179 dbcsr_make_random_matrix 9 4.0 0.891 0.908 1.107 1.142 make_m2s 18 6.0 0.001 0.001 0.918 0.985 make_images 18 7.0 0.026 0.028 0.915 0.982 mp_sum_l 310 2.7 0.505 0.977 0.505 0.977 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.501 0.973 dbcsr_finalize 27 5.7 0.000 0.001 0.848 0.954 dbcsr_merge_all 18 6.5 0.139 0.163 0.751 0.846 dbcsr_data_release 444 7.6 0.653 0.755 0.653 0.755 dbcsr_redistribute 9 5.0 0.389 0.433 0.655 0.686 dbcsr_destroy 111 5.9 0.009 0.056 0.567 0.670 multiply_cannon_metrocomm3 72 8.0 0.000 0.000 0.337 0.644 make_images_data 18 8.0 0.001 0.001 0.465 0.540 dbcsr_data_copy_aa2 18 7.5 0.443 0.531 0.443 0.531 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=8.807999999999993, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=61.091, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=14.757, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=14.069, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.238, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="tree_to_linear_d", label="tree_to_linear_d", y=3.141, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.429, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=2.749000000000006, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.389, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=15.779, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.891, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.139, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="tree_to_linear_d", label="tree_to_linear_d", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.653, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.505, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=3.429, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.042 0.042 140.063 140.063 qs_mol_dyn_low 1 2.0 0.005 0.005 138.226 138.226 velocity_verlet 5 3.0 0.004 0.004 113.041 113.041 qmmm_el_coupling 6 3.8 0.000 0.000 62.780 62.780 qmmm_elec_with_gaussian 6 4.8 0.189 0.189 62.774 62.774 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 61.131 61.131 qmmm_elec_gaussian_low_G 6 6.8 59.766 59.766 59.766 59.766 qs_forces 6 3.8 0.001 0.001 55.330 55.330 qs_energies 6 4.8 0.000 0.000 49.238 49.238 scf_env_do_scf 6 5.8 0.000 0.000 45.347 45.347 scf_env_do_scf_inner_loop 39 6.8 0.003 0.003 38.212 38.212 rebuild_ks_matrix 45 8.4 0.000 0.000 38.018 38.018 qs_ks_build_kohn_sham_matrix 45 9.4 0.007 0.007 38.018 38.018 qs_ks_update_qs_env 45 7.8 0.000 0.000 32.650 32.650 pw_transfer 966 11.9 0.066 0.066 22.958 22.958 fft_wrap_pw1pw2 801 13.0 0.008 0.008 22.615 22.615 fft_wrap_pw1pw2_150 507 14.3 2.420 2.420 22.106 22.106 qs_vxc_create 45 10.4 0.001 0.001 20.639 20.639 xc_vxc_pw_create 45 11.4 4.255 4.255 20.639 20.639 fist_calc_energy_force 6 3.8 0.002 0.002 10.443 10.443 pw_scatter_s 429 15.4 10.310 10.310 10.310 10.310 qs_rho_update_rho 45 7.9 0.000 0.000 10.102 10.102 calculate_rho_elec 45 8.9 0.886 0.886 10.102 10.102 xc_rho_set_and_dset_create 45 12.4 0.237 0.237 9.512 9.512 force_nonbond 6 4.8 9.263 9.263 9.263 9.263 qmmm_forces 6 3.8 0.001 0.001 8.889 8.889 fft3d_s 802 15.0 8.610 8.610 8.620 8.620 pw_integral_ab 2539 7.4 8.556 8.556 8.556 8.556 qmmm_forces_with_gaussian 6 4.8 0.127 0.127 8.424 8.424 init_scf_loop 6 6.8 0.000 0.000 7.130 7.130 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 6.505 6.505 qs_ks_ddapc 45 10.4 0.001 0.001 6.465 6.465 qmmm_forces_gaussian_low_G 6 6.8 5.421 5.421 5.421 5.421 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.380 5.380 pw_poisson_solve 51 9.9 2.255 2.255 5.179 5.179 grid_collocate_task_list 45 9.9 4.640 4.640 4.640 4.640 density_rs2pw 45 9.9 0.002 0.002 4.576 4.576 sum_up_and_integrate 45 10.4 0.243 0.243 4.301 4.301 integrate_v_rspace 45 11.4 0.012 0.012 4.058 4.058 cp_ddapc_apply_CD 45 11.4 0.006 0.006 4.010 4.010 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.034 0.036 82.612 82.614 qs_mol_dyn_low 1 2.0 0.005 0.005 81.019 81.115 qs_forces 6 3.8 0.001 0.001 58.916 58.916 qs_energies 6 4.8 0.001 0.001 56.147 56.147 scf_env_do_scf 6 5.8 0.000 0.001 54.704 54.704 scf_env_do_scf_inner_loop 113 6.2 0.002 0.009 52.540 52.541 rebuild_ks_matrix 119 8.1 0.000 0.000 38.576 38.594 qs_ks_build_kohn_sham_matrix 119 9.1 0.020 0.022 38.576 38.594 qs_ks_update_qs_env 119 7.3 0.001 0.001 36.257 36.274 velocity_verlet 5 3.0 0.002 0.003 34.321 34.326 pw_transfer 2446 11.8 0.271 0.289 24.068 24.336 fft_wrap_pw1pw2 2059 12.8 0.032 0.035 23.314 23.616 fft_wrap_pw1pw2_150 1321 14.0 2.079 2.244 22.674 22.937 qs_vxc_create 119 10.1 0.003 0.004 19.335 19.341 xc_vxc_pw_create 119 11.1 0.395 0.513 19.332 19.337 fft3d_ps 2059 14.8 10.517 11.462 17.652 18.115 qs_rho_update_rho 119 7.3 0.001 0.001 15.427 15.428 calculate_rho_elec 119 8.3 0.086 0.095 15.426 15.427 sum_up_and_integrate 119 10.1 0.084 0.091 14.187 14.248 integrate_v_rspace 119 11.1 0.004 0.005 14.103 14.168 qmmm_forces 6 3.8 0.002 0.003 12.292 12.293 qmmm_forces_with_gaussian 6 4.8 0.370 0.459 11.868 12.072 rs_pw_transfer 988 11.5 0.015 0.018 10.584 11.091 xc_rho_set_and_dset_create 119 12.1 0.498 0.586 9.352 9.753 density_rs2pw 119 9.3 0.011 0.012 9.134 9.568 qmmm_el_coupling 6 3.8 0.000 0.000 8.695 8.756 qmmm_elec_with_gaussian 6 4.8 0.320 0.450 8.692 8.753 potential_pw2rs 119 12.1 0.011 0.012 8.200 8.209 grid_collocate_task_list 119 9.3 6.040 6.605 6.040 6.605 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 5.761 5.936 grid_integrate_task_list 119 12.1 5.552 5.840 5.552 5.840 mp_alltoall_z22v 2059 16.8 4.370 5.726 4.370 5.726 qmmm_forces_gaussian_low_G 6 6.8 4.696 4.862 4.696 4.862 rs_pw_transfer_PW2RS_150 125 13.9 2.322 2.411 4.620 4.643 pw_restrict_s3 18 5.8 2.096 2.142 4.535 4.585 rs_pw_transfer_RS2PW_150 125 11.2 1.872 2.016 3.784 4.304 x_to_yz 1095 16.3 1.719 1.833 3.949 4.116 yz_to_x 964 15.3 1.000 1.178 3.140 4.109 mp_waitany 4028 12.8 3.319 4.058 3.319 4.058 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.000 3.676 3.739 pw_prolongate_s3 18 6.8 1.688 1.709 3.676 3.739 pw_integral_ab 2761 7.7 3.100 3.133 3.433 3.584 qs_scf_new_mos 113 7.2 0.001 0.001 3.537 3.546 qs_scf_loop_do_ot 113 8.2 0.001 0.001 3.536 3.546 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.402 3.494 ot_scf_mini 113 9.2 0.002 0.002 3.380 3.388 dbcsr_multiply_generic 2588 12.3 0.095 0.112 3.189 3.241 qs_ks_ddapc 119 10.1 0.002 0.003 2.643 2.777 qmmm_elec_gaussian_low_G 6 6.8 2.474 2.555 2.474 2.555 mp_sum_dm3 33 5.7 2.311 2.533 2.311 2.533 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 2.328 2.329 pw_gather_p 964 14.3 1.818 2.271 1.818 2.271 init_scf_loop 6 6.8 0.000 0.000 2.160 2.160 ot_mini 113 10.2 0.001 0.001 2.134 2.146 mp_waitall_1 188862 16.2 1.851 2.038 1.851 2.038 pw_derive 732 12.5 1.592 1.757 1.592 1.757 pw_scatter_p 1095 15.3 1.673 1.733 1.673 1.733 qs_ot_get_derivative 113 11.2 0.001 0.001 1.689 1.698 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=33.496999999999986, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=59.766, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=10.31, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="force_nonbond", label="force_nonbond", y=9.263, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=8.61, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=8.556, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=5.421, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=4.64, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=45.863, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=2.474, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="force_nonbond", label="force_nonbond", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=3.1, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=4.696, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=6.04, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=5.552, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=10.517, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.37, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-11-20 12:45:17+00:00