StartDate: 2021-11-26 11:02:54+00:00 CpuId: 64x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm CommitSHA: 092bd514a9f216825f0db50d7e1786e61e8af5b4 CommitTime: 2021-11-26 11:28:05 +0100 CommitAuthor: Matthias Krack CommitSubject: Adjust tolerance Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-b51:gittree-6064eb4-buildargs-68b329d Sending build context to Docker daemon 77.31kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> 50b3b84820f3 Step 3/9 : WORKDIR /workspace ---> Running in e176928ffbe3 Removing intermediate container e176928ffbe3 ---> 7e92ca6eb6d1 Step 4/9 : COPY ./scripts/install_basics.sh . ---> b1fa2a29db4f Step 5/9 : RUN ./install_basics.sh ---> Running in 78a5de60145a Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 15383 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8ubuntu0.1_amd64.deb ... Unpacking rsync (3.1.3-8ubuntu0.1) ... Preparing to unpack .../wget_1.20.3-1ubuntu2_amd64.deb ... Unpacking wget (1.20.3-1ubuntu2) over (1.20.3-1ubuntu1) ... Setting up wget (1.20.3-1ubuntu2) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8ubuntu0.1) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container 78a5de60145a ---> 41d29e4a55c9 Step 6/9 : COPY ./scripts/install_performance.sh . ---> f71a23abd257 Step 7/9 : RUN ./install_performance.sh "local" ---> Running in 13f2d27dc770 './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_static.psmp' -> '/opt/cp2k-toolchain/install/arch/local_static.psmp' './local_static.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_static.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container 13f2d27dc770 ---> bc7f6a1e4742 Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> daac4c248ce9 Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in 09575dfe4ede Removing intermediate container 09575dfe4ede ---> c38c670d97d3 Successfully built c38c670d97d3 Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-b51:gittree-cfe3b56-buildargs-69e6612 Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: 092bd514a9f216825f0db50d7e1786e61e8af5b4 CommitTime: 2021-11-26 11:28:05 +0100 CommitAuthor: Matthias Krack CommitSubject: Adjust tolerance ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.032 0.032 166.485 166.485 qs_mol_dyn_low 1 2.0 0.004 0.004 165.716 165.716 qs_forces 11 3.9 0.001 0.001 165.657 165.657 qs_energies 11 4.9 0.001 0.001 155.212 155.212 scf_env_do_scf 11 5.9 0.001 0.001 122.869 122.869 velocity_verlet 10 3.0 0.002 0.002 116.650 116.650 scf_env_do_scf_inner_loop 108 6.5 0.010 0.010 83.705 83.705 init_scf_loop 11 6.9 0.000 0.000 38.973 38.973 prepare_preconditioner 11 7.9 0.000 0.000 34.835 34.835 make_preconditioner 11 8.9 0.000 0.000 34.835 34.835 rebuild_ks_matrix 119 8.3 0.001 0.001 34.095 34.095 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 34.094 34.094 make_full_inverse_cholesky 11 9.9 0.000 0.000 32.820 32.820 qs_ks_update_qs_env 119 7.6 0.001 0.001 31.722 31.722 qs_rho_update_rho 119 7.7 0.001 0.001 29.727 29.727 calculate_rho_elec 119 8.7 1.561 1.561 29.726 29.726 qs_scf_new_mos 108 7.5 0.001 0.001 28.593 28.593 qs_scf_loop_do_ot 108 8.5 0.001 0.001 28.592 28.592 ot_scf_mini 108 9.5 0.003 0.003 26.619 26.619 dbcsr_multiply_generic 2286 12.5 0.189 0.189 24.307 24.307 grid_collocate_task_list 119 9.7 23.063 23.063 23.063 23.063 sum_up_and_integrate 119 10.3 0.398 0.398 21.453 21.453 integrate_v_rspace 119 11.3 0.534 0.534 21.055 21.055 cp_fm_cholesky_invert 11 10.9 19.484 19.484 19.484 19.484 grid_integrate_task_list 119 12.3 17.812 17.812 17.812 17.812 init_scf_run 11 5.9 0.001 0.001 16.592 16.592 scf_env_initial_rho_setup 11 6.9 0.001 0.001 16.591 16.591 wfi_extrapolate 11 7.9 0.001 0.001 15.734 15.734 ot_mini 108 10.5 0.001 0.001 15.683 15.683 cp_gemm 81 9.0 0.000 0.000 15.019 15.019 cp_gemm_cosma 81 10.0 15.019 15.019 15.019 15.019 make_m2s 4572 13.5 0.068 0.068 13.654 13.654 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 10.825 10.825 pw_transfer 1439 11.6 0.098 0.098 8.681 8.681 fft_wrap_pw1pw2 1201 12.6 0.011 0.011 8.341 8.341 qs_ot_get_derivative 108 11.5 0.002 0.002 8.148 8.148 ot_diis_step 108 11.5 0.006 0.006 7.530 7.530 cp_fm_cholesky_decompose 22 10.9 7.280 7.280 7.280 7.280 make_images 4572 14.5 2.636 2.636 7.249 7.249 fft_wrap_pw1pw2_140 487 13.2 0.954 0.954 7.169 7.169 qs_ot_get_p 119 10.4 0.001 0.001 6.789 6.789 dbcsr_make_dense_low 5837 15.5 0.103 0.103 6.580 6.580 make_dense_data 5837 16.5 5.804 5.804 6.454 6.454 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 6.433 6.433 apply_single 119 13.6 0.001 0.001 6.433 6.433 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.279 6.279 dbcsr_complete_redistribute 329 12.2 2.935 2.935 6.234 6.234 dbcsr_make_images_dense 3978 14.8 0.029 0.029 5.923 5.923 qs_env_update_s_mstruct 11 6.9 0.000 0.000 5.898 5.898 multiply_cannon 2286 13.5 1.001 1.001 5.876 5.876 dbcsr_copy 2102 12.0 0.299 0.299 5.560 5.560 qs_create_task_list 11 7.9 0.000 0.000 5.330 5.330 generate_qs_task_list 11 8.9 3.577 3.577 5.330 5.330 dbcsr_copy_into_existing 22 7.9 5.212 5.212 5.212 5.212 qs_ot_p2m_diag 50 11.0 0.222 0.222 5.195 5.195 density_rs2pw 119 9.7 0.007 0.007 5.103 5.103 copy_dbcsr_to_fm 153 11.3 0.004 0.004 5.086 5.086 pw_poisson_solve 119 10.3 2.215 2.215 4.924 4.924 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 4.747 4.747 calculate_w_matrix_ot 11 6.9 0.008 0.008 4.747 4.747 cp_dbcsr_syevd 50 12.0 0.004 0.004 4.621 4.621 cp_fm_diag_elpa 50 13.0 0.001 0.001 4.455 4.455 cp_fm_diag_elpa_base 50 14.0 4.401 4.401 4.454 4.454 multiply_cannon_loop 2286 14.5 0.050 0.050 4.307 4.307 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.301 4.301 multiply_cannon_multrec 2286 15.5 4.173 4.173 4.255 4.255 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.207 4.207 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.163 4.163 fft3d_s 1202 14.6 3.570 3.570 3.577 3.577 copy_fm_to_dbcsr 176 11.2 0.002 0.002 3.355 3.355 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.014 71.731 71.732 qs_mol_dyn_low 1 2.0 0.005 0.007 71.600 71.606 qs_forces 11 3.9 0.002 0.002 71.544 71.544 qs_energies 11 4.9 0.001 0.002 66.660 66.662 scf_env_do_scf 11 5.9 0.001 0.001 60.003 60.003 scf_env_do_scf_inner_loop 108 6.5 0.003 0.010 55.550 55.551 velocity_verlet 10 3.0 0.002 0.002 42.920 42.921 rebuild_ks_matrix 119 8.3 0.001 0.001 27.838 27.895 qs_ks_build_kohn_sham_matrix 119 9.3 0.022 0.023 27.837 27.894 qs_ks_update_qs_env 119 7.6 0.001 0.001 24.763 24.819 qs_rho_update_rho 119 7.7 0.001 0.001 21.846 21.858 calculate_rho_elec 119 8.7 0.048 0.050 21.845 21.857 sum_up_and_integrate 119 10.3 0.046 0.049 21.734 21.779 integrate_v_rspace 119 11.3 0.005 0.005 21.688 21.734 dbcsr_multiply_generic 2286 12.5 0.134 0.138 16.706 16.835 grid_collocate_task_list 119 9.7 15.553 16.076 15.553 16.076 grid_integrate_task_list 119 12.3 15.484 16.015 15.484 16.015 qs_scf_new_mos 108 7.5 0.001 0.001 13.567 13.604 qs_scf_loop_do_ot 108 8.5 0.001 0.001 13.566 13.602 ot_scf_mini 108 9.5 0.003 0.004 12.724 12.759 multiply_cannon 2286 13.5 0.225 0.232 11.113 11.289 multiply_cannon_loop 2286 14.5 0.224 0.236 10.098 10.483 mp_waitall_1 169478 16.3 8.385 8.807 8.385 8.807 rs_pw_transfer 974 11.9 0.016 0.018 6.654 7.564 ot_mini 108 10.5 0.001 0.002 7.461 7.498 density_rs2pw 119 9.7 0.009 0.010 5.707 6.632 pw_transfer 1439 11.6 0.135 0.143 5.829 5.896 multiply_cannon_metrocomm3 18288 15.5 0.080 0.083 5.428 5.866 fft_wrap_pw1pw2 1201 12.6 0.015 0.016 5.543 5.614 potential_pw2rs 119 12.3 0.010 0.011 5.150 5.159 fft_wrap_pw1pw2_140 487 13.2 0.572 0.601 4.851 5.047 init_scf_run 11 5.9 0.000 0.002 4.587 4.587 scf_env_initial_rho_setup 11 6.9 0.000 0.001 4.586 4.587 init_scf_loop 11 6.9 0.000 0.001 4.436 4.436 wfi_extrapolate 11 7.9 0.001 0.001 4.189 4.189 fft3d_ps 1201 14.6 2.286 2.422 4.096 4.160 make_m2s 4572 13.5 0.075 0.077 3.832 3.889 ot_diis_step 108 11.5 0.005 0.005 3.781 3.781 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.729 3.764 apply_single 119 13.6 0.001 0.001 3.728 3.764 qs_ot_get_derivative 108 11.5 0.001 0.002 3.651 3.688 multiply_cannon_multrec 18288 15.5 3.502 3.594 3.519 3.612 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.298 3.305 make_images 4572 14.5 0.186 0.192 3.142 3.206 mp_waitany 9880 13.7 2.124 3.076 2.124 3.076 rs_pw_transfer_RS2PW_140 130 11.5 0.602 0.654 1.978 2.902 rs_pw_transfer_PW2RS_140 130 13.9 1.225 1.288 2.584 2.621 mp_alltoall_d11v 2130 13.8 1.356 2.044 1.356 2.044 qs_ot_get_p 119 10.4 0.001 0.001 1.780 1.827 rs_gather_matrices 119 12.3 0.129 0.141 0.998 1.705 cp_gemm 81 9.0 0.000 0.000 1.639 1.643 cp_gemm_cosma 81 10.0 1.639 1.643 1.639 1.643 make_images_data 4572 15.5 0.062 0.068 1.475 1.590 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 1.439 1.552 prepare_preconditioner 11 7.9 0.000 0.000 1.449 1.460 make_preconditioner 11 8.9 0.000 0.000 1.449 1.460 qs_energies_init_hamiltonians 11 5.9 0.000 0.001 1.453 1.454 hybrid_alltoall_any 4725 16.4 0.124 0.468 1.320 1.435 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=79.65400000000001, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=23.063, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=19.484, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=17.812, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=15.019, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=7.28, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.173, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=24.88199999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=15.553, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=15.484, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=1.639, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.502, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.385, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=2.286, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.035 0.035 216.244 216.244 qs_mol_dyn_low 1 2.0 0.004 0.004 215.465 215.465 qs_forces 11 3.9 0.001 0.001 215.408 215.408 qs_energies 11 4.9 0.001 0.001 200.994 200.994 scf_env_do_scf 11 5.9 0.001 0.001 164.514 164.514 velocity_verlet 10 3.0 0.002 0.002 146.739 146.739 scf_env_do_scf_inner_loop 96 6.5 0.009 0.009 121.841 121.841 rebuild_ks_matrix 107 8.3 0.001 0.001 61.586 61.586 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 61.585 61.585 qs_ks_update_qs_env 107 7.6 0.001 0.001 55.456 55.456 qs_rho_update_rho 107 7.7 0.001 0.001 55.006 55.006 calculate_rho_elec 107 8.7 1.393 1.393 55.005 55.005 sum_up_and_integrate 107 10.3 0.362 0.362 50.627 50.627 integrate_v_rspace 107 11.3 0.469 0.469 50.265 50.265 grid_collocate_task_list 107 9.7 49.435 49.435 49.435 49.435 grid_integrate_task_list 107 12.3 47.497 47.497 47.497 47.497 init_scf_loop 11 6.9 0.000 0.000 42.472 42.472 prepare_preconditioner 11 7.9 0.000 0.000 35.188 35.188 make_preconditioner 11 8.9 0.000 0.000 35.188 35.188 make_full_inverse_cholesky 11 9.9 0.000 0.000 33.194 33.194 qs_scf_new_mos 96 7.5 0.001 0.001 23.732 23.732 qs_scf_loop_do_ot 96 8.5 0.001 0.001 23.731 23.731 ot_scf_mini 96 9.5 0.003 0.003 22.017 22.017 dbcsr_multiply_generic 1966 12.4 0.163 0.163 20.233 20.233 cp_fm_cholesky_invert 11 10.9 19.766 19.766 19.766 19.766 init_scf_run 11 5.9 0.001 0.001 19.527 19.527 scf_env_initial_rho_setup 11 6.9 0.001 0.001 19.526 19.526 wfi_extrapolate 11 7.9 0.001 0.001 18.429 18.429 cp_gemm 81 9.0 0.000 0.000 14.976 14.976 cp_gemm_cosma 81 10.0 14.976 14.976 14.976 14.976 ot_mini 96 10.5 0.001 0.001 12.832 12.832 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.921 11.921 make_m2s 3932 13.4 0.058 0.058 11.065 11.065 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.761 7.761 cp_fm_cholesky_decompose 22 10.9 7.374 7.374 7.374 7.374 pw_transfer 1295 11.6 0.087 0.087 7.137 7.137 qs_ot_get_derivative 96 11.5 0.001 0.001 6.966 6.966 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.895 6.895 fft_wrap_pw1pw2 1081 12.6 0.009 0.009 6.843 6.843 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.650 6.650 qs_create_task_list 11 7.9 0.000 0.000 6.352 6.352 generate_qs_task_list 11 8.9 4.628 4.628 6.352 6.352 dbcsr_complete_redistribute 317 12.2 2.906 2.906 6.340 6.340 make_images 3932 14.4 2.166 2.166 5.977 5.977 dbcsr_copy 1855 11.9 0.267 0.267 5.907 5.907 ot_diis_step 96 11.5 0.005 0.005 5.861 5.861 fft_wrap_pw1pw2_140 439 13.2 0.599 0.599 5.815 5.815 qs_ot_get_p 107 10.4 0.001 0.001 5.748 5.748 dbcsr_copy_into_existing 22 7.9 5.595 5.595 5.596 5.596 dbcsr_make_dense_low 4961 15.5 0.088 0.088 5.249 5.249 multiply_cannon 1966 13.4 0.840 0.840 5.176 5.176 copy_dbcsr_to_fm 147 11.2 0.003 0.003 5.145 5.145 make_dense_data 4961 16.5 4.624 4.624 5.142 5.142 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.064 5.064 apply_single 107 13.6 0.000 0.000 5.064 5.064 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 4.864 4.864 calculate_w_matrix_ot 11 6.9 0.008 0.008 4.864 4.864 dbcsr_make_images_dense 3386 14.7 0.024 0.024 4.668 4.668 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.423 4.423 qs_ot_p2m_diag 44 11.0 0.205 0.205 4.390 4.390 pw_poisson_solve 107 10.3 1.891 1.891 4.340 4.340 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.013 127.383 127.384 qs_mol_dyn_low 1 2.0 0.005 0.005 127.261 127.267 qs_forces 11 3.9 0.002 0.002 127.207 127.208 qs_energies 11 4.9 0.001 0.002 118.483 118.485 scf_env_do_scf 11 5.9 0.001 0.001 108.797 108.797 scf_env_do_scf_inner_loop 96 6.5 0.003 0.009 101.126 101.127 velocity_verlet 10 3.0 0.002 0.002 75.811 75.812 rebuild_ks_matrix 107 8.3 0.001 0.001 58.365 58.408 qs_ks_build_kohn_sham_matrix 107 9.3 0.021 0.022 58.365 58.408 sum_up_and_integrate 107 10.3 0.042 0.045 52.813 52.878 integrate_v_rspace 107 11.3 0.004 0.005 52.771 52.835 qs_ks_update_qs_env 107 7.6 0.001 0.001 51.414 51.448 qs_rho_update_rho 107 7.7 0.001 0.001 48.683 48.689 calculate_rho_elec 107 8.7 0.043 0.045 48.682 48.688 grid_integrate_task_list 107 12.3 45.910 47.561 45.910 47.561 grid_collocate_task_list 107 9.7 42.211 43.683 42.211 43.683 dbcsr_multiply_generic 1966 12.4 0.118 0.120 14.945 15.615 qs_scf_new_mos 96 7.5 0.001 0.001 11.902 11.944 qs_scf_loop_do_ot 96 8.5 0.001 0.001 11.901 11.943 ot_scf_mini 96 9.5 0.003 0.003 11.168 11.206 multiply_cannon 1966 13.4 0.197 0.202 10.070 10.305 multiply_cannon_loop 1966 14.4 0.198 0.208 9.184 9.532 rs_pw_transfer 878 11.9 0.015 0.016 7.043 8.249 mp_waitall_1 146670 16.2 7.634 8.020 7.634 8.020 init_scf_loop 11 6.9 0.000 0.001 7.653 7.654 init_scf_run 11 5.9 0.001 0.002 7.567 7.567 scf_env_initial_rho_setup 11 6.9 0.000 0.001 7.566 7.566 density_rs2pw 107 9.7 0.008 0.009 5.944 7.174 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.155 7.164 wfi_extrapolate 11 7.9 0.001 0.001 6.932 6.932 ot_mini 96 10.5 0.001 0.001 6.564 6.604 pw_transfer 1295 11.6 0.123 0.133 5.229 5.288 multiply_cannon_metrocomm3 15728 15.4 0.069 0.071 4.954 5.286 fft_wrap_pw1pw2 1081 12.6 0.013 0.015 4.969 5.040 potential_pw2rs 107 12.3 0.009 0.010 4.780 4.790 fft_wrap_pw1pw2_140 439 13.2 0.522 0.544 4.375 4.521 mp_waitany 8968 13.7 2.968 4.145 2.968 4.145 rs_pw_transfer_RS2PW_140 118 11.5 0.434 0.462 2.655 3.876 fft3d_ps 1081 14.6 2.050 2.158 3.649 3.725 mp_alltoall_d11v 1998 13.7 2.352 3.669 2.352 3.669 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.396 3.442 apply_single 107 13.6 0.001 0.001 3.396 3.442 make_m2s 3932 13.4 0.065 0.068 3.393 3.437 ot_diis_step 96 11.5 0.004 0.005 3.381 3.382 rs_gather_matrices 107 12.3 0.130 0.142 2.027 3.298 multiply_cannon_multrec 15728 15.4 3.172 3.251 3.188 3.266 qs_ot_get_derivative 96 11.5 0.001 0.001 3.158 3.195 make_images 3932 14.4 0.164 0.168 2.786 2.837 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=77.196, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=49.435, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=47.497, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=19.766, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=14.976, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=7.374, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=25.488, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=42.211, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=45.91, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=2.968, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.172, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=7.634, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.371 0.371 273.017 273.017 qs_energies 1 2.0 0.000 0.000 271.812 271.812 scf_env_do_scf 1 3.0 0.000 0.000 269.311 269.311 qs_ks_update_qs_env 8 5.0 0.000 0.000 251.935 251.935 rebuild_ks_matrix 7 6.0 0.000 0.000 251.834 251.834 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 251.834 251.834 hfx_ks_matrix 7 8.0 0.000 0.000 168.297 168.297 integrate_four_center 7 9.0 2.386 2.386 168.266 168.266 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 157.832 157.832 integrate_four_center_main 7 10.0 0.593 0.593 156.674 156.674 integrate_four_center_bin 451 11.0 156.081 156.081 156.081 156.081 init_scf_loop 1 4.0 0.000 0.000 111.465 111.465 cp_gemm 129 10.3 0.001 0.001 68.887 68.887 cp_gemm_cosma 129 11.3 68.886 68.886 68.886 68.886 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 39.470 39.470 admm_fit_mo_coeffs 7 9.0 0.000 0.000 37.794 37.794 admm_mo_merge_derivs 7 8.0 0.000 0.000 35.299 35.299 merge_mo_derivs_diag 7 9.0 0.022 0.022 35.299 35.299 purify_mo_diag 7 10.0 0.001 0.001 22.267 22.267 fit_mo_coeffs 7 10.0 0.000 0.000 15.527 15.527 prepare_preconditioner 1 5.0 0.000 0.000 13.426 13.426 make_preconditioner 1 6.0 0.000 0.000 13.426 13.426 integrate_four_center_load 7 10.0 0.001 0.001 8.823 8.823 hfx_load_balance 1 11.0 0.002 0.002 8.822 8.822 arnoldi_normal_ev 11 9.3 0.002 0.002 8.124 8.124 estimate_cond_num 1 7.0 0.000 0.000 8.048 8.048 build_subspace 28 9.5 0.016 0.016 8.022 8.022 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.208 0.214 183.996 183.997 qs_energies 1 2.0 0.000 0.001 183.650 183.651 scf_env_do_scf 1 3.0 0.000 0.000 183.087 183.087 qs_ks_update_qs_env 8 5.0 0.000 0.000 180.171 180.171 rebuild_ks_matrix 7 6.0 0.000 0.000 180.158 180.158 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.003 180.158 180.158 hfx_ks_matrix 7 8.0 0.000 0.000 168.040 168.040 integrate_four_center 7 9.0 0.097 0.405 168.024 168.024 integrate_four_center_main 7 10.0 0.005 0.005 154.416 158.128 integrate_four_center_bin 448 11.0 154.411 158.123 154.411 158.123 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 107.117 107.117 init_scf_loop 1 4.0 0.000 0.000 75.968 75.969 integrate_four_center_load 7 10.0 0.000 0.000 8.968 8.972 hfx_load_balance 1 11.0 0.001 0.002 8.968 8.972 mp_sync 70 11.3 3.787 6.055 3.787 6.055 cp_gemm 129 10.3 0.000 0.001 4.920 4.925 cp_gemm_cosma 129 11.3 4.919 4.924 4.919 4.924 hfx_load_balance_count 1 12.0 4.357 4.523 4.357 4.523 hfx_load_balance_bin 1 12.0 4.363 4.457 4.363 4.457 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=44.70000000000002, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=156.081, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=68.886, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=2.386, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=0.593, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="CP2K", label="CP2K", y=0.371, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=11.84899999999999, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=154.411, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=4.919, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.097, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.005, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="CP2K", label="CP2K", y=0.208, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.357, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=3.787, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.363, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.016 0.016 409.887 409.887 qs_energies 1 2.0 0.000 0.000 409.399 409.399 mp2_main 1 3.0 0.000 0.000 403.087 403.087 mp2_gpw_main 1 4.0 0.001 0.001 402.668 402.668 rpa_ri_compute_en 1 5.0 0.000 0.000 387.630 387.630 rpa_num_int 1 6.0 0.000 0.000 387.605 387.605 compute_mat_P_omega 1 7.0 0.002 0.002 202.607 202.607 compute_mat_P_omega_contract 10 8.0 11.908 11.908 201.095 201.095 dbcsr_t_total 2336 9.6 0.017 0.017 191.745 191.745 cp_gemm 105 8.4 0.000 0.000 156.397 156.397 cp_gemm_cosma 105 9.4 156.397 156.397 156.397 156.397 dbcsr_t_contract 787 11.0 45.914 45.914 119.594 119.594 GW_matrix_operations 10 7.0 0.006 0.006 110.911 110.911 compute_mat_P_omega_calc_M_occ 250 9.0 11.927 11.927 76.249 76.249 dbcsr_t_copy 1103 10.7 19.772 19.772 70.750 70.750 dbcsr_tas_total 1149 12.2 0.051 0.051 67.326 67.326 dbcsr_tas_multiply 807 12.1 0.003 0.003 65.948 65.948 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 52.677 52.677 dbcsr_multiply_generic 837 15.8 0.125 0.125 52.543 52.543 dbcsr_tas_dbcsr 807 14.1 0.003 0.003 52.118 52.118 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 50.801 50.801 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 49.702 49.702 dbcsr_tas_mm_1N 524 15.1 0.002 0.002 40.240 40.240 multiply_cannon 837 16.8 13.957 13.957 39.257 39.257 dbcsr_tas_reserve_blocks_index 3261 13.7 7.269 7.269 27.178 27.178 dbcsr_tas_copy 574 11.4 16.713 16.713 24.461 24.461 multiply_cannon_loop 837 17.8 0.159 0.159 22.614 22.614 multiply_cannon_multrec 837 18.8 20.920 20.920 21.525 21.525 dbcsr_t_reserve_blocks_index 2280 12.5 1.353 1.353 20.858 20.858 dbcsr_t_reserve_blocks_index_a 2222 11.6 0.012 0.012 20.566 20.566 dbcsr_reserve_blocks 3717 14.7 19.203 19.203 19.582 19.582 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 19.331 19.331 compute_QP_energies 1 7.0 0.000 0.000 19.214 19.214 compute_self_energy_cubic_gw 1 8.0 0.093 0.093 19.214 19.214 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 15.023 15.023 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 14.064 14.064 dbcsr_t_copy_nocomm 251 12.0 10.898 10.898 13.297 13.297 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 11.991 11.991 make_m2s 1674 16.8 0.105 0.105 10.804 10.804 make_images 1674 17.8 5.071 5.071 10.253 10.253 dbcsr_tas_mm_2 251 15.0 0.001 0.001 10.236 10.236 cp_fm_cholesky_invert 10 8.0 9.253 9.253 9.253 9.253 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.010 58.095 58.096 qs_energies 1 2.0 0.001 0.001 57.967 57.973 mp2_main 1 3.0 0.000 0.001 56.573 56.579 mp2_gpw_main 1 4.0 0.000 0.001 56.518 56.524 rpa_ri_compute_en 1 5.0 0.000 0.000 54.417 54.423 rpa_num_int 1 6.0 0.001 0.001 54.409 54.416 dbcsr_t_total 2336 9.6 0.015 0.016 41.605 41.606 compute_mat_P_omega 1 7.0 0.001 0.002 40.610 40.620 compute_mat_P_omega_contract 10 8.0 0.761 0.787 40.332 40.336 dbcsr_t_contract 787 11.0 1.868 2.020 30.632 30.635 dbcsr_tas_total 1149 12.2 0.062 0.068 26.924 26.925 dbcsr_tas_multiply 807 12.1 0.003 0.003 26.785 26.787 dbcsr_tas_dbcsr 807 14.1 0.003 0.004 19.564 19.565 dbcsr_multiply_generic 837 15.8 0.070 0.075 16.274 17.179 compute_mat_P_omega_calc_M_occ 250 9.0 0.742 0.773 13.562 13.562 multiply_cannon 837 16.8 0.133 0.148 9.591 10.067 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.792 9.792 dbcsr_t_copy 1111 10.7 4.226 4.447 9.417 9.790 dbcsr_tas_mm_1N 524 15.1 0.003 0.003 8.648 9.568 cp_gemm 105 8.4 0.000 0.000 9.320 9.330 cp_gemm_cosma 105 9.4 9.320 9.330 9.320 9.330 multiply_cannon_loop 837 17.8 0.043 0.045 8.731 9.207 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 8.687 8.688 multiply_cannon_multrec 1386 17.8 6.804 7.195 7.055 7.422 dbcsr_tas_mm_2 251 15.0 0.002 0.002 7.393 7.393 mp_sync 8696 11.6 6.339 7.293 6.339 7.293 make_m2s 1674 16.8 0.044 0.047 5.757 6.296 make_images 1674 17.8 0.240 0.248 5.676 6.213 GW_matrix_operations 10 7.0 0.001 0.002 5.953 5.961 compute_QP_energies 1 7.0 0.000 0.000 4.203 4.203 compute_self_energy_cubic_gw 1 8.0 0.005 0.006 4.199 4.203 dbcsr_t_communicate_buffer 1098 11.7 0.092 0.099 3.481 3.680 mp_waitall_2 3776 14.7 3.272 3.569 3.272 3.569 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 3.283 3.293 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 3.171 3.181 make_images_data 1674 18.8 0.037 0.039 2.999 3.171 contract_cubic_gw 21 9.0 0.000 0.000 3.149 3.149 hybrid_alltoall_any 1724 19.5 2.316 2.626 2.887 3.053 dbcsr_t_reserve_blocks_index_a 2791 11.4 0.018 0.021 2.668 2.994 dbcsr_t_reserve_blocks_index 2849 12.4 0.108 0.111 2.664 2.993 dbcsr_tas_reserve_blocks_index 3300 13.8 0.265 0.286 2.613 2.933 make_images_pack 1674 18.8 2.238 2.747 2.252 2.760 dbcsr_reserve_blocks 3785 14.7 2.339 2.643 2.378 2.684 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 2.099 2.099 convert_to_new_pgrid 2421 14.1 0.017 0.019 1.857 1.972 dbcsr_copy 3323 15.8 1.794 1.914 1.823 1.943 mp_waitall_1 26582 19.0 1.524 1.903 1.524 1.903 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 1.706 1.712 dbcsr_add_anytype 909 13.7 0.998 1.054 1.551 1.622 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 1.512 1.516 dbcsr_tas_replicate 396 14.1 0.796 0.873 1.342 1.426 scf_env_do_scf 1 3.0 0.000 0.000 1.340 1.341 scf_env_do_scf_inner_loop 17 4.0 0.000 0.002 1.340 1.341 mp_max_i 2058 9.6 0.974 1.233 0.974 1.233 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=147.68100000000004, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=156.397, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_contract", label="dbcsr_t_contract", y=45.914, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=20.92, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_copy", label="dbcsr_t_copy", y=19.772, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=19.203, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=23.927, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=9.32, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_contract", label="dbcsr_t_contract", y=1.868, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=6.804, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_copy", label="dbcsr_t_copy", y=4.226, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=2.339, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=3.272, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=6.339, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.101 0.101 192.693 192.693 qs_energies 1 2.0 0.000 0.000 191.014 191.014 scf_env_do_scf 1 3.0 0.000 0.000 181.070 181.070 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 181.070 181.070 qs_scf_new_mos 15 5.0 0.000 0.000 80.556 80.556 qs_ks_update_qs_env 15 5.0 0.000 0.000 69.522 69.522 rebuild_ks_matrix 15 6.0 0.000 0.000 69.158 69.158 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.002 69.158 69.158 eigensolver 15 6.0 0.002 0.002 67.058 67.058 cp_fm_diag_elpa 15 7.0 0.000 0.000 52.541 52.541 cp_fm_diag_elpa_base 15 8.0 48.087 48.087 52.540 52.540 qs_vxc_create 15 8.0 0.045 0.045 45.238 45.238 calculate_dispersion_nonloc 15 9.0 9.035 9.035 39.317 39.317 pw_transfer 1191 9.8 0.093 0.093 26.990 26.990 fft_wrap_pw1pw2 1086 10.9 0.013 0.013 26.693 26.693 qs_rho_update_rho 16 5.0 0.000 0.000 24.338 24.338 calculate_rho_elec 16 6.0 0.341 0.341 24.338 24.338 grid_collocate_task_list 16 7.0 22.755 22.755 22.755 22.755 sum_up_and_integrate 15 8.0 0.079 0.079 22.299 22.299 integrate_v_rspace 15 9.0 0.035 0.035 22.220 22.220 grid_integrate_task_list 15 10.0 21.587 21.587 21.587 21.587 fft_wrap_pw1pw2_150 765 12.0 3.339 3.339 20.187 20.187 fft3d_s 1087 12.8 11.104 11.104 11.116 11.116 copy_dbcsr_to_fm 16 5.9 0.001 0.001 10.636 10.636 pw_scatter_s 585 13.0 10.573 10.573 10.573 10.573 cp_fm_cholesky_restore 45 7.0 10.269 10.269 10.269 10.269 dbcsr_complete_redistribute 46 8.3 3.458 3.458 9.527 9.527 cp_fm_upper_to_full 30 8.0 8.700 8.700 8.700 8.700 gspace_mixing 14 5.0 0.271 0.271 7.918 7.918 vdW_energy 15 10.0 7.722 7.722 7.722 7.722 broyden_mixing 14 6.0 7.159 7.159 7.160 7.160 fft_wrap_pw1pw2_200 197 11.5 0.351 0.351 6.254 6.254 xc_vxc_pw_create 15 9.0 1.588 1.588 5.877 5.877 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 4.575 4.575 init_scf_run 1 3.0 0.000 0.000 4.540 4.540 dbcsr_finalize 159 9.9 0.023 0.023 4.141 4.141 dbcsr_merge_all 91 11.1 0.084 0.084 3.979 3.979 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.015 0.020 84.371 84.372 qs_energies 1 2.0 0.001 0.001 83.983 83.983 scf_env_do_scf 1 3.0 0.000 0.000 78.931 78.932 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 78.931 78.932 qs_ks_update_qs_env 15 5.0 0.000 0.000 39.026 39.042 rebuild_ks_matrix 15 6.0 0.000 0.000 38.977 38.994 qs_ks_build_kohn_sham_matrix 15 7.0 0.004 0.005 38.977 38.994 sum_up_and_integrate 15 8.0 0.013 0.016 23.004 23.038 integrate_v_rspace 15 9.0 0.001 0.001 22.992 23.024 qs_rho_update_rho 16 5.0 0.000 0.000 22.973 22.976 calculate_rho_elec 16 6.0 0.011 0.013 22.973 22.975 grid_integrate_task_list 15 10.0 21.229 22.077 21.229 22.077 grid_collocate_task_list 16 7.0 21.112 21.756 21.112 21.756 qs_scf_new_mos 15 5.0 0.001 0.001 17.353 17.406 eigensolver 15 6.0 0.002 0.002 15.916 15.925 qs_vxc_create 15 8.0 0.001 0.001 15.445 15.454 calculate_dispersion_nonloc 15 9.0 1.416 1.465 12.570 12.591 pw_transfer 1191 9.8 0.134 0.149 11.706 11.800 cp_fm_diag_elpa 15 7.0 0.000 0.000 11.643 11.649 cp_fm_diag_elpa_base 15 8.0 11.401 11.436 11.638 11.642 fft_wrap_pw1pw2 1086 10.9 0.021 0.023 11.409 11.516 fft3d_ps 1086 12.9 5.046 5.162 8.569 8.762 fft_wrap_pw1pw2_150 765 12.0 0.698 0.792 7.618 7.684 cp_fm_cholesky_restore 45 7.0 4.037 4.095 4.037 4.095 fft_wrap_pw1pw2_200 197 11.5 0.369 0.400 3.649 3.698 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 3.150 3.150 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.729 2.976 xc_vxc_pw_create 15 9.0 0.056 0.079 2.874 2.891 mp_alltoall_z22v 1086 14.9 2.047 2.335 2.047 2.335 vdW_energy 15 10.0 2.104 2.196 2.104 2.196 rs_pw_transfer 158 9.4 0.002 0.003 1.747 2.165 x_to_yz 585 14.0 0.927 0.971 2.003 2.080 build_core_ppnl 1 5.0 1.822 2.002 1.822 2.002 density_rs2pw 16 7.0 0.002 0.002 1.703 1.958 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=68.31800000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=48.087, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=22.755, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=21.587, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=11.104, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=10.573, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=10.269, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=21.546, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=11.401, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.112, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=21.229, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=4.037, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=5.046, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.084 0.084 337.954 337.954 qs_energies 1 2.0 0.000 0.000 337.798 337.798 ls_scf 1 3.0 0.000 0.000 335.993 335.993 ls_scf_main 1 4.0 0.002 0.002 320.401 320.401 density_matrix_trs4 11 5.0 0.011 0.011 175.392 175.392 ls_scf_dm_to_ks 11 5.0 0.000 0.000 138.325 138.325 matrix_ls_to_qs 11 6.0 0.000 0.000 134.142 134.142 dbcsr_multiply_generic 185 6.1 0.515 0.515 111.022 111.022 dbcsr_copy_into_existing 11 7.0 83.248 83.248 83.248 83.248 multiply_cannon 185 7.1 2.982 2.982 73.980 73.980 dbcsr_complete_redistribute 23 7.5 39.873 39.873 55.606 55.606 multiply_cannon_loop 185 8.1 0.406 0.406 52.741 52.741 matrix_decluster 11 7.0 0.000 0.000 50.893 50.893 multiply_cannon_multrec 185 9.1 50.512 50.512 50.565 50.565 arnoldi_extremal 12 6.1 0.000 0.000 45.557 45.557 arnoldi_normal_ev 12 7.1 0.029 0.029 45.556 45.556 build_subspace 23 8.1 0.131 0.131 44.913 44.913 dbcsr_matrix_vector_mult 652 9.0 0.256 0.256 34.983 34.983 dbcsr_matrix_vector_mult_local 652 10.0 33.439 33.439 33.448 33.448 make_m2s 370 7.1 0.030 0.030 30.455 30.455 make_images 370 8.1 7.415 7.415 27.957 27.957 dbcsr_finalize 646 7.5 0.215 0.215 21.155 21.155 dbcsr_merge_all 597 8.5 3.450 3.450 19.086 19.086 setup_rec_index_2d 370 8.1 18.103 18.103 18.103 18.103 dbcsr_sort_indices 1103 9.9 16.746 16.746 16.746 16.746 quick_finalize 395 10.0 0.537 0.537 14.311 14.311 ls_scf_init_scf 1 4.0 0.000 0.000 14.303 14.303 ls_scf_init_matrix_S 1 5.0 0.000 0.000 13.869 13.869 tree_to_linear_d 110 9.4 13.358 13.358 13.358 13.358 dbcsr_special_finalize 370 9.1 0.003 0.003 13.193 13.193 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 13.032 13.032 dbcsr_dot_sd 144 6.3 8.851 8.851 8.852 8.852 dbcsr_new_transposed 2 7.0 0.134 0.134 7.883 7.883 dbcsr_frobenius_norm 142 6.1 7.786 7.786 7.789 7.789 dbcsr_redistribute 2 8.0 7.639 7.639 7.711 7.711 make_images_data 370 9.1 0.011 0.011 7.157 7.157 matrix_qs_to_ls 12 5.1 0.000 0.000 6.947 6.947 matrix_cluster 12 6.1 0.000 0.000 6.947 6.947 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.010 94.600 94.601 qs_energies 1 2.0 0.000 0.000 94.504 94.504 ls_scf 1 3.0 0.000 0.000 94.426 94.427 ls_scf_main 1 4.0 0.001 0.003 90.647 90.648 density_matrix_trs4 11 5.0 0.009 0.013 86.887 86.975 dbcsr_multiply_generic 185 6.1 0.075 0.092 81.748 82.098 multiply_cannon 185 7.1 0.041 0.043 68.150 69.047 multiply_cannon_loop 185 8.1 0.221 0.234 64.363 66.132 multiply_cannon_multrec 1480 9.1 42.376 45.599 42.865 46.097 mp_waitall_1 11936 10.3 19.465 22.453 19.465 22.453 multiply_cannon_metrocomm3 1480 9.1 0.018 0.020 11.385 15.161 make_m2s 370 7.1 0.034 0.037 9.204 9.301 make_images 370 8.1 0.695 0.741 9.085 9.184 multiply_cannon_metrocomm1 1480 9.1 0.010 0.012 4.751 7.464 calculate_norms 2960 9.1 5.063 5.312 5.063 5.312 make_images_data 370 9.1 0.012 0.013 3.723 4.040 mp_sum_l 1039 5.9 3.056 3.861 3.056 3.861 arnoldi_extremal 12 6.1 0.000 0.001 3.732 3.739 arnoldi_normal_ev 12 7.1 0.002 0.008 3.731 3.738 build_subspace 23 8.1 0.039 0.052 3.610 3.613 hybrid_alltoall_any 393 9.9 0.322 1.629 3.032 3.331 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.254 3.301 dbcsr_matrix_vector_mult 652 9.0 0.018 0.078 3.059 3.130 dbcsr_complete_redistribute 23 7.5 1.793 1.887 2.893 2.967 matrix_ls_to_qs 11 6.0 0.000 0.000 2.862 2.943 ls_scf_init_scf 1 4.0 0.000 0.000 2.900 2.901 ls_scf_init_matrix_S 1 5.0 0.000 0.000 2.861 2.871 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.001 2.088 2.748 make_images_pack 370 9.1 2.481 2.715 2.486 2.721 matrix_decluster 11 7.0 0.000 0.000 2.609 2.689 dbcsr_matrix_vector_mult_local 652 10.0 2.504 2.628 2.508 2.632 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 2.617 2.619 buffer_matrices_ensure_size 370 8.1 2.191 2.290 2.191 2.290 dbcsr_add_d 280 6.0 0.002 0.002 2.081 2.173 dbcsr_add_anytype 280 7.0 1.124 1.202 2.079 2.172 dbcsr_finalize 646 7.5 0.014 0.015 1.977 2.037 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=112.77900000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=83.248, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=50.512, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=39.873, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=33.439, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=18.103, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=20.343000000000004, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=42.376, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.793, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.504, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=5.063, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=19.465, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=3.056, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.006 112.988 112.988 lib_test 1 2.0 0.000 0.000 112.981 112.981 dbcsr_run_tests 3 3.0 0.002 0.002 112.981 112.981 test_multiplies_multiproc 3 4.0 0.001 0.001 93.453 93.453 dbcsr_redistribute 9 5.0 64.333 64.333 67.867 67.867 dbcsr_multiply_generic 9 5.0 0.001 0.001 23.725 23.725 dbcsr_make_random_matrix 9 4.0 14.156 14.156 19.438 19.438 multiply_cannon 9 6.0 0.002 0.002 17.159 17.159 multiply_cannon_loop 9 7.0 0.003 0.003 16.625 16.625 multiply_cannon_multrec 9 8.0 16.621 16.621 16.622 16.622 dbcsr_finalize 27 5.7 0.005 0.005 8.979 8.979 dbcsr_merge_all 18 6.5 3.116 3.116 8.226 8.226 tree_to_linear_d 9 7.0 3.207 3.207 3.207 3.207 mp_alltoall_d11v 27 6.0 3.186 3.186 3.186 3.186 dbcsr_data_release 975 7.6 2.495 2.495 2.495 2.495 make_m2s 18 6.0 0.001 0.001 2.271 2.271 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.005 28.204 28.205 lib_test 1 2.0 0.000 0.000 28.170 28.192 dbcsr_run_tests 3 3.0 0.000 0.001 28.169 28.191 test_multiplies_multiproc 3 4.0 0.001 0.001 26.997 27.080 dbcsr_multiply_generic 9 5.0 0.002 0.002 24.929 25.035 multiply_cannon 9 6.0 0.002 0.003 22.480 22.835 multiply_cannon_loop 9 7.0 0.004 0.004 22.030 22.404 multiply_cannon_multrec 72 8.0 18.481 19.641 18.483 19.642 mp_waitall_1 576 9.2 3.985 4.872 3.985 4.872 multiply_cannon_metrocomm1 72 8.0 0.002 0.002 3.156 4.033 multiply_cannon_metrocomm3 72 8.0 0.000 0.001 0.379 1.557 mp_sum_l 310 2.7 0.556 1.556 0.556 1.556 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.550 1.549 dbcsr_make_random_matrix 9 4.0 0.891 0.975 1.129 1.236 make_m2s 18 6.0 0.001 0.001 1.014 1.071 make_images 18 7.0 0.026 0.027 1.011 1.068 dbcsr_finalize 27 5.7 0.001 0.001 0.952 1.065 dbcsr_merge_all 18 6.5 0.154 0.176 0.827 0.927 dbcsr_data_release 444 7.6 0.700 0.774 0.700 0.774 dbcsr_redistribute 9 5.0 0.412 0.458 0.728 0.759 dbcsr_destroy 111 5.9 0.008 0.067 0.586 0.659 make_images_data 18 8.0 0.001 0.001 0.505 0.592 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=8.990000000000009, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=64.333, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=16.621, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=14.156, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="tree_to_linear_d", label="tree_to_linear_d", y=3.207, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=3.186, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.495, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=3.179000000000002, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.412, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=18.481, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.891, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="tree_to_linear_d", label="tree_to_linear_d", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.7, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=3.985, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.556, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.043 0.043 138.042 138.042 qs_mol_dyn_low 1 2.0 0.005 0.005 136.165 136.165 velocity_verlet 5 3.0 0.004 0.004 110.019 110.019 qmmm_el_coupling 6 3.8 0.000 0.000 63.970 63.970 qmmm_elec_with_gaussian 6 4.8 0.186 0.186 63.963 63.963 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 62.310 62.310 qmmm_elec_gaussian_low_G 6 6.8 60.830 60.830 60.830 60.830 qs_forces 6 3.8 0.001 0.001 57.839 57.839 qs_energies 6 4.8 0.001 0.001 51.453 51.453 scf_env_do_scf 6 5.8 0.001 0.001 47.604 47.604 scf_env_do_scf_inner_loop 39 6.8 0.003 0.003 40.052 40.052 rebuild_ks_matrix 45 8.4 0.000 0.000 39.414 39.414 qs_ks_build_kohn_sham_matrix 45 9.4 0.008 0.008 39.414 39.414 qs_ks_update_qs_env 45 7.8 0.000 0.000 33.778 33.778 pw_transfer 966 11.9 0.072 0.072 24.102 24.102 fft_wrap_pw1pw2 801 13.0 0.009 0.009 23.734 23.734 fft_wrap_pw1pw2_150 507 14.3 2.452 2.452 23.204 23.204 qs_vxc_create 45 10.4 0.001 0.001 21.550 21.550 xc_vxc_pw_create 45 11.4 4.316 4.316 21.549 21.549 pw_scatter_s 429 15.4 10.617 10.617 10.617 10.617 qs_rho_update_rho 45 7.9 0.000 0.000 10.488 10.488 calculate_rho_elec 45 8.9 0.888 0.888 10.488 10.488 xc_rho_set_and_dset_create 45 12.4 0.247 0.247 9.927 9.927 fft3d_s 802 15.0 9.210 9.210 9.221 9.221 qmmm_forces 6 3.8 0.001 0.001 9.057 9.057 pw_integral_ab 2539 7.4 8.650 8.650 8.650 8.650 qmmm_forces_with_gaussian 6 4.8 0.141 0.141 8.577 8.577 init_scf_loop 6 6.8 0.000 0.000 7.546 7.546 qs_ks_ddapc 45 10.4 0.001 0.001 6.649 6.649 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 6.608 6.608 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.648 5.648 qmmm_forces_gaussian_low_G 6 6.8 5.540 5.540 5.540 5.540 pw_poisson_solve 51 9.9 2.342 2.342 5.321 5.321 grid_collocate_task_list 45 9.9 4.804 4.804 4.804 4.804 density_rs2pw 45 9.9 0.003 0.003 4.795 4.795 fist_calc_energy_force 6 3.8 0.002 0.002 4.507 4.507 sum_up_and_integrate 45 10.4 0.236 0.236 4.407 4.407 integrate_v_rspace 45 11.4 0.014 0.014 4.171 4.171 cp_ddapc_apply_CD 45 11.4 0.006 0.006 4.067 4.067 force_nonbond 6 4.8 3.278 3.278 3.278 3.278 qs_scf_new_mos 39 7.8 0.000 0.000 2.810 2.810 qs_scf_loop_do_ot 39 8.8 0.000 0.000 2.810 2.810 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.034 0.037 88.608 88.609 qs_mol_dyn_low 1 2.0 0.005 0.006 86.872 86.968 qs_forces 6 3.8 0.001 0.001 64.096 64.096 qs_energies 6 4.8 0.001 0.001 61.074 61.075 scf_env_do_scf 6 5.8 0.000 0.001 59.542 59.542 scf_env_do_scf_inner_loop 113 6.2 0.003 0.010 57.167 57.168 rebuild_ks_matrix 119 8.1 0.000 0.000 42.280 42.299 qs_ks_build_kohn_sham_matrix 119 9.1 0.021 0.022 42.280 42.298 qs_ks_update_qs_env 119 7.3 0.001 0.001 39.712 39.728 velocity_verlet 5 3.0 0.003 0.003 36.260 36.265 pw_transfer 2446 11.8 0.273 0.293 27.184 27.501 fft_wrap_pw1pw2 2059 12.8 0.034 0.038 26.368 26.688 fft_wrap_pw1pw2_150 1321 14.0 2.337 2.467 25.629 25.829 qs_vxc_create 119 10.1 0.003 0.004 21.605 21.611 xc_vxc_pw_create 119 11.1 0.466 0.615 21.601 21.607 fft3d_ps 2059 14.8 11.974 13.096 19.829 20.422 qs_rho_update_rho 119 7.3 0.001 0.001 16.762 16.763 calculate_rho_elec 119 8.3 0.086 0.096 16.761 16.762 sum_up_and_integrate 119 10.1 0.088 0.097 15.116 15.159 integrate_v_rspace 119 11.1 0.004 0.005 15.027 15.077 qmmm_forces 6 3.8 0.003 0.003 12.633 12.633 qmmm_forces_with_gaussian 6 4.8 0.405 0.478 12.272 12.421 rs_pw_transfer 988 11.5 0.016 0.018 11.556 12.156 density_rs2pw 119 9.3 0.011 0.013 10.132 10.631 xc_rho_set_and_dset_create 119 12.1 0.516 0.604 10.148 10.494 potential_pw2rs 119 12.1 0.011 0.012 9.037 9.051 qmmm_el_coupling 6 3.8 0.000 0.000 9.004 9.046 qmmm_elec_with_gaussian 6 4.8 0.376 0.479 9.001 9.043 grid_collocate_task_list 119 9.3 6.332 6.707 6.332 6.707 mp_alltoall_z22v 2059 16.8 4.719 6.206 4.719 6.206 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 5.822 5.960 grid_integrate_task_list 119 12.1 5.598 5.860 5.598 5.860 rs_pw_transfer_PW2RS_150 125 13.9 2.564 2.662 5.032 5.080 qmmm_forces_gaussian_low_G 6 6.8 4.767 4.907 4.767 4.907 pw_restrict_s3 18 5.8 2.194 2.266 4.815 4.891 rs_pw_transfer_RS2PW_150 125 11.2 2.079 2.248 4.163 4.762 yz_to_x 964 15.3 1.177 1.351 3.499 4.570 mp_waitany 4028 12.8 3.584 4.562 3.584 4.562 x_to_yz 1095 16.3 1.908 2.059 4.305 4.461 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.000 3.890 3.942 pw_prolongate_s3 18 6.8 1.768 1.789 3.889 3.942 pw_integral_ab 2761 7.7 3.213 3.237 3.588 3.810 qs_scf_new_mos 113 7.2 0.001 0.001 3.635 3.644 qs_scf_loop_do_ot 113 8.2 0.001 0.001 3.634 3.643 ot_scf_mini 113 9.2 0.002 0.002 3.476 3.487 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.366 3.423 dbcsr_multiply_generic 2588 12.3 0.097 0.113 3.259 3.375 qs_ks_ddapc 119 10.1 0.003 0.003 2.935 3.075 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 2.579 2.580 mp_sum_dm3 33 5.7 2.390 2.565 2.390 2.565 pw_gather_p 964 14.3 2.135 2.490 2.135 2.490 qmmm_elec_gaussian_low_G 6 6.8 2.419 2.474 2.419 2.474 init_scf_loop 6 6.8 0.000 0.000 2.372 2.372 mp_waitall_1 188862 16.2 2.025 2.206 2.025 2.206 ot_mini 113 10.2 0.001 0.001 2.191 2.204 pw_scatter_p 1095 15.3 1.971 2.053 1.971 2.053 pw_derive 732 12.5 1.772 1.929 1.772 1.929 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=38.39099999999998, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=60.83, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=10.617, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=9.21, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=8.65, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=5.54, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=4.804, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=49.586000000000006, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=2.419, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=3.213, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=4.767, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=6.332, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=5.598, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=11.974, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.719, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-11-26 11:59:48+00:00