StartDate: 2021-12-03 19:08:08+00:00 CpuId: 64x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm CommitSHA: d099a73e17a197937428483d9ba3595b5f6e500d CommitTime: 2021-12-03 18:42:52 +0100 CommitAuthor: Matthias Krack CommitSubject: Remove obsolete GNU arch files Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-b51:gittree-1dbc4c1-buildargs-68b329d Sending build context to Docker daemon 77.31kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> 494c61c1676e Step 3/9 : WORKDIR /workspace ---> Running in cc02de4d906f Removing intermediate container cc02de4d906f ---> f5948f28fc03 Step 4/9 : COPY ./scripts/install_basics.sh . ---> ba1934f3dff2 Step 5/9 : RUN ./install_basics.sh ---> Running in b6c0f7c97076 Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 15383 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8ubuntu0.1_amd64.deb ... Unpacking rsync (3.1.3-8ubuntu0.1) ... Preparing to unpack .../wget_1.20.3-1ubuntu2_amd64.deb ... Unpacking wget (1.20.3-1ubuntu2) over (1.20.3-1ubuntu1) ... Setting up wget (1.20.3-1ubuntu2) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8ubuntu0.1) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container b6c0f7c97076 ---> cdaf3d89a71e Step 6/9 : COPY ./scripts/install_performance.sh . ---> 92330d10a0a4 Step 7/9 : RUN ./install_performance.sh "local" ---> Running in 5d4b17d8d1c8 './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_static.psmp' -> '/opt/cp2k-toolchain/install/arch/local_static.psmp' './local_static.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_static.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container 5d4b17d8d1c8 ---> 341f3ab9b530 Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> 36cd7f1ad7ba Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in aa88ae92f78b Removing intermediate container aa88ae92f78b ---> 48cdf639deae Successfully built 48cdf639deae Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-b51:gittree-cfe3b56-buildargs-25bfbc9 Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: d099a73e17a197937428483d9ba3595b5f6e500d CommitTime: 2021-12-03 18:42:52 +0100 CommitAuthor: Matthias Krack CommitSubject: Remove obsolete GNU arch files ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.033 0.033 163.466 163.466 qs_mol_dyn_low 1 2.0 0.004 0.004 162.524 162.524 qs_forces 11 3.9 0.001 0.001 162.465 162.465 qs_energies 11 4.9 0.001 0.001 152.132 152.132 scf_env_do_scf 11 5.9 0.001 0.001 120.033 120.033 velocity_verlet 10 3.0 0.002 0.002 113.648 113.648 scf_env_do_scf_inner_loop 108 6.5 0.010 0.010 81.184 81.184 init_scf_loop 11 6.9 0.000 0.000 38.649 38.649 prepare_preconditioner 11 7.9 0.000 0.000 34.633 34.633 make_preconditioner 11 8.9 0.000 0.000 34.633 34.633 rebuild_ks_matrix 119 8.3 0.001 0.001 33.243 33.243 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 33.242 33.242 make_full_inverse_cholesky 11 9.9 0.000 0.000 32.617 32.617 qs_ks_update_qs_env 119 7.6 0.001 0.001 30.966 30.966 qs_rho_update_rho 119 7.7 0.001 0.001 28.496 28.496 calculate_rho_elec 119 8.7 1.547 1.547 28.495 28.495 qs_scf_new_mos 108 7.5 0.001 0.001 27.850 27.850 qs_scf_loop_do_ot 108 8.5 0.001 0.001 27.849 27.849 ot_scf_mini 108 9.5 0.003 0.003 25.931 25.931 dbcsr_multiply_generic 2286 12.5 0.181 0.181 23.535 23.535 grid_collocate_task_list 119 9.7 22.436 22.436 22.436 22.436 sum_up_and_integrate 119 10.3 0.384 0.384 21.316 21.316 integrate_v_rspace 119 11.3 0.553 0.553 20.931 20.931 cp_fm_cholesky_invert 11 10.9 19.534 19.534 19.534 19.534 grid_integrate_task_list 119 12.3 17.934 17.934 17.934 17.934 init_scf_run 11 5.9 0.001 0.001 16.532 16.532 scf_env_initial_rho_setup 11 6.9 0.001 0.001 16.531 16.531 wfi_extrapolate 11 7.9 0.001 0.001 15.667 15.667 ot_mini 108 10.5 0.001 0.001 15.435 15.435 cp_gemm 81 9.0 0.000 0.000 15.099 15.099 cp_gemm_cosma 81 10.0 15.098 15.098 15.098 15.098 make_m2s 4572 13.5 0.067 0.067 13.159 13.159 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 10.581 10.581 qs_ot_get_derivative 108 11.5 0.002 0.002 8.044 8.044 pw_transfer 1439 11.6 0.098 0.098 7.644 7.644 ot_diis_step 108 11.5 0.006 0.006 7.387 7.387 fft_wrap_pw1pw2 1201 12.6 0.011 0.011 7.325 7.325 cp_fm_cholesky_decompose 22 10.9 6.984 6.984 6.984 6.984 make_images 4572 14.5 2.602 2.602 6.925 6.925 qs_ot_get_p 119 10.4 0.001 0.001 6.531 6.531 dbcsr_make_dense_low 5837 15.5 0.100 0.100 6.447 6.447 dbcsr_complete_redistribute 329 12.2 3.000 3.000 6.391 6.391 make_dense_data 5837 16.5 5.675 5.675 6.324 6.324 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.273 6.273 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 6.246 6.246 apply_single 119 13.6 0.001 0.001 6.245 6.245 fft_wrap_pw1pw2_140 487 13.2 0.629 0.629 6.210 6.210 qs_env_update_s_mstruct 11 6.9 0.000 0.000 5.889 5.889 multiply_cannon 2286 13.5 0.975 0.975 5.788 5.788 dbcsr_make_images_dense 3978 14.8 0.028 0.028 5.764 5.764 qs_create_task_list 11 7.9 0.000 0.000 5.356 5.356 generate_qs_task_list 11 8.9 3.641 3.641 5.356 5.356 dbcsr_copy 2102 12.0 0.288 0.288 5.268 5.268 copy_dbcsr_to_fm 153 11.3 0.004 0.004 5.149 5.149 qs_ot_p2m_diag 50 11.0 0.220 0.220 4.947 4.947 dbcsr_copy_into_existing 22 7.9 4.933 4.933 4.934 4.934 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 4.823 4.823 calculate_w_matrix_ot 11 6.9 0.008 0.008 4.823 4.823 pw_poisson_solve 119 10.3 1.938 1.938 4.603 4.603 density_rs2pw 119 9.7 0.006 0.006 4.512 4.512 cp_dbcsr_syevd 50 12.0 0.005 0.005 4.375 4.375 multiply_cannon_loop 2286 14.5 0.051 0.051 4.251 4.251 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.248 4.248 cp_fm_diag_elpa 50 13.0 0.000 0.000 4.209 4.209 cp_fm_diag_elpa_base 50 14.0 4.154 4.154 4.208 4.208 multiply_cannon_multrec 2286 15.5 4.118 4.118 4.198 4.198 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.065 4.065 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.057 4.057 copy_fm_to_dbcsr 176 11.2 0.002 0.002 3.452 3.452 fft3d_s 1202 14.6 3.307 3.307 3.314 3.314 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.008 0.010 71.080 71.081 qs_mol_dyn_low 1 2.0 0.005 0.005 70.954 70.960 qs_forces 11 3.9 0.002 0.002 70.899 70.899 qs_energies 11 4.9 0.001 0.002 66.070 66.071 scf_env_do_scf 11 5.9 0.001 0.001 59.556 59.557 scf_env_do_scf_inner_loop 108 6.5 0.003 0.011 55.283 55.284 velocity_verlet 10 3.0 0.002 0.002 42.434 42.436 rebuild_ks_matrix 119 8.3 0.001 0.001 27.801 27.844 qs_ks_build_kohn_sham_matrix 119 9.3 0.021 0.022 27.800 27.844 qs_ks_update_qs_env 119 7.6 0.001 0.001 24.759 24.791 sum_up_and_integrate 119 10.3 0.046 0.049 21.861 21.896 integrate_v_rspace 119 11.3 0.004 0.005 21.815 21.848 qs_rho_update_rho 119 7.7 0.001 0.001 21.648 21.658 calculate_rho_elec 119 8.7 0.047 0.049 21.647 21.657 dbcsr_multiply_generic 2286 12.5 0.133 0.137 16.434 16.549 grid_integrate_task_list 119 12.3 15.770 16.134 15.770 16.134 grid_collocate_task_list 119 9.7 15.612 16.092 15.612 16.092 qs_scf_new_mos 108 7.5 0.001 0.001 13.364 13.400 qs_scf_loop_do_ot 108 8.5 0.001 0.001 13.363 13.399 ot_scf_mini 108 9.5 0.003 0.004 12.551 12.585 multiply_cannon 2286 13.5 0.221 0.226 10.991 11.236 multiply_cannon_loop 2286 14.5 0.222 0.237 9.982 10.405 mp_waitall_1 169478 16.3 8.215 8.484 8.215 8.484 ot_mini 108 10.5 0.001 0.001 7.396 7.433 rs_pw_transfer 974 11.9 0.016 0.017 6.355 7.364 density_rs2pw 119 9.7 0.009 0.009 5.468 6.478 multiply_cannon_metrocomm3 18288 15.5 0.079 0.083 5.380 5.727 pw_transfer 1439 11.6 0.129 0.142 5.546 5.633 fft_wrap_pw1pw2 1201 12.6 0.014 0.015 5.267 5.353 potential_pw2rs 119 12.3 0.010 0.011 4.892 4.901 fft_wrap_pw1pw2_140 487 13.2 0.543 0.562 4.612 4.797 init_scf_run 11 5.9 0.000 0.002 4.491 4.491 scf_env_initial_rho_setup 11 6.9 0.000 0.001 4.490 4.491 init_scf_loop 11 6.9 0.000 0.001 4.257 4.257 wfi_extrapolate 11 7.9 0.001 0.001 4.104 4.104 fft3d_ps 1201 14.6 2.165 2.267 3.900 3.958 ot_diis_step 108 11.5 0.005 0.005 3.768 3.769 make_m2s 4572 13.5 0.075 0.077 3.715 3.767 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.675 3.727 apply_single 119 13.6 0.001 0.001 3.675 3.726 qs_ot_get_derivative 108 11.5 0.001 0.002 3.598 3.633 multiply_cannon_multrec 18288 15.5 3.436 3.516 3.454 3.534 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.254 3.268 mp_waitany 9880 13.7 2.144 3.140 2.144 3.140 make_images 4572 14.5 0.190 0.195 3.028 3.083 rs_pw_transfer_RS2PW_140 130 11.5 0.538 0.578 1.957 2.980 rs_pw_transfer_PW2RS_140 130 13.9 1.181 1.235 2.479 2.503 mp_alltoall_d11v 2130 13.8 1.434 2.075 1.434 2.075 qs_ot_get_p 119 10.4 0.001 0.001 1.748 1.793 rs_gather_matrices 119 12.3 0.123 0.135 1.102 1.770 cp_gemm 81 9.0 0.000 0.000 1.620 1.625 cp_gemm_cosma 81 10.0 1.619 1.624 1.619 1.624 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 1.427 1.541 make_images_data 4572 15.5 0.062 0.066 1.416 1.533 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=77.36200000000002, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=22.436, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=19.534, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=17.934, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=15.098, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=6.984, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.118, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=24.262999999999998, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=15.612, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=15.77, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=1.619, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.436, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.215, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=2.165, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.031 0.031 213.971 213.971 qs_mol_dyn_low 1 2.0 0.004 0.004 213.184 213.184 qs_forces 11 3.9 0.001 0.001 213.127 213.127 qs_energies 11 4.9 0.001 0.001 199.208 199.208 scf_env_do_scf 11 5.9 0.001 0.001 163.116 163.116 velocity_verlet 10 3.0 0.002 0.002 144.515 144.515 scf_env_do_scf_inner_loop 96 6.5 0.009 0.009 121.038 121.038 rebuild_ks_matrix 107 8.3 0.001 0.001 61.490 61.490 qs_ks_build_kohn_sham_matrix 107 9.3 0.016 0.016 61.490 61.490 qs_ks_update_qs_env 107 7.6 0.001 0.001 55.384 55.384 qs_rho_update_rho 107 7.7 0.001 0.001 54.117 54.117 calculate_rho_elec 107 8.7 1.386 1.386 54.116 54.116 sum_up_and_integrate 107 10.3 0.338 0.338 50.997 50.997 integrate_v_rspace 107 11.3 0.506 0.506 50.659 50.659 grid_collocate_task_list 107 9.7 48.759 48.759 48.759 48.759 grid_integrate_task_list 107 12.3 47.988 47.988 47.988 47.988 init_scf_loop 11 6.9 0.000 0.000 41.880 41.880 prepare_preconditioner 11 7.9 0.000 0.000 34.593 34.593 make_preconditioner 11 8.9 0.000 0.000 34.593 34.593 make_full_inverse_cholesky 11 9.9 0.000 0.000 32.662 32.662 qs_scf_new_mos 96 7.5 0.001 0.001 23.839 23.839 qs_scf_loop_do_ot 96 8.5 0.001 0.001 23.838 23.838 ot_scf_mini 96 9.5 0.003 0.003 22.137 22.137 dbcsr_multiply_generic 1966 12.4 0.154 0.154 20.268 20.268 cp_fm_cholesky_invert 11 10.9 20.036 20.036 20.036 20.036 init_scf_run 11 5.9 0.001 0.001 19.502 19.502 scf_env_initial_rho_setup 11 6.9 0.001 0.001 19.501 19.501 wfi_extrapolate 11 7.9 0.001 0.001 18.414 18.414 cp_gemm 81 9.0 0.000 0.000 15.036 15.036 cp_gemm_cosma 81 10.0 15.036 15.036 15.036 15.036 ot_mini 96 10.5 0.001 0.001 12.922 12.922 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.580 11.580 make_m2s 3932 13.4 0.057 0.057 11.086 11.086 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.666 7.666 qs_ot_get_derivative 96 11.5 0.001 0.001 6.932 6.932 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.846 6.846 pw_transfer 1295 11.6 0.084 0.084 6.733 6.733 cp_fm_cholesky_decompose 22 10.9 6.632 6.632 6.632 6.632 fft_wrap_pw1pw2 1081 12.6 0.009 0.009 6.457 6.457 qs_create_task_list 11 7.9 0.000 0.000 6.327 6.327 generate_qs_task_list 11 8.9 4.659 4.659 6.327 6.327 dbcsr_complete_redistribute 317 12.2 2.896 2.896 6.255 6.255 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.250 6.250 make_images 3932 14.4 2.253 2.253 5.994 5.994 ot_diis_step 96 11.5 0.005 0.005 5.987 5.987 qs_ot_get_p 107 10.4 0.001 0.001 5.777 5.777 fft_wrap_pw1pw2_140 439 13.2 0.554 0.554 5.466 5.466 dbcsr_copy 1855 11.9 0.256 0.256 5.300 5.300 dbcsr_make_dense_low 4961 15.5 0.083 0.083 5.260 5.260 multiply_cannon 1966 13.4 0.905 0.905 5.236 5.236 make_dense_data 4961 16.5 4.603 4.603 5.157 5.157 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.130 5.130 apply_single 107 13.6 0.001 0.001 5.130 5.130 copy_dbcsr_to_fm 147 11.2 0.003 0.003 5.066 5.066 dbcsr_copy_into_existing 22 7.9 5.003 5.003 5.003 5.003 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 4.847 4.847 calculate_w_matrix_ot 11 6.9 0.008 0.008 4.847 4.847 dbcsr_make_images_dense 3386 14.7 0.023 0.023 4.675 4.675 qs_ot_p2m_diag 44 11.0 0.193 0.193 4.435 4.435 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.009 127.475 127.476 qs_mol_dyn_low 1 2.0 0.005 0.005 127.358 127.364 qs_forces 11 3.9 0.002 0.002 127.304 127.304 qs_energies 11 4.9 0.001 0.001 118.587 118.589 scf_env_do_scf 11 5.9 0.001 0.001 108.998 108.999 scf_env_do_scf_inner_loop 96 6.5 0.003 0.009 101.313 101.313 velocity_verlet 10 3.0 0.002 0.002 75.818 75.820 rebuild_ks_matrix 107 8.3 0.001 0.001 58.544 58.610 qs_ks_build_kohn_sham_matrix 107 9.3 0.019 0.020 58.544 58.609 sum_up_and_integrate 107 10.3 0.042 0.046 53.086 53.127 integrate_v_rspace 107 11.3 0.004 0.004 53.044 53.083 qs_ks_update_qs_env 107 7.6 0.001 0.001 51.594 51.648 qs_rho_update_rho 107 7.7 0.001 0.001 48.714 48.721 calculate_rho_elec 107 8.7 0.042 0.044 48.713 48.720 grid_integrate_task_list 107 12.3 46.309 47.618 46.309 47.618 grid_collocate_task_list 107 9.7 42.243 43.685 42.243 43.685 dbcsr_multiply_generic 1966 12.4 0.117 0.122 14.931 15.098 qs_scf_new_mos 96 7.5 0.001 0.001 11.916 11.953 qs_scf_loop_do_ot 96 8.5 0.001 0.001 11.915 11.953 ot_scf_mini 96 9.5 0.003 0.003 11.186 11.224 multiply_cannon 1966 13.4 0.191 0.196 10.072 10.312 multiply_cannon_loop 1966 14.4 0.196 0.208 9.156 9.428 rs_pw_transfer 878 11.9 0.014 0.016 6.922 8.491 mp_waitall_1 146670 16.2 7.563 7.871 7.563 7.871 init_scf_loop 11 6.9 0.000 0.001 7.669 7.670 density_rs2pw 107 9.7 0.008 0.008 5.957 7.526 init_scf_run 11 5.9 0.000 0.002 7.507 7.508 scf_env_initial_rho_setup 11 6.9 0.000 0.001 7.507 7.507 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.145 7.156 wfi_extrapolate 11 7.9 0.001 0.001 6.888 6.888 ot_mini 96 10.5 0.001 0.001 6.589 6.632 multiply_cannon_metrocomm3 15728 15.4 0.069 0.074 4.968 5.339 pw_transfer 1295 11.6 0.117 0.125 5.043 5.111 fft_wrap_pw1pw2 1081 12.6 0.013 0.014 4.786 4.861 mp_waitany 8968 13.7 3.093 4.629 3.093 4.629 potential_pw2rs 107 12.3 0.009 0.009 4.552 4.560 rs_pw_transfer_RS2PW_140 118 11.5 0.397 0.429 2.777 4.355 fft_wrap_pw1pw2_140 439 13.2 0.493 0.511 4.187 4.354 mp_alltoall_d11v 1998 13.7 2.441 4.013 2.441 4.013 rs_gather_matrices 107 12.3 0.120 0.130 2.131 3.659 fft3d_ps 1081 14.6 1.974 2.081 3.542 3.621 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.360 3.418 apply_single 107 13.6 0.001 0.001 3.360 3.418 ot_diis_step 96 11.5 0.004 0.005 3.392 3.392 make_m2s 3932 13.4 0.065 0.068 3.321 3.379 multiply_cannon_multrec 15728 15.4 3.127 3.234 3.142 3.250 qs_ot_get_derivative 96 11.5 0.001 0.001 3.173 3.213 make_images 3932 14.4 0.167 0.171 2.719 2.781 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=75.52000000000001, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=48.759, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=47.988, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=20.036, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=15.036, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=6.632, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=25.14, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=42.243, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=46.309, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.127, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=3.093, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=7.563, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.382 0.382 273.221 273.221 qs_energies 1 2.0 0.000 0.000 271.986 271.986 scf_env_do_scf 1 3.0 0.000 0.000 269.542 269.542 qs_ks_update_qs_env 8 5.0 0.000 0.000 252.576 252.576 rebuild_ks_matrix 7 6.0 0.000 0.000 252.473 252.473 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.001 252.473 252.473 hfx_ks_matrix 7 8.0 0.000 0.000 169.263 169.263 integrate_four_center 7 9.0 2.283 2.283 169.235 169.235 integrate_four_center_main 7 10.0 0.586 0.586 157.851 157.851 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 157.433 157.433 integrate_four_center_bin 451 11.0 157.265 157.265 157.265 157.265 init_scf_loop 1 4.0 0.000 0.000 112.095 112.095 cp_gemm 129 10.3 0.000 0.000 68.684 68.684 cp_gemm_cosma 129 11.3 68.683 68.683 68.683 68.683 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 39.705 39.705 admm_fit_mo_coeffs 7 9.0 0.000 0.000 38.125 38.125 admm_mo_merge_derivs 7 8.0 0.000 0.000 35.009 35.009 merge_mo_derivs_diag 7 9.0 0.021 0.021 35.009 35.009 purify_mo_diag 7 10.0 0.001 0.001 22.367 22.367 fit_mo_coeffs 7 10.0 0.000 0.000 15.758 15.758 prepare_preconditioner 1 5.0 0.000 0.000 13.307 13.307 make_preconditioner 1 6.0 0.000 0.000 13.307 13.307 integrate_four_center_load 7 10.0 0.001 0.001 8.741 8.741 hfx_load_balance 1 11.0 0.002 0.002 8.741 8.741 arnoldi_normal_ev 11 9.3 0.002 0.002 8.166 8.166 estimate_cond_num 1 7.0 0.000 0.000 8.096 8.096 build_subspace 28 9.5 0.014 0.014 8.042 8.042 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.212 0.218 183.718 183.719 qs_energies 1 2.0 0.001 0.001 183.348 183.349 scf_env_do_scf 1 3.0 0.000 0.000 182.712 182.712 qs_ks_update_qs_env 8 5.0 0.000 0.000 179.850 179.850 rebuild_ks_matrix 7 6.0 0.000 0.000 179.837 179.838 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 179.837 179.838 hfx_ks_matrix 7 8.0 0.000 0.000 168.000 168.002 integrate_four_center 7 9.0 0.080 0.384 167.985 167.986 integrate_four_center_main 7 10.0 0.005 0.005 154.414 158.007 integrate_four_center_bin 448 11.0 154.409 158.002 154.409 158.002 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 107.069 107.069 init_scf_loop 1 4.0 0.000 0.000 75.642 75.642 integrate_four_center_load 7 10.0 0.000 0.000 8.936 8.937 hfx_load_balance 1 11.0 0.001 0.001 8.936 8.937 mp_sync 70 11.3 3.840 6.022 3.840 6.022 cp_gemm 129 10.3 0.000 0.001 4.854 4.859 cp_gemm_cosma 129 11.3 4.853 4.859 4.853 4.859 hfx_load_balance_bin 1 12.0 4.310 4.467 4.310 4.467 hfx_load_balance_count 1 12.0 4.300 4.460 4.300 4.460 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=44.02200000000002, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=157.265, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=68.683, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=2.283, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=0.586, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="CP2K", label="CP2K", y=0.382, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=11.708999999999975, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=154.409, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=4.853, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.08, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.005, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="CP2K", label="CP2K", y=0.212, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.3, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=3.84, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.31, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.015 0.015 393.008 393.008 qs_energies 1 2.0 0.000 0.000 392.516 392.516 mp2_main 1 3.0 0.000 0.000 386.326 386.326 mp2_gpw_main 1 4.0 0.000 0.000 385.891 385.891 rpa_ri_compute_en 1 5.0 0.000 0.000 370.695 370.695 rpa_num_int 1 6.0 0.000 0.000 370.671 370.671 compute_mat_P_omega 1 7.0 0.002 0.002 197.979 197.979 compute_mat_P_omega_contract 10 8.0 11.876 11.876 196.486 196.486 dbcsr_t_total 2336 9.6 0.014 0.014 187.160 187.160 cp_gemm 105 8.4 0.000 0.000 145.456 145.456 cp_gemm_cosma 105 9.4 145.455 145.455 145.455 145.455 dbcsr_t_contract 787 11.0 46.062 46.062 116.535 116.535 GW_matrix_operations 10 7.0 0.006 0.006 102.594 102.594 compute_mat_P_omega_calc_M_occ 250 9.0 11.866 11.866 74.623 74.623 dbcsr_t_copy 1103 10.7 19.608 19.608 69.237 69.237 dbcsr_tas_total 1149 12.2 0.048 0.048 64.600 64.600 dbcsr_tas_multiply 807 12.1 0.003 0.003 63.233 63.233 dbcsr_multiply_generic 837 15.8 0.125 0.125 50.469 50.469 dbcsr_tas_dbcsr 807 14.1 0.003 0.003 50.059 50.059 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 49.316 49.316 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 48.685 48.685 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 47.593 47.593 dbcsr_tas_mm_1N 524 15.1 0.002 0.002 38.788 38.788 multiply_cannon 837 16.8 9.104 9.104 38.235 38.235 dbcsr_tas_reserve_blocks_index 3261 13.7 7.180 7.180 26.525 26.525 dbcsr_tas_copy 574 11.4 16.177 16.177 23.713 23.713 multiply_cannon_loop 837 17.8 0.143 0.143 20.948 20.948 dbcsr_t_reserve_blocks_index 2280 12.5 1.182 1.182 20.159 20.159 multiply_cannon_multrec 837 18.8 15.766 15.766 19.952 19.952 dbcsr_t_reserve_blocks_index_a 2222 11.6 0.009 0.009 19.864 19.864 dbcsr_reserve_blocks 3717 14.7 18.670 18.670 19.050 19.050 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 18.879 18.879 compute_QP_energies 1 7.0 0.000 0.000 18.617 18.617 compute_self_energy_cubic_gw 1 8.0 0.094 0.094 18.616 18.616 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 15.181 15.181 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 13.814 13.814 dbcsr_t_copy_nocomm 251 12.0 10.788 10.788 13.116 13.116 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 11.370 11.370 make_m2s 1674 16.8 0.105 0.105 9.935 9.935 dbcsr_tas_mm_2 251 15.0 0.001 0.001 9.742 9.742 make_images 1674 17.8 4.629 4.629 9.456 9.456 cp_fm_cholesky_invert 10 8.0 8.626 8.626 8.626 8.626 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.008 54.536 54.537 qs_energies 1 2.0 0.001 0.001 54.425 54.431 mp2_main 1 3.0 0.001 0.001 53.105 53.111 mp2_gpw_main 1 4.0 0.000 0.000 53.042 53.048 rpa_ri_compute_en 1 5.0 0.000 0.000 50.967 50.972 rpa_num_int 1 6.0 0.000 0.001 50.959 50.965 dbcsr_t_total 2336 9.6 0.015 0.016 38.725 38.726 compute_mat_P_omega 1 7.0 0.001 0.002 37.785 37.800 compute_mat_P_omega_contract 10 8.0 0.706 0.732 37.505 37.510 dbcsr_t_contract 787 11.0 1.774 1.935 28.551 28.556 dbcsr_tas_total 1149 12.2 0.059 0.066 25.042 25.042 dbcsr_tas_multiply 807 12.1 0.003 0.003 24.886 24.888 dbcsr_tas_dbcsr 807 14.1 0.003 0.003 18.101 18.102 dbcsr_multiply_generic 837 15.8 0.066 0.070 14.982 15.794 compute_mat_P_omega_calc_M_occ 250 9.0 0.693 0.717 12.699 12.699 multiply_cannon 837 16.8 0.126 0.142 8.811 9.162 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.045 9.045 cp_gemm 105 8.4 0.000 0.000 8.951 8.972 cp_gemm_cosma 105 9.4 8.950 8.972 8.950 8.972 dbcsr_t_copy 1111 10.7 3.917 4.111 8.685 8.946 dbcsr_tas_mm_1N 524 15.1 0.003 0.003 7.993 8.750 multiply_cannon_loop 837 17.8 0.039 0.042 8.006 8.344 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 8.111 8.111 mp_sync 8696 11.6 6.044 7.046 6.044 7.046 dbcsr_tas_mm_2 251 15.0 0.002 0.002 6.799 6.799 multiply_cannon_multrec 1386 17.8 6.275 6.589 6.497 6.795 make_m2s 1674 16.8 0.041 0.044 5.310 5.865 make_images 1674 17.8 0.241 0.259 5.234 5.788 GW_matrix_operations 10 7.0 0.001 0.001 5.729 5.736 compute_QP_energies 1 7.0 0.000 0.001 3.982 3.982 compute_self_energy_cubic_gw 1 8.0 0.005 0.005 3.979 3.982 dbcsr_t_communicate_buffer 1098 11.7 0.089 0.094 3.128 3.293 mp_waitall_2 3776 14.7 2.938 3.161 2.938 3.161 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 3.115 3.125 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 3.007 3.017 contract_cubic_gw 21 9.0 0.000 0.000 2.948 2.948 make_images_data 1674 18.8 0.035 0.036 2.725 2.834 hybrid_alltoall_any 1724 19.5 2.109 2.404 2.621 2.745 dbcsr_t_reserve_blocks_index_a 2791 11.4 0.017 0.019 2.442 2.743 dbcsr_t_reserve_blocks_index 2849 12.4 0.097 0.103 2.438 2.742 dbcsr_tas_reserve_blocks_index 3300 13.8 0.265 0.283 2.395 2.699 make_images_pack 1674 18.8 2.081 2.525 2.094 2.538 dbcsr_reserve_blocks 3785 14.7 2.117 2.403 2.155 2.442 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 2.073 2.073 convert_to_new_pgrid 2421 14.1 0.016 0.018 1.691 1.829 dbcsr_copy 3323 15.8 1.632 1.772 1.660 1.799 mp_waitall_1 26582 19.0 1.384 1.740 1.384 1.740 compute_mat_P_omega_copy_M_vir 250 9.0 0.001 0.002 1.556 1.562 dbcsr_add_anytype 909 13.7 0.912 0.963 1.429 1.494 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 1.390 1.394 dbcsr_tas_replicate 396 14.1 0.771 0.851 1.290 1.361 scf_env_do_scf 1 3.0 0.000 0.000 1.269 1.269 scf_env_do_scf_inner_loop 17 4.0 0.000 0.001 1.269 1.269 mp_max_i 2058 9.6 0.969 1.216 0.969 1.216 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=131.26999999999998, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=145.455, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_contract", label="dbcsr_t_contract", y=46.062, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_copy", label="dbcsr_t_copy", y=19.608, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=18.67, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_tas_copy", label="dbcsr_tas_copy", y=16.177, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=15.766, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=22.521, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=8.95, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_contract", label="dbcsr_t_contract", y=1.774, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_copy", label="dbcsr_t_copy", y=3.917, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=2.117, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_tas_copy", label="dbcsr_tas_copy", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=6.275, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.938, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=6.044, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.095 0.095 193.739 193.739 qs_energies 1 2.0 0.000 0.000 192.033 192.033 scf_env_do_scf 1 3.0 0.000 0.000 182.411 182.411 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 182.410 182.410 qs_scf_new_mos 15 5.0 0.000 0.000 81.020 81.020 qs_ks_update_qs_env 15 5.0 0.000 0.000 70.082 70.082 rebuild_ks_matrix 15 6.0 0.000 0.000 69.685 69.685 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.002 69.685 69.685 eigensolver 15 6.0 0.002 0.002 65.981 65.981 cp_fm_diag_elpa 15 7.0 0.000 0.000 51.511 51.511 cp_fm_diag_elpa_base 15 8.0 46.649 46.649 51.511 51.511 qs_vxc_create 15 8.0 0.036 0.036 46.161 46.161 calculate_dispersion_nonloc 15 9.0 8.963 8.963 40.270 40.270 pw_transfer 1191 9.8 0.097 0.097 26.835 26.835 fft_wrap_pw1pw2 1086 10.9 0.014 0.014 26.556 26.556 qs_rho_update_rho 16 5.0 0.000 0.000 24.416 24.416 calculate_rho_elec 16 6.0 0.336 0.336 24.416 24.416 grid_collocate_task_list 16 7.0 22.901 22.901 22.901 22.901 sum_up_and_integrate 15 8.0 0.086 0.086 21.835 21.835 integrate_v_rspace 15 9.0 0.032 0.032 21.749 21.749 grid_integrate_task_list 15 10.0 21.130 21.130 21.130 21.130 fft_wrap_pw1pw2_150 765 12.0 3.349 3.349 20.151 20.151 copy_dbcsr_to_fm 16 5.9 0.001 0.001 11.888 11.888 pw_scatter_s 585 13.0 11.294 11.294 11.294 11.294 dbcsr_complete_redistribute 46 8.3 3.778 3.778 10.739 10.739 fft3d_s 1087 12.8 10.377 10.377 10.388 10.388 cp_fm_cholesky_restore 45 7.0 9.745 9.745 9.745 9.745 cp_fm_upper_to_full 30 8.0 9.583 9.583 9.583 9.583 vdW_energy 15 10.0 8.645 8.645 8.645 8.645 gspace_mixing 14 5.0 0.270 0.270 8.194 8.194 broyden_mixing 14 6.0 7.416 7.416 7.417 7.417 fft_wrap_pw1pw2_200 197 11.5 0.328 0.328 6.147 6.147 xc_vxc_pw_create 15 9.0 1.594 1.594 5.856 5.856 dbcsr_finalize 159 9.9 0.021 0.021 4.726 4.726 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 4.575 4.575 dbcsr_merge_all 91 11.1 0.072 0.072 4.574 4.574 init_scf_run 1 3.0 0.000 0.000 4.312 4.312 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.016 0.018 82.028 82.029 qs_energies 1 2.0 0.000 0.001 81.624 81.624 scf_env_do_scf 1 3.0 0.000 0.000 76.601 76.602 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 76.601 76.602 qs_ks_update_qs_env 15 5.0 0.000 0.000 37.312 37.339 rebuild_ks_matrix 15 6.0 0.000 0.000 37.272 37.300 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.004 37.272 37.300 qs_rho_update_rho 16 5.0 0.000 0.000 23.279 23.284 calculate_rho_elec 16 6.0 0.011 0.012 23.279 23.284 sum_up_and_integrate 15 8.0 0.011 0.013 22.523 22.568 integrate_v_rspace 15 9.0 0.001 0.001 22.512 22.556 grid_collocate_task_list 16 7.0 21.351 22.059 21.351 22.059 grid_integrate_task_list 15 10.0 20.830 21.404 20.830 21.404 qs_scf_new_mos 15 5.0 0.001 0.001 16.555 16.689 eigensolver 15 6.0 0.002 0.002 15.227 15.239 qs_vxc_create 15 8.0 0.001 0.001 14.281 14.289 calculate_dispersion_nonloc 15 9.0 1.373 1.424 11.636 11.648 cp_fm_diag_elpa 15 7.0 0.000 0.000 11.151 11.157 cp_fm_diag_elpa_base 15 8.0 10.932 10.965 11.146 11.150 pw_transfer 1191 9.8 0.125 0.147 10.548 10.710 fft_wrap_pw1pw2 1086 10.9 0.019 0.023 10.274 10.473 fft3d_ps 1086 12.9 4.550 4.754 7.771 7.914 fft_wrap_pw1pw2_150 765 12.0 0.625 0.664 6.877 6.954 cp_fm_cholesky_restore 45 7.0 3.860 3.912 3.860 3.912 fft_wrap_pw1pw2_200 197 11.5 0.328 0.357 3.270 3.410 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 3.111 3.112 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.709 2.945 xc_vxc_pw_create 15 9.0 0.054 0.070 2.644 2.667 rs_pw_transfer 158 9.4 0.002 0.003 1.839 2.396 vdW_energy 15 10.0 2.002 2.170 2.002 2.170 density_rs2pw 16 7.0 0.001 0.002 1.787 2.169 mp_alltoall_z22v 1086 14.9 1.858 2.163 1.858 2.163 build_core_ppnl 1 5.0 1.816 1.990 1.816 1.990 x_to_yz 585 14.0 0.836 0.894 1.800 1.904 mp_waitany 520 11.3 1.215 1.801 1.215 1.801 rs_pw_transfer_RS2PW_200 18 8.8 0.068 0.077 0.972 1.723 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=71.64300000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=46.649, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=22.901, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=21.13, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=11.294, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=10.377, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=9.745, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=20.50500000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=10.932, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.351, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=20.83, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=3.86, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=4.55, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.083 0.083 311.368 311.368 qs_energies 1 2.0 0.000 0.000 311.182 311.182 ls_scf 1 3.0 0.000 0.000 309.488 309.488 ls_scf_main 1 4.0 0.002 0.002 295.037 295.037 density_matrix_trs4 11 5.0 0.010 0.010 166.070 166.070 ls_scf_dm_to_ks 11 5.0 0.000 0.000 122.273 122.273 matrix_ls_to_qs 11 6.0 0.000 0.000 118.026 118.026 dbcsr_multiply_generic 185 6.1 0.464 0.464 102.852 102.852 multiply_cannon 185 7.1 2.921 2.921 69.625 69.625 dbcsr_copy_into_existing 11 7.0 68.287 68.287 68.288 68.288 dbcsr_complete_redistribute 23 7.5 39.073 39.073 54.521 54.521 matrix_decluster 11 7.0 0.000 0.000 49.737 49.737 multiply_cannon_loop 185 8.1 0.370 0.370 49.140 49.140 multiply_cannon_multrec 185 9.1 47.080 47.080 47.130 47.130 arnoldi_extremal 12 6.1 0.000 0.000 45.771 45.771 arnoldi_normal_ev 12 7.1 0.028 0.028 45.771 45.771 build_subspace 23 8.1 0.131 0.131 45.166 45.166 dbcsr_matrix_vector_mult 652 9.0 0.269 0.269 34.796 34.796 dbcsr_matrix_vector_mult_local 652 10.0 33.173 33.173 33.181 33.181 make_m2s 370 7.1 0.029 0.029 27.354 27.354 make_images 370 8.1 7.202 7.202 25.098 25.098 dbcsr_finalize 646 7.5 0.205 0.205 20.420 20.420 dbcsr_merge_all 597 8.5 3.232 3.232 18.511 18.511 setup_rec_index_2d 370 8.1 17.420 17.420 17.420 17.420 dbcsr_sort_indices 1103 9.9 14.391 14.391 14.391 14.391 ls_scf_init_scf 1 4.0 0.000 0.000 13.452 13.452 tree_to_linear_d 110 9.4 13.126 13.126 13.126 13.126 ls_scf_init_matrix_S 1 5.0 0.000 0.000 13.024 13.024 quick_finalize 395 10.0 0.459 0.459 12.245 12.245 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 12.157 12.157 dbcsr_special_finalize 370 9.1 0.002 0.002 11.289 11.289 dbcsr_dot_sd 144 6.3 8.586 8.586 8.587 8.587 dbcsr_frobenius_norm 142 6.1 7.555 7.555 7.557 7.557 matrix_qs_to_ls 12 5.1 0.000 0.000 6.978 6.978 matrix_cluster 12 6.1 0.000 0.000 6.978 6.978 dbcsr_new_transposed 2 7.0 0.133 0.133 6.461 6.461 make_images_data 370 9.1 0.009 0.009 6.444 6.444 dbcsr_redistribute 2 8.0 6.226 6.226 6.293 6.293 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.009 87.955 87.956 qs_energies 1 2.0 0.000 0.000 87.867 87.867 ls_scf 1 3.0 0.000 0.000 87.792 87.793 ls_scf_main 1 4.0 0.000 0.002 84.286 84.286 density_matrix_trs4 11 5.0 0.008 0.012 80.637 80.736 dbcsr_multiply_generic 185 6.1 0.069 0.084 75.668 76.012 multiply_cannon 185 7.1 0.038 0.045 63.080 64.889 multiply_cannon_loop 185 8.1 0.181 0.219 59.479 61.473 multiply_cannon_multrec 1480 9.1 38.290 43.066 38.735 43.531 mp_waitall_1 11936 10.3 19.142 23.259 19.142 23.259 multiply_cannon_metrocomm3 1480 9.1 0.016 0.019 11.862 18.957 multiply_cannon_metrocomm1 1480 9.1 0.009 0.013 4.345 12.440 make_m2s 370 7.1 0.032 0.036 8.350 8.491 make_images 370 8.1 0.697 0.743 8.234 8.375 calculate_norms 2960 9.1 4.283 5.126 4.283 5.126 mp_sum_l 1039 5.9 3.007 4.827 3.007 4.827 arnoldi_extremal 12 6.1 0.000 0.001 3.673 3.682 arnoldi_normal_ev 12 7.1 0.002 0.008 3.673 3.681 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.001 2.101 3.680 make_images_data 370 9.1 0.011 0.013 3.286 3.664 build_subspace 23 8.1 0.036 0.051 3.552 3.555 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.159 3.254 hybrid_alltoall_any 393 9.9 0.286 1.448 2.688 3.232 dbcsr_matrix_vector_mult 652 9.0 0.018 0.079 2.996 3.161 matrix_ls_to_qs 11 6.0 0.000 0.000 2.765 2.867 dbcsr_complete_redistribute 23 7.5 1.761 1.914 2.775 2.866 ls_scf_init_scf 1 4.0 0.000 0.000 2.713 2.714 ls_scf_init_matrix_S 1 5.0 0.000 0.000 2.673 2.686 matrix_decluster 11 7.0 0.000 0.000 2.498 2.598 dbcsr_matrix_vector_mult_local 652 10.0 2.455 2.587 2.460 2.592 make_images_pack 370 9.1 2.227 2.578 2.232 2.582 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 2.446 2.448 buffer_matrices_ensure_size 370 8.1 2.038 2.155 2.038 2.155 dbcsr_add_d 280 6.0 0.001 0.002 1.899 2.079 dbcsr_add_anytype 280 7.0 1.014 1.128 1.897 2.078 dbcsr_finalize 646 7.5 0.013 0.015 1.796 1.980 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=106.33499999999998, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=68.287, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=47.08, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=39.073, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=33.173, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=17.42, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=19.016999999999996, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=38.29, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.761, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.455, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=3.007, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=19.142, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.283, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.005 98.697 98.697 lib_test 1 2.0 0.000 0.000 98.690 98.690 dbcsr_run_tests 3 3.0 0.002 0.002 98.690 98.690 test_multiplies_multiproc 3 4.0 0.001 0.001 79.498 79.498 dbcsr_redistribute 9 5.0 53.542 53.542 56.971 56.971 dbcsr_multiply_generic 9 5.0 0.001 0.001 20.915 20.915 dbcsr_make_random_matrix 9 4.0 13.931 13.931 19.105 19.105 multiply_cannon 9 6.0 0.002 0.002 14.588 14.588 multiply_cannon_loop 9 7.0 0.003 0.003 14.095 14.095 multiply_cannon_multrec 9 8.0 14.091 14.091 14.092 14.092 dbcsr_finalize 27 5.7 0.004 0.004 8.797 8.797 dbcsr_merge_all 18 6.5 3.095 3.095 8.102 8.102 tree_to_linear_d 9 7.0 3.126 3.126 3.126 3.126 mp_alltoall_d11v 27 6.0 3.108 3.108 3.108 3.108 dbcsr_data_release 975 7.6 2.377 2.377 2.377 2.377 make_m2s 18 6.0 0.001 0.001 2.144 2.144 make_images 18 7.0 0.694 0.694 2.078 2.078 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.005 24.117 24.118 lib_test 1 2.0 0.000 0.000 24.087 24.107 dbcsr_run_tests 3 3.0 0.000 0.001 24.085 24.106 test_multiplies_multiproc 3 4.0 0.000 0.001 22.961 23.038 dbcsr_multiply_generic 9 5.0 0.001 0.001 21.115 21.213 multiply_cannon 9 6.0 0.002 0.003 18.972 19.463 multiply_cannon_loop 9 7.0 0.003 0.004 18.569 18.993 multiply_cannon_multrec 72 8.0 15.705 16.619 15.706 16.620 mp_waitall_1 576 9.2 3.242 4.173 3.242 4.173 multiply_cannon_metrocomm1 72 8.0 0.001 0.002 2.489 3.516 multiply_cannon_metrocomm3 72 8.0 0.000 0.001 0.363 1.387 dbcsr_make_random_matrix 9 4.0 0.867 0.955 1.086 1.166 mp_sum_l 310 2.7 0.514 1.134 0.514 1.134 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.510 1.130 dbcsr_finalize 27 5.7 0.000 0.001 0.833 1.051 dbcsr_merge_all 18 6.5 0.132 0.178 0.736 0.929 make_m2s 18 6.0 0.001 0.001 0.853 0.907 make_images 18 7.0 0.026 0.027 0.850 0.904 dbcsr_data_release 444 7.6 0.634 0.686 0.634 0.686 dbcsr_redistribute 9 5.0 0.370 0.417 0.629 0.653 dbcsr_destroy 111 5.9 0.005 0.051 0.544 0.628 dbcsr_data_copy_aa2 18 7.5 0.433 0.588 0.433 0.588 make_images_data 18 8.0 0.001 0.001 0.433 0.522 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=8.522000000000006, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=53.542, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=14.091, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=13.931, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="tree_to_linear_d", label="tree_to_linear_d", y=3.126, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=3.108, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.377, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=2.785, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.37, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=15.705, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.867, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="tree_to_linear_d", label="tree_to_linear_d", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.634, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.514, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=3.242, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.047 0.047 140.012 140.012 qs_mol_dyn_low 1 2.0 0.005 0.005 138.123 138.123 velocity_verlet 5 3.0 0.004 0.004 112.555 112.555 qmmm_el_coupling 6 3.8 0.000 0.000 63.941 63.941 qmmm_elec_with_gaussian 6 4.8 0.197 0.197 63.935 63.935 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 62.297 62.297 qmmm_elec_gaussian_low_G 6 6.8 61.103 61.103 61.103 61.103 qs_forces 6 3.8 0.001 0.001 55.878 55.878 qs_energies 6 4.8 0.000 0.000 49.584 49.584 scf_env_do_scf 6 5.8 0.000 0.000 45.813 45.813 rebuild_ks_matrix 45 8.4 0.000 0.000 38.399 38.399 qs_ks_build_kohn_sham_matrix 45 9.4 0.007 0.007 38.399 38.399 scf_env_do_scf_inner_loop 39 6.8 0.003 0.003 38.337 38.337 qs_ks_update_qs_env 45 7.8 0.000 0.000 32.832 32.832 pw_transfer 966 11.9 0.070 0.070 22.966 22.966 fft_wrap_pw1pw2 801 13.0 0.009 0.009 22.650 22.650 fft_wrap_pw1pw2_150 507 14.3 2.312 2.312 22.150 22.150 qs_vxc_create 45 10.4 0.001 0.001 20.566 20.566 xc_vxc_pw_create 45 11.4 4.188 4.188 20.565 20.565 pw_scatter_s 429 15.4 10.777 10.777 10.777 10.777 qs_rho_update_rho 45 7.9 0.000 0.000 9.787 9.787 calculate_rho_elec 45 8.9 0.865 0.865 9.786 9.786 xc_rho_set_and_dset_create 45 12.4 0.233 0.233 9.597 9.597 qmmm_forces 6 3.8 0.002 0.002 8.790 8.790 pw_integral_ab 2539 7.4 8.759 8.759 8.759 8.759 fist_calc_energy_force 6 3.8 0.002 0.002 8.743 8.743 fft3d_s 802 15.0 8.338 8.338 8.348 8.348 qmmm_forces_with_gaussian 6 4.8 0.145 0.145 8.308 8.308 force_nonbond 6 4.8 7.549 7.549 7.549 7.549 init_scf_loop 6 6.8 0.000 0.000 7.471 7.471 qs_ks_ddapc 45 10.4 0.001 0.001 6.699 6.699 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 6.384 6.384 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.579 5.579 pw_poisson_solve 51 9.9 2.431 2.431 5.466 5.466 qmmm_forces_gaussian_low_G 6 6.8 5.320 5.320 5.320 5.320 density_rs2pw 45 9.9 0.002 0.002 4.541 4.541 grid_collocate_task_list 45 9.9 4.381 4.381 4.381 4.381 sum_up_and_integrate 45 10.4 0.264 0.264 4.249 4.249 cp_ddapc_apply_CD 45 11.4 0.006 0.006 4.172 4.172 integrate_v_rspace 45 11.4 0.013 0.013 3.985 3.985 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.030 0.032 79.377 79.378 qs_mol_dyn_low 1 2.0 0.005 0.005 77.848 77.940 qs_forces 6 3.8 0.001 0.001 56.433 56.433 qs_energies 6 4.8 0.001 0.001 53.757 53.757 scf_env_do_scf 6 5.8 0.000 0.001 52.379 52.379 scf_env_do_scf_inner_loop 113 6.2 0.002 0.008 50.292 50.293 rebuild_ks_matrix 119 8.1 0.000 0.001 36.854 36.874 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.025 36.854 36.874 qs_ks_update_qs_env 119 7.3 0.001 0.001 34.623 34.642 velocity_verlet 5 3.0 0.002 0.002 33.078 33.082 pw_transfer 2446 11.8 0.247 0.283 22.885 23.407 fft_wrap_pw1pw2 2059 12.8 0.031 0.035 22.184 22.845 fft_wrap_pw1pw2_150 1321 14.0 2.004 2.259 21.583 22.004 qs_vxc_create 119 10.1 0.003 0.004 18.381 18.387 xc_vxc_pw_create 119 11.1 0.403 0.552 18.378 18.383 fft3d_ps 2059 14.8 9.993 10.824 16.575 17.664 qs_rho_update_rho 119 7.3 0.001 0.001 14.788 14.789 calculate_rho_elec 119 8.3 0.086 0.095 14.787 14.788 sum_up_and_integrate 119 10.1 0.078 0.085 13.696 13.723 integrate_v_rspace 119 11.1 0.004 0.006 13.617 13.651 qmmm_forces 6 3.8 0.002 0.003 11.903 11.904 qmmm_forces_with_gaussian 6 4.8 0.330 0.411 11.510 11.684 rs_pw_transfer 988 11.5 0.014 0.019 10.011 10.549 xc_rho_set_and_dset_create 119 12.1 0.487 0.583 8.824 9.179 density_rs2pw 119 9.3 0.010 0.011 8.678 9.072 qmmm_el_coupling 6 3.8 0.000 0.000 8.398 8.482 qmmm_elec_with_gaussian 6 4.8 0.285 0.397 8.395 8.479 potential_pw2rs 119 12.1 0.010 0.011 7.693 7.702 grid_collocate_task_list 119 9.3 5.879 6.297 5.879 6.297 mp_alltoall_z22v 2059 16.8 3.945 6.225 3.945 6.225 grid_integrate_task_list 119 12.1 5.517 5.820 5.517 5.820 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 5.651 5.803 qmmm_forces_gaussian_low_G 6 6.8 4.616 4.747 4.616 4.747 mp_waitany 4028 12.8 3.094 4.539 3.094 4.539 pw_restrict_s3 18 5.8 2.021 2.059 4.393 4.458 rs_pw_transfer_PW2RS_150 125 13.9 2.206 2.341 4.348 4.427 yz_to_x 964 15.3 0.969 1.126 2.927 4.418 x_to_yz 1095 16.3 1.625 1.853 3.611 4.132 rs_pw_transfer_RS2PW_150 125 11.2 1.862 2.045 3.632 4.099 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.000 3.565 3.653 pw_prolongate_s3 18 6.8 1.627 1.658 3.565 3.653 pw_integral_ab 2761 7.7 2.994 3.063 3.287 3.515 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.338 3.475 qs_scf_new_mos 113 7.2 0.001 0.001 3.460 3.470 qs_scf_loop_do_ot 113 8.2 0.001 0.001 3.460 3.469 ot_scf_mini 113 9.2 0.002 0.002 3.309 3.318 dbcsr_multiply_generic 2588 12.3 0.095 0.112 3.107 3.174 mp_sum_dm3 33 5.7 2.197 2.693 2.197 2.693 qs_ks_ddapc 119 10.1 0.002 0.003 2.534 2.679 qmmm_elec_gaussian_low_G 6 6.8 2.422 2.538 2.422 2.538 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 2.240 2.241 pw_gather_p 964 14.3 1.833 2.169 1.833 2.169 ot_mini 113 10.2 0.001 0.001 2.099 2.112 init_scf_loop 6 6.8 0.000 0.000 2.083 2.084 mp_waitall_1 188862 16.2 1.744 1.938 1.744 1.938 mp_sum_d 5818 12.2 0.908 1.821 0.908 1.821 pw_scatter_p 1095 15.3 1.688 1.760 1.688 1.760 pw_derive 732 12.5 1.504 1.721 1.504 1.721 qs_ot_get_derivative 113 11.2 0.001 0.001 1.658 1.668 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=33.785, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=61.103, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=10.777, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=8.759, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=8.338, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="force_nonbond", label="force_nonbond", y=7.549, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=5.32, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=4.381, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=44.010999999999996, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=2.422, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=2.994, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="force_nonbond", label="force_nonbond", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=4.616, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=5.879, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=5.517, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.945, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=9.993, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-12-03 20:04:23+00:00