StartDate: 2021-11-10 19:25:57+00:00 CpuId: 64x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm CommitSHA: 5c07750f97d53e738194bcd6f1cd8be811cfa537 CommitTime: 2021-11-10 20:22:00 +0100 CommitAuthor: Ole Schütt CommitSubject: Prettify cp_lbfgs.F Trying to pull image cp2k-toolchain-mpich... success :-) Trying to pull image cp2k-perf-openmp... image not found. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Args: TOOLCHAIN=gcr.io/cp2k-org-project/img_cp2k-toolchain-mpich-arch-b51:gittree-6d79758-buildargs-68b329d Sending build context to Docker daemon 76.8kB Step 1/9 : ARG TOOLCHAIN=cp2k/toolchain:latest Step 2/9 : FROM ${TOOLCHAIN} ---> 7c1b5df9b9d6 Step 3/9 : WORKDIR /workspace ---> Running in 7d3869824e96 Removing intermediate container 7d3869824e96 ---> 735b1d4109e5 Step 4/9 : COPY ./scripts/install_basics.sh . ---> cf2ab879611c Step 5/9 : RUN ./install_basics.sh ---> Running in 5c784c13ae62 Installing Ubuntu packages... debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpopt0:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 15383 files and directories currently installed.) Preparing to unpack .../libpopt0_1.16-14_amd64.deb ... Unpacking libpopt0:amd64 (1.16-14) ... Selecting previously unselected package rsync. Preparing to unpack .../rsync_3.1.3-8_amd64.deb ... Unpacking rsync (3.1.3-8) ... Setting up libpopt0:amd64 (1.16-14) ... Setting up rsync (3.1.3-8) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Processing triggers for libc-bin (2.31-0ubuntu9.2) ... done. Cloning cp2k repository... done. Removing intermediate container 5c784c13ae62 ---> 2ca29823daeb Step 6/9 : COPY ./scripts/install_performance.sh . ---> 2bd7474541ac Step 7/9 : RUN ./install_performance.sh "local" ---> Running in c02bc3a5ae3e './local.pdbg' -> '/opt/cp2k-toolchain/install/arch/local.pdbg' './local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' './local.sdbg' -> '/opt/cp2k-toolchain/install/arch/local.sdbg' './local.ssmp' -> '/opt/cp2k-toolchain/install/arch/local.ssmp' './local_coverage.pdbg' -> '/opt/cp2k-toolchain/install/arch/local_coverage.pdbg' './local_static.psmp' -> '/opt/cp2k-toolchain/install/arch/local_static.psmp' './local_static.ssmp' -> '/opt/cp2k-toolchain/install/arch/local_static.ssmp' './local_warn.psmp' -> '/opt/cp2k-toolchain/install/arch/local_warn.psmp' Warming cache by trying to compile cp2k... done. Removing intermediate container c02bc3a5ae3e ---> 8df9830e0c3c Step 8/9 : COPY ./scripts/ci_entrypoint.sh ./scripts/test_performance.sh ./scripts/plot_performance.py ./ ---> a95b4a673189 Step 9/9 : CMD ["./ci_entrypoint.sh", "./test_performance.sh", "local"] ---> Running in 399d5b60edba Removing intermediate container 399d5b60edba ---> 5e895eb9093d Successfully built 5e895eb9093d Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-b51:gittree-f70e0f5-buildargs-9178cc9 Pushing image cp2k-perf-openmp... done. #################### Running Image cp2k-perf-openmp #################### ========== Fetching Git Commit ========== CommitSHA: 5c07750f97d53e738194bcd6f1cd8be811cfa537 CommitTime: 2021-11-10 20:22:00 +0100 CommitAuthor: Ole Schütt CommitSubject: Prettify cp_lbfgs.F ========== Running Test ========== ========== Compiling CP2K ========== Compiling cp2k... done. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.037 0.037 167.811 167.811 qs_mol_dyn_low 1 2.0 0.004 0.004 166.960 166.960 qs_forces 11 3.9 0.002 0.002 166.902 166.902 qs_energies 11 4.9 0.001 0.001 155.976 155.976 scf_env_do_scf 11 5.9 0.001 0.001 123.129 123.129 velocity_verlet 10 3.0 0.002 0.002 117.222 117.222 scf_env_do_scf_inner_loop 108 6.5 0.010 0.010 83.546 83.546 init_scf_loop 11 6.9 0.000 0.000 39.368 39.368 prepare_preconditioner 11 7.9 0.000 0.000 35.213 35.213 make_preconditioner 11 8.9 0.000 0.000 35.213 35.213 rebuild_ks_matrix 119 8.3 0.001 0.001 34.207 34.207 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 34.206 34.206 make_full_inverse_cholesky 11 9.9 0.000 0.000 33.275 33.275 qs_ks_update_qs_env 119 7.6 0.001 0.001 31.767 31.767 qs_rho_update_rho 119 7.7 0.001 0.001 29.447 29.447 calculate_rho_elec 119 8.7 1.563 1.563 29.446 29.446 qs_scf_new_mos 108 7.5 0.001 0.001 28.595 28.595 qs_scf_loop_do_ot 108 8.5 0.001 0.001 28.594 28.594 ot_scf_mini 108 9.5 0.003 0.003 26.587 26.587 dbcsr_multiply_generic 2286 12.5 0.185 0.185 24.344 24.344 grid_collocate_task_list 119 9.7 23.168 23.168 23.168 23.168 sum_up_and_integrate 119 10.3 0.396 0.396 21.901 21.901 integrate_v_rspace 119 11.3 0.572 0.572 21.505 21.505 cp_fm_cholesky_invert 11 10.9 19.443 19.443 19.443 19.443 grid_integrate_task_list 119 12.3 18.314 18.314 18.314 18.314 init_scf_run 11 5.9 0.001 0.001 16.684 16.684 scf_env_initial_rho_setup 11 6.9 0.001 0.001 16.683 16.683 wfi_extrapolate 11 7.9 0.001 0.001 15.838 15.838 ot_mini 108 10.5 0.001 0.001 15.756 15.756 cp_gemm 81 9.0 0.000 0.000 15.223 15.223 cp_gemm_cosma 81 10.0 15.222 15.222 15.222 15.222 make_m2s 4572 13.5 0.069 0.069 13.534 13.534 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.019 11.019 qs_ot_get_derivative 108 11.5 0.002 0.002 8.305 8.305 pw_transfer 1439 11.6 0.099 0.099 8.115 8.115 fft_wrap_pw1pw2 1201 12.6 0.010 0.010 7.778 7.778 cp_fm_cholesky_decompose 22 10.9 7.593 7.593 7.593 7.593 ot_diis_step 108 11.5 0.006 0.006 7.447 7.447 make_images 4572 14.5 2.652 2.652 7.238 7.238 qs_ot_get_p 119 10.4 0.001 0.001 6.724 6.724 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.686 6.686 fft_wrap_pw1pw2_140 487 13.2 0.674 0.674 6.604 6.604 dbcsr_make_dense_low 5837 15.5 0.108 0.108 6.475 6.475 dbcsr_complete_redistribute 329 12.2 3.002 3.002 6.440 6.440 make_dense_data 5837 16.5 5.683 5.683 6.343 6.343 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 6.290 6.290 apply_single 119 13.6 0.001 0.001 6.290 6.290 dbcsr_copy 2102 12.0 0.301 0.301 6.175 6.175 multiply_cannon 2286 13.5 1.057 1.057 5.966 5.966 qs_env_update_s_mstruct 11 6.9 0.000 0.000 5.902 5.902 dbcsr_copy_into_existing 22 7.9 5.820 5.820 5.821 5.821 dbcsr_make_images_dense 3978 14.8 0.028 0.028 5.789 5.789 copy_dbcsr_to_fm 153 11.3 0.004 0.004 5.347 5.347 qs_create_task_list 11 7.9 0.000 0.000 5.331 5.331 generate_qs_task_list 11 8.9 3.629 3.629 5.331 5.331 qs_ot_p2m_diag 50 11.0 0.215 0.215 5.063 5.063 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 4.968 4.968 calculate_w_matrix_ot 11 6.9 0.008 0.008 4.967 4.967 pw_poisson_solve 119 10.3 1.996 1.996 4.721 4.721 density_rs2pw 119 9.7 0.007 0.007 4.715 4.715 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.515 4.515 cp_dbcsr_syevd 50 12.0 0.005 0.005 4.489 4.489 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 4.421 4.421 cp_fm_diag_elpa 50 13.0 0.000 0.000 4.318 4.318 cp_fm_diag_elpa_base 50 14.0 4.264 4.264 4.317 4.317 multiply_cannon_loop 2286 14.5 0.058 0.058 4.311 4.311 multiply_cannon_multrec 2286 15.5 4.171 4.171 4.252 4.252 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.238 4.238 fft3d_s 1202 14.6 3.530 3.530 3.536 3.536 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.013 75.642 75.643 qs_mol_dyn_low 1 2.0 0.006 0.007 75.507 75.513 qs_forces 11 3.9 0.002 0.002 75.450 75.451 qs_energies 11 4.9 0.001 0.002 70.375 70.376 scf_env_do_scf 11 5.9 0.001 0.001 63.494 63.495 scf_env_do_scf_inner_loop 108 6.5 0.003 0.011 58.900 58.900 velocity_verlet 10 3.0 0.002 0.002 44.682 44.684 rebuild_ks_matrix 119 8.3 0.001 0.001 29.523 29.538 qs_ks_build_kohn_sham_matrix 119 9.3 0.023 0.024 29.523 29.537 qs_ks_update_qs_env 119 7.6 0.001 0.001 26.262 26.274 sum_up_and_integrate 119 10.3 0.052 0.055 22.961 22.984 integrate_v_rspace 119 11.3 0.005 0.005 22.909 22.932 qs_rho_update_rho 119 7.7 0.001 0.001 22.635 22.650 calculate_rho_elec 119 8.7 0.048 0.050 22.634 22.649 dbcsr_multiply_generic 2286 12.5 0.138 0.141 18.132 18.188 grid_integrate_task_list 119 12.3 16.224 16.672 16.224 16.672 grid_collocate_task_list 119 9.7 15.811 16.429 15.811 16.429 qs_scf_new_mos 108 7.5 0.001 0.001 14.729 14.750 qs_scf_loop_do_ot 108 8.5 0.001 0.001 14.728 14.749 ot_scf_mini 108 9.5 0.004 0.004 13.828 13.851 multiply_cannon 2286 13.5 0.238 0.244 12.112 12.352 multiply_cannon_loop 2286 14.5 0.232 0.249 10.990 11.186 mp_waitall_1 169478 16.3 9.356 9.662 9.356 9.662 ot_mini 108 10.5 0.001 0.002 8.146 8.167 rs_pw_transfer 974 11.9 0.018 0.019 7.179 8.140 density_rs2pw 119 9.7 0.009 0.010 6.166 7.130 pw_transfer 1439 11.6 0.145 0.153 6.328 6.402 multiply_cannon_metrocomm3 18288 15.5 0.084 0.089 6.082 6.361 fft_wrap_pw1pw2 1201 12.6 0.015 0.015 6.013 6.096 potential_pw2rs 119 12.3 0.010 0.011 5.592 5.601 fft_wrap_pw1pw2_140 487 13.2 0.604 0.632 5.236 5.412 init_scf_run 11 5.9 0.001 0.002 4.748 4.748 scf_env_initial_rho_setup 11 6.9 0.000 0.001 4.747 4.748 fft3d_ps 1201 14.6 2.481 2.598 4.493 4.595 init_scf_loop 11 6.9 0.001 0.001 4.576 4.576 wfi_extrapolate 11 7.9 0.001 0.001 4.318 4.319 make_m2s 4572 13.5 0.077 0.081 4.147 4.211 ot_diis_step 108 11.5 0.005 0.006 4.164 4.165 apply_preconditioner_dbcsr 119 12.6 0.000 0.001 4.073 4.112 apply_single 119 13.6 0.001 0.001 4.072 4.112 qs_ot_get_derivative 108 11.5 0.001 0.002 3.946 3.965 multiply_cannon_multrec 18288 15.5 3.632 3.822 3.651 3.841 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.499 3.508 make_images 4572 14.5 0.190 0.195 3.436 3.501 mp_waitany 9880 13.7 2.324 3.283 2.324 3.283 rs_pw_transfer_RS2PW_140 130 11.5 0.614 0.648 2.134 3.110 rs_pw_transfer_PW2RS_140 130 13.9 1.338 1.410 2.786 2.825 mp_alltoall_d11v 2130 13.8 1.420 1.983 1.420 1.983 qs_ot_get_p 119 10.4 0.001 0.002 1.876 1.919 make_images_data 4572 15.5 0.063 0.069 1.616 1.736 cp_gemm 81 9.0 0.000 0.000 1.678 1.682 cp_gemm_cosma 81 10.0 1.678 1.682 1.678 1.682 rs_gather_matrices 119 12.3 0.147 0.158 1.034 1.634 hybrid_alltoall_any 4725 16.4 0.129 0.435 1.442 1.554 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 1.439 1.544 prepare_preconditioner 11 7.9 0.000 0.000 1.529 1.542 make_preconditioner 11 8.9 0.000 0.000 1.529 1.542 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=79.89999999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=23.168, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=19.443, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=18.314, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=15.222, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=7.593, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.171, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=26.46, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=15.811, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=16.224, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=1.678, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.632, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=9.356, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=2.481, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.038 0.038 218.236 218.236 qs_mol_dyn_low 1 2.0 0.004 0.004 217.436 217.436 qs_forces 11 3.9 0.002 0.002 217.380 217.380 qs_energies 11 4.9 0.001 0.001 202.963 202.963 scf_env_do_scf 11 5.9 0.001 0.001 166.073 166.073 velocity_verlet 10 3.0 0.002 0.002 147.498 147.498 scf_env_do_scf_inner_loop 96 6.5 0.009 0.009 123.631 123.631 rebuild_ks_matrix 107 8.3 0.001 0.001 62.356 62.356 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 62.355 62.355 qs_ks_update_qs_env 107 7.6 0.001 0.001 56.141 56.141 qs_rho_update_rho 107 7.7 0.001 0.001 55.484 55.484 calculate_rho_elec 107 8.7 1.399 1.399 55.483 55.483 sum_up_and_integrate 107 10.3 0.359 0.359 51.368 51.368 integrate_v_rspace 107 11.3 0.476 0.476 51.009 51.009 grid_collocate_task_list 107 9.7 49.896 49.896 49.896 49.896 grid_integrate_task_list 107 12.3 48.228 48.228 48.228 48.228 init_scf_loop 11 6.9 0.000 0.000 42.224 42.224 prepare_preconditioner 11 7.9 0.000 0.000 34.921 34.921 make_preconditioner 11 8.9 0.000 0.000 34.921 34.921 make_full_inverse_cholesky 11 9.9 0.000 0.000 32.920 32.920 qs_scf_new_mos 96 7.5 0.001 0.001 24.557 24.557 qs_scf_loop_do_ot 96 8.5 0.001 0.001 24.556 24.556 ot_scf_mini 96 9.5 0.003 0.003 22.820 22.820 dbcsr_multiply_generic 1966 12.4 0.162 0.162 20.927 20.927 init_scf_run 11 5.9 0.001 0.001 19.835 19.835 scf_env_initial_rho_setup 11 6.9 0.001 0.001 19.834 19.834 cp_fm_cholesky_invert 11 10.9 19.689 19.689 19.689 19.689 wfi_extrapolate 11 7.9 0.001 0.001 18.713 18.713 cp_gemm 81 9.0 0.000 0.000 15.194 15.194 cp_gemm_cosma 81 10.0 15.194 15.194 15.194 15.194 ot_mini 96 10.5 0.001 0.001 13.494 13.494 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.975 11.975 make_m2s 3932 13.4 0.058 0.058 11.593 11.593 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.866 7.866 qs_ot_get_derivative 96 11.5 0.001 0.001 7.274 7.274 cp_fm_cholesky_decompose 22 10.9 7.209 7.209 7.209 7.209 pw_transfer 1295 11.6 0.088 0.088 7.163 7.163 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.950 6.950 fft_wrap_pw1pw2 1081 12.6 0.009 0.009 6.860 6.860 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 6.548 6.548 qs_create_task_list 11 7.9 0.000 0.000 6.398 6.398 generate_qs_task_list 11 8.9 4.723 4.723 6.398 6.398 dbcsr_complete_redistribute 317 12.2 2.929 2.929 6.286 6.286 make_images 3932 14.4 2.264 2.264 6.254 6.254 ot_diis_step 96 11.5 0.005 0.005 6.216 6.216 dbcsr_copy 1855 11.9 0.270 0.270 6.004 6.004 fft_wrap_pw1pw2_140 439 13.2 0.599 0.599 5.820 5.820 qs_ot_get_p 107 10.4 0.001 0.001 5.799 5.799 dbcsr_copy_into_existing 22 7.9 5.688 5.688 5.689 5.689 dbcsr_make_dense_low 4961 15.5 0.090 0.090 5.492 5.492 make_dense_data 4961 16.5 4.848 4.848 5.382 5.382 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 5.376 5.376 apply_single 107 13.6 0.000 0.000 5.376 5.376 multiply_cannon 1966 13.4 0.865 0.865 5.192 5.192 copy_dbcsr_to_fm 147 11.2 0.004 0.004 5.110 5.110 dbcsr_make_images_dense 3386 14.7 0.024 0.024 4.912 4.912 qs_energies_compute_matrix_w 11 5.9 0.000 0.000 4.902 4.902 calculate_w_matrix_ot 11 6.9 0.008 0.008 4.902 4.902 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.440 4.440 qs_ot_p2m_diag 44 11.0 0.195 0.195 4.423 4.423 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.011 130.718 130.719 qs_mol_dyn_low 1 2.0 0.005 0.006 130.595 130.601 qs_forces 11 3.9 0.002 0.003 130.539 130.539 qs_energies 11 4.9 0.001 0.001 121.564 121.565 scf_env_do_scf 11 5.9 0.001 0.001 111.699 111.701 scf_env_do_scf_inner_loop 96 6.5 0.003 0.009 103.811 103.812 velocity_verlet 10 3.0 0.002 0.002 77.663 77.665 rebuild_ks_matrix 107 8.3 0.001 0.001 59.552 59.588 qs_ks_build_kohn_sham_matrix 107 9.3 0.021 0.025 59.551 59.587 sum_up_and_integrate 107 10.3 0.046 0.051 53.639 53.676 integrate_v_rspace 107 11.3 0.005 0.005 53.592 53.629 qs_ks_update_qs_env 107 7.6 0.001 0.001 52.400 52.434 qs_rho_update_rho 107 7.7 0.001 0.001 49.664 49.676 calculate_rho_elec 107 8.7 0.043 0.045 49.663 49.676 grid_integrate_task_list 107 12.3 46.825 47.526 46.825 47.526 grid_collocate_task_list 107 9.7 42.848 43.586 42.848 43.586 dbcsr_multiply_generic 1966 12.4 0.120 0.123 16.083 16.171 qs_scf_new_mos 96 7.5 0.001 0.001 12.806 12.830 qs_scf_loop_do_ot 96 8.5 0.001 0.001 12.806 12.830 ot_scf_mini 96 9.5 0.003 0.003 12.012 12.041 multiply_cannon 1966 13.4 0.205 0.210 10.820 11.080 multiply_cannon_loop 1966 14.4 0.205 0.213 9.843 10.077 rs_pw_transfer 878 11.9 0.016 0.017 7.332 8.738 mp_waitall_1 146670 16.2 8.345 8.725 8.345 8.725 init_scf_loop 11 6.9 0.000 0.001 7.870 7.871 init_scf_run 11 5.9 0.001 0.002 7.718 7.718 scf_env_initial_rho_setup 11 6.9 0.000 0.001 7.717 7.718 density_rs2pw 107 9.7 0.008 0.009 6.225 7.657 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 7.362 7.367 wfi_extrapolate 11 7.9 0.001 0.001 7.090 7.090 ot_mini 96 10.5 0.001 0.001 7.062 7.089 pw_transfer 1295 11.6 0.131 0.140 5.641 5.708 multiply_cannon_metrocomm3 15728 15.4 0.073 0.075 5.449 5.669 fft_wrap_pw1pw2 1081 12.6 0.013 0.014 5.361 5.436 potential_pw2rs 107 12.3 0.009 0.010 5.120 5.131 fft_wrap_pw1pw2_140 439 13.2 0.539 0.558 4.671 4.866 mp_waitany 8968 13.7 3.003 4.393 3.003 4.393 rs_pw_transfer_RS2PW_140 118 11.5 0.456 0.487 2.673 4.077 fft3d_ps 1081 14.6 2.209 2.353 3.989 4.052 make_m2s 3932 13.4 0.067 0.069 3.652 3.703 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.649 3.682 apply_single 107 13.6 0.001 0.001 3.649 3.682 ot_diis_step 96 11.5 0.005 0.005 3.661 3.661 qs_ot_get_derivative 96 11.5 0.001 0.001 3.373 3.399 multiply_cannon_multrec 15728 15.4 3.282 3.382 3.298 3.398 mp_alltoall_d11v 1998 13.7 1.947 3.266 1.947 3.266 make_images 3932 14.4 0.167 0.172 3.031 3.086 rs_gather_matrices 107 12.3 0.130 0.147 1.589 2.871 rs_pw_transfer_PW2RS_140 118 13.9 1.263 1.307 2.654 2.701 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=78.02000000000001, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=49.896, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=48.228, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=19.689, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=15.194, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=7.209, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=26.414999999999992, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=42.848, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=46.825, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.345, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.282, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=3.003, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.417 0.417 276.515 276.515 qs_energies 1 2.0 0.000 0.000 275.138 275.138 scf_env_do_scf 1 3.0 0.000 0.000 272.657 272.657 qs_ks_update_qs_env 8 5.0 0.000 0.000 255.097 255.097 rebuild_ks_matrix 7 6.0 0.000 0.000 254.994 254.994 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.002 254.994 254.994 hfx_ks_matrix 7 8.0 0.000 0.000 170.527 170.527 integrate_four_center 7 9.0 2.301 2.301 170.497 170.497 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 159.958 159.958 integrate_four_center_main 7 10.0 1.108 1.108 158.854 158.854 integrate_four_center_bin 457 11.0 157.746 157.746 157.746 157.746 init_scf_loop 1 4.0 0.000 0.000 112.685 112.685 cp_gemm 129 10.3 0.001 0.001 69.530 69.530 cp_gemm_cosma 129 11.3 69.529 69.529 69.529 69.529 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 39.984 39.984 admm_fit_mo_coeffs 7 9.0 0.000 0.000 38.268 38.268 admm_mo_merge_derivs 7 8.0 0.000 0.000 35.580 35.580 merge_mo_derivs_diag 7 9.0 0.022 0.022 35.580 35.580 purify_mo_diag 7 10.0 0.001 0.001 22.743 22.743 fit_mo_coeffs 7 10.0 0.000 0.000 15.525 15.525 prepare_preconditioner 1 5.0 0.000 0.000 13.561 13.561 make_preconditioner 1 6.0 0.000 0.000 13.561 13.561 integrate_four_center_load 7 10.0 0.001 0.001 8.964 8.964 hfx_load_balance 1 11.0 0.002 0.002 8.962 8.962 arnoldi_normal_ev 11 9.3 0.002 0.002 8.120 8.120 estimate_cond_num 1 7.0 0.000 0.000 8.041 8.041 build_subspace 28 9.5 0.015 0.015 7.988 7.988 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.220 0.229 185.432 185.433 qs_energies 1 2.0 0.000 0.001 185.065 185.065 scf_env_do_scf 1 3.0 0.000 0.000 184.496 184.496 qs_ks_update_qs_env 8 5.0 0.000 0.000 181.514 181.515 rebuild_ks_matrix 7 6.0 0.000 0.000 181.501 181.501 qs_ks_build_kohn_sham_matrix 7 7.0 0.002 0.003 181.501 181.501 hfx_ks_matrix 7 8.0 0.000 0.000 169.254 169.255 integrate_four_center 7 9.0 0.099 0.418 169.238 169.238 integrate_four_center_main 7 10.0 0.005 0.005 155.495 158.840 integrate_four_center_bin 448 11.0 155.490 158.836 155.490 158.836 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 107.768 107.768 init_scf_loop 1 4.0 0.000 0.000 76.727 76.727 integrate_four_center_load 7 10.0 0.000 0.000 9.055 9.060 hfx_load_balance 1 11.0 0.001 0.001 9.055 9.060 mp_sync 70 11.3 3.828 5.916 3.828 5.916 cp_gemm 129 10.3 0.000 0.001 4.964 4.969 cp_gemm_cosma 129 11.3 4.963 4.969 4.963 4.969 hfx_load_balance_bin 1 12.0 4.411 4.521 4.411 4.521 hfx_load_balance_count 1 12.0 4.406 4.519 4.406 4.519 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=45.41399999999999, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=157.746, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=69.529, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=2.301, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=1.108, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="CP2K", label="CP2K", y=0.417, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=12.009999999999991, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=155.49, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=4.963, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.099, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.005, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="CP2K", label="CP2K", y=0.22, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=4.411, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=3.828, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=4.406, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.016 0.016 452.329 452.329 qs_energies 1 2.0 0.000 0.000 451.817 451.817 mp2_main 1 3.0 0.000 0.000 445.306 445.306 mp2_gpw_main 1 4.0 0.000 0.000 444.928 444.928 rpa_ri_compute_en 1 5.0 0.000 0.000 429.271 429.271 rpa_num_int 1 6.0 0.000 0.000 429.246 429.246 compute_mat_P_omega 1 7.0 0.002 0.002 210.264 210.264 compute_mat_P_omega_contract 10 8.0 12.934 12.934 208.709 208.709 dbcsr_t_total 2336 9.6 0.017 0.017 198.617 198.617 cp_gemm 105 8.4 0.000 0.000 188.364 188.364 cp_gemm_cosma 105 9.4 188.364 188.364 188.364 188.364 GW_matrix_operations 10 7.0 0.007 0.007 131.758 131.758 dbcsr_t_contract 787 11.0 47.944 47.944 124.008 124.008 compute_mat_P_omega_calc_M_occ 250 9.0 12.931 12.931 78.901 78.901 dbcsr_t_copy 1103 10.7 20.616 20.616 73.069 73.069 dbcsr_tas_total 1149 12.2 0.054 0.054 69.835 69.835 dbcsr_tas_multiply 807 12.1 0.004 0.004 68.321 68.321 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 64.139 64.139 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 61.878 61.878 dbcsr_multiply_generic 837 15.8 0.139 0.139 54.052 54.052 dbcsr_tas_dbcsr 807 14.1 0.003 0.003 53.490 53.490 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 51.358 51.358 dbcsr_tas_mm_1N 524 15.1 0.002 0.002 41.012 41.012 multiply_cannon 837 16.8 13.454 13.454 39.312 39.312 dbcsr_tas_reserve_blocks_index 3261 13.7 7.198 7.198 27.631 27.631 dbcsr_tas_copy 574 11.4 17.131 17.131 24.848 24.848 multiply_cannon_loop 837 17.8 0.135 0.135 23.030 23.030 multiply_cannon_multrec 837 18.8 21.133 21.133 21.885 21.885 dbcsr_t_reserve_blocks_index 2280 12.5 1.283 1.283 21.229 21.229 dbcsr_t_reserve_blocks_index_a 2222 11.6 0.012 0.012 20.935 20.935 compute_QP_energies 1 7.0 0.000 0.000 20.878 20.878 compute_self_energy_cubic_gw 1 8.0 0.109 0.109 20.878 20.878 dbcsr_reserve_blocks 3717 14.7 19.722 19.722 20.104 20.104 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 20.054 20.054 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 15.629 15.629 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 14.657 14.657 dbcsr_t_copy_nocomm 251 12.0 11.414 11.414 13.865 13.865 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 12.612 12.612 make_m2s 1674 16.8 0.107 0.107 11.983 11.983 make_images 1674 17.8 5.622 5.622 11.419 11.419 dbcsr_tas_mm_2 251 15.0 0.001 0.001 10.706 10.706 cp_fm_cholesky_invert 10 8.0 9.079 9.079 9.079 9.079 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.012 60.696 60.698 qs_energies 1 2.0 0.000 0.001 60.538 60.544 mp2_main 1 3.0 0.000 0.001 59.055 59.061 mp2_gpw_main 1 4.0 0.000 0.001 58.998 59.004 rpa_ri_compute_en 1 5.0 0.000 0.000 56.878 56.884 rpa_num_int 1 6.0 0.001 0.001 56.870 56.876 dbcsr_t_total 2336 9.6 0.016 0.018 44.041 44.042 compute_mat_P_omega 1 7.0 0.001 0.007 43.103 43.115 compute_mat_P_omega_contract 10 8.0 0.801 0.831 42.800 42.808 dbcsr_t_contract 787 11.0 1.946 2.128 32.327 32.331 dbcsr_tas_total 1149 12.2 0.065 0.071 28.405 28.406 dbcsr_tas_multiply 807 12.1 0.003 0.003 28.265 28.267 dbcsr_tas_dbcsr 807 14.1 0.003 0.004 20.562 20.563 dbcsr_multiply_generic 837 15.8 0.074 0.079 17.152 18.261 compute_mat_P_omega_calc_M_occ 250 9.0 0.783 0.814 14.476 14.477 multiply_cannon 837 16.8 0.139 0.160 10.051 10.525 dbcsr_t_copy 1111 10.7 4.425 4.670 10.002 10.422 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 10.392 10.392 dbcsr_tas_mm_1N 524 15.1 0.003 0.003 9.100 9.981 multiply_cannon_loop 837 17.8 0.044 0.046 9.145 9.563 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 9.115 9.115 cp_gemm 105 8.4 0.000 0.000 9.070 9.083 cp_gemm_cosma 105 9.4 9.070 9.083 9.070 9.083 dbcsr_tas_mm_2 251 15.0 0.002 0.002 7.828 7.829 mp_sync 8696 11.6 6.709 7.827 6.709 7.827 multiply_cannon_multrec 1386 17.8 7.085 7.426 7.349 7.669 make_m2s 1674 16.8 0.046 0.050 6.114 6.758 make_images 1674 17.8 0.249 0.259 6.028 6.671 GW_matrix_operations 10 7.0 0.001 0.002 5.836 5.844 compute_QP_energies 1 7.0 0.000 0.000 4.364 4.364 compute_self_energy_cubic_gw 1 8.0 0.005 0.005 4.362 4.364 dbcsr_t_communicate_buffer 1098 11.7 0.096 0.105 3.717 3.892 mp_waitall_2 3776 14.7 3.491 3.724 3.491 3.724 make_images_data 1674 18.8 0.038 0.041 3.203 3.347 contract_cubic_gw 21 9.0 0.000 0.000 3.265 3.265 hybrid_alltoall_any 1724 19.5 2.459 2.787 3.079 3.224 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 3.185 3.192 dbcsr_t_reserve_blocks_index_a 2791 11.4 0.020 0.021 2.840 3.113 dbcsr_t_reserve_blocks_index 2849 12.4 0.109 0.116 2.834 3.111 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 3.065 3.072 dbcsr_tas_reserve_blocks_index 3300 13.8 0.271 0.293 2.786 3.051 make_images_pack 1674 18.8 2.362 2.986 2.377 3.000 dbcsr_reserve_blocks 3785 14.7 2.504 2.756 2.546 2.798 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 2.118 2.118 convert_to_new_pgrid 2421 14.1 0.018 0.020 1.979 2.109 dbcsr_copy 3323 15.8 1.911 2.044 1.941 2.073 mp_waitall_1 26582 19.0 1.662 2.007 1.662 2.007 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 1.812 1.817 dbcsr_add_anytype 909 13.7 1.064 1.127 1.650 1.720 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 1.596 1.601 dbcsr_tas_replicate 396 14.1 0.830 0.913 1.408 1.465 scf_env_do_scf 1 3.0 0.000 0.000 1.421 1.422 scf_env_do_scf_inner_loop 17 4.0 0.001 0.002 1.421 1.421 mp_max_i 2057 9.6 1.043 1.331 1.043 1.331 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=154.55000000000007, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="cp_gemm_cosma", label="cp_gemm_cosma", y=188.364, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_contract", label="dbcsr_t_contract", y=47.944, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=21.133, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_t_copy", label="dbcsr_t_copy", y=20.616, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=19.722, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=25.465999999999994, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="cp_gemm_cosma", label="cp_gemm_cosma", y=9.07, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_contract", label="dbcsr_t_contract", y=1.946, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=7.085, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_t_copy", label="dbcsr_t_copy", y=4.425, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbcsr_reserve_blocks", label="dbcsr_reserve_blocks", y=2.504, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=3.491, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=6.709, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.123 0.123 206.787 206.787 qs_energies 1 2.0 0.000 0.000 204.876 204.876 scf_env_do_scf 1 3.0 0.000 0.000 194.031 194.031 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 194.031 194.031 qs_scf_new_mos 15 5.0 0.000 0.000 86.732 86.732 qs_ks_update_qs_env 15 5.0 0.000 0.000 75.508 75.508 rebuild_ks_matrix 15 6.0 0.000 0.000 75.125 75.125 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 75.125 75.125 eigensolver 15 6.0 0.002 0.002 72.244 72.244 cp_fm_diag_elpa 15 7.0 0.000 0.000 54.999 54.999 cp_fm_diag_elpa_base 15 8.0 50.134 50.134 54.998 54.998 qs_vxc_create 15 8.0 0.041 0.041 50.588 50.588 calculate_dispersion_nonloc 15 9.0 9.914 9.914 44.041 44.041 pw_transfer 1191 9.8 0.105 0.105 30.840 30.840 fft_wrap_pw1pw2 1086 10.9 0.014 0.014 30.493 30.493 qs_rho_update_rho 16 5.0 0.000 0.000 25.195 25.195 calculate_rho_elec 16 6.0 0.349 0.349 25.194 25.194 grid_collocate_task_list 16 7.0 23.448 23.448 23.448 23.448 fft_wrap_pw1pw2_150 765 12.0 3.589 3.589 23.150 23.150 sum_up_and_integrate 15 8.0 0.084 0.084 22.788 22.788 integrate_v_rspace 15 9.0 0.038 0.038 22.704 22.704 grid_integrate_task_list 15 10.0 21.987 21.987 21.987 21.987 fft3d_s 1087 12.8 13.134 13.134 13.146 13.146 cp_fm_cholesky_restore 45 7.0 12.656 12.656 12.656 12.656 pw_scatter_s 585 13.0 11.699 11.699 11.699 11.699 copy_dbcsr_to_fm 16 5.9 0.001 0.001 11.421 11.421 dbcsr_complete_redistribute 46 8.3 3.646 3.646 10.060 10.060 cp_fm_upper_to_full 30 8.0 9.451 9.451 9.451 9.451 vdW_energy 15 10.0 8.456 8.456 8.456 8.456 gspace_mixing 14 5.0 0.277 0.277 7.905 7.905 broyden_mixing 14 6.0 7.109 7.109 7.109 7.109 fft_wrap_pw1pw2_200 197 11.5 0.413 0.413 7.067 7.067 xc_vxc_pw_create 15 9.0 1.608 1.608 6.507 6.507 init_scf_run 1 3.0 0.000 0.000 5.071 5.071 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 4.750 4.750 dbcsr_finalize 159 9.9 0.024 0.024 4.323 4.323 dbcsr_merge_all 91 11.1 0.090 0.090 4.161 4.161 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.017 0.019 93.957 93.958 qs_energies 1 2.0 0.001 0.001 93.558 93.558 scf_env_do_scf 1 3.0 0.000 0.000 88.295 88.296 scf_env_do_scf_inner_loop 15 4.0 0.002 0.003 88.295 88.296 qs_ks_update_qs_env 15 5.0 0.000 0.000 43.676 43.721 rebuild_ks_matrix 15 6.0 0.000 0.000 43.621 43.665 qs_ks_build_kohn_sham_matrix 15 7.0 0.005 0.005 43.620 43.665 sum_up_and_integrate 15 8.0 0.016 0.018 23.891 23.955 integrate_v_rspace 15 9.0 0.001 0.001 23.875 23.938 qs_rho_update_rho 16 5.0 0.000 0.000 23.727 23.730 calculate_rho_elec 16 6.0 0.012 0.012 23.726 23.730 grid_integrate_task_list 15 10.0 21.872 22.761 21.872 22.761 grid_collocate_task_list 16 7.0 21.579 22.411 21.579 22.411 qs_scf_new_mos 15 5.0 0.001 0.001 21.116 21.527 eigensolver 15 6.0 0.002 0.003 19.267 19.280 qs_vxc_create 15 8.0 0.001 0.002 19.082 19.102 calculate_dispersion_nonloc 15 9.0 1.490 1.562 15.502 15.517 pw_transfer 1191 9.8 0.146 0.153 15.351 15.482 fft_wrap_pw1pw2 1086 10.9 0.023 0.025 15.021 15.155 cp_fm_diag_elpa 15 7.0 0.000 0.000 13.930 13.938 cp_fm_diag_elpa_base 15 8.0 13.626 13.680 13.923 13.925 fft3d_ps 1086 12.9 6.633 6.874 11.495 11.793 fft_wrap_pw1pw2_150 765 12.0 0.802 0.862 10.140 10.177 cp_fm_cholesky_restore 45 7.0 5.056 5.122 5.056 5.122 fft_wrap_pw1pw2_200 197 11.5 0.445 0.465 4.691 4.790 xc_vxc_pw_create 15 9.0 0.069 0.096 3.579 3.593 mp_alltoall_z22v 1086 14.9 2.990 3.404 2.990 3.404 qs_energies_init_hamiltonians 1 3.0 0.001 0.001 3.266 3.266 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.806 3.075 x_to_yz 585 14.0 1.165 1.212 2.729 2.870 rs_pw_transfer 158 9.4 0.003 0.003 1.919 2.424 vdW_energy 15 10.0 2.299 2.404 2.299 2.404 yz_to_x 501 13.7 0.667 0.814 2.093 2.335 density_rs2pw 16 7.0 0.002 0.002 1.936 2.202 build_core_ppnl 1 5.0 1.882 2.062 1.882 2.062 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=85.42800000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=50.134, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=23.448, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=21.987, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=13.134, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=12.656, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=25.191000000000003, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=13.626, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.579, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=21.872, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=5.056, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=6.633, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.091 0.091 342.603 342.603 qs_energies 1 2.0 0.000 0.000 342.434 342.434 ls_scf 1 3.0 0.000 0.000 340.520 340.520 ls_scf_main 1 4.0 0.002 0.002 323.581 323.581 density_matrix_trs4 11 5.0 0.012 0.012 186.323 186.323 ls_scf_dm_to_ks 11 5.0 0.000 0.000 130.326 130.326 matrix_ls_to_qs 11 6.0 0.000 0.000 125.948 125.948 dbcsr_multiply_generic 185 6.1 0.501 0.501 119.017 119.017 multiply_cannon 185 7.1 3.343 3.343 79.625 79.625 dbcsr_copy_into_existing 11 7.0 74.284 74.284 74.284 74.284 multiply_cannon_loop 185 8.1 0.419 0.419 57.042 57.042 dbcsr_complete_redistribute 23 7.5 40.207 40.207 56.514 56.514 multiply_cannon_multrec 185 9.1 54.560 54.560 54.649 54.649 matrix_decluster 11 7.0 0.000 0.000 51.663 51.663 arnoldi_extremal 12 6.1 0.000 0.000 48.080 48.080 arnoldi_normal_ev 12 7.1 0.030 0.030 48.080 48.080 build_subspace 23 8.1 0.139 0.139 47.410 47.410 dbcsr_matrix_vector_mult 652 9.0 0.270 0.270 37.040 37.040 dbcsr_matrix_vector_mult_local 652 10.0 35.443 35.443 35.452 35.452 make_m2s 370 7.1 0.032 0.032 32.520 32.520 make_images 370 8.1 7.879 7.879 29.747 29.747 dbcsr_finalize 646 7.5 0.217 0.217 22.175 22.175 dbcsr_merge_all 597 8.5 3.662 3.662 19.992 19.992 setup_rec_index_2d 370 8.1 18.951 18.951 18.951 18.951 dbcsr_sort_indices 1103 9.9 17.674 17.674 17.674 17.674 ls_scf_init_scf 1 4.0 0.000 0.000 15.739 15.739 ls_scf_init_matrix_S 1 5.0 0.000 0.000 15.289 15.289 quick_finalize 395 10.0 0.573 0.573 15.074 15.074 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 14.313 14.313 dbcsr_special_finalize 370 9.1 0.004 0.004 13.897 13.897 tree_to_linear_d 110 9.4 13.853 13.853 13.853 13.853 dbcsr_dot_sd 144 6.3 9.462 9.462 9.463 9.463 dbcsr_frobenius_norm 142 6.1 8.270 8.270 8.273 8.273 dbcsr_new_transposed 2 7.0 0.154 0.154 7.770 7.770 make_images_data 370 9.1 0.012 0.012 7.769 7.769 dbcsr_redistribute 2 8.0 7.501 7.501 7.578 7.578 matrix_qs_to_ls 12 5.1 0.000 0.000 7.208 7.208 matrix_cluster 12 6.1 0.000 0.000 7.208 7.208 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.011 112.097 112.098 qs_energies 1 2.0 0.000 0.000 111.992 111.992 ls_scf 1 3.0 0.000 0.000 111.911 111.912 ls_scf_main 1 4.0 0.001 0.004 107.419 107.420 density_matrix_trs4 11 5.0 0.010 0.015 103.221 103.303 dbcsr_multiply_generic 185 6.1 0.085 0.091 97.137 97.534 multiply_cannon 185 7.1 0.052 0.055 81.545 82.933 multiply_cannon_loop 185 8.1 0.285 0.292 77.201 79.204 multiply_cannon_multrec 1480 9.1 50.165 53.538 50.760 54.118 mp_waitall_1 11936 10.3 23.686 26.486 23.686 26.486 multiply_cannon_metrocomm3 1480 9.1 0.024 0.025 14.113 18.803 make_m2s 370 7.1 0.039 0.045 10.936 11.016 make_images 370 8.1 0.730 0.763 10.802 10.883 multiply_cannon_metrocomm1 1480 9.1 0.013 0.016 5.465 7.954 calculate_norms 2960 9.1 6.480 6.761 6.480 6.761 make_images_data 370 9.1 0.015 0.016 4.600 4.970 arnoldi_extremal 12 6.1 0.001 0.001 4.388 4.403 arnoldi_normal_ev 12 7.1 0.002 0.008 4.387 4.402 build_subspace 23 8.1 0.049 0.063 4.250 4.254 hybrid_alltoall_any 393 9.9 0.398 2.055 3.855 4.222 mp_sum_l 1039 5.9 3.068 4.010 3.068 4.010 ls_scf_dm_to_ks 11 5.0 0.000 0.000 3.618 3.685 dbcsr_matrix_vector_mult 652 9.0 0.022 0.099 3.557 3.659 ls_scf_init_scf 1 4.0 0.000 0.000 3.453 3.454 ls_scf_init_matrix_S 1 5.0 0.000 0.000 3.413 3.422 dbcsr_complete_redistribute 23 7.5 1.998 2.136 3.289 3.387 matrix_ls_to_qs 11 6.0 0.000 0.000 3.223 3.317 make_images_pack 370 9.1 2.964 3.298 2.971 3.306 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 3.127 3.131 matrix_decluster 11 7.0 0.000 0.000 2.961 3.057 dbcsr_matrix_vector_mult_local 652 10.0 2.780 2.928 2.786 2.934 dbcsr_multiply_generic_mpsum_f 137 7.1 0.001 0.001 2.066 2.847 buffer_matrices_ensure_size 370 8.1 2.626 2.778 2.626 2.778 dbcsr_add_d 280 6.0 0.002 0.002 2.569 2.679 dbcsr_add_anytype 280 7.0 1.418 1.514 2.567 2.678 dbcsr_finalize 646 7.5 0.016 0.017 2.334 2.474 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=119.15800000000004, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=74.284, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=54.56, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=40.207, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=35.443, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=18.951, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="make_images_pack", label="make_images_pack", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=20.956000000000003, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=50.165, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.998, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=2.78, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="make_images_pack", label="make_images_pack", y=2.964, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=6.48, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=3.068, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=23.686, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.007 111.152 111.152 lib_test 1 2.0 0.000 0.000 111.144 111.144 dbcsr_run_tests 3 3.0 0.003 0.003 111.144 111.144 test_multiplies_multiproc 3 4.0 0.001 0.001 91.202 91.202 dbcsr_redistribute 9 5.0 59.909 59.909 63.621 63.621 dbcsr_multiply_generic 9 5.0 0.001 0.001 25.334 25.334 dbcsr_make_random_matrix 9 4.0 14.433 14.433 19.842 19.842 multiply_cannon 9 6.0 0.002 0.002 18.465 18.465 multiply_cannon_loop 9 7.0 0.004 0.004 17.936 17.936 multiply_cannon_multrec 9 8.0 17.932 17.932 17.932 17.932 dbcsr_finalize 27 5.7 0.005 0.005 9.252 9.252 dbcsr_merge_all 18 6.5 3.223 3.223 8.451 8.451 mp_alltoall_d11v 27 6.0 3.342 3.342 3.342 3.342 tree_to_linear_d 9 7.0 3.258 3.258 3.258 3.258 dbcsr_data_release 975 7.6 2.677 2.677 2.677 2.677 make_m2s 18 6.0 0.001 0.001 2.396 2.396 make_images 18 7.0 0.734 0.734 2.315 2.315 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.006 29.208 29.209 lib_test 1 2.0 0.000 0.000 29.176 29.198 dbcsr_run_tests 3 3.0 0.001 0.001 29.174 29.197 test_multiplies_multiproc 3 4.0 0.001 0.002 27.963 28.025 dbcsr_multiply_generic 9 5.0 0.002 0.002 25.780 25.894 multiply_cannon 9 6.0 0.003 0.003 23.228 23.789 multiply_cannon_loop 9 7.0 0.004 0.004 22.729 23.256 multiply_cannon_multrec 72 8.0 19.189 19.734 19.191 19.736 mp_waitall_1 576 9.2 4.001 4.805 4.001 4.805 multiply_cannon_metrocomm1 72 8.0 0.002 0.002 3.191 4.148 dbcsr_make_random_matrix 9 4.0 0.907 0.958 1.167 1.215 make_m2s 18 6.0 0.001 0.001 1.069 1.155 make_images 18 7.0 0.027 0.029 1.066 1.152 dbcsr_finalize 27 5.7 0.001 0.001 1.030 1.124 mp_sum_l 310 2.7 0.532 1.119 0.532 1.119 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.528 1.115 dbcsr_merge_all 18 6.5 0.163 0.185 0.895 1.010 dbcsr_data_release 444 7.6 0.744 0.832 0.744 0.832 dbcsr_redistribute 9 5.0 0.430 0.495 0.757 0.811 multiply_cannon_metrocomm3 72 8.0 0.000 0.001 0.337 0.728 dbcsr_destroy 111 5.9 0.007 0.061 0.619 0.704 make_images_data 18 8.0 0.001 0.001 0.531 0.671 dbcsr_data_copy_aa2 18 7.5 0.525 0.601 0.525 0.601 hybrid_alltoall_any 18 9.0 0.051 0.213 0.451 0.600 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=9.600999999999999, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=59.909, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=17.932, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=14.433, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=3.342, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="tree_to_linear_d", label="tree_to_linear_d", y=3.258, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.677, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=3.4049999999999976, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.43, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=19.189, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.907, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_alltoall_d11v", label="mp_alltoall_d11v", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="tree_to_linear_d", label="tree_to_linear_d", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.744, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=4.001, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.532, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.052 0.052 149.879 149.879 qs_mol_dyn_low 1 2.0 0.005 0.005 147.785 147.785 velocity_verlet 5 3.0 0.004 0.004 118.944 118.944 qmmm_el_coupling 6 3.8 0.000 0.000 71.234 71.234 qmmm_elec_with_gaussian 6 4.8 0.196 0.196 71.227 71.227 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 69.519 69.519 qmmm_elec_gaussian_low_G 6 6.8 68.023 68.023 68.023 68.023 qs_forces 6 3.8 0.001 0.001 59.720 59.720 qs_energies 6 4.8 0.001 0.001 53.188 53.188 scf_env_do_scf 6 5.8 0.001 0.001 49.141 49.141 scf_env_do_scf_inner_loop 39 6.8 0.003 0.003 41.316 41.316 rebuild_ks_matrix 45 8.4 0.000 0.000 40.671 40.671 qs_ks_build_kohn_sham_matrix 45 9.4 0.008 0.008 40.671 40.671 qs_ks_update_qs_env 45 7.8 0.000 0.000 34.873 34.873 pw_transfer 966 11.9 0.076 0.076 25.021 25.021 fft_wrap_pw1pw2 801 13.0 0.009 0.009 24.644 24.644 fft_wrap_pw1pw2_150 507 14.3 2.539 2.539 24.079 24.079 qs_vxc_create 45 10.4 0.001 0.001 22.272 22.272 xc_vxc_pw_create 45 11.4 4.388 4.388 22.271 22.271 qs_rho_update_rho 45 7.9 0.000 0.000 11.010 11.010 calculate_rho_elec 45 8.9 0.902 0.902 11.009 11.009 pw_scatter_s 429 15.4 10.796 10.796 10.796 10.796 xc_rho_set_and_dset_create 45 12.4 0.249 0.249 10.232 10.232 fft3d_s 802 15.0 9.745 9.745 9.755 9.755 qmmm_forces 6 3.8 0.001 0.001 8.957 8.957 pw_integral_ab 2539 7.4 8.865 8.865 8.865 8.865 qmmm_forces_with_gaussian 6 4.8 0.141 0.141 8.458 8.458 init_scf_loop 6 6.8 0.000 0.000 7.819 7.819 fist_calc_energy_force 6 3.8 0.002 0.002 7.074 7.074 qs_ks_ddapc 45 10.4 0.001 0.001 6.840 6.840 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 6.462 6.462 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.811 5.811 force_nonbond 6 4.8 5.771 5.771 5.771 5.771 pw_poisson_solve 51 9.9 2.399 2.399 5.472 5.472 qmmm_forces_gaussian_low_G 6 6.8 5.360 5.360 5.360 5.360 grid_collocate_task_list 45 9.9 5.124 5.124 5.124 5.124 density_rs2pw 45 9.9 0.003 0.003 4.983 4.983 sum_up_and_integrate 45 10.4 0.241 0.241 4.594 4.594 integrate_v_rspace 45 11.4 0.012 0.012 4.353 4.353 cp_ddapc_apply_CD 45 11.4 0.006 0.006 4.186 4.186 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.036 0.039 96.215 96.217 qs_mol_dyn_low 1 2.0 0.006 0.006 94.510 94.607 qs_forces 6 3.8 0.001 0.001 70.700 70.700 qs_energies 6 4.8 0.001 0.001 67.446 67.447 scf_env_do_scf 6 5.8 0.000 0.001 65.788 65.788 scf_env_do_scf_inner_loop 113 6.2 0.003 0.010 63.195 63.196 rebuild_ks_matrix 119 8.1 0.000 0.001 46.959 46.977 qs_ks_build_kohn_sham_matrix 119 9.1 0.022 0.024 46.959 46.977 qs_ks_update_qs_env 119 7.3 0.001 0.002 44.168 44.185 velocity_verlet 5 3.0 0.003 0.005 39.055 39.060 pw_transfer 2446 11.8 0.299 0.314 30.748 30.914 fft_wrap_pw1pw2 2059 12.8 0.037 0.039 29.838 30.031 fft_wrap_pw1pw2_150 1321 14.0 2.641 2.792 28.937 29.105 qs_vxc_create 119 10.1 0.004 0.004 24.185 24.193 xc_vxc_pw_create 119 11.1 0.518 0.686 24.181 24.189 fft3d_ps 2059 14.8 13.449 14.595 22.523 22.793 qs_rho_update_rho 119 7.3 0.001 0.001 18.347 18.349 calculate_rho_elec 119 8.3 0.087 0.096 18.346 18.348 sum_up_and_integrate 119 10.1 0.099 0.108 16.542 16.623 integrate_v_rspace 119 11.1 0.005 0.006 16.443 16.516 rs_pw_transfer 988 11.5 0.017 0.020 12.944 13.696 qmmm_forces 6 3.8 0.003 0.003 13.144 13.145 qmmm_forces_with_gaussian 6 4.8 0.455 0.558 12.785 12.955 density_rs2pw 119 9.3 0.012 0.014 11.427 12.094 xc_rho_set_and_dset_create 119 12.1 0.541 0.661 11.217 11.615 potential_pw2rs 119 12.1 0.012 0.013 10.151 10.163 qmmm_el_coupling 6 3.8 0.000 0.000 9.494 9.560 qmmm_elec_with_gaussian 6 4.8 0.410 0.574 9.490 9.556 mp_alltoall_z22v 2059 16.8 5.534 7.216 5.534 7.216 grid_collocate_task_list 119 9.3 6.575 7.065 6.575 7.065 grid_integrate_task_list 119 12.1 5.857 6.199 5.857 6.199 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 5.844 5.986 rs_pw_transfer_PW2RS_150 125 13.9 2.866 2.937 5.589 5.627 rs_pw_transfer_RS2PW_150 125 11.2 2.276 2.474 4.656 5.409 mp_waitany 4028 12.8 4.034 5.335 4.034 5.335 yz_to_x 964 15.3 1.372 1.557 4.084 5.217 pw_restrict_s3 18 5.8 2.331 2.374 5.118 5.191 x_to_yz 1095 16.3 2.113 2.306 4.935 5.182 qmmm_forces_gaussian_low_G 6 6.8 4.766 4.895 4.766 4.895 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.001 4.130 4.196 pw_prolongate_s3 18 6.8 1.868 1.918 4.130 4.196 pw_integral_ab 2761 7.7 3.409 3.467 3.817 4.019 qs_scf_new_mos 113 7.2 0.001 0.001 3.898 3.908 qs_scf_loop_do_ot 113 8.2 0.001 0.001 3.897 3.907 ot_scf_mini 113 9.2 0.002 0.002 3.722 3.733 dbcsr_multiply_generic 2588 12.3 0.100 0.120 3.586 3.669 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.413 3.497 qs_ks_ddapc 119 10.1 0.003 0.004 3.244 3.385 mp_sum_dm3 33 5.7 2.695 2.814 2.695 2.814 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 2.805 2.806 init_scf_loop 6 6.8 0.000 0.000 2.588 2.589 pw_gather_p 964 14.3 2.335 2.586 2.335 2.586 qmmm_elec_gaussian_low_G 6 6.8 2.457 2.552 2.457 2.552 mp_waitall_1 188862 16.2 2.296 2.498 2.296 2.498 pw_scatter_p 1095 15.3 2.230 2.364 2.230 2.364 ot_mini 113 10.2 0.001 0.001 2.349 2.361 pw_derive 732 12.5 1.933 2.121 1.933 2.121 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=36.19500000000001, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=68.023, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=10.796, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=9.745, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=8.865, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="force_nonbond", label="force_nonbond", y=5.771, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=5.36, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=5.124, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=54.168000000000006, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=2.457, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=3.409, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="force_nonbond", label="force_nonbond", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=4.766, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=6.575, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=5.534, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=5.857, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=13.449, yerr=0.0 Summary: Performance test works fine. Status: OK Uploading artifacts... done EndDate: 2021-11-10 20:26:37+00:00