StartDate: 2022-10-27 22:01:03+00:00 CpuId: 32x AMD EPYC (3rd Gen) (Milan) [Zen 3], 7nm (SMT disabled) CommitSHA: f36b488847463310f25cb3cd7662924bf5e00627 CommitTime: 2022-10-27 23:21:38 +0200 CommitAuthor: Ole Schütt CommitSubject: Docker: Upgrade to nvidia/cuda:11.8.0-devel-ubuntu22.04 Populating docker build cache... done. #################### Building Image cp2k-perf-openmp #################### Dockerfile: /tools/docker/Dockerfile.test_performance Build-Path: / Build-Args: GIT_COMMIT_SHA=f36b488847463310f25cb3cd7662924bf5e00627 Sending build context to Docker daemon 365.6MB Step 1/42 : FROM ubuntu:22.04 22.04: Pulling from library/ubuntu 301a8b74f71f: Already exists Digest: sha256:7cfe75438fc77c9d7235ae502bf229b15ca86647ac01c844b272b56326d56184 Status: Downloaded newer image for ubuntu:22.04 ---> cdb68b455a14 Step 2/42 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> e36f08f73260 Step 3/42 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 682c309d9640 Step 4/42 : RUN ./install_requirements.sh ubuntu:22.04 ---> Using cache ---> 40a433e6df84 Step 5/42 : RUN mkdir scripts ---> Using cache ---> 99ad9eee5fce Step 6/42 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> e9da3ca0f434 Step 7/42 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> a1b4d22eda67 Step 8/42 : RUN ./install_cp2k_toolchain.sh --install-all --mpi-mode=mpich --target-cpu=native --with-gcc=system --dry-run ---> Using cache ---> 99d8c439deed Step 9/42 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 396b14ea4f0a Step 10/42 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 8c8bdd8cd062 Step 11/42 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 318710a932c5 Step 12/42 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> 0a03f027fa49 Step 13/42 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 396a23ac81a6 Step 14/42 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 9fceba6a66c4 Step 15/42 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 48c084976c13 Step 16/42 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 0434b1b150d8 Step 17/42 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> d309c7027db9 Step 18/42 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 6777d98b65e0 Step 19/42 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> 0caa85841636 Step 20/42 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in 6c0c23f8583b ==================== Installing ELPA ==================== elpa-2022.05.001.tar.gz: OK Checksum of elpa-2022.05.001.tar.gz Ok patching file nvcc_wrap Installing from scratch into /opt/cp2k-toolchain/install/elpa-2022.05.001/cpu Step elpa took 46.00 seconds. ==================== Installing PT-Scotch ==================== scotch_6.0.0.tar.gz: OK Checksum of scotch_6.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scotch-6.0.0 Step ptscotch took 5.00 seconds. ==================== Installing SuperLU_DIST ==================== superlu_dist_6.1.0.tar.gz: OK Checksum of superlu_dist_6.1.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/superlu_dist-6.1.0 Step superlu took 7.00 seconds. ==================== Installing PEXSI ==================== pexsi_v1.2.0.tar.gz: OK Checksum of pexsi_v1.2.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pexsi-1.2.0 Step pexsi took 53.00 seconds. Removing intermediate container 6c0c23f8583b ---> 2d373bad4e71 Step 21/42 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> 637e1687e116 Step 22/42 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in ea64d6af978d ==================== Installing QUIP ==================== QUIP-0.9.10.tar.gz: OK Checksum of QUIP-0.9.10.tar.gz Ok fox-b5b69ef9a46837bd944ba5c9bc1cf9d00a6198a7.tar.gz: OK Checksum of fox-b5b69ef9a46837bd944ba5c9bc1cf9d00a6198a7.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/quip-0.9.10 Step quip took 191.00 seconds. ==================== Installing gsl ==================== gsl-2.7.tar.gz: OK Checksum of gsl-2.7.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.7 Step gsl took 40.00 seconds. ==================== Installing PLUMED ==================== plumed-src-2.8.0.tgz: OK Checksum of plumed-src-2.8.0.tgz Ok Installing from scratch into /opt/cp2k-toolchain/install/plumed-2.8.0 Step plumed took 46.00 seconds. Removing intermediate container ea64d6af978d ---> 03ff7059dbe5 Step 23/42 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> b3e186436238 Step 24/42 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in 7a406430ebb2 ==================== Installing hdf5 ==================== hdf5-1.12.0.tar.bz2: OK Checksum of hdf5-1.12.0.tar.bz2 Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-1.12.0 Step hdf5 took 80.00 seconds. ==================== Installing libvdwxc ==================== libvdwxc-0.4.0.tar.gz: OK Checksum of libvdwxc-0.4.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.4.0 Step libvdwxc took 33.00 seconds. ==================== Installing spglib ==================== spglib-1.16.2.tar.gz: OK Checksum of spglib-1.16.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-1.16.2 Step spglib took 6.00 seconds. ==================== Installing libvori ==================== libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 15.00 seconds. Removing intermediate container 7a406430ebb2 ---> 70172e612eae Step 25/42 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> e00710b51530 Step 26/42 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in 3d0839815b74 ==================== Installing spfft ==================== SpFFT-1.0.6.tar.gz: OK Checksum of SpFFT-1.0.6.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.0.6 Step spfft took 5.00 seconds. ==================== Installing spla ==================== SpLA-1.5.4.tar.gz: OK Checksum of SpLA-1.5.4.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.5.4 Step spla took 6.00 seconds. ==================== Installing SIRIUS ==================== SIRIUS-7.3.2.tar.gz: OK Checksum of SIRIUS-7.3.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.3.2 Step sirius took 71.00 seconds. Removing intermediate container 3d0839815b74 ---> 1a29371ff0aa Step 27/42 : COPY ./tools/toolchain/scripts/arch_base.tmpl ./tools/toolchain/scripts/generate_arch_files.sh ./scripts/ ---> d69628de9309 Step 28/42 : RUN ./scripts/generate_arch_files.sh && rm -rf ./build ---> Running in e23b77fc0584 ==================== generating arch files ==================== arch files can be found in the /opt/cp2k-toolchain/install/arch subdirectory Wrote /opt/cp2k-toolchain/install/arch/local.ssmp Wrote /opt/cp2k-toolchain/install/arch/local_static.ssmp Wrote /opt/cp2k-toolchain/install/arch/local.sdbg Wrote /opt/cp2k-toolchain/install/arch/local_asan.ssmp Wrote /opt/cp2k-toolchain/install/arch/local_coverage.sdbg Wrote /opt/cp2k-toolchain/install/arch/local.psmp Wrote /opt/cp2k-toolchain/install/arch/local.pdbg Wrote /opt/cp2k-toolchain/install/arch/local_asan.psmp Wrote /opt/cp2k-toolchain/install/arch/local_static.psmp Wrote /opt/cp2k-toolchain/install/arch/local_warn.psmp Wrote /opt/cp2k-toolchain/install/arch/local_coverage.pdbg ========================== usage ========================= Done! Now copy: cp /opt/cp2k-toolchain/install/arch/* to the cp2k/arch/ directory To use the installed tools and libraries and cp2k version compiled with it you will first need to execute at the prompt: source /opt/cp2k-toolchain/install/setup To build CP2K you should change directory: cd cp2k/ make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg" arch files for GPU enabled CUDA versions are named "local_cuda.*" arch files for GPU enabled HIP versions are named "local_hip.*" arch files for OpenCL (GPU) versions are named "local_opencl.*" arch files for coverage versions are named "local_coverage.*" Note that these pre-built arch files are for the GNU compiler, users have to adapt them for other compilers. It is possible to use the provided CP2K arch files as guidance. Removing intermediate container e23b77fc0584 ---> c3be91881a8e Step 29/42 : WORKDIR /opt/cp2k ---> Running in ec08a0cbc81a Removing intermediate container ec08a0cbc81a ---> 8183ef72218d Step 30/42 : COPY ./Makefile . ---> 5c730a7b47c9 Step 31/42 : COPY ./src ./src ---> bccb308fc8e1 Step 32/42 : COPY ./exts ./exts ---> 0f5b57cae8e3 Step 33/42 : COPY ./tools/build_utils ./tools/build_utils ---> 724b5c8e2d59 Step 34/42 : RUN /bin/bash -c " mkdir -p arch && ln -vs /opt/cp2k-toolchain/install/arch/local.psmp ./arch/ && echo 'Compiling cp2k...' && source /opt/cp2k-toolchain/install/setup && ( make -j ARCH=local VERSION=psmp &> /dev/null || true ) && ( [ ! -f ./exe/local/cp2k.psmp ] || ldd ./exe/local/cp2k.psmp | grep -q libmpi )" ---> Running in 527571be4d3d './arch/local.psmp' -> '/opt/cp2k-toolchain/install/arch/local.psmp' Compiling cp2k... Removing intermediate container 527571be4d3d ---> b8281bb348e9 Step 35/42 : COPY ./data ./data ---> 82b7712e9346 Step 36/42 : COPY ./tests ./tests ---> c2cde581170b Step 37/42 : COPY ./tools/regtesting ./tools/regtesting ---> 8805c34b4516 Step 38/42 : COPY ./benchmarks ./benchmarks ---> a96220efe279 Step 39/42 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 3445d1062ef5 Step 40/42 : RUN ./test_performance.sh "local" 2>&1 | tee report.log ---> Running in 243a18226d35 ========== Compiling CP2K ========== Compiling cp2k... done. Checking benchmark inputs... Found 60 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 1 threads and 32 ranks... done. Running H2O-64.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.036 0.036 86.462 86.462 qs_mol_dyn_low 1 2.0 0.003 0.003 85.855 85.855 qs_forces 11 3.9 0.001 0.001 85.816 85.816 qs_energies 11 4.9 0.001 0.001 79.947 79.947 scf_env_do_scf 11 5.9 0.001 0.001 69.439 69.439 velocity_verlet 10 3.0 0.002 0.002 56.096 56.096 scf_env_do_scf_inner_loop 108 6.5 0.011 0.011 52.707 52.707 rebuild_ks_matrix 119 8.3 0.001 0.001 19.839 19.839 qs_ks_build_kohn_sham_matrix 119 9.3 0.012 0.012 19.838 19.838 dbcsr_multiply_generic 2286 12.5 0.150 0.150 19.423 19.423 qs_scf_new_mos 108 7.5 0.001 0.001 19.339 19.339 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.338 19.338 qs_rho_update_rho_low 119 7.7 0.001 0.001 18.862 18.862 calculate_rho_elec 119 8.7 0.949 0.949 18.861 18.861 qs_ks_update_qs_env 119 7.6 0.001 0.001 18.215 18.215 ot_scf_mini 108 9.5 0.002 0.002 18.070 18.070 init_scf_loop 11 6.9 0.000 0.000 16.608 16.608 grid_collocate_task_list 119 9.7 14.553 14.553 14.553 14.553 prepare_preconditioner 11 7.9 0.000 0.000 14.088 14.088 make_preconditioner 11 8.9 0.000 0.000 14.088 14.088 make_full_inverse_cholesky 11 9.9 0.000 0.000 12.926 12.926 sum_up_and_integrate 119 10.3 0.510 0.510 12.836 12.836 integrate_v_rspace 119 11.3 0.090 0.090 12.325 12.325 ot_mini 108 10.5 0.001 0.001 11.698 11.698 make_m2s 4572 13.5 0.045 0.045 10.772 10.772 grid_integrate_task_list 119 12.3 10.443 10.443 10.443 10.443 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 6.142 6.142 qs_ot_get_derivative 108 11.5 0.001 0.001 6.139 6.139 pw_transfer 1439 11.6 0.061 0.061 5.852 5.852 dbcsr_make_dense_low 5837 15.5 0.065 0.065 5.790 5.790 make_dense_data 5837 16.5 5.125 5.125 5.711 5.711 fft_wrap_pw1pw2 1201 12.6 0.006 0.006 5.656 5.656 ot_diis_step 108 11.5 0.004 0.004 5.556 5.556 make_images 4572 14.5 2.095 2.095 5.371 5.371 dbcsr_make_images_dense 3978 14.8 0.017 0.017 5.115 5.115 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 5.048 5.048 apply_single 119 13.6 0.000 0.000 5.048 5.048 fft_wrap_pw1pw2_140 487 13.2 0.427 0.427 4.848 4.848 cp_fm_cholesky_decompose 22 10.9 4.632 4.632 4.632 4.632 multiply_cannon 2286 13.5 0.172 0.172 4.586 4.586 multiply_cannon_loop 2286 14.5 0.034 0.034 4.134 4.134 cp_fm_cholesky_invert 11 10.9 4.130 4.130 4.130 4.130 multiply_cannon_multrec 2286 15.5 4.054 4.054 4.099 4.099 init_scf_run 11 5.9 0.002 0.002 3.768 3.768 scf_env_initial_rho_setup 11 6.9 0.000 0.000 3.766 3.766 dbcsr_complete_redistribute 329 12.2 1.933 1.933 3.716 3.716 dbcsr_copy 2102 12.0 0.225 0.225 3.479 3.479 density_rs2pw 119 9.7 0.004 0.004 3.359 3.359 wfi_extrapolate 11 7.9 0.001 0.001 3.295 3.295 build_core_hamiltonian_matrix_ 11 4.9 0.000 0.000 3.293 3.293 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.276 3.276 dbcsr_copy_into_existing 22 7.9 3.222 3.222 3.223 3.223 copy_dbcsr_to_fm 153 11.3 0.002 0.002 3.051 3.051 qs_ot_get_p 119 10.4 0.001 0.001 2.977 2.977 qs_create_task_list 11 7.9 0.000 0.000 2.902 2.902 generate_qs_task_list 11 8.9 1.983 1.983 2.902 2.902 fft3d_s 1202 14.6 2.750 2.750 2.755 2.755 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 2.574 2.574 build_core_hamiltonian_matrix 11 6.9 0.000 0.000 2.464 2.464 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 2.328 2.328 dbcsr_data_release 279534 16.0 2.172 2.172 2.172 2.172 pw_poisson_solve 119 10.3 0.331 0.331 1.973 1.973 qs_ot_get_derivative_diag 49 12.0 0.001 0.001 1.924 1.924 qs_ot_get_derivative_taylor 59 13.0 0.001 0.001 1.907 1.907 copy_fm_to_dbcsr 176 11.2 0.001 0.001 1.838 1.838 potential_pw2rs 119 12.3 0.046 0.046 1.791 1.791 cp_fm_upper_to_full 72 14.2 1.766 1.766 1.766 1.766 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.022 42.894 42.904 qs_mol_dyn_low 1 2.0 0.003 0.004 42.783 42.787 qs_forces 11 3.9 0.001 0.001 42.743 42.743 qs_energies 11 4.9 0.001 0.001 39.951 39.952 scf_env_do_scf 11 5.9 0.000 0.002 36.607 36.608 scf_env_do_scf_inner_loop 108 6.5 0.003 0.018 33.832 33.832 velocity_verlet 10 3.0 0.002 0.003 25.493 25.494 rebuild_ks_matrix 119 8.3 0.000 0.001 15.534 15.622 qs_ks_build_kohn_sham_matrix 119 9.3 0.014 0.018 15.534 15.622 qs_ks_update_qs_env 119 7.6 0.001 0.001 13.840 13.918 dbcsr_multiply_generic 2286 12.5 0.068 0.073 12.713 12.924 qs_rho_update_rho_low 119 7.7 0.001 0.001 12.557 12.563 calculate_rho_elec 119 8.7 0.029 0.031 12.556 12.563 sum_up_and_integrate 119 10.3 0.029 0.035 11.491 11.517 integrate_v_rspace 119 11.3 0.004 0.006 11.462 11.491 qs_scf_new_mos 108 7.5 0.001 0.001 9.968 10.141 qs_scf_loop_do_ot 108 8.5 0.001 0.001 9.968 10.140 multiply_cannon 2286 13.5 0.145 0.174 9.451 9.883 ot_scf_mini 108 9.5 0.002 0.002 9.370 9.529 multiply_cannon_loop 2286 14.5 0.087 0.113 8.945 9.363 grid_collocate_task_list 119 9.7 8.867 9.265 8.867 9.265 mp_waitall_1 169478 16.3 8.081 8.817 8.081 8.817 grid_integrate_task_list 119 12.3 7.962 8.138 7.962 8.138 multiply_cannon_metrocomm3 18288 15.5 0.036 0.042 5.339 6.168 ot_mini 108 10.5 0.001 0.001 5.452 5.629 rs_pw_transfer 974 11.9 0.010 0.013 4.027 4.452 density_rs2pw 119 9.7 0.005 0.006 3.356 3.755 pw_transfer 1439 11.6 0.101 0.117 3.170 3.244 multiply_cannon_multrec 18288 15.5 2.802 3.202 2.811 3.212 fft_wrap_pw1pw2 1201 12.6 0.009 0.010 3.004 3.065 potential_pw2rs 119 12.3 0.006 0.007 2.998 3.021 qs_ot_get_derivative 108 11.5 0.001 0.001 2.799 2.956 init_scf_loop 11 6.9 0.000 0.000 2.762 2.763 fft_wrap_pw1pw2_140 487 13.2 0.244 0.295 2.574 2.697 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 2.563 2.664 apply_single 119 13.6 0.000 0.000 2.563 2.664 ot_diis_step 108 11.5 0.003 0.003 2.622 2.627 init_scf_run 11 5.9 0.000 0.004 2.376 2.377 scf_env_initial_rho_setup 11 6.9 0.000 0.003 2.376 2.376 make_m2s 4572 13.5 0.044 0.046 2.203 2.247 fft3d_ps 1201 14.6 1.149 1.317 2.159 2.232 wfi_extrapolate 11 7.9 0.001 0.001 2.149 2.150 make_images 4572 14.5 0.113 0.117 1.889 1.937 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 1.831 1.845 mp_waitany 9880 13.7 1.244 1.744 1.244 1.744 qs_ot_get_p 119 10.4 0.001 0.001 1.201 1.414 rs_pw_transfer_RS2PW_140 130 11.5 0.185 0.227 0.938 1.366 multiply_cannon_metrocomm1 18288 15.5 0.017 0.021 0.519 1.362 mp_sum_l 11218 13.2 0.641 1.218 0.641 1.218 make_images_data 4572 15.5 0.034 0.039 1.073 1.199 rs_pw_transfer_PW2RS_140 130 13.9 0.336 0.419 1.129 1.171 prepare_preconditioner 11 7.9 0.000 0.000 1.058 1.091 make_preconditioner 11 8.9 0.000 0.000 1.058 1.091 hybrid_alltoall_any 4725 16.4 0.058 0.173 0.944 1.067 mp_alltoall_d11v 2130 13.8 0.688 1.032 0.688 1.032 mp_alltoall_z22v 1201 16.6 0.776 1.026 0.776 1.026 make_full_inverse_cholesky 11 9.9 0.000 0.000 0.954 0.982 qs_ot_get_derivative_taylor 59 13.0 0.001 0.001 0.869 0.978 qs_ot_get_derivative_diag 49 12.0 0.001 0.001 0.888 0.948 build_core_hamiltonian_matrix_ 11 4.9 0.000 0.000 0.851 0.932 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 0.922 0.924 mp_sum_d 4129 12.0 0.562 0.882 0.562 0.882 rs_pw_transfer_PW2RS_50 119 14.3 0.251 0.334 0.780 0.862 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_32omp", title="Timings of H2O-64 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32omp", name="rest", label="rest", y=43.525, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=14.553, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=10.443, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="make_dense_data", label="make_dense_data", y=5.125, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=4.632, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=4.13, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=4.054, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_timings_32mpi", title="Timings of H2O-64 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_32mpi", name="rest", label="rest", y=13.937999999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.867, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.962, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=2.802, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitany", label="mp_waitany", y=1.244, yerr=0.0 PlotPoint: plot="H2O-64_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=8.081, yerr=0.0 Running H2O-64_nonortho.inp with 1 threads and 32 ranks... done. Running H2O-64_nonortho.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-64_nonortho_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.035 0.035 109.893 109.893 qs_mol_dyn_low 1 2.0 0.003 0.003 109.229 109.229 qs_forces 11 3.9 0.001 0.001 109.190 109.190 qs_energies 11 4.9 0.001 0.001 101.640 101.640 scf_env_do_scf 11 5.9 0.001 0.001 89.100 89.100 scf_env_do_scf_inner_loop 96 6.5 0.010 0.010 71.293 71.293 velocity_verlet 10 3.0 0.002 0.002 70.433 70.433 rebuild_ks_matrix 107 8.3 0.001 0.001 33.057 33.057 qs_ks_build_kohn_sham_matrix 107 9.3 0.011 0.011 33.056 33.056 qs_rho_update_rho_low 107 7.7 0.001 0.001 31.407 31.407 calculate_rho_elec 107 8.7 0.851 0.851 31.407 31.407 qs_ks_update_qs_env 107 7.6 0.001 0.001 29.708 29.708 grid_collocate_task_list 107 9.7 27.244 27.244 27.244 27.244 sum_up_and_integrate 107 10.3 0.457 0.457 26.667 26.667 integrate_v_rspace 107 11.3 0.102 0.102 26.209 26.209 grid_integrate_task_list 107 12.3 24.420 24.420 24.420 24.420 init_scf_loop 11 6.9 0.000 0.000 17.683 17.683 dbcsr_multiply_generic 1966 12.4 0.148 0.148 17.368 17.368 qs_scf_new_mos 96 7.5 0.001 0.001 16.945 16.945 qs_scf_loop_do_ot 96 8.5 0.001 0.001 16.945 16.945 ot_scf_mini 96 9.5 0.002 0.002 15.784 15.784 prepare_preconditioner 11 7.9 0.000 0.000 13.680 13.680 make_preconditioner 11 8.9 0.000 0.000 13.680 13.680 make_full_inverse_cholesky 11 9.9 0.000 0.000 12.544 12.544 ot_mini 96 10.5 0.001 0.001 10.154 10.154 make_m2s 3932 13.4 0.039 0.039 9.569 9.569 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 6.558 6.558 pw_transfer 1295 11.6 0.057 0.057 5.664 5.664 fft_wrap_pw1pw2 1081 12.6 0.006 0.006 5.487 5.487 init_scf_run 11 5.9 0.002 0.002 5.384 5.384 qs_ot_get_derivative 96 11.5 0.001 0.001 5.382 5.382 scf_env_initial_rho_setup 11 6.9 0.000 0.000 5.382 5.382 dbcsr_make_dense_low 4961 15.5 0.068 0.068 5.089 5.089 make_dense_data 4961 16.5 4.565 4.565 5.009 5.009 make_images 3932 14.4 1.917 1.917 4.923 4.923 ot_diis_step 96 11.5 0.003 0.003 4.769 4.769 fft_wrap_pw1pw2_140 439 13.2 0.502 0.502 4.766 4.766 wfi_extrapolate 11 7.9 0.001 0.001 4.748 4.748 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 4.432 4.432 apply_single 107 13.6 0.000 0.000 4.432 4.432 dbcsr_make_images_dense 3386 14.7 0.014 0.014 4.397 4.397 multiply_cannon 1966 13.4 0.148 0.148 4.274 4.274 cp_fm_cholesky_decompose 22 10.9 4.253 4.253 4.253 4.253 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 4.206 4.206 cp_fm_cholesky_invert 11 10.9 4.039 4.039 4.039 4.039 multiply_cannon_loop 1966 14.4 0.026 0.026 3.879 3.879 multiply_cannon_multrec 1966 15.4 3.811 3.811 3.851 3.851 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.701 3.701 dbcsr_complete_redistribute 317 12.2 1.881 1.881 3.622 3.622 dbcsr_copy 1855 11.9 0.191 0.191 3.413 3.413 build_core_hamiltonian_matrix_ 11 4.9 0.000 0.000 3.342 3.342 density_rs2pw 107 9.7 0.003 0.003 3.311 3.311 qs_create_task_list 11 7.9 0.000 0.000 3.301 3.301 generate_qs_task_list 11 8.9 2.399 2.399 3.301 3.301 dbcsr_copy_into_existing 22 7.9 3.191 3.191 3.191 3.191 copy_dbcsr_to_fm 147 11.2 0.002 0.002 3.035 3.035 fft3d_s 1082 14.6 2.665 2.665 2.670 2.670 qs_ot_get_p 107 10.4 0.001 0.001 2.524 2.524 build_core_hamiltonian_matrix 11 6.9 0.000 0.000 2.435 2.435 transfer_dbcsr_to_fm 11 10.9 0.000 0.000 2.354 2.354 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-64_nonortho_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.022 68.333 68.342 qs_mol_dyn_low 1 2.0 0.003 0.004 68.233 68.237 qs_forces 11 3.9 0.001 0.001 68.185 68.185 qs_energies 11 4.9 0.001 0.001 63.576 63.578 scf_env_do_scf 11 5.9 0.000 0.002 58.851 58.852 scf_env_do_scf_inner_loop 96 6.5 0.002 0.017 54.465 54.466 velocity_verlet 10 3.0 0.002 0.010 40.756 40.758 rebuild_ks_matrix 107 8.3 0.000 0.001 29.622 29.688 qs_ks_build_kohn_sham_matrix 107 9.3 0.013 0.015 29.622 29.688 qs_ks_update_qs_env 107 7.6 0.001 0.001 26.086 26.149 sum_up_and_integrate 107 10.3 0.030 0.036 26.048 26.078 integrate_v_rspace 107 11.3 0.004 0.005 26.018 26.051 qs_rho_update_rho_low 107 7.7 0.000 0.001 25.269 25.274 calculate_rho_elec 107 8.7 0.026 0.028 25.269 25.273 grid_integrate_task_list 107 12.3 22.660 22.987 22.660 22.987 grid_collocate_task_list 107 9.7 21.721 22.167 21.721 22.167 dbcsr_multiply_generic 1966 12.4 0.061 0.065 11.120 11.301 qs_scf_new_mos 96 7.5 0.001 0.001 8.656 8.741 qs_scf_loop_do_ot 96 8.5 0.001 0.001 8.655 8.740 multiply_cannon 1966 13.4 0.135 0.154 8.223 8.564 ot_scf_mini 96 9.5 0.002 0.002 8.116 8.203 multiply_cannon_loop 1966 14.4 0.080 0.091 7.750 8.111 mp_waitall_1 146670 16.2 6.747 7.348 6.747 7.348 multiply_cannon_metrocomm3 15728 15.4 0.032 0.037 4.422 5.133 ot_mini 96 10.5 0.001 0.001 4.725 4.819 rs_pw_transfer 878 11.9 0.009 0.011 3.855 4.426 init_scf_loop 11 6.9 0.000 0.000 4.373 4.373 density_rs2pw 107 9.7 0.005 0.006 3.249 3.843 init_scf_run 11 5.9 0.000 0.004 3.724 3.724 scf_env_initial_rho_setup 11 6.9 0.000 0.003 3.724 3.724 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.651 3.660 wfi_extrapolate 11 7.9 0.001 0.001 3.379 3.379 pw_transfer 1295 11.6 0.092 0.114 2.884 2.961 fft_wrap_pw1pw2 1081 12.6 0.008 0.010 2.734 2.804 multiply_cannon_multrec 15728 15.4 2.606 2.747 2.615 2.756 potential_pw2rs 107 12.3 0.006 0.007 2.682 2.708 qs_ot_get_derivative 96 11.5 0.001 0.001 2.421 2.508 fft_wrap_pw1pw2_140 439 13.2 0.224 0.275 2.358 2.489 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 2.234 2.328 apply_single 107 13.6 0.000 0.000 2.234 2.328 ot_diis_step 96 11.5 0.003 0.003 2.284 2.284 make_m2s 3932 13.4 0.038 0.041 2.003 2.065 fft3d_ps 1081 14.6 1.040 1.144 1.950 2.023 mp_waitany 8968 13.7 1.458 1.992 1.458 1.992 make_images 3932 14.4 0.101 0.105 1.726 1.788 rs_pw_transfer_RS2PW_140 118 11.5 0.185 0.224 1.173 1.745 mp_alltoall_d11v 1998 13.7 0.846 1.449 0.846 1.449 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_32omp", title="Timings of H2O-64_nonortho with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="rest", label="rest", y=41.56100000000001, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=27.244, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=24.42, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="make_dense_data", label="make_dense_data", y=4.565, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=4.253, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=4.039, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=3.811, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitany", label="mp_waitany", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 Plot: name="H2O-64_nonortho_timings_32mpi", title="Timings of H2O-64_nonortho with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="rest", label="rest", y=13.140999999999998, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=21.721, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=22.66, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="make_dense_data", label="make_dense_data", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_decompose", label="cp_fm_cholesky_decompose", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="cp_fm_cholesky_invert", label="cp_fm_cholesky_invert", y=0.0, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=2.606, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitany", label="mp_waitany", y=1.458, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=6.747, yerr=0.0 Running H2O-hyb.inp with 1 threads and 32 ranks... done. Running H2O-hyb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/H2O-hyb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.188 0.188 103.574 103.574 qs_energies 1 2.0 0.000 0.000 102.754 102.754 scf_env_do_scf 1 3.0 0.000 0.000 101.640 101.640 qs_ks_update_qs_env 8 5.0 0.000 0.000 96.904 96.904 rebuild_ks_matrix 7 6.0 0.000 0.000 96.848 96.848 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.001 96.848 96.848 hfx_ks_matrix 7 8.0 0.000 0.000 87.947 87.947 integrate_four_center 7 9.0 1.611 1.611 87.928 87.928 integrate_four_center_main 7 10.0 0.505 0.505 80.153 80.153 integrate_four_center_bin 451 11.0 79.648 79.648 79.648 79.648 scf_env_do_scf_inner_loop 7 4.0 0.001 0.001 55.727 55.727 init_scf_loop 1 4.0 0.000 0.000 45.903 45.903 integrate_four_center_load 7 10.0 0.000 0.000 5.924 5.924 hfx_load_balance 1 11.0 0.001 0.001 5.924 5.924 qs_vxc_create 14 8.0 0.000 0.000 3.093 3.093 xc_vxc_pw_create 14 9.0 0.124 0.124 3.093 3.093 hfx_load_balance_count 1 12.0 2.956 2.956 2.956 2.956 hfx_load_balance_bin 1 12.0 2.950 2.950 2.950 2.950 calculate_rho_elec 15 7.4 0.117 0.117 2.393 2.393 xc_rho_set_and_dset_create 14 10.0 0.098 0.098 2.355 2.355 admm_mo_calc_rho_aux 7 8.0 0.000 0.000 2.286 2.286 prepare_preconditioner 1 5.0 0.000 0.000 2.233 2.233 make_preconditioner 1 6.0 0.000 0.000 2.233 2.233 ------------------------------------------------------------------------------- From /workspace/artifacts/H2O-hyb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.213 0.233 93.418 93.428 qs_energies 1 2.0 0.000 0.000 93.086 93.092 scf_env_do_scf 1 3.0 0.000 0.000 92.765 92.765 qs_ks_update_qs_env 8 5.0 0.000 0.000 90.947 90.947 rebuild_ks_matrix 7 6.0 0.000 0.000 90.939 90.940 qs_ks_build_kohn_sham_matrix 7 7.0 0.001 0.001 90.939 90.940 hfx_ks_matrix 7 8.0 0.000 0.000 85.846 85.847 integrate_four_center 7 9.0 0.050 0.323 85.837 85.838 integrate_four_center_main 7 10.0 0.003 0.003 78.220 79.352 integrate_four_center_bin 448 11.0 78.217 79.349 78.217 79.349 scf_env_do_scf_inner_loop 7 4.0 0.000 0.001 51.600 51.600 init_scf_loop 1 4.0 0.000 0.000 41.164 41.164 integrate_four_center_load 7 10.0 0.000 0.000 5.802 5.802 hfx_load_balance 1 11.0 0.001 0.001 5.801 5.802 hfx_load_balance_bin 1 12.0 2.771 2.899 2.771 2.899 hfx_load_balance_count 1 12.0 2.771 2.897 2.771 2.897 mp_sync 70 11.3 1.174 2.578 1.174 2.578 qs_vxc_create 14 8.0 0.000 0.000 2.312 2.312 xc_vxc_pw_create 14 9.0 0.006 0.007 2.312 2.312 xc_rho_set_and_dset_create 14 10.0 0.009 0.010 1.881 1.982 ------------------------------------------------------------------------------- Plot: name="H2O-hyb_timings_32omp", title="Timings of H2O-hyb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32omp", name="rest", label="rest", y=15.715999999999994, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_bin", label="integrate_four_center_bin", y=79.648, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_count", label="hfx_load_balance_count", y=2.956, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=2.95, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center", label="integrate_four_center", y=1.611, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="integrate_four_center_main", label="integrate_four_center_main", y=0.505, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="CP2K", label="CP2K", y=0.188, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="H2O-hyb_timings_32mpi", title="Timings of H2O-hyb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="H2O-hyb_timings_32mpi", name="rest", label="rest", y=8.219000000000008, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_bin", label="integrate_four_center_bin", y=78.217, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_count", label="hfx_load_balance_count", y=2.771, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="hfx_load_balance_bin", label="hfx_load_balance_bin", y=2.771, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center", label="integrate_four_center", y=0.05, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="integrate_four_center_main", label="integrate_four_center_main", y=0.003, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="CP2K", label="CP2K", y=0.213, yerr=0.0 PlotPoint: plot="H2O-hyb_timings_32mpi", name="mp_sync", label="mp_sync", y=1.174, yerr=0.0 Running GW_PBE_4benzene.inp with 1 threads and 32 ranks... done. Running GW_PBE_4benzene.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.013 0.013 76.756 76.756 qs_energies 1 2.0 0.000 0.000 76.383 76.383 mp2_main 1 3.0 0.000 0.000 73.613 73.613 mp2_gpw_main 1 4.0 0.000 0.000 73.508 73.508 rpa_ri_compute_en 1 5.0 0.000 0.000 69.943 69.943 rpa_num_int 1 6.0 0.001 0.001 69.937 69.937 compute_mat_P_omega 1 7.0 0.003 0.003 60.261 60.261 compute_mat_P_omega_contract 10 8.0 8.748 8.748 60.058 60.058 dbt_total 2336 9.6 0.011 0.011 46.413 46.413 dbt_contract 787 11.0 0.034 0.034 39.883 39.883 dbt_tas_total 1149 12.2 0.190 0.190 39.050 39.050 dbt_tas_multiply 807 12.1 0.002 0.002 37.726 37.726 dbt_tas_dbm 807 14.1 0.003 0.003 31.320 31.320 dbm_multiply 807 16.1 31.311 31.311 31.311 31.311 dbt_tas_mm_1N 524 15.1 0.002 0.002 24.324 24.324 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 23.019 23.019 compute_mat_P_omega_calc_M_occ 250 9.0 8.774 8.774 16.441 16.441 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 6.547 6.547 dbt_tas_mm_2 251 15.0 0.001 0.001 5.556 5.556 dbt_copy 1103 10.7 0.065 0.065 5.190 5.190 compute_QP_energies 1 7.0 0.000 0.000 4.869 4.869 compute_self_energy_cubic_gw 1 8.0 0.053 0.053 4.868 4.868 contract_cubic_gw 21 9.0 0.000 0.000 3.898 3.898 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 3.558 3.558 dbt_tas_reserve_blocks_index 3261 14.3 0.139 0.139 3.168 3.168 dbm_reserve_blocks 3628 15.3 3.091 3.091 3.091 3.091 scf_env_do_scf 1 3.0 0.000 0.000 2.667 2.667 scf_env_do_scf_inner_loop 17 4.0 0.002 0.002 2.667 2.667 dbt_reserve_blocks_index 2280 13.1 0.051 0.051 2.385 2.385 convert_to_new_pgrid 2421 14.1 0.099 0.099 2.379 2.379 dbt_reserve_blocks_index_array 2222 12.2 0.010 0.010 2.342 2.342 dbm_copy 1614 15.1 2.280 2.280 2.280 2.280 dbt_tas_copy 574 11.4 1.280 1.280 2.101 2.101 dbt_crop 1042 12.0 1.320 1.320 2.052 2.052 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 2.008 2.008 dbt_tas_reshape 367 15.0 0.006 0.006 2.006 2.006 compute_W_cubic_GW 10 7.0 0.004 0.004 1.975 1.975 get_2c_integrals 1 6.0 0.000 0.000 1.748 1.748 dbt_reshape 278 11.9 0.920 0.920 1.726 1.726 ------------------------------------------------------------------------------- From /workspace/artifacts/GW_PBE_4benzene_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.006 0.020 29.842 29.852 qs_energies 1 2.0 0.000 0.001 29.660 29.676 mp2_main 1 3.0 0.000 0.000 28.732 28.748 mp2_gpw_main 1 4.0 0.000 0.000 28.664 28.680 rpa_ri_compute_en 1 5.0 0.000 0.000 27.409 27.425 rpa_num_int 1 6.0 0.000 0.002 27.409 27.425 dbt_total 2336 9.6 0.010 0.011 24.376 24.377 compute_mat_P_omega 1 7.0 0.001 0.005 23.396 23.414 compute_mat_P_omega_contract 10 8.0 0.382 0.407 23.281 23.288 dbt_contract 787 11.0 0.024 0.025 18.415 18.417 dbt_tas_total 1149 12.2 0.047 0.057 16.572 16.573 dbt_tas_multiply 807 12.1 0.002 0.002 16.519 16.521 dbt_tas_dbm 807 14.1 0.003 0.003 12.292 12.303 dbm_multiply 807 16.1 9.600 10.303 9.600 10.303 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 7.075 7.075 compute_mat_P_omega_calc_M_occ 250 9.0 0.368 0.393 6.800 6.800 dbt_tas_mm_2 251 15.0 0.001 0.001 5.885 5.888 mp_sync 8706 11.6 4.331 5.486 4.331 5.486 dbt_copy 1111 10.7 0.011 0.012 5.234 5.455 dbt_reshape 1098 11.7 2.006 2.167 4.994 5.213 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 4.949 4.949 dbt_tas_mm_1N 524 15.1 0.001 0.001 4.360 4.692 compute_QP_energies 1 7.0 0.000 0.000 2.516 2.517 compute_self_energy_cubic_gw 1 8.0 0.002 0.003 2.515 2.516 mp_waitall_2 3776 15.3 2.349 2.491 2.349 2.491 dbt_communicate_buffer 1098 12.7 0.052 0.058 2.384 2.486 contract_cubic_gw 21 9.0 0.000 0.000 2.006 2.006 dbt_reserve_blocks_index 2849 13.1 0.061 0.066 1.451 1.705 dbt_reserve_blocks_index_array 2791 12.2 0.009 0.011 1.443 1.693 dbt_tas_reserve_blocks_index 3300 14.5 0.111 0.120 1.422 1.675 dbm_reserve_blocks 3696 15.4 1.399 1.656 1.399 1.656 dbt_crop 1042 12.0 0.851 0.959 1.303 1.464 mp2_ri_gpw_compute_in 1 5.0 0.000 0.001 1.253 1.254 dbt_tas_replicate 396 14.1 0.522 0.682 1.069 1.125 convert_to_new_pgrid 2421 14.1 0.023 0.028 0.737 0.925 compute_mat_P_omega_copy_M_vir 250 9.0 0.001 0.001 0.911 0.913 parallel_gemm_fm 105 8.4 0.000 0.000 0.891 0.901 parallel_gemm_fm_cosma 105 9.4 0.891 0.901 0.891 0.901 dbm_copy 1608 15.1 0.708 0.896 0.708 0.896 scf_env_do_scf 1 3.0 0.000 0.000 0.889 0.889 scf_env_do_scf_inner_loop 17 4.0 0.000 0.003 0.889 0.889 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 0.859 0.862 dbm_add 807 14.1 0.602 0.705 0.602 0.705 compute_W_cubic_GW 10 7.0 0.001 0.001 0.670 0.676 mp_max_i 1994 9.8 0.479 0.644 0.479 0.644 mp_bcast_im 6 9.7 0.589 0.610 0.589 0.610 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_32omp", title="Timings of GW_PBE_4benzene with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="rest", label="rest", y=21.631999999999998, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=31.311, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=8.774, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=8.748, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=3.091, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbm_copy", label="dbm_copy", y=2.28, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="dbt_reshape", label="dbt_reshape", y=0.92, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_32mpi", title="Timings of GW_PBE_4benzene with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="rest", label="rest", y=8.699000000000002, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=9.6, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_calc_M_occ", label="compute_mat_P_omega_calc_M_occ", y=0.368, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="compute_mat_P_omega_contract", label="compute_mat_P_omega_contract", y=0.382, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=1.399, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbm_copy", label="dbm_copy", y=0.708, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="dbt_reshape", label="dbt_reshape", y=2.006, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.349, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_32mpi", name="mp_sync", label="mp_sync", y=4.331, yerr=0.0 Running RI-HFX_H2O-32.inp with 1 threads and 32 ranks... done. Running RI-HFX_H2O-32.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.019 298.510 298.510 qs_forces 1 2.0 0.000 0.000 297.956 297.956 rebuild_ks_matrix 7 6.6 0.000 0.000 296.458 296.458 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.001 296.458 296.458 hfx_ks_matrix 7 8.6 0.000 0.000 294.630 294.630 hfx_ri_update_ks 7 9.6 0.000 0.000 257.974 257.974 hfx_ri_update_ks_Pmat 7 10.6 31.407 31.407 257.969 257.969 dbt_total 841 11.0 0.005 0.005 241.120 241.120 qs_energies 1 3.0 0.000 0.000 229.969 229.969 scf_env_do_scf 1 4.0 0.000 0.000 229.612 229.612 qs_ks_update_qs_env 8 6.0 0.000 0.000 228.521 228.521 dbt_contract 207 12.4 0.027 0.027 223.376 223.376 dbt_tas_total 375 13.4 1.533 1.533 223.052 223.052 dbt_tas_multiply 216 13.5 0.001 0.001 218.729 218.729 dbt_tas_dbm 216 15.5 0.001 0.001 207.953 207.953 dbm_multiply 216 17.5 207.950 207.950 207.950 207.950 hfx_ri_update_ks_Pmat_KS 63 11.6 0.000 0.000 203.234 203.234 dbt_tas_mm_2 91 16.5 0.001 0.001 195.054 195.054 scf_env_do_scf_inner_loop 6 5.0 0.001 0.001 150.482 150.482 init_scf_loop 2 5.0 0.000 0.000 79.127 79.127 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 67.940 67.940 hfx_ri_update_forces 1 7.0 1.591 1.591 36.653 36.653 hfx_ri_forces_Pmat_3c 1 8.0 4.672 4.672 21.231 21.231 dbt_copy 409 11.7 0.036 0.036 13.413 13.413 precalc_derivatives 1 8.0 2.121 2.121 11.951 11.951 dbt_reshape 132 13.2 5.848 5.848 9.467 9.467 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 9.203 9.203 dbt_tas_mm_3T 77 17.1 0.000 0.000 8.884 8.884 build_3c_derivatives 3 9.0 2.579 2.579 6.773 6.773 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 6.721 6.721 dbt_tas_reserve_blocks_index 1287 15.4 0.273 0.273 6.139 6.139 dbm_reserve_blocks 1461 16.2 6.056 6.056 6.056 6.056 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-HFX_H2O-32_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.024 42.368 42.377 qs_forces 1 2.0 0.000 0.000 42.249 42.250 rebuild_ks_matrix 7 6.6 0.000 0.000 41.609 41.610 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.002 41.609 41.610 hfx_ks_matrix 7 8.6 0.000 0.000 40.702 40.713 dbt_total 841 11.0 0.005 0.005 36.059 36.067 dbt_contract 207 12.4 0.019 0.020 28.536 28.565 dbt_tas_total 375 13.4 0.052 0.159 26.505 26.505 dbt_tas_multiply 216 13.5 0.001 0.001 25.559 25.559 hfx_ri_update_ks 7 9.6 0.000 0.000 24.030 24.030 hfx_ri_update_ks_Pmat 7 10.6 1.170 1.284 24.027 24.029 qs_energies 1 3.0 0.000 0.000 22.604 22.604 scf_env_do_scf 1 4.0 0.000 0.001 22.464 22.465 qs_ks_update_qs_env 8 6.0 0.000 0.000 21.975 21.976 dbt_tas_dbm 216 15.5 0.001 0.001 20.323 20.327 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 19.636 19.636 dbm_multiply 216 17.5 17.784 19.234 17.784 19.234 hfx_ri_update_forces 1 7.0 0.053 0.057 16.671 16.683 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 12.651 12.651 hfx_ri_forces_Pmat_3c 1 8.0 0.157 0.169 12.514 12.529 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 11.545 11.545 init_scf_loop 2 5.0 0.000 0.000 9.813 9.813 dbt_tas_mm_2 91 16.5 0.001 0.001 9.588 9.588 mp_sync 2909 12.8 4.732 7.448 4.732 7.448 dbt_copy 421 11.8 0.009 0.010 6.113 6.392 dbt_tas_mm_3T 77 17.1 0.000 0.000 5.201 5.901 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 4.634 4.634 dbt_reshape 252 12.8 2.125 2.235 4.205 4.353 dbt_tas_mm_3N 37 15.4 0.000 0.000 3.808 3.995 precalc_derivatives 1 8.0 0.077 0.087 3.158 3.158 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 3.049 3.049 dbm_reserve_blocks 1477 16.3 2.344 2.723 2.344 2.723 dbt_tas_reserve_blocks_index 1302 15.5 0.212 0.221 2.340 2.694 dbt_crop 372 13.7 1.653 1.754 2.236 2.447 mp_waitall_2 1204 16.3 2.139 2.249 2.139 2.249 dbt_reserve_blocks_index 938 14.4 0.088 0.096 1.836 2.110 dbt_reserve_blocks_index_array 915 13.4 0.004 0.005 1.814 2.088 build_3c_derivatives 3 9.0 0.219 0.235 1.741 1.753 dbt_tas_replicate 175 15.2 0.628 0.668 1.584 1.689 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 1.623 1.625 convert_to_new_pgrid 648 15.5 0.029 0.062 1.215 1.505 dbt_communicate_buffer 252 13.8 0.011 0.012 1.422 1.489 dbt_tas_copy 169 12.8 0.732 0.788 1.307 1.454 dbm_copy 452 16.3 1.044 1.346 1.044 1.346 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 1.312 1.321 dbt_tas_communicate_buffer 352 16.4 0.013 0.014 0.935 1.003 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_32omp", title="Timings of RI-HFX_H2O-32 with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="rest", label="rest", y=42.57699999999997, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_multiply", label="dbm_multiply", y=207.95, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=31.407, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=6.056, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="dbt_reshape", label="dbt_reshape", y=5.848, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="hfx_ri_forces_Pmat_3c", label="hfx_ri_forces_Pmat_3c", y=4.672, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_sync", label="mp_sync", y=0.0, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32omp", name="mp_waitall_2", label="mp_waitall_2", y=0.0, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_32mpi", title="Timings of RI-HFX_H2O-32 with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="rest", label="rest", y=11.917000000000002, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_multiply", label="dbm_multiply", y=17.784, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=1.17, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=2.344, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="dbt_reshape", label="dbt_reshape", y=2.125, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="hfx_ri_forces_Pmat_3c", label="hfx_ri_forces_Pmat_3c", y=0.157, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_sync", label="mp_sync", y=4.732, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_32mpi", name="mp_waitall_2", label="mp_waitall_2", y=2.139, yerr=0.0 Running RI-MP2_ammonia.inp with 1 threads and 32 ranks... done. Running RI-MP2_ammonia.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.014 0.014 183.667 183.667 qs_energies 1 2.0 0.000 0.000 183.487 183.487 mp2_main 1 3.0 0.000 0.000 178.633 178.633 mp2_gpw_main 1 4.0 0.001 0.001 178.225 178.225 mp2_ri_gpw_compute_in 1 5.0 0.378 0.378 130.075 130.075 mp2_ri_gpw_compute_in_loop 1 6.0 0.010 0.010 118.812 118.812 mp2_eri_3c_integrate_gpw 2656 7.0 0.013 0.013 88.968 88.968 integrate_v_rspace 2666 8.0 0.604 0.604 75.961 75.961 grid_integrate_task_list 2666 9.0 73.418 73.418 73.418 73.418 mp2_ri_gpw_compute_en 1 5.0 0.088 0.088 48.125 48.125 mp2_ri_gpw_compute_en_RI_loop 1 6.0 9.943 9.943 46.308 46.308 mp2_ri_gpw_compute_en_expansio 2080 7.0 2.106 2.106 28.632 28.632 local_gemm 2080 8.0 26.526 26.526 26.526 26.526 dbcsr_multiply_generic 5322 8.0 0.172 0.172 20.627 20.627 ao_to_mo_and_store_B_mult_1 2656 7.0 0.010 0.010 20.595 20.595 calculate_wavefunction 2656 8.0 8.254 8.254 11.713 11.713 pw_transfer 63872 10.6 1.020 1.020 11.496 11.496 get_2c_integrals 1 6.0 0.000 0.000 10.883 10.883 multiply_cannon 5322 9.0 0.430 0.430 10.380 10.380 fft_wrap_pw1pw2 53228 11.4 0.104 0.104 10.265 10.265 compute_2c_integrals 1 7.0 0.006 0.006 10.027 10.027 compute_2c_integrals_loop_lm 1 8.0 0.011 0.011 10.008 10.008 mp2_eri_2c_integrate_gpw 1 9.0 3.367 3.367 9.996 9.996 ao_to_mo_and_store_B_E_Ex_1 2656 7.0 2.203 2.203 9.147 9.147 multiply_cannon_loop 5322 10.0 0.121 0.121 9.044 9.044 make_m2s 10644 9.0 0.062 0.062 7.957 7.957 make_images 10644 10.0 3.183 3.183 7.637 7.637 multiply_cannon_multrec 5322 11.0 7.569 7.569 7.608 7.608 copy_dbcsr_to_fm 2679 8.0 0.028 0.028 7.505 7.505 fft_wrap_pw1pw2_20 21271 12.4 0.453 0.453 7.352 7.352 fft3d_s 53229 13.4 6.321 6.321 6.356 6.356 mp2_ri_gpw_compute_en_ener 2080 7.0 5.876 5.876 5.876 5.876 dbcsr_complete_redistribute 2689 9.0 1.189 1.189 5.743 5.743 dbcsr_finalize 10708 9.5 0.218 0.218 5.188 5.188 dbcsr_merge_all 8011 10.3 3.434 3.434 4.498 4.498 scf_env_do_scf 1 3.0 0.000 0.000 4.450 4.450 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 4.450 4.450 potential_pw2rs 5322 10.0 0.143 0.143 3.841 3.841 ------------------------------------------------------------------------------- From /workspace/artifacts/RI-MP2_ammonia_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.005 0.022 33.760 33.771 qs_energies 1 2.0 0.001 0.012 33.696 33.696 mp2_main 1 3.0 0.000 0.000 31.480 31.481 mp2_gpw_main 1 4.0 0.001 0.001 31.364 31.365 mp2_ri_gpw_compute_in 1 5.0 0.052 0.053 16.717 16.893 mp2_ri_gpw_compute_in_loop 1 6.0 0.001 0.001 15.557 15.736 mp2_ri_gpw_compute_en 1 5.0 0.141 0.149 14.560 14.729 mp2_ri_gpw_compute_en_RI_loop 1 6.0 0.748 0.880 13.754 13.758 mp2_eri_3c_integrate_gpw 83 7.0 0.001 0.001 13.613 13.679 integrate_v_rspace 93 8.1 0.097 0.105 13.547 13.620 grid_integrate_task_list 93 9.1 13.198 13.272 13.198 13.272 mp2_ri_gpw_compute_en_expansio 65 7.0 0.099 0.127 10.328 10.520 local_gemm 65 8.0 10.229 10.398 10.229 10.398 mp2_ri_gpw_compute_en_comm 17 7.0 0.066 0.093 2.339 2.788 mp_sendrecv_dm3 1054 8.0 1.836 2.449 1.836 2.449 scf_env_do_scf 1 3.0 0.000 0.000 2.098 2.100 scf_env_do_scf_inner_loop 10 4.0 0.001 0.009 2.098 2.100 dbcsr_multiply_generic 176 8.0 0.007 0.008 1.673 1.851 ao_to_mo_and_store_B_mult_1 83 7.0 0.001 0.001 1.654 1.829 get_2c_integrals 1 6.0 0.000 0.000 1.092 1.110 multiply_cannon 176 9.0 0.014 0.017 1.000 1.062 qs_scf_new_mos 10 5.0 0.000 0.000 1.018 1.027 multiply_cannon_loop 176 10.0 0.002 0.002 0.944 1.007 eigensolver 11 5.8 0.001 0.001 0.992 0.994 multiply_cannon_multrec 246 11.0 0.817 0.852 0.822 0.857 cp_fm_diag_elpa 11 6.8 0.000 0.000 0.837 0.838 compute_2c_integrals 1 7.0 0.002 0.003 0.815 0.834 cp_fm_redistribute_end 11 7.8 0.312 0.826 0.327 0.831 pw_transfer 2120 10.5 0.044 0.045 0.800 0.813 cp_fm_diag_elpa_base 11 7.8 0.490 0.788 0.501 0.803 fft_wrap_pw1pw2 1768 11.4 0.004 0.004 0.744 0.755 make_m2s 352 9.0 0.003 0.003 0.638 0.751 compute_2c_integrals_loop_lm 1 8.0 0.002 0.003 0.724 0.750 mp2_eri_2c_integrate_gpw 1 9.0 0.196 0.210 0.722 0.749 make_images 352 10.0 0.050 0.052 0.626 0.739 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_32omp", title="Timings of RI-MP2_ammonia with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="rest", label="rest", y=57.956999999999994, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=73.418, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="local_gemm", label="local_gemm", y=26.526, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=9.943, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="calculate_wavefunction", label="calculate_wavefunction", y=8.254, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=7.569, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32omp", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=0.0, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_32mpi", title="Timings of RI-MP2_ammonia with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="rest", label="rest", y=6.931999999999999, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=13.198, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="local_gemm", label="local_gemm", y=10.229, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=0.748, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="calculate_wavefunction", label="calculate_wavefunction", y=0.0, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=0.817, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_32mpi", name="mp_sendrecv_dm3", label="mp_sendrecv_dm3", y=1.836, yerr=0.0 Running diag_cu144_broy.inp with 1 threads and 32 ranks... done. Running diag_cu144_broy.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/diag_cu144_broy_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.083 0.083 123.952 123.952 qs_energies 1 2.0 0.000 0.000 122.587 122.587 scf_env_do_scf 1 3.0 0.000 0.000 115.876 115.876 scf_env_do_scf_inner_loop 15 4.0 0.002 0.002 115.875 115.875 qs_ks_update_qs_env 15 5.0 0.000 0.000 48.237 48.237 rebuild_ks_matrix 15 6.0 0.000 0.000 48.026 48.026 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.002 48.026 48.026 qs_scf_new_mos 15 5.0 0.000 0.000 42.984 42.984 eigensolver 15 6.0 0.001 0.001 35.333 35.333 qs_vxc_create 15 8.0 0.036 0.036 33.005 33.005 calculate_dispersion_nonloc 15 9.0 6.988 6.988 28.779 28.779 cp_fm_diag_elpa 15 7.0 0.000 0.000 23.324 23.324 cp_fm_diag_elpa_base 15 8.0 20.675 20.675 23.324 23.324 pw_transfer 1191 10.0 0.058 0.058 22.386 22.386 fft_wrap_pw1pw2 1086 11.0 0.008 0.008 22.205 22.205 qs_rho_update_rho_low 16 5.0 0.000 0.000 21.807 21.807 calculate_rho_elec 16 6.0 0.223 0.223 21.807 21.807 grid_collocate_task_list 16 7.0 20.435 20.435 20.435 20.435 fft_wrap_pw1pw2_150 765 12.0 3.565 3.565 16.130 16.130 sum_up_and_integrate 15 8.0 0.110 0.110 13.881 13.881 integrate_v_rspace 15 9.0 0.018 0.018 13.770 13.770 grid_integrate_task_list 15 10.0 13.233 13.233 13.233 13.233 fft3d_s 1087 13.0 9.981 9.981 9.988 9.988 cp_fm_cholesky_restore 45 7.0 9.702 9.702 9.702 9.702 pw_scatter_s 585 13.1 6.738 6.738 6.738 6.738 fft_wrap_pw1pw2_200 197 12.3 0.740 0.740 5.895 5.895 copy_dbcsr_to_fm 16 5.9 0.000 0.000 5.588 5.588 dbcsr_complete_redistribute 46 8.3 2.253 2.253 5.472 5.472 cp_fm_upper_to_full 30 8.0 4.954 4.954 4.954 4.954 vdW_energy 15 10.0 4.364 4.364 4.364 4.364 xc_vxc_pw_create 15 9.0 0.224 0.224 4.191 4.191 gspace_mixing 14 5.0 0.170 0.170 4.052 4.052 broyden_mixing 14 6.0 3.445 3.445 3.446 3.446 init_scf_run 1 3.0 0.000 0.000 3.210 3.210 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 3.039 3.039 xc_pw_derive 90 11.0 0.001 0.001 2.720 2.720 ------------------------------------------------------------------------------- From /workspace/artifacts/diag_cu144_broy_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.014 0.029 58.707 58.719 qs_energies 1 2.0 0.000 0.000 58.456 58.462 scf_env_do_scf 1 3.0 0.000 0.001 54.376 54.376 scf_env_do_scf_inner_loop 15 4.0 0.001 0.003 54.376 54.376 qs_ks_update_qs_env 15 5.0 0.000 0.000 22.338 22.346 rebuild_ks_matrix 15 6.0 0.000 0.000 22.305 22.313 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.003 22.305 22.313 qs_rho_update_rho_low 16 5.0 0.000 0.000 20.220 20.225 calculate_rho_elec 16 6.0 0.007 0.007 20.220 20.225 grid_collocate_task_list 16 7.0 19.130 19.316 19.130 19.316 sum_up_and_integrate 15 8.0 0.011 0.015 13.470 13.522 integrate_v_rspace 15 9.0 0.001 0.001 13.460 13.516 grid_integrate_task_list 15 10.0 12.793 12.891 12.793 12.891 qs_scf_new_mos 15 5.0 0.000 0.000 12.543 12.581 eigensolver 15 6.0 0.001 0.002 11.632 11.669 qs_vxc_create 15 8.0 0.001 0.001 8.559 8.569 cp_fm_diag_elpa 15 7.0 0.000 0.000 8.471 8.476 cp_fm_diag_elpa_base 15 8.0 8.319 8.349 8.466 8.466 calculate_dispersion_nonloc 15 9.0 0.898 0.909 6.834 6.860 pw_transfer 1191 10.0 0.080 0.096 6.321 6.386 fft_wrap_pw1pw2 1086 11.0 0.010 0.013 6.165 6.251 fft3d_ps 1086 13.0 2.313 2.688 4.603 4.839 fft_wrap_pw1pw2_150 765 12.0 0.234 0.287 3.892 3.935 cp_fm_cholesky_restore 45 7.0 3.011 3.084 3.011 3.084 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 2.451 2.451 mp_alltoall_z22v 1086 15.0 1.760 2.390 1.760 2.390 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 2.146 2.353 fft_wrap_pw1pw2_200 197 12.3 0.162 0.186 2.158 2.216 xc_vxc_pw_create 15 9.0 0.015 0.021 1.724 1.753 x_to_yz 585 14.1 0.303 0.339 1.228 1.494 init_scf_run 1 3.0 0.000 0.001 1.407 1.408 build_core_ppnl 1 5.0 1.256 1.387 1.256 1.387 vdW_energy 15 10.0 1.239 1.340 1.239 1.340 yz_to_x 501 13.9 0.205 0.298 1.040 1.339 scf_env_initial_rho_setup 1 4.0 0.000 0.000 1.320 1.320 rs_pw_transfer 158 9.4 0.001 0.002 0.983 1.222 xc_pw_derive 90 11.0 0.001 0.001 1.137 1.207 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_32omp", title="Timings of diag_cu144_broy with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32omp", name="rest", label="rest", y=49.926, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=20.675, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=20.435, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=13.233, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_s", label="fft3d_s", y=9.981, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=9.702, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="diag_cu144_broy_timings_32mpi", title="Timings of diag_cu144_broy with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="rest", label="rest", y=13.140999999999998, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=8.319, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=19.13, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=12.793, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=3.011, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=2.313, yerr=0.0 Running bench_dftb.inp with 1 threads and 32 ranks... done. Running bench_dftb.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/bench_dftb_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.080 0.080 295.333 295.333 qs_energies 1 2.0 0.000 0.000 295.193 295.193 ls_scf 1 3.0 0.000 0.000 293.954 293.954 ls_scf_main 1 4.0 0.002 0.002 284.974 284.974 density_matrix_trs4 11 5.0 0.012 0.012 194.637 194.637 arnoldi_extremal 12 6.1 0.000 0.000 110.708 110.708 arnoldi_normal_ev 12 7.1 0.014 0.014 110.708 110.708 build_subspace 23 8.1 0.075 0.075 108.801 108.801 dbcsr_matrix_vector_mult 652 9.0 0.168 0.168 108.573 108.573 dbcsr_matrix_vector_mult_local 652 10.0 107.126 107.126 107.136 107.136 ls_scf_dm_to_ks 11 5.0 0.000 0.000 85.190 85.190 matrix_ls_to_qs 11 6.0 0.000 0.000 81.813 81.813 dbcsr_multiply_generic 185 6.1 0.832 0.832 72.832 72.832 dbcsr_copy_into_existing 11 7.0 44.671 44.671 44.672 44.672 multiply_cannon 185 7.1 0.278 0.278 44.645 44.645 dbcsr_complete_redistribute 23 7.5 29.860 29.860 40.790 40.790 matrix_decluster 11 7.0 0.000 0.000 37.140 37.140 multiply_cannon_loop 185 8.1 0.228 0.228 32.339 32.339 make_m2s 370 7.1 0.037 0.037 24.015 24.015 make_images 370 8.1 10.441 10.441 22.507 22.507 multiply_cannon_multrec 185 9.1 22.137 22.137 22.163 22.163 dbcsr_finalize 646 7.5 0.209 0.209 14.559 14.559 dbcsr_merge_all 597 8.5 2.209 2.209 13.388 13.388 setup_rec_index_2d 370 8.1 11.941 11.941 11.941 11.941 dbcsr_sort_indices 1103 9.9 10.016 10.016 10.016 10.016 tree_to_linear_d 110 9.4 9.952 9.952 9.952 9.952 calculate_norms 370 9.1 9.948 9.948 9.948 9.948 quick_finalize 395 10.0 0.327 0.327 8.635 8.635 ls_scf_init_scf 1 4.0 0.000 0.000 8.301 8.301 dbcsr_special_finalize 370 9.1 0.002 0.002 7.990 7.990 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.957 7.957 matrix_sqrt_Newton_Schulz 1 6.0 0.001 0.001 7.314 7.314 ------------------------------------------------------------------------------- From /workspace/artifacts/bench_dftb_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.021 62.974 62.985 qs_energies 1 2.0 0.000 0.000 62.878 62.878 ls_scf 1 3.0 0.000 0.000 62.819 62.820 ls_scf_main 1 4.0 0.000 0.007 60.357 60.359 density_matrix_trs4 11 5.0 0.006 0.018 57.920 58.002 dbcsr_multiply_generic 185 6.1 0.055 0.074 54.911 55.130 multiply_cannon 185 7.1 0.031 0.035 45.698 46.583 multiply_cannon_loop 185 8.1 0.100 0.122 43.444 44.580 multiply_cannon_multrec 1480 9.1 27.016 30.500 27.258 30.753 mp_waitall_1 11936 10.3 14.233 19.958 14.233 19.958 multiply_cannon_metrocomm3 1480 9.1 0.011 0.015 8.139 15.793 multiply_cannon_metrocomm1 1480 9.1 0.006 0.009 3.463 8.186 make_m2s 370 7.1 0.033 0.036 6.427 6.501 make_images 370 8.1 0.617 0.661 6.304 6.380 calculate_norms 2960 9.1 4.427 5.544 4.427 5.544 mp_sum_l 1119 5.6 1.853 3.167 1.853 3.167 make_images_data 370 9.1 0.008 0.012 2.851 3.119 hybrid_alltoall_any 393 9.9 0.174 1.110 2.464 2.722 arnoldi_extremal 12 6.1 0.000 0.000 2.257 2.273 arnoldi_normal_ev 12 7.1 0.001 0.004 2.257 2.273 ls_scf_dm_to_ks 11 5.0 0.000 0.000 2.114 2.238 dbcsr_multiply_generic_mpsum_f 137 7.1 0.000 0.000 1.038 2.190 build_subspace 23 8.1 0.018 0.024 2.169 2.170 dbcsr_complete_redistribute 23 7.5 1.178 1.272 1.860 1.959 dbcsr_matrix_vector_mult 652 9.0 0.009 0.043 1.876 1.916 matrix_ls_to_qs 11 6.0 0.000 0.000 1.815 1.915 ls_scf_init_scf 1 4.0 0.000 0.000 1.892 1.893 ls_scf_init_matrix_S 1 5.0 0.000 0.000 1.866 1.873 make_images_pack 370 9.1 1.590 1.787 1.593 1.790 matrix_decluster 11 7.0 0.000 0.000 1.675 1.775 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.001 1.706 1.709 dbcsr_matrix_vector_mult_local 652 10.0 1.567 1.615 1.569 1.617 buffer_matrices_ensure_size 370 8.1 1.194 1.419 1.194 1.419 dbcsr_finalize 646 7.5 0.007 0.008 1.226 1.368 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_32omp", title="Timings of bench_dftb with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32omp", name="rest", label="rest", y=69.65000000000003, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=107.126, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=44.671, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=29.86, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=22.137, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="setup_rec_index_2d", label="setup_rec_index_2d", y=11.941, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="calculate_norms", label="calculate_norms", y=9.948, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="make_images_pack", label="make_images_pack", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="bench_dftb_timings_32mpi", title="Timings of bench_dftb with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_32mpi", name="rest", label="rest", y=11.109999999999985, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=1.567, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_copy_into_existing", label="dbcsr_copy_into_existing", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=1.178, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=27.016, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="setup_rec_index_2d", label="setup_rec_index_2d", y=0.0, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="calculate_norms", label="calculate_norms", y=4.427, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="make_images_pack", label="make_images_pack", y=1.59, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=14.233, yerr=0.0 PlotPoint: plot="bench_dftb_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=1.853, yerr=0.0 Running dbcsr.inp with 1 threads and 32 ranks... done. Running dbcsr.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/dbcsr_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.007 0.007 69.184 69.184 lib_test 1 2.0 0.000 0.000 69.177 69.177 dbcsr_run_tests 3 3.0 0.002 0.002 69.177 69.177 test_multiplies_multiproc 3 4.0 0.001 0.001 53.633 53.633 dbcsr_redistribute 9 5.0 34.974 34.974 36.543 36.543 dbcsr_multiply_generic 9 5.0 0.001 0.001 15.750 15.750 dbcsr_make_random_matrix 9 4.0 12.375 12.375 15.434 15.434 multiply_cannon 9 6.0 0.001 0.001 11.039 11.039 multiply_cannon_loop 9 7.0 0.015 0.015 10.682 10.682 multiply_cannon_multrec 9 8.0 10.667 10.667 10.668 10.668 dbcsr_finalize 27 5.7 0.015 0.015 5.602 5.602 dbcsr_merge_all 18 6.5 1.963 1.963 4.839 4.839 dbcsr_data_release 975 7.6 2.883 2.883 2.883 2.883 tree_to_linear_d 9 7.0 1.846 1.846 1.846 1.846 make_m2s 18 6.0 0.001 0.001 1.600 1.600 make_images 18 7.0 0.547 0.547 1.554 1.554 dbcsr_destroy 93 5.8 0.000 0.000 1.407 1.407 ------------------------------------------------------------------------------- From /workspace/artifacts/dbcsr_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.011 16.998 17.003 lib_test 1 2.0 0.000 0.000 16.971 16.989 dbcsr_run_tests 3 3.0 0.000 0.001 16.970 16.989 test_multiplies_multiproc 3 4.0 0.000 0.002 16.144 16.192 dbcsr_multiply_generic 9 5.0 0.001 0.001 14.850 14.941 multiply_cannon 9 6.0 0.001 0.002 13.201 13.441 multiply_cannon_loop 9 7.0 0.002 0.002 12.921 13.141 multiply_cannon_multrec 72 8.0 10.778 11.450 10.779 11.451 mp_waitall_1 576 9.2 2.454 3.400 2.454 3.400 multiply_cannon_metrocomm1 72 8.0 0.001 0.001 1.893 2.631 dbcsr_data_release 444 7.6 0.751 0.858 0.751 0.858 dbcsr_make_random_matrix 9 4.0 0.659 0.665 0.795 0.821 dbcsr_finalize 27 5.7 0.000 0.000 0.677 0.793 mp_sum_l 390 2.5 0.355 0.756 0.355 0.756 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.352 0.752 dbcsr_destroy 111 5.9 0.000 0.000 0.601 0.723 make_m2s 18 6.0 0.001 0.001 0.648 0.681 make_images 18 7.0 0.020 0.021 0.644 0.678 dbcsr_merge_all 18 6.5 0.098 0.126 0.525 0.602 multiply_cannon_metrocomm3 72 8.0 0.000 0.000 0.243 0.551 make_images_data 18 8.0 0.000 0.001 0.348 0.443 dbcsr_redistribute 9 5.0 0.228 0.277 0.386 0.418 hybrid_alltoall_any 18 9.0 0.028 0.129 0.308 0.389 dbcsr_data_copy_aa2 18 7.5 0.315 0.382 0.315 0.382 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_32omp", title="Timings of dbcsr with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32omp", name="rest", label="rest", y=6.321999999999996, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_redistribute", label="dbcsr_redistribute", y=34.974, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=12.375, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=10.667, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_data_release", label="dbcsr_data_release", y=2.883, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="dbcsr_merge_all", label="dbcsr_merge_all", y=1.963, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_waitall_1", label="mp_waitall_1", y=0.0, yerr=0.0 PlotPoint: plot="dbcsr_timings_32omp", name="mp_sum_l", label="mp_sum_l", y=0.0, yerr=0.0 Plot: name="dbcsr_timings_32mpi", title="Timings of dbcsr with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_32mpi", name="rest", label="rest", y=1.674999999999999, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_redistribute", label="dbcsr_redistribute", y=0.228, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=0.659, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=10.778, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_data_release", label="dbcsr_data_release", y=0.751, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="dbcsr_merge_all", label="dbcsr_merge_all", y=0.098, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_waitall_1", label="mp_waitall_1", y=2.454, yerr=0.0 PlotPoint: plot="dbcsr_timings_32mpi", name="mp_sum_l", label="mp_sum_l", y=0.355, yerr=0.0 Running MQAE_single_node.inp with 1 threads and 32 ranks... done. Running MQAE_single_node.inp with 32 threads and 1 ranks... done. From /workspace/artifacts/MQAE_single_node_32omp.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.054 0.054 127.287 127.287 qs_mol_dyn_low 1 2.0 0.003 0.003 125.949 125.949 velocity_verlet 5 3.0 0.003 0.003 102.930 102.930 qmmm_el_coupling 6 3.8 0.000 0.000 83.797 83.797 qmmm_elec_with_gaussian 6 4.8 0.011 0.011 83.793 83.793 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 83.174 83.174 qmmm_elec_gaussian_low_G 6 6.8 82.266 82.266 82.266 82.266 qs_forces 6 3.8 0.000 0.000 33.810 33.810 qs_energies 6 4.8 0.000 0.000 30.017 30.017 scf_env_do_scf 6 5.8 0.001 0.001 27.794 27.794 scf_env_do_scf_inner_loop 39 6.8 0.004 0.004 24.055 24.055 rebuild_ks_matrix 45 8.4 0.000 0.000 23.158 23.158 qs_ks_build_kohn_sham_matrix 45 9.4 0.005 0.005 23.158 23.158 qs_ks_update_qs_env 45 7.8 0.000 0.000 19.785 19.785 pw_transfer 966 12.3 0.049 0.049 16.569 16.569 fft_wrap_pw1pw2 801 13.6 0.006 0.006 16.367 16.367 fft_wrap_pw1pw2_150 507 15.2 2.236 2.236 15.967 15.967 qs_vxc_create 45 10.4 0.001 0.001 12.563 12.563 xc_vxc_pw_create 45 11.4 0.646 0.646 12.563 12.563 xc_pw_derive 270 13.4 0.002 0.002 8.938 8.938 fft3d_s 802 15.6 7.635 7.635 7.643 7.643 qs_rho_update_rho_low 45 7.9 0.000 0.000 6.881 6.881 calculate_rho_elec 45 8.9 0.563 0.563 6.881 6.881 xc_rho_set_and_dset_create 45 12.4 0.538 0.538 6.377 6.377 xc_pw_divergence 45 12.4 0.001 0.001 5.494 5.494 qmmm_forces 6 3.8 0.002 0.002 5.270 5.270 pw_scatter_s 429 15.8 5.243 5.243 5.243 5.243 qmmm_forces_with_gaussian 6 4.8 0.029 0.029 4.948 4.948 pw_integral_ab 2539 7.4 4.470 4.470 4.470 4.470 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 4.219 4.219 qs_ks_ddapc 45 10.4 0.001 0.001 4.209 4.209 init_scf_loop 6 6.8 0.000 0.000 3.734 3.734 qmmm_forces_gaussian_low_G 6 6.8 3.531 3.531 3.531 3.531 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 3.380 3.380 sum_up_and_integrate 45 10.4 0.330 0.330 3.304 3.304 grid_collocate_task_list 45 9.9 3.161 3.161 3.161 3.161 density_rs2pw 45 9.9 0.001 0.001 3.157 3.157 integrate_v_rspace 45 11.4 0.007 0.007 2.974 2.974 cp_ddapc_apply_CD 45 11.4 0.004 0.004 2.582 2.582 ------------------------------------------------------------------------------- From /workspace/artifacts/MQAE_single_node_32mpi.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.033 0.050 51.792 51.803 qs_mol_dyn_low 1 2.0 0.003 0.004 50.815 50.872 qs_forces 6 3.8 0.001 0.001 36.491 36.491 qs_energies 6 4.8 0.000 0.001 34.805 34.805 scf_env_do_scf 6 5.8 0.000 0.001 33.927 33.928 scf_env_do_scf_inner_loop 113 6.2 0.002 0.016 32.552 32.553 rebuild_ks_matrix 119 8.1 0.000 0.001 23.841 23.851 qs_ks_build_kohn_sham_matrix 119 9.1 0.014 0.016 23.840 23.851 qs_ks_update_qs_env 119 7.3 0.001 0.001 22.434 22.444 velocity_verlet 5 3.0 0.002 0.003 21.335 21.339 pw_transfer 2446 12.3 0.192 0.225 15.716 16.262 fft_wrap_pw1pw2 2059 13.4 0.021 0.026 15.316 15.960 fft_wrap_pw1pw2_150 1321 14.9 1.119 1.424 14.760 15.273 fft3d_ps 2059 15.4 6.302 7.476 11.461 12.633 qs_vxc_create 119 10.1 0.002 0.002 12.451 12.454 xc_vxc_pw_create 119 11.1 0.137 0.217 12.449 12.452 qs_rho_update_rho_low 119 7.3 0.000 0.001 9.878 9.879 calculate_rho_elec 119 8.3 0.048 0.054 9.877 9.878 xc_pw_derive 714 13.1 0.009 0.012 9.341 9.638 sum_up_and_integrate 119 10.1 0.076 0.095 8.396 8.415 integrate_v_rspace 119 11.1 0.003 0.003 8.321 8.351 qmmm_forces 6 3.8 0.002 0.002 7.119 7.120 qmmm_forces_with_gaussian 6 4.8 0.010 0.013 6.771 7.000 qmmm_el_coupling 6 3.8 0.000 0.000 6.287 6.449 qmmm_elec_with_gaussian 6 4.8 0.003 0.004 6.285 6.447 xc_rho_set_and_dset_create 119 12.1 0.340 0.441 6.108 6.357 xc_pw_divergence 119 12.1 0.004 0.006 6.012 6.278 rs_pw_transfer 988 11.5 0.010 0.014 5.856 6.146 mp_alltoall_z22v 2059 17.4 3.900 5.831 3.900 5.831 density_rs2pw 119 9.3 0.005 0.007 5.446 5.710 potential_pw2rs 119 12.1 0.005 0.006 4.735 4.769 grid_collocate_task_list 119 9.3 4.274 4.555 4.274 4.555 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 3.826 3.880 grid_integrate_task_list 119 12.1 3.346 3.518 3.346 3.518 yz_to_x 964 16.0 0.476 0.621 2.354 3.490 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 3.317 3.415 x_to_yz 1095 16.8 0.727 0.855 2.749 3.377 qmmm_forces_gaussian_low_G 6 6.8 3.148 3.200 3.148 3.200 mp_waitany 4028 12.8 2.252 2.970 2.252 2.970 qmmm_elec_gaussian_low_G 6 6.8 2.726 2.817 2.726 2.817 pw_restrict_s3 18 5.8 1.297 1.359 2.368 2.558 rs_pw_transfer_PW2RS_150 125 13.9 0.666 0.834 2.179 2.263 mp_waitall_1 188862 16.2 1.689 2.164 1.689 2.164 qmmm_elec_with_gaussian:spline 6 5.8 0.000 0.000 1.967 2.119 pw_prolongate_s3 18 6.8 1.089 1.136 1.967 2.119 rs_pw_transfer_RS2PW_150 125 11.2 0.519 0.739 1.751 2.020 qs_scf_new_mos 113 7.2 0.000 0.001 1.926 1.934 qs_scf_loop_do_ot 113 8.2 0.000 0.000 1.926 1.933 ot_scf_mini 113 9.2 0.001 0.001 1.843 1.849 qs_ks_ddapc 119 10.1 0.002 0.002 1.722 1.847 dbcsr_multiply_generic 2588 12.3 0.056 0.059 1.759 1.785 mp_sum_dm3 33 5.7 1.475 1.553 1.475 1.553 pw_scatter_p 1095 15.8 1.508 1.547 1.508 1.547 pw_gather_p 964 15.0 1.180 1.521 1.180 1.521 pw_integral_ab 2761 7.7 1.078 1.252 1.359 1.437 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 1.416 1.416 init_scf_loop 6 6.8 0.000 0.000 1.373 1.373 mp_sum_d 5820 12.2 0.756 1.358 0.756 1.358 ot_mini 113 10.2 0.000 0.001 1.134 1.141 rs_pw_transfer_PW2RS_40 119 14.1 0.198 0.283 0.879 1.042 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_32omp", title="Timings of MQAE_single_node with 32 OpenMP Threads", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32omp", name="rest", label="rest", y=20.980999999999995, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=82.266, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_s", label="fft3d_s", y=7.635, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_scatter_s", label="pw_scatter_s", y=5.243, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="pw_integral_ab", label="pw_integral_ab", y=4.47, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=3.531, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_collocate_task_list", label="grid_collocate_task_list", y=3.161, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="grid_integrate_task_list", label="grid_integrate_task_list", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32omp", name="fft3d_ps", label="fft3d_ps", y=0.0, yerr=0.0 Plot: name="MQAE_single_node_timings_32mpi", title="Timings of MQAE_single_node with 32 MPI Ranks", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_32mpi", name="rest", label="rest", y=27.018, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=2.726, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_s", label="fft3d_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_scatter_s", label="pw_scatter_s", y=0.0, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="pw_integral_ab", label="pw_integral_ab", y=1.078, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=3.148, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_collocate_task_list", label="grid_collocate_task_list", y=4.274, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.9, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="grid_integrate_task_list", label="grid_integrate_task_list", y=3.346, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_32mpi", name="fft3d_ps", label="fft3d_ps", y=6.302, yerr=0.0 Summary: Performance test took 33 minutes. Status: OK Removing intermediate container 243a18226d35 ---> ec08e8964b9c Step 41/42 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in b4b86d0c93b9 Removing intermediate container b4b86d0c93b9 ---> f3e2bf628fa4 Step 42/42 : ENTRYPOINT [] ---> Running in f139fa5ac967 Removing intermediate container f139fa5ac967 ---> 4a781a36ff14 [Warning] One or more build-args [GIT_COMMIT_SHA] were not consumed Successfully built 4a781a36ff14 Successfully tagged gcr.io/cp2k-org-project/img_cp2k-perf-openmp-arch-be5:master Pushing new image... done. #################### Running Image cp2k-perf-openmp #################### Uploading artifacts... done EndDate: 2022-10-27 22:55:12+00:00