Differences

This shows you the differences between two versions of the page.

--- dev:regtesting [2020/08/21 10:15] – external edit 127.0.0.1
+++ dev:regtesting [2021/06/02 13:11] – tmueller
@@ Line 102: / Line 102: @@
 | ''TEST_TYPES''       | this file allows you to create a new test type. I.e. to specify for which words should be grepped and what field should be used in the numerical comparison. |
+===== Run with sbatch =====
+The ''%%do_regtest%%'' script in ''%%tools/regtesting%%'' is often run from ''%%make VERSION=... ARCH=... test%%'' where it sets up everything just nicely. On a cluster you usually do want to separate the steps and invoke ''%%do_regtest%%'' manually.
+What you need:
+  * ''%%sbatch%%'' template script
+  * a CP2K source tree with a built CP2K
+==== Instructions ====
+The way the regtest script works is that it goes through all the directories (for example ''%%tests/QS/regtest-admm-1/%%'') and launches all tests in that directory. After each directories tests are started it checks whether the number of maximum tasks is reached, if not it also spawns the tests from the next directory. If the maximum number of tasks to run has been reached it waits until enough of them have finished to spawn tests from the next directory. Since the tests are usually rather short this procedure seldomly causes oversubscription.
+Also, ''%%srun%%'' should simply wait until nodes are free should there be no more free nodes available within the given allocation. Hence, the more nodes (or total number of tasks) you allocate for the ''%%sbatch%%'' the more tests can run in parallel. But we have to make sure ''%%do_regtest%%'' knows about that number by setting ''%%-maxtasks ${SLURM_NTASKS}%%'', ''%%SLURM_NTASKS%%'' is automatically set by ''%%sbatch%%'' to the number of tasks you specified either when running ''%%sbatch%%'' or in the preamble of the ''%%sbatch%%'' script.
+Append the following to your ''%%sbatch%%'' template and at least adapt the value for ''%%CP2K_BASE_DIR%%'' and possibly also the ''%%CP2K_TEST_DIR%%'':
+<code bash>
+CP2K_BASE_DIR="/PATH/TO/YOUR/CP2K/SOURCE/TREE"
+CP2K_TEST_DIR="${SCRATCH}/cp2k_regtesting"
+# CP2K_REGTEST_SCRIPT_DIR=""  # only set if needed (see below)
+CP2K_ARCH="local"
+CP2K_VERSION="psmp"
+# the following is the default, adjust if you want to run single tests with more than 2 ranks/tasks
+NTASKS_SINGLE_TEST=2
+NNODES_SINGLE_TEST=1  # otherwise srun will distribute the 2 tasks over 2 nodes
+SRUN_CMD="srun --cpu-bind=verbose,cores"
+# the following should be sufficiently generic:
+mkdir -p "${CP2K_TEST_DIR}"
+cd "${CP2K_TEST_DIR}"
+cp2k_rel_dir=$(realpath --relative-to="${CP2K_TEST_DIR}" "${CP2K_BASE_DIR}")
+# srun does not like `-np`, override the complete command instead:
+export cp2k_run_prefix="${SRUN_CMD} -N ${NNODES_SINGLE_TEST} -n ${NTASKS_SINGLE_TEST}"
+"${CP2K_REGEST_SCRIPT_DIR:-${CP2K_BASE_DIR}/tools/regtesting}/do_regtest" \
+  -arch "${CP2K_ARCH}" \
+  -version "${CP2K_VERSION}" \
+  -nobuild \
+  -mpiranks ${NTASKS_SINGLE_TEST} \
+  -ompthreads ${OMP_NUM_THREADS} \
+  -maxtasks ${SLURM_NTASKS} \
+  -cp2kdir "${cp2k_rel_dir}" \
+  |& tee "${CP2K_TEST_DIR}/${CP2K_ARCH}.${CP2K_VERSION}.log"
+# the above will output both to the slurm-*.out as well as a log file,
+# if you want only the log file replace the `|& tee` with a `>&`.
+# More options:
+# -farming   ... enable farming mode, see below
+# -retest    ... only do tests which failed in a previous run
+</code>
+A complete ''%%sbatch%%'' script to run the regtests on CSCS’ Alps (Eiger) could look as follows:
+<code bash>
+#!/bin/bash -l
+#SBATCH --time=01:00:00
+#SBATCH --nodes=4
+#SBATCH --ntasks-per-node=128
+#SBATCH --cpus-per-task=2
+#SBATCH --ntasks-per-core=1
+# More SBATCH options:
+# If you need 512GB memory nodes (otherwise only 256GB guaranteed):
+#    #SBATCH --mem=497G
+# To run on the debug queue (max 10 nodes, 30 min):
+#    #SBATCH--partition=debug
+set -o errexit
+set -o nounset
+set -o pipefail
+export MPICH_OFI_STARTUP_CONNECT=1
+export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
+export OMP_PROC_BIND=close
+export OMP_PLACES=cores
+source "${MODULESHOME}/init/bash"
+module load cpeGNU
+module load \
+    cray-fftw \
+    ELPA/2020.11.001 \
+    libxsmm/1.16.1 \
+    libxc/5.1.3 \
+    Libint-CP2K/2.6.0 \
+    gcc/10.2.0
+# Let the user see the currently loaded modules in the slurm log for completeness:
+module list
+CP2K_BASE_DIR="/users/timuel/work/cp2k"
+CP2K_TEST_DIR="${SCRATCH}/cp2k_regtesting"
+CP2K_ARCH="Eiger-gfortran"
+CP2K_VERSION="psmp"
+NTASKS_SINGLE_TEST=2
+NNODES_SINGLE_TEST=1
+SRUN_CMD="srun --cpu-bind=verbose,cores"
+# the following should be sufficiently generic:
+mkdir -p "${CP2K_TEST_DIR}"
+cd "${CP2K_TEST_DIR}"
+cp2k_rel_dir=$(realpath --relative-to="${CP2K_TEST_DIR}" "${CP2K_BASE_DIR}")
+# srun does not like `-np`, override the complete command instead:
+export cp2k_run_prefix="${SRUN_CMD} -N ${NNODES_SINGLE_TEST} -n ${NTASKS_SINGLE_TEST}"
+"${CP2K_REGEST_SCRIPT_DIR:-${CP2K_BASE_DIR}/tools/regtesting}/do_regtest" \
+  -arch "${CP2K_ARCH}" \
+  -version "${CP2K_VERSION}" \
+  -nobuild \
+  -mpiranks ${NTASKS_SINGLE_TEST} \
+  -ompthreads ${OMP_NUM_THREADS} \
+  -maxtasks ${SLURM_NTASKS} \
+  -cp2kdir "${cp2k_rel_dir}" \
+ |& tee "${CP2K_TEST_DIR}/${CP2K_ARCH}.${CP2K_VERSION}.log"
+</code>
+==== Test parallelization via Farming ====
+Since the startup time of ''%%srun%%'' can be significant (the ''%%--bcast%%'' argument for ''%%srun%%'' might help if the binary is statically linked as does putting the CP2K directory on to a scratch volume) there is the option to use CP2Ks //Farming// mode to run all tests within one ''%%srun%%'' and one CP2K instance. This mode is directly supported by ''%%do_regtest%%'' which then dynamically creates the required CP2K input file. The drawback is that if one regtest triggers a ''%%CPABORT%%'' (or another signal) the rest of the tests will never run.
+Simply add the option ''%%-farming%%'' to the list of options for the ''%%do_regtest%%'' call to enable this.
+==== Minimal directory setup ====
+If you want to test a precompiled executable there is a minimal directory layout you have to reproduce to run the regtest:
+  * ''%%cp2k-prebuilt/exe/prebuilt/*.psmp%%'' … directory with all the executables
+  * ''%%cp2k-prebuilt/tests%%'' … directory containing the tests (can NOT be a symlink)
+  * ''%%cp2k-prebuilt/data%%'' … containing CP2Ks data
+An example if your HPC center uses EasyBuild to provide the CP2K package:
+<code>
+cp2k-prebuilt
+├── data -> /apps/eiger/UES/jenkins/1.4.0/software/CP2K/8.1-cpeGNU-21.04/data
+├── exe
+│   └── prebuilt -> /apps/eiger/UES/jenkins/1.4.0/software/CP2K/8.1-cpeGNU-21.04/bin
+└── tests
+</code>
+and then update the variables as follows:
+<code bash>
+CP2K_BASE_DIR="/PATH/TO/THE/MINIMAL/DIR/cp2k-prebuilt"
+CP2K_TEST_DIR="${SCRATCH}/cp2k_regtesting"
+CP2K_REGTEST_SCRIPT_DIR="/PATH/TO/A/FULL/CP2K/DIR/tools/regtesting"
+CP2K_ARCH="prebuilt"
+CP2K_VERSION="psmp"
+</code>