Differences

This shows you the differences between two versions of the page.

--- dev:regtesting [2014/10/12 06:36] – Adjust number krack
+++ dev:regtesting [2023/10/19 14:03] (current) – Update flags krack
@@ Line 1: / Line 1: @@
-==== CP2K Regression Testing =====
+====== CP2K Regression Testing ======
-=== Regression Testing Script ===
+CP2K comes with over 3000 test input files (located in [[src>tests]]) which serve as both examples on how to use the many features in CP2K and also as a method for developers to test modifications and extensions to CP2K. In order to reduce the chance of bugs being introduced into the code, and ensure that all parts of the code are working. We also recommend that all users complete a test before using a self-compiled binary for their projects.
-CP2K comes with over 2500 test input files (located in ''cp2k/tests'') which serve as both examples on how to use the many features in CP2K and also as a method for developers to test modifications and extensions to CP2K. In order to reduce the chance of bugs being introduced into the code, and ensure that all parts of the code are working, we recommend that all developers complete a successful run of the full regression test suite before committing changes to SVN.
+==== Dashboard ====
+A number of regtests are run automatically by various members of our community. The results of these tests are collected centrally at the [[http://dashboard.cp2k.org | Dashboard ]]. If errors are detected, the developer responsible for the change should fix it immediately. The output logs provide the arch file used for these tests, which might suggest useful settings for that particular architecture.
-The regression test suite is run using the ''do_regtest'' script in ''cp2k/tools/regtesting''.  Instructions are included at the top of the script.
+==== Code Coverage ====
+We aim that the regression test suite covers all the functionality of CP2K. For this purpose we regularly create [[http://www.cp2k.org/static/coverage/|Coverage Reports]] of the test-suite. If you see parts of the code which are not well tested, please contribute to improving coverage by writing new tests!
-In addition, when new features are added to CP2K, please add new tests which cover those features and commit them to SVN along with the code.
+===== How does it work? =====
-=== Automated Regression Tester ===
+The regression test suite is run using the [[src>tests/do_regtest.py | do_regtest]] script.
+It performs the following tasks:
+   * performs a realclean build of the source
+   * executes a list of tests
+   * compares the results (outputs) with those of the last known result (reference)
+   * produces a summary
-An automated regression test service is available at http://cp2k-www.epcc.ed.ac.uk .  This runs the test suite on a range of architectures and versions of CP2K every time a change is made to SVN.  If errors are found, the developer responsible for the change will receive and email notifying them of the problem(s) and these should be fixed immediately.
-We plan to extend the range of platforms, compilers etc. that are covered by the automated regression tester in future.  If you would like to set up automated regression testing on a local machine, or have other ideas about how the test service can be improved, please email [[mailto:ibethuneNOSPAM@epcc.ed.ac.uk| Iain Bethune ]](remove NOSPAM) from address before sending
+===== Running the regtests =====
-=== CP2K Code Coverage ===
+==== Step 0: make based testing ====
-We aim that the regression test suite covers all the functionality of CP2K.  Code coverage reports from recent runs of the tests are available at  http://www.cp2k.org/static/coverage/ If you see parts of the code which are not well tested, please contribute to improving coverage by writing new tests!
+  * If you are able to build and run cp2k on the local machine the easiest way is to start the regtesting by running ''make ARCH=... VERSION=... test''. The options listed in Step 2 can also be specified with ''make'' via ''TESTOPTS=<options>'', e.g. ''TESTOPTS+="--mpiranks 4 --ompthreads 4"'' requires 4 MPI ranks and 4 OpenMP threads for each MPI rank.
+  * Be careful about the value of ''-j'' parameter you will use; running too many tests in parallel can cause tests to fail due to lack of system and/or GPU memory.
+  * If this fails (e.g. on batch systems), continue with Step 1 else go to interpretation.
-=== Results of the Automatic Regression Tester ===
+==== Step 1: Preparation ====
-  * [[http://sourceforge.net/p/cp2k/code/HEAD/tree/trunk/cp2k/arch/Linux-x86-64-gfortran-regtest.pdbg | GFortran compiler (v4.9.1): ARCH=Linux-x86-64-gfortran-regtest VERSION=pdbg]]
+  * Decide on a directory for doing the regtest, there will be plenty of files in this dir (after a while) so make it something like ''$HOME/rt''
+  * Clone a version of cp2k into ''$HOME/rt''.
+  * Set up the arch files so that you can cleanly build cp2k (test this)
-[[http://cp2k.web.psi.ch/regtest/trunk/Linux-x86-64-gfortran-regtest/pdbg|{{http://cp2k.web.psi.ch/regtest/trunk/Linux-x86-64-gfortran-regtest/pdbg/regplot.png}}]]
+==== Step 2: Running ====
+<code>
+$ tests/do_regtest.py -h
+usage: do_regtest.py [-h] [--mpiranks MPIRANKS] [--ompthreads OMPTHREADS]
+                     [--maxtasks MAXTASKS] [--num_gpus NUM_GPUS]
+                     [--timeout TIMEOUT] [--maxerrors MAXERRORS]
+                     [--mpiexec MPIEXEC] [--smoketest] [--valgrind]
+                     [--keepalive] [--flagslow] [--debug]
+                     [--restrictdir RESTRICTDIR] [--skipdir SKIPDIR]
+                     [--workbasedir WORKBASEDIR] arch version
-  * [[http://sourceforge.net/p/cp2k/code/HEAD/tree/trunk/cp2k/arch/Linux-x86-64-gfortran-regtest.ssmp | GFortran compiler (v4.9.1): ARCH=Linux-x86-64-gfortran-regtest VERSION=ssmp]]
+Runs CP2K regression test suite.
-[[http://cp2k.web.psi.ch/regtest/trunk/Linux-x86-64-gfortran-regtest/ssmp|{{http://cp2k.web.psi.ch/regtest/trunk/Linux-x86-64-gfortran-regtest/ssmp/regplot.png}}]]
+positional arguments:
+  arch
+  version
-=== Latest commits ===
+options:
+  -h, --help            show this help message and exit
+  --mpiranks MPIRANKS
+  --ompthreads OMPTHREADS
+  --maxtasks MAXTASKS
+  --num_gpus NUM_GPUS
+  --timeout TIMEOUT
+  --maxerrors MAXERRORS
+  --mpiexec MPIEXEC
+  --smoketest           Runs only the first test of each directory.
+  --valgrind            Runs tests under Valgrind memcheck. Best used together with --keepalive.
+  --keepalive           Use a persistent cp2k-shell process to reduce startup time.
+  --flagslow            Flag slow tests in the final summary and status report.
+  --debug
+  --restrictdir RESTRICTDIR
+  --skipdir SKIPDIR
+  --workbasedir WORKBASEDIR
+</code>
-{{rss>http://sourceforge.net/p/cp2k/code/feed 20 author date 10m}}
+==== Step 3: Interpretation ====
+A test results can be any of the following:
+^  Test Result         ^  Meaning |
+| ''OK''               | if the results match those of a previous run precisely. The execution time is also given. |
+| ''NEW''              | if they have not been executed previously. The reference result is generated automatically in this run. Tests can also be NEW if they have been reset, i.e. been newly added to the TEST_FILES_RESET files. |
+| ''RUNTIME FAILURE''  | if they stopped unexpectedly (e.g. core dump, or stop) |
+| ''WRONG RESULT''     | if they produce a result that deviates (even a tiny bit) from an old reference |
+The last two outcomes generally mean that a bug has been introduced, which requires investigation.
+Since regtesting only yields information relative to a previously known result, it is most useful to do a regtest before and after you make changes. To allow per-test numerical difference higher than that set as a default, add third column in appropriate TEST_FILES file with a relative value of the difference.
+===== Adding and Resetting Tests=====
+The test-suite is fully controlled by the following files in the [[src>tests]] directories
+^ File Name            ^ Content   |
+| ''TEST_DIRS''        | is just a list of directories that contain tests. You can add your directory here. |
+| ''TEST_FILES''       | the list of input files that need to be executed. You can add your file name here. Adding a comment about what it tests might help later debugging problems if a regtest fails |
+| ''TEST_FILES_RESET'' | you can add files for which the reference output became invalid (e.g. bug fix) to this list of files. However be absolutely sure that the change is due to a bug fix, do not reset these that fail because of unclear reasons. Try to add a comment to the git message and/or the file itself |
+| ''TEST_TYPES''       | this file allows you to create a new test type. I.e. to specify for which words should be grepped and what field should be used in the numerical comparison. |
+===== Run with sbatch =====
+The ''%%do_regtest%%'' script in ''%%tools/regtesting%%'' is often run from ''%%make VERSION=... ARCH=... test%%'' where it sets up everything just nicely. On a cluster you usually do want to separate the steps and invoke ''%%do_regtest%%'' manually.
+What you need:
+  * ''%%sbatch%%'' template script
+  * a CP2K source tree with a built CP2K
+==== Instructions ====
+The way the regtest script works is that it goes through all the directories (for example ''%%tests/QS/regtest-admm-1/%%'') and launches all tests in that directory. After each directories tests are started it checks whether the number of maximum tasks is reached, if not it also spawns the tests from the next directory. If the maximum number of tasks to run has been reached it waits until enough of them have finished to spawn tests from the next directory. Since the tests are usually rather short this procedure seldomly causes oversubscription.
+Also, ''%%srun%%'' should simply wait until nodes are free should there be no more free nodes available within the given allocation. Hence, the more nodes (or total number of tasks) you allocate for the ''%%sbatch%%'' the more tests can run in parallel. But we have to make sure ''%%do_regtest%%'' knows about that number by setting ''%%-maxtasks ${SLURM_NTASKS}%%'', ''%%SLURM_NTASKS%%'' is automatically set by ''%%sbatch%%'' to the number of tasks you specified either when running ''%%sbatch%%'' or in the preamble of the ''%%sbatch%%'' script.
+Append the following to your ''%%sbatch%%'' template and at least adapt the value for ''%%CP2K_BASE_DIR%%'' and possibly also the ''%%CP2K_TEST_DIR%%'':
+<code bash>
+CP2K_BASE_DIR="/PATH/TO/YOUR/CP2K/SOURCE/TREE"
+CP2K_TEST_DIR="${SCRATCH}/cp2k_regtesting"
+# CP2K_REGTEST_SCRIPT_DIR=""  # only set if needed (see below)
+CP2K_ARCH="local"
+CP2K_VERSION="psmp"
+# the following is the default, adjust if you want to run single tests with more than 2 ranks/tasks
+NTASKS_SINGLE_TEST=2
+NNODES_SINGLE_TEST=1  # otherwise srun will distribute the 2 tasks over 2 nodes
+SRUN_CMD="srun --cpu-bind=verbose,cores"
+# the following should be sufficiently generic:
+mkdir -p "${CP2K_TEST_DIR}"
+cd "${CP2K_TEST_DIR}"
+cp2k_rel_dir=$(realpath --relative-to="${CP2K_TEST_DIR}" "${CP2K_BASE_DIR}")
+# srun does not like `-np`, override the complete command instead:
+export cp2k_run_prefix="${SRUN_CMD} -N ${NNODES_SINGLE_TEST} -n ${NTASKS_SINGLE_TEST}"
+"${CP2K_REGEST_SCRIPT_DIR:-${CP2K_BASE_DIR}/tools/regtesting}/do_regtest" \
+  -arch "${CP2K_ARCH}" \
+  -version "${CP2K_VERSION}" \
+  -nobuild \
+  -mpiranks ${NTASKS_SINGLE_TEST} \
+  -ompthreads ${OMP_NUM_THREADS} \
+  -maxtasks ${SLURM_NTASKS} \
+  -cp2kdir "${cp2k_rel_dir}" \
+  |& tee "${CP2K_TEST_DIR}/${CP2K_ARCH}.${CP2K_VERSION}.log"
+# the above will output both to the slurm-*.out as well as a log file,
+# if you want only the log file replace the `|& tee` with a `>&`.
+# More options:
+# -farming   ... enable farming mode, see below
+# -retest    ... only do tests which failed in a previous run
+</code>
+A complete ''%%sbatch%%'' script to run the regtests on CSCS’ Alps (Eiger) could look as follows:
+<code bash>
+#!/bin/bash -l
+#SBATCH --time=01:00:00
+#SBATCH --nodes=4
+#SBATCH --ntasks-per-node=128
+#SBATCH --cpus-per-task=2
+#SBATCH --ntasks-per-core=1
+# More SBATCH options:
+# If you need 512GB memory nodes (otherwise only 256GB guaranteed):
+#    #SBATCH --mem=497G
+# To run on the debug queue (max 10 nodes, 30 min):
+#    #SBATCH--partition=debug
+set -o errexit
+set -o nounset
+set -o pipefail
+export MPICH_OFI_STARTUP_CONNECT=1
+export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
+export OMP_PROC_BIND=close
+export OMP_PLACES=cores
+source "${MODULESHOME}/init/bash"
+module load cpeGNU
+module load \
+    cray-fftw \
+    ELPA/2020.11.001 \
+    libxsmm/1.16.1 \
+    libxc/5.1.3 \
+    Libint-CP2K/2.6.0 \
+    gcc/10.2.0
+# Let the user see the currently loaded modules in the slurm log for completeness:
+module list
+CP2K_BASE_DIR="/users/timuel/work/cp2k"
+CP2K_TEST_DIR="${SCRATCH}/cp2k_regtesting"
+CP2K_ARCH="Eiger-gfortran"
+CP2K_VERSION="psmp"
+NTASKS_SINGLE_TEST=2
+NNODES_SINGLE_TEST=1
+SRUN_CMD="srun --cpu-bind=verbose,cores"
+# to run tests across nodes (to check for communication effects), use:
+# NNODES_SINGLE_TEST=4
+# SRUN_CMD="srun --cpu-bind=verbose,cores --ntasks-per-node 2"
+# the following should be sufficiently generic:
+mkdir -p "${CP2K_TEST_DIR}"
+cd "${CP2K_TEST_DIR}"
+cp2k_rel_dir=$(realpath --relative-to="${CP2K_TEST_DIR}" "${CP2K_BASE_DIR}")
+# srun does not like `-np`, override the complete command instead:
+export cp2k_run_prefix="${SRUN_CMD} -N ${NNODES_SINGLE_TEST} -n ${NTASKS_SINGLE_TEST}"
+"${CP2K_REGEST_SCRIPT_DIR:-${CP2K_BASE_DIR}/tools/regtesting}/do_regtest" \
+  -arch "${CP2K_ARCH}" \
+  -version "${CP2K_VERSION}" \
+  -nobuild \
+  -mpiranks ${NTASKS_SINGLE_TEST} \
+  -ompthreads ${OMP_NUM_THREADS} \
+  -maxtasks ${SLURM_NTASKS} \
+  -cp2kdir "${cp2k_rel_dir}" \
+ |& tee "${CP2K_TEST_DIR}/${CP2K_ARCH}.${CP2K_VERSION}.log"
+</code>
+==== Test parallelization via Farming ====
+Since the startup time of ''%%srun%%'' can be significant (the ''%%--bcast%%'' argument for ''%%srun%%'' might help if the binary is statically linked as does putting the CP2K directory on to a scratch volume) there is the option to use CP2Ks //Farming// mode to run all tests within one ''%%srun%%'' and one CP2K instance. This mode is directly supported by ''%%do_regtest%%'' which then dynamically creates the required CP2K input file. The drawback is that if one regtest triggers a ''%%CPABORT%%'' (or another signal) the rest of the tests will never run.
+Simply add the option ''%%-farming%%'' to the list of options for the ''%%do_regtest%%'' call to enable this.
+==== Minimal directory setup ====
+If you want to test a precompiled executable there is a minimal directory layout you have to reproduce to run the regtest:
+  * ''%%cp2k-prebuilt/exe/prebuilt/*.psmp%%'' … directory with all the executables
+  * ''%%cp2k-prebuilt/tests%%'' … directory containing the tests (can NOT be a symlink)
+  * ''%%cp2k-prebuilt/data%%'' … containing CP2Ks data
+An example if your HPC center uses EasyBuild to provide the CP2K package:
+<code>
+cp2k-prebuilt
+├── data -> /apps/eiger/UES/jenkins/1.4.0/software/CP2K/8.1-cpeGNU-21.04/data
+├── exe
+│   └── prebuilt -> /apps/eiger/UES/jenkins/1.4.0/software/CP2K/8.1-cpeGNU-21.04/bin
+└── tests
+</code>
+and then update the variables as follows:
+<code bash>
+CP2K_BASE_DIR="/PATH/TO/THE/MINIMAL/DIR/cp2k-prebuilt"
+CP2K_TEST_DIR="${SCRATCH}/cp2k_regtesting"
+CP2K_REGTEST_SCRIPT_DIR="/PATH/TO/A/FULL/CP2K/DIR/tools/regtesting"
+CP2K_ARCH="prebuilt"
+CP2K_VERSION="psmp"
+</code>
+**Note**: if the ''%%tools/regtesting%%'' is not in that minimal directory tree as shown above you may get an error about the ''%%timings.py%%'' not found and there will be no timings. If you need those you should link/copy the regtesting scripts into ''%%tools/regtest%%'' of that minimal directory tree, at which you point you can leave the ''%%CP2K_REGTEST_SCRIPT_DIR%%'' variable undefined again.