How to Train a neural network interatomic potential using Allegro and Perform Molecular Dynamics with CP2K
This Colab tutorial illustrates how to train an equivariant neural network interatomic potential for bulk water using the Allegro framework. You will learn how to train a model, deploy it in production, and run molecular dynamics simulations in CP2K
. The training and inference will be carried out on the GPU provided by the Colab environment.
Allegro is designed for constructing highly accurate and scalable interatomic potentials for molecular dynamics simulations. The methodology is described in detail in this paper (10.1038/s41467-023-36329-y). An open-source package (https://github.com/mir-group/allegro) that implements Allegro, built on the nequip framework (https://github.com/mir-group/nequip) was developed by the Allegro and NequIP authors, A. Musaelian, S. Batzner, A. Johansson, L. Sun, C. J. Owen, M. Kornbluth, B. Kozinsky.
Input Section
Inference in CP2K is performed through the MM
package of CP2K, Fist. As an example, the relevant section for Allegro (or similarly for NequIP) is:
- Allegro_si_MD.inp
&ALLEGRO ATOMS Si PARM_FILE_NAME Allegro/si-deployed.pth UNIT_COORDS angstrom UNIT_ENERGY eV UNIT_FORCES eV*angstrom^-1 &END ALLEGRO
where the si-deployed.pth
refers to the PyTorch model that was deployed using the Allegro framework, and the UNIT
tags refer to the units of the coordinates, energy and forces of the model itself. An example for the full input file can be found in the Colab tutorial and on the regtests, see tests/Fist/regtest-allegro/Allegro_si_MD.inp
Input details
The tag ATOMS
expects a list of elements/kinds in a way and order that is consistent with the YAML file of NequIP and Allegro. If this is not done unphysical results will be obtained. Additionally, the atomic coordinates &COORD…&END or in the &TOPOLOGY…&END themselves have to be provided in a way that is consistent with the YAML file. If this is not done unphysical results will be obtained. Spotting such issues is quite straightforward as the energy is significantly wrong. For example, by inverting the order of one of the elements in the test regtest-nequip/NequIP_water.inp
, the error with respect to the reference value is of the order of 1 eV. Additionally, running MD leads rapidly to highly unstable simulations.
Compiling CP2K with LibTorch
Running with NequIP or Allegro requires compiling CP2K with the libtorch library. For the CP2K binaries running on CPUs installing the toolchain using the flag –with-libtorch
is enough. To benefit from (often significant) GPU acceleration, the precompiled Libtorch library for CUDA can be obtained at https://pytorch.org/, for example for CUDA 11.8:
wget https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcu118.zip
After extracting the libtorch CUDA binaries, the toolchain script ./install_cp2k_toolchain.sh
can be run providing the appropriate path with the flag –with-libtorch=<path-to-libtorch-cuda>
.
Further Resources
For additional references on NequIP, Allegro and equivariant neural networks (e3nn) see:
- Allegro paper 10.1038/s41467-023-36329-y and code https://github.com/mir-group/allegro
- NequIP paper 10.1038/s41467-022-29939-5 and code https://github.com/mir-group/nequip
- A Tutorial on LAMMPS by the NequIP/Allegro authors is found at the Colab notebook here
- For an introduction to e3nn see here, 10.5281/zenodo.7430260