Differences

This shows you the differences between two versions of the page.

--- howto:pao-ml [2018/01/20 17:43] – [Step 4: Optimize machine learning hyper-parameters] oschuett
+++ howto:pao-ml [2020/08/21 10:15] – external edit 127.0.0.1
@@ Line 2: / Line 2: @@
 PAO-ML stands for Polarized Atomic Orbitals from Machine Learning. It uses machine learning to generate geometry adopted small basis sets. It also provides exact ionic forces. The scheme can serve as an almost drop-in replacement for conventional
-basis sets to speedup otherwise standard DFT calculations. The method is similar to semi-empirical models based on minimal basis sets, but offers improved accuracy and quasi-automatic parameterization. However, the method is still in an early stage - so use with caution. For more information see: [[doi>10.3929/ethz-a-010819495]].
+basis sets to speedup otherwise standard DFT calculations. The method is similar to semi-empirical models based on minimal basis sets, but offers improved accuracy and quasi-automatic parameterization. However, the method is still in an early stage - so use with caution. For more information see: [[doi>10.1021/acs.jctc.8b00378]].
 ===== Step 1: Obtain training structures =====
@@ Line 10: / Line 10: @@
 ===== Step 2: Calculate reference data in primary basis =====
-Choose a primary basis set, e.g. ''DZVP-MOLOPT-GTH'' and perform a full [[inp>CP2K_INPUT/FORCE_EVAL/DFT/LS_SCF|LS_SCF]] optimization. You should also enable [[inp>FORCE_EVAL/DFT/LS_SCF.html#RESTART_WRITE| RESTART_WRITE]] to save the final density matrix. It can be used to speed up the next step significantly.
+Choose a primary basis set, e.g. ''DZVP-MOLOPT-GTH'' and perform a full [[inp>CP2K_INPUT/FORCE_EVAL/DFT/LS_SCF|LS_SCF]] optimization. You should also enable [[inp>FORCE_EVAL/DFT/LS_SCF#RESTART_WRITE| RESTART_WRITE]] to save the final density matrix. It can be used to speed up the next step significantly.
 ===== Step 3: Optimize PAO basis for training structures =====
-Choose a [[inp>FORCE_EVAL/SUBSYS/KIND.html#PAO_BASIS_SIZE | PAO_BASIS_SIZE]] for each atomic kind. Good results can already be optained with a minimal basis sets. Slightly larger-than-minimal PAO basis sets can significantly increase the accuracy. However, they are also tougher to optimize and machine learn.
+Choose a [[inp>FORCE_EVAL/SUBSYS/KIND#PAO_BASIS_SIZE | PAO_BASIS_SIZE]] for each atomic kind. Good results can already be optained with a minimal basis sets. Slightly larger-than-minimal PAO basis sets can significantly increase the accuracy. However, they are also tougher to optimize and machine learn.
 Most of the PAO settings are in the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO | PAO]] sections:
@@ Line 54: / Line 54: @@
 </code>
+==== Tuning the PAO Optimization =====
+Finding the optimal PAO basis poses an intricate minimization problem, because the rotation matrix U and the Kohn-Sham
+matrix H have to be optimized in a self-consistent manner. In order to speedup the optimization, the Kohn-Sham matrix is only updated occasionally while most time is spend on optimizing U. This alternating scheme is controlled by two input parameters:
+  * The frequency with which H is recalculated is determined by [[inp>FORCE_EVAL/DFT/LS_SCF/PAO#list_MAX_CYCLES|MAX_CYCLES]].
+  * Overshooting during the U optimization is damped via [[inp>FORCE_EVAL/DFT/LS_SCF/PAO#list_MIXING|MIXING]].
+The progress of the PAO optimization can be tracked from lines that start with ''PAO| step''. The columns have the following meaning:
+<code>
+             step-num             energy          conv-crit. step-length   time
+ PAO| step   1121                 -186.164843303  0.227E-06  0.120E+01     1.440
+</code>
+  * The step number counts the number of energy evaluation, ie. the number of U matrices probed. It can increase with different intervals, when the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO/LINE_SEARCH#list_METHOD|ADAPTive]] line-search method is used. When the step number reaches [[inp>FORCE_EVAL/DFT/LS_SCF/PAO#MAX_PAO|MAX_PAO]] then the optimization is terminated prematurely.
+  * The energy is the quantity that is optimized. It contains **only the first order term** of the total energy, ie. $Tr[HP]$, but shares the same variational minima. It furthermore contains the contributions from the various regularization terms.
+  * The convergence criterion is the norm of the gradient normalized by system size. It is compared against [[inp>FORCE_EVAL/DFT/LS_SCF/PAO#EPS_PAO|EPS_PAO]] to decided if the PAO optimization has converged. The overall optimization is terminated if this convergence criterion is reached within two steps after updating the Kohn-Sham matrix.
+  * The step length is the outcome of the line search. It should be of order 1. If it starts to behave erratically towards the end of the optimization, this indicates that further optimization is hindered by numerical accuracy e.g. from [[inp>FORCE_EVAL/DFT/LS_SCF#list_EPS_FILTER|EPS_FILTER]] or [[inp>FORCE_EVAL/DFT/LS_SCF#list_EPS_SCF|EPS_SCF]].
+  * The time is the time spend on this optimization step in seconds. This number can varry accordingly to the number of performed lines search steps.
 ===== Step 4: Optimize machine learning hyper-parameters =====
@@ Line 59: / Line 76: @@
 For the simulation of larger systems the PAO-ML scheme infers new PAO basis sets from the training data. For this two heuristics are employed: A [[https://en.wikipedia.org/wiki/Feature_(machine_learning) | descriptor ]] and an inference algorithm. Currently, only one simple descriptor and [[https://en.wikipedia.org/wiki/Gaussian_process| Gaussian processes]] are implemented. However, this part offers great opportunities for future research.
-In order to obtain good results from the learning machinery a small number of so-called [[https://en.wikipedia.org/wiki/Hyperparameter | hyperparameters]] have to be carefully tuned for each application. For the current implementation this includes the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO/MACHINE_LEARNING.html#GP_SCALE| GP_SCALE]] and the descriptor's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR.html#BETA | BETA ]] and [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR.html#SCREENING | SCREENING]].
+In order to obtain good results from the learning machinery a small number of so-called [[https://en.wikipedia.org/wiki/Hyperparameter | hyperparameters]] have to be carefully tuned for each application. For the current implementation this includes the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO/MACHINE_LEARNING#GP_SCALE| GP_SCALE]] and the descriptor's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR#BETA | BETA ]] and [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR#SCREENING | SCREENING]].
-For the optimization of the hyper-parameter exists no gradient, hence one has to use a derivative-free method like the one by [[https://en.wikipedia.org/wiki/Powell%27s_method| Powell]]. A possible implementation is e.g. the [[src>cp2k/tools/scriptmini/ | scriptmini ]] tool. A good optimization criterion is the variance of the energy difference wrt. the primary basis across the training set. Alternatively, atomic forces could be compared. Despite the missing gradients, this optimization is rather quick because it only performs calculations in the small PAO basis set.
+For the optimization of the hyper-parameter exists no gradient, hence one has to use a derivative-free method like the one by [[https://en.wikipedia.org/wiki/Powell%27s_method| Powell]]. A versatile implementation is e.g. the [[src>tools/scriptmini | scriptmini ]] tool. A good optimization criterion is the variance of the energy difference wrt. the primary basis across the training set. Alternatively, atomic forces could be compared. Despite the missing gradients, this optimization is rather quick because it only performs calculations in the small PAO basis set.
 ===== Step 5: Run simulation with PAO-ML ====
@@ Line 116: / Line 133: @@
 Unfortunately, there is not yet a simple way to assess learnability. One way to investigate is to create a set of structures along a reaction coordinate, e.g. a dimer dissociation. One can then plot the numbers from the ''Xblock'' in the ''.pao'' files vs. the reaction coordinate.
-The most critical parameters for learnability are [[inp>FORCE_EVAL/DFT/LS_SCF/PAO.html#LINPOT_REGULARIZATION_DELTA | LINPOT_REGULARIZATION_DELTA]] and the potential's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_POTENTIAL.html#BETA | BETA]].
+The most critical parameters for learnability are [[inp>FORCE_EVAL/DFT/LS_SCF/PAO#LINPOT_REGULARIZATION_DELTA | LINPOT_REGULARIZATION_DELTA]] and the potential's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_POTENTIAL#BETA | BETA]].