User Tools

Site Tools


howto:pao-ml

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
howto:pao-ml [2018/01/27 14:07] oschuetthowto:pao-ml [2018/07/13 13:04] oschuett
Line 10: Line 10:
 ===== Step 2: Calculate reference data in primary basis ===== ===== Step 2: Calculate reference data in primary basis =====
  
-Choose a primary basis set, e.g. ''DZVP-MOLOPT-GTH'' and perform a full [[inp>CP2K_INPUT/FORCE_EVAL/DFT/LS_SCF|LS_SCF]] optimization. You should also enable [[inp>FORCE_EVAL/DFT/LS_SCF.html#RESTART_WRITE| RESTART_WRITE]] to save the final density matrix. It can be used to speed up the next step significantly.+Choose a primary basis set, e.g. ''DZVP-MOLOPT-GTH'' and perform a full [[inp>CP2K_INPUT/FORCE_EVAL/DFT/LS_SCF|LS_SCF]] optimization. You should also enable [[inp>FORCE_EVAL/DFT/LS_SCF#RESTART_WRITE| RESTART_WRITE]] to save the final density matrix. It can be used to speed up the next step significantly.
  
 ===== Step 3: Optimize PAO basis for training structures ===== ===== Step 3: Optimize PAO basis for training structures =====
  
-Choose a [[inp>FORCE_EVAL/SUBSYS/KIND.html#PAO_BASIS_SIZE | PAO_BASIS_SIZE]] for each atomic kind. Good results can already be optained with a minimal basis sets. Slightly larger-than-minimal PAO basis sets can significantly increase the accuracy. However, they are also tougher to optimize and machine learn.+Choose a [[inp>FORCE_EVAL/SUBSYS/KIND#PAO_BASIS_SIZE | PAO_BASIS_SIZE]] for each atomic kind. Good results can already be optained with a minimal basis sets. Slightly larger-than-minimal PAO basis sets can significantly increase the accuracy. However, they are also tougher to optimize and machine learn.
  
 Most of the PAO settings are in the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO | PAO]] sections: Most of the PAO settings are in the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO | PAO]] sections:
Line 76: Line 76:
 For the simulation of larger systems the PAO-ML scheme infers new PAO basis sets from the training data. For this two heuristics are employed: A [[https://en.wikipedia.org/wiki/Feature_(machine_learning) | descriptor ]] and an inference algorithm. Currently, only one simple descriptor and [[https://en.wikipedia.org/wiki/Gaussian_process| Gaussian processes]] are implemented. However, this part offers great opportunities for future research. For the simulation of larger systems the PAO-ML scheme infers new PAO basis sets from the training data. For this two heuristics are employed: A [[https://en.wikipedia.org/wiki/Feature_(machine_learning) | descriptor ]] and an inference algorithm. Currently, only one simple descriptor and [[https://en.wikipedia.org/wiki/Gaussian_process| Gaussian processes]] are implemented. However, this part offers great opportunities for future research.
  
-In order to obtain good results from the learning machinery a small number of so-called [[https://en.wikipedia.org/wiki/Hyperparameter | hyperparameters]] have to be carefully tuned for each application. For the current implementation this includes the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO/MACHINE_LEARNING.html#GP_SCALE| GP_SCALE]] and the descriptor's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR.html#BETA | BETA ]] and [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR.html#SCREENING | SCREENING]].+In order to obtain good results from the learning machinery a small number of so-called [[https://en.wikipedia.org/wiki/Hyperparameter | hyperparameters]] have to be carefully tuned for each application. For the current implementation this includes the [[inp>FORCE_EVAL/DFT/LS_SCF/PAO/MACHINE_LEARNING#GP_SCALE| GP_SCALE]] and the descriptor's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR#BETA | BETA ]] and [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_DESCRIPTOR#SCREENING | SCREENING]].
  
 For the optimization of the hyper-parameter exists no gradient, hence one has to use a derivative-free method like the one by [[https://en.wikipedia.org/wiki/Powell%27s_method| Powell]]. A versatile implementation is e.g. the [[src>cp2k/tools/scriptmini/ | scriptmini ]] tool. A good optimization criterion is the variance of the energy difference wrt. the primary basis across the training set. Alternatively, atomic forces could be compared. Despite the missing gradients, this optimization is rather quick because it only performs calculations in the small PAO basis set. For the optimization of the hyper-parameter exists no gradient, hence one has to use a derivative-free method like the one by [[https://en.wikipedia.org/wiki/Powell%27s_method| Powell]]. A versatile implementation is e.g. the [[src>cp2k/tools/scriptmini/ | scriptmini ]] tool. A good optimization criterion is the variance of the energy difference wrt. the primary basis across the training set. Alternatively, atomic forces could be compared. Despite the missing gradients, this optimization is rather quick because it only performs calculations in the small PAO basis set.
Line 133: Line 133:
 Unfortunately, there is not yet a simple way to assess learnability. One way to investigate is to create a set of structures along a reaction coordinate, e.g. a dimer dissociation. One can then plot the numbers from the ''Xblock'' in the ''.pao'' files vs. the reaction coordinate. Unfortunately, there is not yet a simple way to assess learnability. One way to investigate is to create a set of structures along a reaction coordinate, e.g. a dimer dissociation. One can then plot the numbers from the ''Xblock'' in the ''.pao'' files vs. the reaction coordinate.
  
-The most critical parameters for learnability are [[inp>FORCE_EVAL/DFT/LS_SCF/PAO.html#LINPOT_REGULARIZATION_DELTA | LINPOT_REGULARIZATION_DELTA]] and the potential's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_POTENTIAL.html#BETA | BETA]].+The most critical parameters for learnability are [[inp>FORCE_EVAL/DFT/LS_SCF/PAO#LINPOT_REGULARIZATION_DELTA | LINPOT_REGULARIZATION_DELTA]] and the potential's [[inp>FORCE_EVAL/SUBSYS/KIND/PAO_POTENTIAL#BETA | BETA]].
howto/pao-ml.txt · Last modified: 2024/01/03 13:19 by oschuett