====== Profile guided optimization for CP2K ====== 
Using profile guided optimization (PGO) helps to generate faster CP2K executables, typically a few percent. The basic procedure is rather easy if a recent gcc/gfortran is used (e.g. gcc 4.9.2, as tested below, older versions will/may not work). 
1. Introduce in the used arch file (e.g. local.sopt) the variable $(PROFOPT) as part as the FCFLAGS. 
FCFLAGS  = -I$(CP2KINSTALLDIR)/include -std=f2003 -fimplicit-none -ffree-form -fno-omit-frame-pointer -g -O3 -march=native -ffast-math $(PROFOPT) $(DFLAGS) $(WFLAGS) 
2. Clean any eventual leftovers from previous compilations, removing all relevant directories (i.e. realclean) 
make -j ARCH=local VERSION=sopt realclean 
3. Build the code with extra instrumentation (this binary is slow, and used only for training purposes) 
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-generate 
4. Run the binary either on a specific testcase, or on the full testsuite (do_regtest) for good coverage. Only those parts of the code executed during the training run can benefit from PGO. This will write additional files (.gcda) files in the obj directory. 
../../exe/local/cp2k.sopt -i test.inp -o test.out 
5. Remove the old instrumented object files, retaining the .gcda files (i.e. clean) 
make -j ARCH=local VERSION=sopt clean 
6. Recompile to build the optimized binary using the profile data. 
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-use 
