This is an old revision of the document!
Using profile guided optimization (PGO) helps to generate faster CP2K executables, typically a few percent. The basic procedure is rather easy if a recent gcc/gfortran is used (e.g. gcc 4.9.2, as tested below, older versions will/may not work).
1. Introduce in the used arch file (e.g. local.sopt) the variable $(PROFOPT) as part as the FCFLAGS.
FCFLAGS = -I$(CP2KINSTALLDIR)/include -std=f2003 -fimplicit-none -ffree-form -fno-omit-frame-pointer -g -O3 -march=native -ffast-math $(PROFOPT) $(DFLAGS) $(WFLAGS)
2. Clean any eventual leftovers from previous compilations, removing all relevant directories (i.e. realclean)
make -j ARCH=local VERSION=sopt realclean
3. Build the code with extra instrumentation (this binary is slow, and used only for training purposes)
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-generate
4. Run the binary either on a specific testcase, or on the full testsuite for good coverage. Only those parts of the code executed during the training run can benefit from PGO. This will write additional files (.gcda) files in the obj directory.
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-generate test
5. Remove the old instrumented object files, retaining the .gcda files (i.e. clean)
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-use clean
6. Recompile to build the optimized binary using the profile data.
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-use