This is an old revision of the document!

Protein Folding in Solution

In this exercise, you will calculate the protein folding free energy using thermodynamic integration, a method based on molecular dynamics (MD). The protein will be described by the empirical force field, CHARMM, http://mackerell.umaryland.edu/charmm_ff.shtml

Background

A model protein you will have to deal with is the alanine decapeptide. The folding/unfolding will be achieved by stretching/compressing the chain. This in practice is achieved by constraining the distance between the end carbon atoms in the protein and performing different simulations for each value of the distance: the atoms are the number 7 and 98. This distance is called a collective variable. At each distance one runs the MD simulation (constrained MD) to extract the time-averaged forces acting on the collective variable, $F(x)$. Then, a free energy difference can be calculated via thermodynamic integration (TI):

\begin{equation} \Delta A = -\int_a^b F(x)\, dx \end{equation}

Here $a$ and $b$ are the initial and the final values of the collective variable. TI is a general method, which can be applied to a variety of processes, e.g. phase transitions, electron transfer etc.

Task 1: Familiarize yourself

Download the files: deca_ala_ex3.tar.gz

in the directory “deca_ala” you will find

deca_ala.pdb (protein data base) file contains the coordinates

deca_ala.psf (protein structure file) file contains connectivity data

par_all27_prot_lipid.inp contains the force field parameters

md_std.inp is the CP2K input file

Open the deca_ala.pdb protein data bank format file with vmd. Create a new representation for the protein, e.g. of type Ribbon to observe the alpha-helix.

Although the image below shows the deca-alanine in water, it is expensive to run thermodynamic integration for a solvated protein with many values of the constraints (here we choose 5 to 10) on small laptops. So we will run TI for the protein in the gas-phase.

Task 2: Perform constrained MD simulations

To perform thermodynamic integration one has to run MD for different values of the distance between atoms 7 and 98, in each run it will be constrained. In the original file md_std.inp it is set to $18.36$ Å as is in the deca_ala.pdb file.

We have made this process automatic. To run TI for different values of the constraint, execute the script “run_ti_jobs.sh” that you find inside the compressed file “deca_ala.tar.gz”. Take a look at the script and familiarize yourself with it. At which values are we constraining the distances between the carbon atoms? In this case we are performing 5 different simulations, each with a different value of the constraint. Feel free to use a larger or smaller number of constraints and to increase or reduce the upper and/or lower bound.

run_ti_jobs.sh

#!/bin/bash -l
for d in $(seq 16 1 20); do
  mkdir $d
  cp deca_ala/* $d/.
  cd $d
  sed -e "s|18.36|${d}|" md_std.inp > d_${d}.inp
  cp2k.sopt -i d_${d}.inp -o d_${d}.out 
  cd ..
done

Be careful with the values chosen for the upper and lower bound of the constraints as the simulations might crash or the SHAKE algorithm for the computation of the constraints might not converge if the values of the constrained distances are unphysical.
We have set the number of step of each constrained MD to 200000, to get good statistics. Try to increase this number if you want to achieve better statistics or to decrease it to get the results faster, at the expense of a better converged free energy.

Relevant constraint section for constrained MD

constraint section

 &CONSTRAINT
    &COLLECTIVE
      COLVAR 1
      INTERMOLECULAR
      TARGET [angstrom] 18.36
    &END COLLECTIVE
    &LAGRANGE_MULTIPLIERS
      COMMON_ITERATION_LEVELS 1
    &END
 &END CONSTRAINT

Task 3: Evaluate the free energy difference

⇒ Each constrained MD will produce a .LagrangeMultLog-files, which look like this:

Shake  Lagrangian Multipliers:           -63.547262596
Rattle Lagrangian Multipliers:            63.240598387
Shake  Lagrangian Multipliers:            -0.326901815
Rattle Lagrangian Multipliers:            -0.318145579

From these files you can calculate the average Lagrange multiplier of the Shake-algorithm like this:

grep Shake yourprojectname.LagrangeMultLog | awk '{c++ ; s=s+$4}END{print s/c}'

The average Lagrange multiplier is the average force $F(x)$ required to constrain the atoms at the distance $x$.
From these forces the free energy difference can be obtained via thermodynamic integration between the two states. Given that state $a$ and $b$ are the initial and the final values of the collective variable, extract the free energy difference from

\begin{equation} \Delta A = -\int_a^b F(x)\, dx \end{equation}

Also, extract the free energy profile as a function of the collective variable $x^\prime$. To do this, one can choose the minimum value of the collective variable, as the lower bound of the integral and plot the free energy as a function of $x^\prime$, the upper bound of the integral, according to the same relation

\begin{equation} \Delta A(x^\prime) = -\int_a^{x^\prime} F(x)\, dx \end{equation}

Discuss the form of the free energy profile and comment on what is the most stable state of the protein. Is this result physical? Explain why or why not. How can the presence of water affect the conformation of a protein?

* We have provided you with a useful script called “generate_plots.sh” that extracts the average force for each constrained MD simulation, and it prints out the file “force_vs_x.dat” containing the force as a function of the collective variable. Take a look at the script and modify it if necessary, e.g. if you have changed the lower and upper bound for the constraint or if you have changed the number of constraints.

* From the file containing the average force as a function of collective variable you need to integrate $F(x) dx$ numerically to obtain $\Delta A$. You may use the trapezoidal rule (or equivalent) with EXCEL, ORIGIN or any scripting language.

Make sure that you get the units right when performing the integration. The Largange multipliers are written in atomic units (Hartree/bohr), while the distances are in Angstrom.

CP2K Open Source Molecular Dynamics

Table of Contents