Differences

This shows you the differences between two versions of the page.

--- exercises:2018_uzh_acpc2:prot_fol [2018/05/18 10:05] – gtocci
+++ exercises:2018_uzh_acpc2:prot_fol [2020/08/21 10:15] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
 ====== Protein Folding in Solution ======
-In this exercise, you will calculate the protein folding free energy using thermodynamic integration, a method based on molecular dynamics (MD). The protein will be described by the empirical force field, CHARMM, [[http://mackerell.umaryland.edu/charmm_ff.shtml]]
+In this exercise, you will calculate the protein folding free energy using thermodynamic integration, a method based on molecular dynamics (MD). The protein will be described by the empirical force field, CHARMM22, [[http://mackerell.umaryland.edu/charmm_ff.shtml]]
 ===== Background =====
-A model protein you will have to deal with is the alanine decapeptide. The folding/unfolding will be achieved by stretching/compressing the chain. This in practice is achieved by constraining the distance between the end carbon atoms in the protein and performing different simulations for each value of the distance: the atoms are the number 7 and 98. This distance is called a collective variable. At each distance one runs the MD simulation (constrained MD) to extract the time-averaged forces acting on the collective variable, $F(x)$. Then, a free energy difference can be calculated via thermodynamic integration (TI):
+A model protein you will have to deal with is the alanine decapeptide. The folding/unfolding will be achieved by stretching/compressing the chain. This in practice can be achieved by constraining the distance between the end carbon atoms in the protein and performing different simulations for each value of the distance. The atoms are the number 11 and 91, which are the carbon atoms of the carbonyl groups at the edges of the protein. The distance between these atoms is the collective variable chosen for the system. At each distance one runs a constrained MD to extract the time-averaged forces acting on the collective variable, $F(x)$. Then, a free energy difference can be calculated via thermodynamic integration (TI):
 \begin{equation}
@@ Line 14: / Line 14: @@
 Download the files: {{ :exercises:2018_uzh_acpc2:deca_ala_ex3.tar.gz |}}
-in the directory "deca_ala" you will find
+in the directory ''deca_ala'' you will find
 ''deca_ala.pdb'' (protein data base) file contains the coordinates
@@ Line 20: / Line 20: @@
 ''deca_ala.psf'' (protein structure file) file contains connectivity data
-''par_all27_prot_lipid.inp'' contains the force field parameters
+''par_all27_prot_lipid.inp'' contains the force field parameters. You will be using the CHARMM v.22, a popular force field for biologically relevant systems.
 ''md_std.inp'' is the CP2K input file
@@ Line 27: / Line 27: @@
 Although the image below shows the deca-alanine in water, it is expensive to run thermodynamic integration
- for a solvated protein with many values  of the constraints (here we choose 5 to 10) on small laptops. So we will run TI for the protein in the gas-phase.
+ for a solvated protein with many values of the constraints on small laptops. So we will run TI for the protein in the gas-phase.
 {{ :exercises:2017_uzh_acpc2:deca_ala.gif?400 |}}
 ===== Task 2: Perform constrained MD simulations =====
-To perform thermodynamic integration one has to run MD for different values of the distance between atoms 7 and 98, in each run it will be constrained. In the original file ''md_std.inp'' it is set to $18.36$ Å as is in the ''deca_ala.pdb'' file.
+Here you are asked to run several MD simulations for different values of the distance between atoms 11 and 91, in each run it will be constrained. In the original file ''md_std.inp'' the distance is set to $14.37$ Å as is in the ''deca_ala.pdb'' file. This is the first step to carry out the termodynamic integration, as described in the equation above.
-We have made this process automatic. To run TI for different values of the constraint, execute the script "run_ti_jobs.sh" that you find inside the compressed file "deca_ala.tar.gz". Take a look at the script and familiarize yourself with it. At which values are we constraining the distances between the carbon atoms? In this case we are performing 5 different simulations, each with a different value of the constraint. Feel free to use a larger or smaller number of constraints and to increase or reduce the upper and/or lower bound.
+We have made the script ''run_ti_jobs.sh'' to run these simulations, which you can find inside the compressed file ''deca_ala.tar.gz''. Take a look at the script and familiarize yourself with it. At which values are we constraining the distances between the carbon atoms? In this case we are performing 5 different simulations, each with a different value of the constraint. You can edit this script to use a larger or smaller number of constraints and to increase or reduce the upper and/or lower bound of integration. Can you guess where in the script we are specifying the values of the constraints?
 <code - run_ti_jobs.sh >
@@ Line 42: / Line 42: @@
   cp deca_ala/* $d/.
   cd $d
-  sed -e "s|18.36|${d}|" md_std.inp > d_${d}.inp
+  sed -e "s|14.37|${d}|" md_std.inp > d_${d}.inp
   cp2k.sopt -i d_${d}.inp -o d_${d}.out
   cd ..
@@ Line 50: / Line 50: @@
 <note tip>
   * Be careful with the values chosen for the upper and lower bound of the constraints as the simulations might crash or the SHAKE algorithm for the computation of the constraints might not converge if the values of the constrained distances are unphysical.
-  * We have set the number of step of each constrained MD to 200000, to get good statistics. Try to increase this number if you want to achieve better statistics or to decrease it to get the results faster, at the expense of a better converged free energy.
+  * We have set the number of steps of each constrained MD to 5000. Try to increase this number if you want to achieve better statistics or to decrease it to get the results faster, at the expense of a more converged free energy.
 </note>
-==== Relevant constraint section for constrained MD ====
+==== Relevant constraint sections for constrained MD ====
+Look into the main input file of cp2k ''md_std.inp'', and try to understand the keywords used as much as possible, by now you should be able to understand most of it, and you can experiment changing some of the keywords to see what happens. Look in particular at the definition of the section ''CONSTRAINT'' where the target value of the distance between the two carbon atoms at the edges of the proteins are constrained for instance to 14.37, and at the ''COLVAR'' section where the the collective variable for the distance between the two C atoms is defined.
 <code - constraint section>
  &CONSTRAINT
@@ Line 59: / Line 62: @@
       COLVAR 1
       INTERMOLECULAR
-      TARGET [angstrom] 18.36
+      TARGET [angstrom] 14.37
     &END COLLECTIVE
     &LAGRANGE_MULTIPLIERS
@@ Line 65: / Line 68: @@
     &END
  &END CONSTRAINT
+</code>
+<code - colvar section>
+    &COLVAR
+      &DISTANCE
+        ATOMS 11 91
+      &END DISTANCE
+    &END
 </code>
@@ Line 77: / Line 88: @@
-  * From these files you can calculate the average Lagrange multiplier of the Shake-algorithm like this:
+  * From these files you can calculate the average Lagrange multiplier of the Shake-algorithm and the standard error like this:
 <code>
-grep Shake yourprojectname.LagrangeMultLog | awk '{c++ ; s=s+$4}END{print s/c}'
+grep Shake yourprojectname.LagrangeMultLog | awk '{c++ ; s=s+$4; sq=sq+$4*$4 }END{print s/c, sqrt(sq/c - s*s/(c*c))/(sqrt(c)) }'
 </code>
-  * The average Lagrange multiplier is the average force $F(x)$ required to constrain the atoms at the distance $x$.
+  * The average Lagrange multiplier is the average force $F(x)$ required to constrain the atoms at the distance $x$. First of all, plot the force $F(x)$ with its standard error as a function of the collective variable to see if the simulation carried out so far is statistically relevant or the relative error is too large.
-  * From these forces the free energy difference can be obtained via thermodynamic integration between the two states. Given that state $a$ and $b$ are the initial and the final values of the collective variable, extract the free energy difference from
+  * From the forces, the free energy difference can be obtained via thermodynamic integration between the two states. Given that state $a$ and $b$ are the initial and the final values of the collective variable, extract the free energy difference from
 \begin{equation}
@@ Line 95: / Line 106: @@
 \end{equation}
-  * Discuss the form of the free energy profile and comment on what is the most stable state of the protein. Is this result physical? Explain why or why not. How can the presence of water affect the conformation of a protein?
+  * Discuss the form of the free energy profile and comment on what is the most stable state of the protein. Is it more stable when it is stretched or when it is in the $\alpha$-helix conformation? Is this result physical? Explain why or why not. How can the presence of water affect the conformation of the protein?
+  *__Tip 1__:  the most stable state will be that where the free energy is at the global minimum.
-<note tip>
+  *__Tip 2__: In order to understand whether the result obtained from thermodynamic integration is physical or not, have a look at the ''.xyz'' files for some of the constrained MD trajectories and think about what are the fundamental interactions between the constituents of the protein that we are taking into account with the CHARMM force field (e.g. electrostatic, van-der Waals, covalent bonds) and how these may contribute to the stabilization of the protein in a given state.
-* We have provided you with a useful script called "generate_plots.sh" that extracts the average force for each constrained MD simulation, and it prints out the file "force_vs_x.dat" containing the force as a function of the collective variable. Take a look at the script and modify it if necessary, e.g. if you have changed the lower and upper bound for the constraint or if you have changed the number of constraints.
+  * The two articles at the links below show how the free energy profile should look like, using thermodynamic integration or a different enhanced sampling method. Compare the free energy profile obtained from your simulations to either of those papers. Most likely, the free energy profile you obtained will not be as converged as theirs. What are some possible reasons for this, and how can one obtain better converged free energy profiles?
+  * __Paper 1__: [[ https://arxiv.org/pdf/0711.2726.pdf |  https://arxiv.org/pdf/0711.2726.pdf ]] see figure 2, solid line obtained with thermodynamic integration, using the same force field (CHARMM v.22) used here. This paper however, uses a different collective variable, i.e. the distance between the N-atoms at the opposite edges.
+  * __Paper 2__: [[ https://pubs.acs.org/doi/pdf/10.1021/ct5002076 | https://pubs.acs.org/doi/pdf/10.1021/ct5002076 ]] see figure 1, obtained with umbrella sampling and adaptive bias force sampling, for two versions of the CHARMM force field, v.22 and v.36. The collective variable in this case is the same as the one specified in our input.
+  * Finally, in principle we could have performed a direct MD simulation (as we did in the past exercises) to compute the free energy profile as a function of the distance between two of the atoms at the opposite edges of the protein (the collective variable we chose for this particular problem). Instead, we chose to perform an enhanced simulation technique. Can you think of a problem we would face if we had decided to perform a direct MD simulation? What could be a possible way to overcome this problem?
-* From the file containing the average force as a function of collective variable you need to integrate $F(x) dx$ numerically to obtain $\Delta A$. You may use the trapezoidal rule (or equivalent) with EXCEL, ORIGIN or any scripting language.
+<note tip>
+  *  We have provided you with a useful script called ''generate_plots.sh'' that extracts the average force and the standard error for each constrained MD simulation (see the ''grep'' command line above), and it prints out the file ''av_force_vs_x.dat'' containing the force as a function of the collective variable, and the error on the force (third column). Take a look at the script and modify it if necessary, e.g. if you have changed the lower and upper bound for the constraint or if you have changed the number of constraints.
+  * In order to check the convergence of the free energy profile one should look at the error on the average force for each constrained MD simulation. The error on the free energy profile can be obtained by propagating the error on the average force upon integration.
+  * From the file containing the average force as a function of collective variable you need to integrate $F(x) dx$ numerically to obtain $\Delta A$. You may use the trapezoidal rule (or equivalent) with EXCEL, ORIGIN or any scripting language.
 </note>
 <note warning>
-Make sure that you get the units right when performing the integration. The Largange multipliers are written in atomic units (Hartree/bohr), while the distances are in Angstrom.
+Make sure that you get the units right when performing the integration. The Largange multipliers are written in atomic units (Hartree/bohr, dimension of a force), while the distances are in Angstrom.
 </note>