In this section we perform molecular dynamics simulations of a thermostable subdomain from a chicken villin headpiece (aka HP-36). The structure consists of 36 residues (295 atoms) as obtained from NMR minimized average structure by C.J. McKnight, P.T. Matsudaira, and P.S. Kim (Nat. Struct. Biol. 1997), and is available in the Protein Databank under identification number 1VII. The N-terminal MET residue is from the expression system and is not from villin, but was kept in the structure during all simulations.
HP-36 (i.e. thermostable subdomain of chicken villin headpiece) is a fast folding subdomain consisting of 3 alpha-helices connected by loops. Recent microsecond (and longer) simulations of HP-36 produced true folding trajectories, leading to new insight on protein folding pathways. Although HP-36 is a relatively small fragment, it is generally believed that many larger proteins fold by first folding subdomains (i.e. local folding precedes global folding). In order to observe folding in a molecular dynamics simulation, one must follow trajectories over very long time intervals, using numerically stable algorithms.
Our primary interest is to illustrate the steps involved with performing a molecular dynamics simulation of a small protein using a typical molecular dynamics software package (Gromacs). In 1998, Y. Duan, L. Wang, and P. A. Kollman generated a 200 nanosecond trajectory of a fully solvated HP-36 using the AMBER software package (Proc. Nat. Acad. Sci. 1998). A more recent distributed simulation project called Folding@Home showed that at least 10 microseconds are needed to observe true folding. Using a massive distributed computing package, they were able to produce 35 different folding simulations (folding.stanford.edu). Since our trajectories will be limited to a few nanoseconds at the most, we do not expect to see any folding events in our simulations.
All of the following images were rendered using VMD, and the 1VII pdb file.
Backbone Cartoon Style Rendering (large) |
Licorice Style Rendering (large) |
Backbone Ribbon Style Rendering (large) |
Probe Accessible Surface Rendering (large) |
All of the following images were rendered using PyMOL, and the 1VII pdb file.
Animated Cartoon Style. |
Animated Mesh Style. |
In the following sections we will outline the steps involved in performing a molecular dynamics simulation of HP-36 using Gromacs. Although the exact details will be different for other software packages, the basic steps are the same.
Build the protein topology from a PDB file.
In Gromacs this involves using the program pdb2gmx. According the the Gromacs manual pdb2gmx "reads a pdb file, lets you choose a forcefield, reads some database files, adds hydrogens to the molecules and generates coordinates in Gromacs (Gromos) format and a topology in Gromacs format." The program gives you a choice between a number of established forcefields, some of which are better at modeling certain systems, others are better at modeling other systems. It's best to read the literature on the various forcefields if you are interested in getting an accurate description. Adding hydrogens is sometimes known as protonation (adding protons) since a hydrogen atom consists of a single proton and (depending on the charge state) one or two electrons. The exact number of electrons will often depend on the pH of the solvent. The database files are used to determine the correct locations to place the hydrogens, and what the charge state is. The topology refers to the determination of which atoms are bonded, and which bonds form angle bonds, and which sequence of bonds form dihedral angles (torsional bonds). Once the topology is determined, the exact form of the forcefield can be built by placing the appropriate constants infront of each term in the forcfield. These constants are pulled from a database distributed with the molecular dynamics package.
# pdb2gmx -ignh -f 1VII.pdb -o 1VII.gro -p 1VII.top
Create a periodic cell around the protein.
To minimize finite volume effects, most molecular dynamics packages use periodic boundary conditions. Any regular 3-dimensional solid which tiles 3-dimesional space can be used as the unit cell. By far the most common choice is a regular box, which works very well for small systems. If the system is much larger, one can use a dodecahedron or octohedron. The main advantage here is that for a spherical protein one obtains a more uniform layer around the exterior of the protein, resulting in less wasted space. In Gromacs, the periodic cell is built using the "editconf" command.
# editconf -f 1VII.gro -o 1VII.gro -d 0.5
Fill the cell with solvent
Most molecular dynamics forcefields are parameterized for solvated systems. The protein is treated as a single solute molecule surrounded by a collection of smaller solvent molecules. By far the most popular solvent is water. The defaults solvent in Gromacs is Simple Point Charge (SPC) water, which models water as three atoms (H-O-H) with point charges centered on each atom. This assumption leads to an inaccurate value for the permanent dipole. THis is corrected by increasing the HOH bond angle to 109.42 degrees from the experimentally observed value of 104.45 degrees. The number of solvent molecules is chosen to achieve a target density for the system. In Gromacs solvent is added using the "genbox" command.
# genbox -cp 1VII.gro -cs -o 1VII_b4em.gro -p 1VII.top
Perform energy minimization to remove overlaps
During the addition of the solvent it is quite possible that some of the solvent molecules may be in close proximity of either the protein or each other. This can cause problems for any molecular dynamics simulations, since close proximity leads to high potential energies. When the system relaxes these near contacts will disappear, but cause problems for any MD algorithm since small stepsizes are need to resolve the motion during the relaxation phase. To eliminate these near overlaps, one typically performs several steps of local optimization to find a more favorable conformation. In Gromacs energy minimization is performed using the "mdrun" command, with parameters specified by the "grompp" command (using an MDP file).
# cat > em.mdp << _EOF_
;
; ** Options for preprocessor **
title = 1VII
cpp = /lib/cpp
define = -DFLEX_SPC
;
; ** Options for Run control **
; Type of simulation
integrator = steep ; steepest descent minimization
; ** Options for Energy minimizing
emtol = 1000.0 ; Tolerance in kJ mol^{-1} nm^{-1}
emstep = 0.01 ; Initial Stepsize in nm
;
; ** Options for bonded interactions **
; Which groups are constrained
constraints = none ; no constraints
;
; ** Options for Neighbor Searching **
nstlist = 10 ; nblist update frequency
ns_type = grid ; nb search type (simple or grid)
rlist = 1.0 ; cut-off for short-range nblist in nm
;
; ** Options for Electrostatics and VDW/LJ **
; We want the ordering rlist <= rvdw <= rcoulomb
coulombtype = cut-off
rcoulomb-switch = 0
rcoulomb = 1.2 ; electrostatic cutoff in nm
vdwtype = cut-off
rvdw-switch = 0
rvdw = 1.0 ; vdw/lj cutoff in nm
_EOF_
# grompp -f em -c 1VII_b4em -p 1VII -o 1VII_em
# mdrun -s 1VII_em -o 1VII_em -c 1VII_b4pr -v
Perform position restrained Molecular Dynamics
To further equilbrate the solvent before starting a full molecular dynamics simulation, we can hold the protein (or protein backbone) fixed while allowing the solvent to move around at constant temperature. This allows the solvent to relax to a state which is natural for the current conformation of the protein. This is only necessary if you want the MD trajectory to start from a particular protein configuration, and not immediately move to a different conformation in an effort to conform to the arrangement of the solvent. To perform restrained MD in Gromacs, one executes the "grompp" command to setup the parameters (with an MDP file) followed by the "mdrun" command to run the dynamics.
# cat > pr.mdp << _EOF_ ; ; ** Options for preprocessor ** title = 1VII position restraining warnings = 10 cpp = /lib/cpp define = -DPOSRES ; ; ** Options for Run control ** ; Type of simulation integrator = md ; Size of timestep in picoseconds dt = 0.002 ; 0.002 = 2 femtoseconds ; Total number of steps nsteps = 10000 ; nsteps x dt = simulation time ; Starting Time (t_0) tinit = 0 ; Number of steps for center of mass motion removal nstcomm = 1 ; ; ** Options for Output Control ** ; Output freq for coordinates, velocities, and forces nstxout = 1000 ; coordinates nstvout = 1000 ; velocities nstfout = 0 ; forces ; Output freq for energies to log and energy files nstlog = 50 nstenergy = 0 ; ; ** Options for bonded interactions ** ; Which groups are constrained constraints = hbonds constraint_algorithm = shake ; Relative tolerance of shake shake-tol = 0.0001 ; ; ** Options for Neighbor Searching ** ; freq for update of neighbor lists nstlist = 10 ; neighbor search algorithm (simple or grid) ns_type = grid ; neighbor list cut-off rlist = 1.0 ; ; ** Options for Electrostatics and VDW ** ; We want the ordering rlist <= rvdw <= rcoulomb coulombtype = cut-off rcoulomb-switch = 0 rcoulomb = 1.2 ; electrostatic cutoff in nm vdwtype = cut-off rvdw-switch = 0 rvdw = 1.0 ; vdw/lj cutoff in nm ; ; ** Options for Temperature Coupling ** Tcoupl = nose-hoover tc_grps = System tau_t = 0.01 ref_t = 300 ; ; ** Options for Pressure Coupling ** Pcoupl = no ; ; ** Options for Initial Velocity Distribution ** ; Generate velocites is on at 300 K. gen_vel = yes gen_temp = 300.0 gen_seed = 173529 _EOF_ # grompp -f pr -c 1VII_b4pr -r 1VII_b4pr -p 1VII -o 1VII_pr # mdrun -s 1VII_pr -o 1VII_pr -c 1VII_b4md -v
Run Molecular Dynamics Simulations
At this point, the system is reasonably equilibrated and one can start the real simulation. All MD simulations are run with the "mdrun" command, with the exact type of simulation specified by the "grompp" command. We have performed a total of three simulations: 300K NVT, 1000K NVT, and ca. 300K NVE. Since there is nothing restraining the temperature in the NVE case, the actual temperature is only approximately 300K.
Analyze the Results
After the simulations are finished, we can start analyzing the results. We have saved snapshots of the configurations after every 100 steps, and information about the various components of the energy after every 50 steps. Since the energy and coordinate files are saved in a portable binary format (XDR), we must use postprocessing programs to extract any desired information in plain text format. To extract energies, we use the "g_energy" command. This program will prompt you to input a list of exactly which components you want from the binary file.
# g_energy -f ener.edr -o md_energy.xvg
Pairs of dihedral angles within a single residue can be plotted against each other in what's known as a Ramachandran plot. These plots are typically used to illustrate what types of larger scale (i.e. backbone) configurations are present in the protein. Each type of structure (helices, sheets, turns, strands) occupy a particular region on the plot:
Unstructured (flexible) strand features are smeared over a large region on the Ramachandran plot, while structured (inflexible) helices are confined to a small region on the plot. To generate data points for a Ramachandran plot we use the "g_rama" command. This generates a plain text time series of all the phi/psi pairs, with a third column indicating to which pair (i.e. residue) the particular point belongs. This is handy since we can isolate a particular pair using "grep".
# g_rama -f 1VII_md.trr -s 1VII_md.tpr -o 1VII_md_rama.xvg
We start from the conformation produced by the preparation and equilibration phase. Using the "grompp" command, with a suitable input file, we can setup our simulation. After the parameters are set the actual simulation is run with the "mdrun" command.
# cat > md.mdp << _EOF_ ; ; ** Options for preprocessor ** title = 1VII MD cpp = /lib/cpp ; ; ** Options for Run control ** ; Type of simulation integrator = md ; Size of timestep in picoseconds dt = 0.002 ; 0.002 = 2 femtoseconds ; Total number of steps nsteps = 500000 ; nsteps x dt = simulation time ; Starting Time (t_0) tinit = 0 ; Number of steps for center of mass motion removal nstcomm = 1 ; ; ** Options for Output Control ** ; Output freq for coordinates, velocities, and forces nstxout = 100 ; coordinates nstvout = 0 ; velocities nstfout = 0 ; forces ; Output freq for energies to log and energy files nstlog = 0 nstenergy = 50 ; ; ** Options for bonded interactions ** ; Which groups are constrained constraints = hbonds constraint_algorithm = shake ; Relative tolerance of shake shake-tol = 0.0001 ; ; ** Options for Neighbor Searching ** ; freq for update of neighbor lists nstlist = 10 ; neighbor search algorithm (simple or grid) ns_type = grid ; neighbor list cut-off rlist = 1.0 ; ; ** Options for Electrostatics and VDW ** ; We want the ordering rlist <= rvdw <= rcoulomb coulombtype = cut-off rcoulomb-switch = 0 rcoulomb = 1.2 ; electrostatic cutoff in nm vdwtype = cut-off rvdw-switch = 0 rvdw = 1.0 ; vdw/lj cutoff in nm ; ; ** Options for Temperature Coupling ** Tcoupl = nose-hoover tc_grps = System tau_t = 0.01 ref_t = 300 ; ; ** Options for Pressure Coupling ** Pcoupl = no ; ; ** Options for Initial Velocity Distribution ** ; Generate velocites is on at 300 K. gen_vel = yes gen_temp = 300.0 gen_seed = 173529 _EOF_ # grompp -f md -c 1VII_b4md -p 1VII -o 1VII_md # mdrun -s 1VII_md -o 1VII_md -c 1VII_after_md -v
![]() Ramachandran Plot Data Points, 1.5M, gzipped |
![]() Temperature Plot Data Points, 296k, gzipped |
![]() Pressure Plot Data Points, 296k, gzipped |
![]() Energy Plot Data Points, 296k, gzipped |
![]() Potential Energy Plot Data Points, 296k, gzipped |
![]() Kinetic Energy Plot Data Points, 296k, gzipped |
We start from the conformation produced by the preparation and equilibration phase. The only modification made to the equilibration was the use of 1000K instead of 300K in the position restrained MD. Using the "grompp" command, with a suitable input file, we can setup our simulation. After the parameters are set the actual simulation is run with the "mdrun" command.
# cat > md.mdp << _EOF_ ; ; ** Options for preprocessor ** title = 1VII MD cpp = /lib/cpp ; ; ** Options for Run control ** ; Type of simulation integrator = md ; Size of timestep in picoseconds dt = 0.002 ; 0.002 = 2 femtoseconds ; Total number of steps nsteps = 500000 ; nsteps x dt = simulation time ; Starting Time (t_0) tinit = 0 ; Number of steps for center of mass motion removal nstcomm = 1 ; ; ** Options for Output Control ** ; Output freq for coordinates, velocities, and forces nstxout = 100 ; coordinates nstvout = 0 ; velocities nstfout = 0 ; forces ; Output freq for energies to log and energy files nstlog = 0 nstenergy = 50 ; ; ** Options for bonded interactions ** ; Which groups are constrained constraints = hbonds constraint_algorithm = shake ; Relative tolerance of shake shake-tol = 0.0001 ; ; ** Options for Neighbor Searching ** ; freq for update of neighbor lists nstlist = 10 ; neighbor search algorithm (simple or grid) ns_type = grid ; neighbor list cut-off rlist = 1.0 ; ; ** Options for Electrostatics and VDW ** ; We want the ordering rlist <= rvdw <= rcoulomb coulombtype = cut-off rcoulomb-switch = 0 rcoulomb = 1.2 ; electrostatic cutoff in nm vdwtype = cut-off rvdw-switch = 0 rvdw = 1.0 ; vdw/lj cutoff in nm ; ; ** Options for Temperature Coupling ** Tcoupl = nose-hoover tc_grps = System tau_t = 0.01 ref_t = 1000 ; ; ** Options for Pressure Coupling ** Pcoupl = no ; ; ** Options for Initial Velocity Distribution ** ; Generate velocites is on at 1000 K. gen_vel = yes gen_temp = 1000.0 gen_seed = 173529 _EOF_ # grompp -f md -c 1VII_b4md -p 1VII -o 1VII_md # mdrun -s 1VII_md -o 1VII_md -c 1VII_after_md -v
![]() Ramachandran Plot Data Points, 1.5M, gzipped |
![]() Temperature Plot Data Points, 296k, gzipped |
![]() Pressure Plot Data Points, 296k, gzipped |
![]() Energy Plot Data Points, 296k, gzipped |
![]() Potential Energy Plot Data Points, 296k, gzipped |
![]() Kinetic Energy Plot Data Points, 296k, gzipped |
editconf: converts generic structure format to .gro, .g96 or .pdb.
g_energy: extracts energy components or distance restraint data from an energy file.
g_rama: selects Phi/Psi dihedral combos from a topology file and computes them as a function of time.
genbox: generates a box of solvent (and other solvent related tasks).
mdp file: a molecular dynamics run parameter file used as input for grompp.
mdrun: the main computational chemistry engine within Gromacs (MD, BD, LD, Energy Minimization).
pdb2gmx: reads the pdb file and generates coordinates and topology in Gromacs format.