MD Simulations of Villin Headpiece Subdomain (1VII)

Contents

  1. Introduction
  2. Gallery of 1VII Images
  3. Preparation and Equilibration
  4. Constant Energy Simulation
  5. Constant Temperature Simulation (300K)
  6. Constant Temperature Simulation (1000K)
  7. Glossary of Gromacs Commands

Introduction

In this section we perform molecular dynamics simulations of a thermostable subdomain from a chicken villin headpiece (aka HP-36). The structure consists of 36 residues (295 atoms) as obtained from NMR minimized average structure by C.J. McKnight, P.T. Matsudaira, and P.S. Kim (Nat. Struct. Biol. 1997), and is available in the Protein Databank under identification number 1VII. The N-terminal MET residue is from the expression system and is not from villin, but was kept in the structure during all simulations.

HP-36 (i.e. thermostable subdomain of chicken villin headpiece) is a fast folding subdomain consisting of 3 alpha-helices connected by loops. Recent microsecond (and longer) simulations of HP-36 produced true folding trajectories, leading to new insight on protein folding pathways. Although HP-36 is a relatively small fragment, it is generally believed that many larger proteins fold by first folding subdomains (i.e. local folding precedes global folding). In order to observe folding in a molecular dynamics simulation, one must follow trajectories over very long time intervals, using numerically stable algorithms.

Our primary interest is to illustrate the steps involved with performing a molecular dynamics simulation of a small protein using a typical molecular dynamics software package (Gromacs). In 1998, Y. Duan, L. Wang, and P. A. Kollman generated a 200 nanosecond trajectory of a fully solvated HP-36 using the AMBER software package (Proc. Nat. Acad. Sci. 1998). A more recent distributed simulation project called Folding@Home showed that at least 10 microseconds are needed to observe true folding. Using a massive distributed computing package, they were able to produce 35 different folding simulations (folding.stanford.edu). Since our trajectories will be limited to a few nanoseconds at the most, we do not expect to see any folding events in our simulations.


Gallery of 1VII Images

All of the following images were rendered using VMD, and the 1VII pdb file.

cartoon style Backbone Cartoon Style Rendering (large) licorice style Licorice Style Rendering (large)
ribbon style Backbone Ribbon Style Rendering (large) surface style Probe Accessible Surface Rendering (large)

All of the following images were rendered using PyMOL, and the 1VII pdb file.

spinning animationAnimated Cartoon Style. spinning animation2Animated Mesh Style.

Preparation and Equilibration

In the following sections we will outline the steps involved in performing a molecular dynamics simulation of HP-36 using Gromacs. Although the exact details will be different for other software packages, the basic steps are the same.

  1. Build the protein topology from a PDB file.

    In Gromacs this involves using the program pdb2gmx. According the the Gromacs manual pdb2gmx "reads a pdb file, lets you choose a forcefield, reads some database files, adds hydrogens to the molecules and generates coordinates in Gromacs (Gromos) format and a topology in Gromacs format." The program gives you a choice between a number of established forcefields, some of which are better at modeling certain systems, others are better at modeling other systems. It's best to read the literature on the various forcefields if you are interested in getting an accurate description. Adding hydrogens is sometimes known as protonation (adding protons) since a hydrogen atom consists of a single proton and (depending on the charge state) one or two electrons. The exact number of electrons will often depend on the pH of the solvent. The database files are used to determine the correct locations to place the hydrogens, and what the charge state is. The topology refers to the determination of which atoms are bonded, and which bonds form angle bonds, and which sequence of bonds form dihedral angles (torsional bonds). Once the topology is determined, the exact form of the forcefield can be built by placing the appropriate constants infront of each term in the forcfield. These constants are pulled from a database distributed with the molecular dynamics package.

    # pdb2gmx -ignh -f 1VII.pdb -o 1VII.gro -p 1VII.top
    
  2. Create a periodic cell around the protein.

    To minimize finite volume effects, most molecular dynamics packages use periodic boundary conditions. Any regular 3-dimensional solid which tiles 3-dimesional space can be used as the unit cell. By far the most common choice is a regular box, which works very well for small systems. If the system is much larger, one can use a dodecahedron or octohedron. The main advantage here is that for a spherical protein one obtains a more uniform layer around the exterior of the protein, resulting in less wasted space. In Gromacs, the periodic cell is built using the "editconf" command.

    # editconf -f 1VII.gro -o 1VII.gro -d 0.5
    
  3. Fill the cell with solvent

    Most molecular dynamics forcefields are parameterized for solvated systems. The protein is treated as a single solute molecule surrounded by a collection of smaller solvent molecules. By far the most popular solvent is water. The defaults solvent in Gromacs is Simple Point Charge (SPC) water, which models water as three atoms (H-O-H) with point charges centered on each atom. This assumption leads to an inaccurate value for the permanent dipole. THis is corrected by increasing the HOH bond angle to 109.42 degrees from the experimentally observed value of 104.45 degrees. The number of solvent molecules is chosen to achieve a target density for the system. In Gromacs solvent is added using the "genbox" command.

    # genbox -cp 1VII.gro -cs -o 1VII_b4em.gro -p 1VII.top
    
  4. Perform energy minimization to remove overlaps

    During the addition of the solvent it is quite possible that some of the solvent molecules may be in close proximity of either the protein or each other. This can cause problems for any molecular dynamics simulations, since close proximity leads to high potential energies. When the system relaxes these near contacts will disappear, but cause problems for any MD algorithm since small stepsizes are need to resolve the motion during the relaxation phase. To eliminate these near overlaps, one typically performs several steps of local optimization to find a more favorable conformation. In Gromacs energy minimization is performed using the "mdrun" command, with parameters specified by the "grompp" command (using an MDP file).

    # cat > em.mdp << _EOF_
    ;
    ; ** Options for preprocessor **
    title               =  1VII
    cpp                 =  /lib/cpp
    define              =  -DFLEX_SPC
    ;
    ; ** Options for Run control **
    ;    Type of simulation
    integrator          =  steep    ; steepest descent minimization
    ; ** Options for Energy minimizing
    emtol               =  1000.0   ; Tolerance in kJ mol^{-1} nm^{-1}
    emstep              =  0.01     ; Initial Stepsize in nm
    ;
    ; ** Options for bonded interactions **
    ;    Which groups are constrained
    constraints         =  none     ; no constraints
    ;
    ; ** Options for Neighbor Searching **
    nstlist             =  10       ; nblist update frequency
    ns_type             =  grid     ; nb search type (simple or grid)
    rlist               =  1.0      ; cut-off for short-range nblist in nm
    ;
    ; ** Options for Electrostatics and VDW/LJ **
    ;    We want the ordering rlist <= rvdw <= rcoulomb
    coulombtype         =  cut-off
    rcoulomb-switch     =  0
    rcoulomb            =  1.2      ; electrostatic cutoff in nm
    vdwtype             =  cut-off
    rvdw-switch         =  0
    rvdw                =  1.0      ; vdw/lj cutoff in nm
    _EOF_
    
    # grompp -f em -c 1VII_b4em -p 1VII -o 1VII_em
    # mdrun -s 1VII_em -o 1VII_em -c 1VII_b4pr -v
    
  5. Perform position restrained Molecular Dynamics

    To further equilbrate the solvent before starting a full molecular dynamics simulation, we can hold the protein (or protein backbone) fixed while allowing the solvent to move around at constant temperature. This allows the solvent to relax to a state which is natural for the current conformation of the protein. This is only necessary if you want the MD trajectory to start from a particular protein configuration, and not immediately move to a different conformation in an effort to conform to the arrangement of the solvent. To perform restrained MD in Gromacs, one executes the "grompp" command to setup the parameters (with an MDP file) followed by the "mdrun" command to run the dynamics.

    # cat > pr.mdp << _EOF_
    ;
    ; ** Options for preprocessor **
    title               =  1VII position restraining
    warnings            =  10
    cpp                 =  /lib/cpp
    define              =  -DPOSRES
    ;
    ; ** Options for Run control **
    ;    Type of simulation
    integrator          =  md
    ;    Size of timestep in picoseconds
    dt                  =  0.002    ; 0.002 = 2 femtoseconds
    ;    Total number of steps
    nsteps              =  10000    ; nsteps x dt = simulation time
    ;    Starting Time (t_0)
    tinit               = 0
    ;    Number of steps for center of mass motion removal
    nstcomm             =  1
    ;
    ; ** Options for Output Control **
    ;    Output freq for coordinates, velocities, and forces
    nstxout             =  1000     ; coordinates
    nstvout             =  1000     ; velocities
    nstfout             =  0        ; forces
    ;    Output freq for energies to log and energy files
    nstlog              = 50
    nstenergy           = 0
    ;
    ; ** Options for bonded interactions **
    ;    Which groups are constrained
    constraints         =  hbonds
    constraint_algorithm = shake
    ;    Relative tolerance of shake
    shake-tol           = 0.0001
    ;
    ; ** Options for Neighbor Searching **
    ;    freq for update of neighbor lists
    nstlist             =  10
    ;    neighbor search algorithm (simple or grid)
    ns_type             =  grid
    ;    neighbor list cut-off
    rlist               =  1.0
    ;
    ; ** Options for Electrostatics and VDW **
    ;    We want the ordering rlist <= rvdw <= rcoulomb
    coulombtype         =  cut-off
    rcoulomb-switch     =  0
    rcoulomb            =  1.2      ; electrostatic cutoff in nm
    vdwtype             =  cut-off
    rvdw-switch         =  0
    rvdw                =  1.0      ; vdw/lj cutoff in nm
    ;
    ; ** Options for Temperature Coupling **
    Tcoupl              =  nose-hoover
    tc_grps             =  System
    tau_t               =  0.01
    ref_t               =  300
    ;
    ; ** Options for Pressure Coupling **
    Pcoupl              =  no
    ;
    ; ** Options for Initial Velocity Distribution **
    ; Generate velocites is on at 300 K.
    gen_vel             =  yes
    gen_temp            =  300.0
    gen_seed            =  173529
    _EOF_
    
    # grompp -f pr -c 1VII_b4pr -r 1VII_b4pr -p 1VII -o 1VII_pr
    
    # mdrun -s 1VII_pr -o 1VII_pr -c 1VII_b4md -v
    
  6. Run Molecular Dynamics Simulations

    At this point, the system is reasonably equilibrated and one can start the real simulation. All MD simulations are run with the "mdrun" command, with the exact type of simulation specified by the "grompp" command. We have performed a total of three simulations: 300K NVT, 1000K NVT, and ca. 300K NVE. Since there is nothing restraining the temperature in the NVE case, the actual temperature is only approximately 300K.

  7. Analyze the Results

    After the simulations are finished, we can start analyzing the results. We have saved snapshots of the configurations after every 100 steps, and information about the various components of the energy after every 50 steps. Since the energy and coordinate files are saved in a portable binary format (XDR), we must use postprocessing programs to extract any desired information in plain text format. To extract energies, we use the "g_energy" command. This program will prompt you to input a list of exactly which components you want from the binary file.

    # g_energy -f ener.edr -o md_energy.xvg
    

    Pairs of dihedral angles within a single residue can be plotted against each other in what's known as a Ramachandran plot. These plots are typically used to illustrate what types of larger scale (i.e. backbone) configurations are present in the protein. Each type of structure (helices, sheets, turns, strands) occupy a particular region on the plot:

    Ramachandran Plot

    Unstructured (flexible) strand features are smeared over a large region on the Ramachandran plot, while structured (inflexible) helices are confined to a small region on the plot. To generate data points for a Ramachandran plot we use the "g_rama" command. This generates a plain text time series of all the phi/psi pairs, with a third column indicating to which pair (i.e. residue) the particular point belongs. This is handy since we can isolate a particular pair using "grep".

    # g_rama -f 1VII_md.trr -s 1VII_md.tpr -o 1VII_md_rama.xvg
    

Constant Energy Simulation


Constant Temperature Simulation (300K)

We start from the conformation produced by the preparation and equilibration phase. Using the "grompp" command, with a suitable input file, we can setup our simulation. After the parameters are set the actual simulation is run with the "mdrun" command.

# cat > md.mdp << _EOF_
;
; ** Options for preprocessor **
title               =  1VII MD
cpp                 =  /lib/cpp
;
; ** Options for Run control **
;    Type of simulation
integrator          =  md
;    Size of timestep in picoseconds
dt                  =  0.002    ; 0.002 = 2 femtoseconds
;    Total number of steps
nsteps              =  500000   ; nsteps x dt = simulation time
;    Starting Time (t_0)
tinit               = 0
;    Number of steps for center of mass motion removal
nstcomm             =  1
;
; ** Options for Output Control **
;    Output freq for coordinates, velocities, and forces
nstxout             =  100      ; coordinates
nstvout             =  0        ; velocities
nstfout             =  0        ; forces
;    Output freq for energies to log and energy files
nstlog              = 0
nstenergy           = 50
;
; ** Options for bonded interactions **
;    Which groups are constrained
constraints         =  hbonds
constraint_algorithm = shake
;    Relative tolerance of shake
shake-tol           = 0.0001
;
; ** Options for Neighbor Searching **
;    freq for update of neighbor lists
nstlist             =  10
;    neighbor search algorithm (simple or grid)
ns_type             =  grid
;    neighbor list cut-off
rlist               =  1.0
;
; ** Options for Electrostatics and VDW **
;    We want the ordering rlist <= rvdw <= rcoulomb
coulombtype         =  cut-off
rcoulomb-switch     =  0
rcoulomb            =  1.2      ; electrostatic cutoff in nm
vdwtype             =  cut-off
rvdw-switch         =  0
rvdw                =  1.0      ; vdw/lj cutoff in nm
;
; ** Options for Temperature Coupling **
Tcoupl              =  nose-hoover
tc_grps             =  System
tau_t               =  0.01
ref_t               =  300
;
; ** Options for Pressure Coupling **
Pcoupl              =  no
;
; ** Options for Initial Velocity Distribution **
; Generate velocites is on at 300 K.
gen_vel             =  yes
gen_temp            =  300.0
gen_seed            =  173529
_EOF_

# grompp -f md -c 1VII_b4md  -p 1VII -o 1VII_md

# mdrun -s 1VII_md -o 1VII_md -c 1VII_after_md -v

Summary of Results:

Ramachandran Plot
Ramachandran Plot
Data Points, 1.5M, gzipped
Temperature Plot
Temperature Plot
Data Points, 296k, gzipped
Pressure Plot
Pressure Plot
Data Points, 296k, gzipped
Energy Plot
Energy Plot
Data Points, 296k, gzipped
Potential Energy Plot
Potential Energy Plot
Data Points, 296k, gzipped
Kinetic Energy Plot
Kinetic Energy Plot
Data Points, 296k, gzipped

Constant Temperature Simulation (1000K)

We start from the conformation produced by the preparation and equilibration phase. The only modification made to the equilibration was the use of 1000K instead of 300K in the position restrained MD. Using the "grompp" command, with a suitable input file, we can setup our simulation. After the parameters are set the actual simulation is run with the "mdrun" command.

# cat > md.mdp << _EOF_
;
; ** Options for preprocessor **
title               =  1VII MD
cpp                 =  /lib/cpp
;
; ** Options for Run control **
;    Type of simulation
integrator          =  md
;    Size of timestep in picoseconds
dt                  =  0.002    ; 0.002 = 2 femtoseconds
;    Total number of steps
nsteps              =  500000   ; nsteps x dt = simulation time
;    Starting Time (t_0)
tinit               = 0
;    Number of steps for center of mass motion removal
nstcomm             =  1
;
; ** Options for Output Control **
;    Output freq for coordinates, velocities, and forces
nstxout             =  100      ; coordinates
nstvout             =  0        ; velocities
nstfout             =  0        ; forces
;    Output freq for energies to log and energy files
nstlog              = 0
nstenergy           = 50
;
; ** Options for bonded interactions **
;    Which groups are constrained
constraints         =  hbonds
constraint_algorithm = shake
;    Relative tolerance of shake
shake-tol           = 0.0001
;
; ** Options for Neighbor Searching **
;    freq for update of neighbor lists
nstlist             =  10
;    neighbor search algorithm (simple or grid)
ns_type             =  grid
;    neighbor list cut-off
rlist               =  1.0
;
; ** Options for Electrostatics and VDW **
;    We want the ordering rlist <= rvdw <= rcoulomb
coulombtype         =  cut-off
rcoulomb-switch     =  0
rcoulomb            =  1.2      ; electrostatic cutoff in nm
vdwtype             =  cut-off
rvdw-switch         =  0
rvdw                =  1.0      ; vdw/lj cutoff in nm
;
; ** Options for Temperature Coupling **
Tcoupl              =  nose-hoover
tc_grps             =  System
tau_t               =  0.01
ref_t               =  1000
;
; ** Options for Pressure Coupling **
Pcoupl              =  no
;
; ** Options for Initial Velocity Distribution **
; Generate velocites is on at 1000 K.
gen_vel             =  yes
gen_temp            =  1000.0
gen_seed            =  173529
_EOF_

# grompp -f md -c 1VII_b4md  -p 1VII -o 1VII_md

# mdrun -s 1VII_md -o 1VII_md -c 1VII_after_md -v

Summary of Results:

Ramachandran Plot
Ramachandran Plot
Data Points, 1.5M, gzipped
Temperature Plot
Temperature Plot
Data Points, 296k, gzipped
Pressure Plot
Pressure Plot
Data Points, 296k, gzipped
Energy Plot
Energy Plot
Data Points, 296k, gzipped
Potential Energy Plot
Potential Energy Plot
Data Points, 296k, gzipped
Kinetic Energy Plot
Kinetic Energy Plot
Data Points, 296k, gzipped

Constant Energy Simulation


Glossary of Gromacs Commands


cow bullet Last Updated: 06-Nov-03
HTML 4.01
Up