Getting started with MD simulations
Learn Python that is needed to code up MD analysis scripts.
Learn Unix that is needed to install MD simulation programs, run MD simulations, and access supercomputers for running MD simulations by following this tutorial here.
Amber, one of the commonly used MD simulation engines, has very well-documented tutorials for everyone to follow here. Run your very first alanine dipeptide simulation by following the tutorial here. Please download and install Amber and VMD from their respective websites before attempting these tutorials.
GROMACS is another commonly used MD simulation engines and can be downloaded for free here, but there are many more like LAMMPS, etc.
PyMOL is another commonly used MD simulation visualization programs (other than VMD) and can be downloaded for free here (college students). Chimera X is also a great visualization program.
How to set up MD simulations (taken from https://ctlee.github.io/BioChemCoRe-2018/system-prep/)
Protein structures from various structural determination methods often are not complete. For example, structures from X-ray crystallography typically do not have resolved hydrogens. Given the importance of hydrogen bonding, which requires hydrogen participation, for protein stability and receptor-ligand interactions, X-ray crystal structures cannot be used used in molecular dynamics (MD) right “off the shelf.”
Please refer here for example Amber scripts.
Protein preparation: Please download Schrodinger Maestro (academic version is free) to do this step since that will be the easiest. After you have loaded your structure, please select the following choices for each step. Make sure to set your pH appropriately (usually around 7 but please check relevant experimental papers if this is really the case).
Essentially, we're adding missing residues, loops, and hydrogens, deleting crystal water molecules, and preparing the structure before parametrization and minimization.
Ligand parameterization: If your system has ligands, we will need to create parameters for these ligands using Amber's Antechamber and Generalized Amber Force Field (GAFF). Please complete this tutorial before proceeding with this step. If you need to deal with DNA or RNA, please complete this tutorial before proceeding with this step. Make sure to save the protein and ligand system as a PDB file and the ligand separately as a MOL2 file from Schrodinger Maestro. Then convert the Schrodinger Maestro MOL2 file to a suitable MOL2 file for Amber with the following command:
antechamber -i LIGAND.mol2 -fi mol2 -o LIGAND_NEW.mol2 -fo mol2 -c bcc -s 2
If Antechamber fails to run within the default iterations, you can add the ek option:
antechamber -i LIGAND.mol2 -fi mol2 -o LIGAND_NEW.mol2 -fo mol2 -c bcc -s 2 -ek "qm_theory='AM1', grms_tol=0.0005, scfconv=1.d-10, ndiis_attempts=700, itrmax=2000"
Protein parameterization: For proteins, we can use the Amber ff19SB force field along with the OPC water model that has been proven to work best with the ff19SB force field (do not use the TIP3P water model with this force field). If your protein is an intrinsically disordered protein (IDP), use the a99SB-disp force field and its associated water model (see here). You can use loadpdb command in Amber tleap with your resulting PQR file from Step 1. Use Amber tleap to create your prmtop and rst7 files as done in the Amber tutorials. Please complete this tutorial before proceeding with this step. Remember that we need to add ions (Na+, K+, and Cl- are common ones) to neutralize the charge of the system and also set the appropriate ionic concentration for the system (usually 0.1-0.15 M NaCl or KCl but please refer to relevant experimental papers to see if this is really the case) - please refer to this tutorial for finding the number of ions to add. We also need to add the appropriate parameter files for the ligands if we have any from Step 2 at the Amber tleap stage.
One of the common errors that you will encounter will be from the discrepancy in residue naming in Schrodinger Maestro vs. Amber. Please visually inspect all HIS instances and replace it with HID/HIP/HIE.
Please delete all of the HXT instances from your PDB file since HXT doesn't exist in Amber force fields.
Running on the supercomputer cluster: Now that you have completed building the system, please get access to a supercomputer cluster to run your simulations and get started by following these steps:
First, create an account with your UCD email here and let the PI know your user ID https://allocations.access-ci.org/
Second, refer to the Expanse user guide: https://hpc-training.sdsc.edu/expanse-101/
Third, ssh to Expanse from your terminal by: ssh email@example.com
Fourth, go to your scratch directory in Expanse (which is /expanse/lustre/scratch/youruserid/temp_project) and transfer your files there by using the scp command.
Useful website for getting familiar with HPC: https://ngs-docs.github.io/2021-august-remote-computing/
Relaxing the system: Now that you have all of your simulation files on the supercomputer cluster, we can run our simulations! First, we need to relax the system before moving onto production MD. We first need to minimize, heat to the appropriate temperature (usually room temperature or 298 or 300 K but again please refer to relevant experimental papers to see if this is really the case), and equilibrate the system as done in the introductory Amber tutorials. Follow these steps on the supercomputer cluster:
First, find your project account by typing in the following commands:
module load sdsc
expanse-client user -r expanse_gpu
Second, use the listed project account and put it in the #SBATCH --account section of your run script that is needed to submit jobs and run simulations on the supercomputer cluster.
Third, replace my email with your email to get notifications about your job in the run script and replace the file names appropriately with yours in the Amber pmemd.cuda command that will be run for your simulation.
Fourth, submit your job by: sbatch run_expanse.sh
Fifth, you can check the status of your job by squeue -u youruserid and cancel your job by scancel jobID
Production MD: For production MD, depending on what you want from your MD simulations (free energies? rate constants? continuous pathways?), we will run an MD simulation with the appropriate enhanced sampling method (usually GaMD or WE or the combination of the two). Please read about the enhanced sampling method that you will use from relevant papers and websites.
Great introductory slides on MD and WE by Prof. Matthew Zwier!
Seminal papers for Ahn lab members:
WESTPA 2.0: Russo, John D., et al. "WESTPA 2.0: High-performance upgrades for weighted ensemble simulations and analysis of longer-timescale applications." Journal of Chemical Theory and Computation 18.2 (2022): 638-649.
WESTPA 1.0 (older version but still has a valuable overview of the WESTPA program): Bogetti, Anthony T., et al. "A suite of tutorials for the WESTPA rare-events sampling software [Article v1. 0]." Living journal of computational molecular science 1.2 (2019).
GaMD: Miao, Yinglong, Victoria A. Feher, and J. Andrew McCammon. "Gaussian accelerated molecular dynamics: unconstrained enhanced sampling and free energy calculation." Journal of chemical theory and computation 11.8 (2015): 3584-3595.
Additional reading and websites to get started on the weighted ensemble (WE) method:
Download the latest WESTPA here for your workstation and supercomputer cluster
Installation for PowerPC architecture, etc. (alternative installations) can be found here