Medusa
Medusa is a computational platform for molecular modeling, which includes a physics-based force field to evaluate the energetics
of given conformations of molecules and molecular complexes, as well as rapid algorithms to search conformational space via a Monte
Carlo based algorithm. Benchmark studies of native sequence recapitulation from protein backbone and the prediction of protein
stability changes (
Eris) highlight the accuracy of the Medusa force field. We have extended the Medusa force field to
model small molecule ligands (MedusaScore), which allows us to predict the binding energy of ligand-receptor complexes. In terms of
sampling, Medusa allows for the rapid search of protein sequence space and side chain conformational space, which enables us to study
protein evolution and perform protein design. Medusa can also rapidly sample ligand conformational space, which made it possible for us
to develop a flexible ligand-receptor docking algorithm, MedusaDock, to simultaneously sample the conformational flexibility of ligand
and receptor.
Eris
To estimate protein thermostability and structural changes upon mutation is of great importance for molecular biologists. Therefore,
we have developed a computational tool, Eris, for accurately predicting the mutation-induced protein stability changes1. Due to the
complex nature of the interactions involved in protein folding, existing stability prediction methods often use empirical parameters
trained on experimental protein stability data. Moreover, limited by their capability to model the structural changes induced by
mutations, the applications of these methods are often restricted to mutations from large residues to small ones. We address these
deficiencies with a unique approach that combines a physical force field with a fast conformation-sampling algorithm in an atomic
framework of proteins. We show that Eris can effectively detect and resolve the atomic clashes and structural strain introduced by
mutation and yield reliable predictions of the stability change for these mutants. We test Eris on 595 mutants and find significant
correlation between the predicted and experimental stability changes. Eris is accessible through the
Dokholyan
laboratory server (
Eris) and as a
standalone software package.
πDMD
Discrete molecular dynamics (DMD) is a special type of molecular dynamics (MD) algorithm that uses stepwise potential functions to
approximate the continuous interaction potentials in traditional MD. This simplification reduces time-driven dynamics to event-driven
motion, which has been highly optimized in order to increase the computational efficiency. As the result, DMD features increased sampling
efficiency over traditional MD. In combination with simplified protein models, DMD simulation is orders of magnitude faster than traditional
MD simulation. Over the years,
Dokholyan laboratory have developed a series of models of proteins,
nucleotides, and lipids for DMD simulations. The models include various levels of coarse-graining as well as atomic resolution.
Dokholyan laboratory have successfully applied DMD to the study various biological problems, including
protein folding dynamics, protein misfolding and aggregation, ensemble reconstruction using experimental constraints, protein design,
self-assembly of lipids, and RNA folding. Molecules in Action has developed a parallel and most efficient version of the DMD software,
πDMD.
MedusaScore
The application of virtual screening processes is still limited by the lack of accurate scoring functions, as has been shown by recent
benchmark studies. To address this problem,
Dokholyan laboratory have developed a novel scoring function,
MedusaScore, for the fast and accurate evaluation of protein-ligand binding. MedusaScore is a direct extension of the Medusa force field,
which has demonstrated superior performance in modeling protein stabilities. Using publicly available benchmark datasets, we find that
MedusaScore can recognize native-like docking poses and predict binding affinity at high fidelity. The overall accuracy is found to be
superior than other widely-used scoring functions that have been tested using the same dataset, including Autodock, ChemScore, DrugScore,
D-Score(DOCK), F-Score(FlexX), G-Score(GOLD), HINT, LigScore, LUDI, PLP, PMF, and X-Score. In contrast to most other scoring functions,
MedusaDock was developed without the use of any protein-ligand complex structures for parameter training, thereby maintaining the best
transferability of the scoring function to a wide-range of targets and ligands. Both
Dokholyan laboratory
and Molecules in Action has been using MedusaScore for virtual screening in structure-based drug design.
MedusaDock
A principal source of innacuracies in molecular docking is inability of current approaches to capture both protein and ligand dynamics during
molecular docking. While a number of attempts have been made to circumvent this weakness of the current approaches by performing "ensemble" docking
of multiple (sampled) proteins conformations and multiple ligand conformations, such docking approaches still do not address the synergism of
protein-ligand reconfigurations upon docking.
Dokholyan laboratory developed a methodology, MedusaDock, that performs
fully flexible docking of both the ligand and a protein target and validated this approach on a number of targets. MedusaDock has been also utilized
by Molecules in Action for virtual drug screening for clients.
Surface matching through fingerprints
Matching of protein surfaces is important for protein function annotation, protein-protein interaction prediction, and protein-protein
interface design. However, traditional surface comparison methods are computationally too expensive to be applied to a dataset of a large
number of protein structures, such as the
Protein DataBank (PDB).
Dokholyan
laboratory has developed a novel approach to match protein surfaces that uses geometric invariant fingerprints. Borrowed from the computer
vision field, these fingerprints accurately describe the 3D features of an object, so that similarity between fingerprints reflects the similarity
between the corresponding objects. Using fingerprints, the comparison of 3D objects can be achieved at high speed. We introduce a novel
neighbor-averaging protocol, which not only significantly improves the accuracy of the fingerprint-based method, but also suggests a tentative 3D
alignment to allow further explicit alignment of the objects without undue computational cost. Using our approach, we successfully screened the
entire PDB for local surface similarities between proteins and protein inhibitors that are identified as binders to the pocket of a common enzyme.
The identified inhibitors belong to unrelated fold families, and could not be detected using traditional sequence or fold comparison methods.
CryoEM fitting
Dokholyan laboratory has developed an algorithm that can rapidly screen hundreds of thousands of structures and
identify those that best fit a given cryoEM density. Our method is based on geometric invariant fingerprints constructed using 3D Zernike functions.
Using those fingerprints, the comparison of electron densities to structures is reduced to a comparison of fingerprints, which is extremely
fast. To demonstrate the feasibility of this method,
Dokholyan laboratory used experimental cryoEM densities of GroEL
and rhodopsin proteins to screen the entire
Protein DataBank, and successfully identify other GroEL and rhodopsin
structures as the top hits.
Structural filters and high-resultion structure refinement
Dokholyan laboratory has developed a set of filters to assess the quality of a structural model or a
low-resolution structure. Three important qualities of a protein structure are assessed: (i) extent of steric clashes, (ii) extent of buried
voids, and (iii) percentage of hydrogen bond donors/acceptors that are buried but do not form hydrogen bonds. Distributions of each of these measures on
high-resolution crystal structures (0-2.5 Å) allow comparison of a given model to structures of natural proteins. The measure of a given structure
with respect to these filters is compared to the high-resolution distribution to obtain a
P-value, which reflects the quality of the structure.
Dokholyan laboratory offers an online access to these filters via
Gaia.
Furthermore,
Dokholyan laboratory has developed a method
Chiron which allows
high-resolution structural refinement.
Loop modeling and grafting
Loop modeling is an important and crucial step when building structural models of proteins. Loop grafting is important in protein design. Molecules in
Action has developed a suite
MiA Suite, which allows building loop structures using DMD simulations as well as their grafting
into a host protein.
Homology modeling
If a protein sequence of unknown structure is at least 30% similar to the sequence of any experimentally-determined structure, one can use
homology modeling to predict its structure. The main steps in homology modeling are: (i) obtaining sequence alignment between query and template,
(ii) changing the sequence of the template to reflect that of the query, (iii) processing the insertions and deletions in the template to obtain
a final structure. Once an alignment of query and template is obtained using other tools (for example, ClustalW or PSI-BLAST), one may perform
the remaining steps using Medusa and DMD with high efficiency. Step (ii) can be performed using the Medusa suite. Step (iii), which involves
breaking and annealing new peptide bonds and also folding of the insertions in conjunction of the rest of the protein, can be performed by DMD.
DMD is highly efficient in loop modeling and satisfying peptide-bond constraints in order to bring distant pieces of protein structure together
after a deletion.
Protein structure refinement
Homology modeling, a viable alternative for protein structure prediction, introduces artifacts into the final structural model,
either due to inaccuracies in the force field used for model building, or due to the model-building protocol. Steric clashing is
one such common structural artifact, characterized by the unphysical overlap of any two atoms in a protein structure. Refinement of
structural models to resolve steric clashes is critical for making further predictions using the generated model. Although there
exist programs for identifying clashes in protein structures based on a predetermined distance cutoff, tools for efficient resolution
of such artifacts are sparse.
Dokholyan laboratory has developed a DMD-based protocol to efficiently
relax the protein backbone in order to remove steric clashes from protein structures
Chiron.
Based on statistics obtained from a large dataset of high-resolution crystal structures,
Dokholyan laboratory
derived a metric to determine the quality of a model in terms of steric clashes and resolve them if required. Using this protocol, one
can minimize steric clashes from protein structures and homology models with an overall backbone deviation of less than 1 Å from the initial
structure.
Peptide-protein binding motif prediction
Protein-peptide interactions form the basis of many signaling pathways in a cell, mediating a foray of functions, from basic cellular processes
like phosphorylation to specialized processes like epitope recognition. Knowledge of specific protein-peptide interactions will help in the
identification of natural binding partners and specific members of cellular signaling pathways.
Dokholyan laboratory
has developed a semi-automated protocol to rapidly screen all possible peptide sequences that fit a protein-peptide complex and select those that
energetically favor the complex scaffold. Using our protocol,
Dokholyan laboratory has screened the peptide combinations
for a chaperone-peptide complex and identified a consensus motif that, when present in a protein, can be a substrate for the chaperone. Further it was
experimentally validated the binding motif and demonstrated that the motif forms a sufficient condition for substrate recognition by the chaperone.
This protocol can be applied to any protein-peptide complex in order to identify the most suited peptide combinations for binding to the target protein.
Protein design
Accurate prediction of stability change upon mutation (
Eris/Medusa) can be utilized in the rational design of mutations for
different purposes. For a protein of known structure, one can screen for mutations to either stabilize or destabilize the protein as required.
Similarly, one may attempt to rescue mutants that are experimentally characterized as unstable by employing the rational selection of further
mutation that restores stability. Furthermore, discrepancy between the thermodynamic and evolutionary preferences of various amino acids at different
positions (evaluated using
Eris) in a protein points to functional sites in protein structure. The fast repacking and selection
of optimal residue side chains afforded by
Eris can be used as an intermediate step in protein design. Cycles of backbone
conformational sampling by
πDMD and sequence selection by
Eris can be used as a viable protocol
for the design of proteins with a specific structure, stability, and/or function.
Protein folding: from structure prediction to dynamic characterization
piDMD utilizes Medusa force field in all-atom protein simulations. Using replica exchange DMD simulations, Dokholyan
laboratory demonstrated folding of six small proteins ab initio, which highlights the accuracy of the force field as well as the sampling
efficiency. Although folding of large proteins ab inito is too time consuming, we can incorporate secondary structure constraints in the simulations
to simulate the rearrangement and packing in order to predict the fully folded structure.
Another important application of all-atom DMD simulation is the characterization of protein dynamics in the folded state. Conformational dynamics
can be used to identify the fuctionally important dynamics. For example, the near-native dynamics of SOD1 monomer is associated with a propensity for
misfolding and aggregation. Dynamics coupling analysis from simulation can help to identify remotely coupled regions, which can then be used to
understand allostery-related functions and to engineer novel allosteric proteins.
RNA folding and structure prediction
Dokholyan laboratory has developed a coarse-grained RNA model for DMD simulations, which can successfully fold
small RNA (<50 nt)
ab initio. To fold large RNA molecules with complex tertiary structures, we incorporate experimentally-derived structural
information into RNA modeling.
Dokholyan laboratory has developed an automated RNA structure refinement method
utilizing base-pairs and distance constraints. RNA base-pairs can often be accurately derived by RNA secondary structure predictions using
biochemical probing data, such as SHAPE chemistry. Distance constraints can be inferred from a variety of biochemical and bioinformatic
techniques, including site-directed hydroxyl radical probing, fluorescence resonance energy transfer, cross-linking, and sequence covariation.
This methodology has been applied to tRNA
Asp, and has benchmarked the method on four different RNAs ranging from 49 nt to 158 nt.
In all cases, the RNA model structure can be refined to a statistically significant native-like structure.
Virtual drug screening
If the structure of a protein target is known, one can computationally screen millions of compounds to identify molecules that bind to the target.
Molecules in Action approach combines chemoinformatics-based methods with MedusaDock, an innovative fully flexible docking algorithm, to predict
the correct binding poses of the molecules, from which the binding affinitites are estimated using MedusaScore. Only the top-ranked molecules are
tested with an experimental assay. Our method features chemically diverse molecules among the top hits, which is useful for the identification of
novel drug molecules. In addition, the false positive rate is low among the top hits, which allows the discovery of true binders while performing
fewer experimental assays. This virtual screening scheme has been validated in several benchmark studies and experimental tests.
Dynamics in drug screening
For some difficult drug screening targets, all available virtual screening scoring functions fail to identify the native binding pose of a drug
in the pocket. This failure of traditional dock-and-score virtual screening methods is due to dynamic interactions between the
target and ligand that are not captured in a single static structure. Therefore,
Dokholyan laboratory has
developed a methodology combining MedusaDock and MedusaScore with πDMD simulations. By using traditional methods as a filter and performing
DMD simulations of the most promising ligand poses, it is possibleto identify the native binding pose by its dynamic behavior in the pocket.