Publications by authors named "Hahnbeom Park"

39 Publications

Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14.

Proteins 2021 Jul 30. Epub 2021 Jul 30.

Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.

The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.26194DOI Listing
July 2021

Protein oligomer modeling guided by predicted interchain contacts in CASP14.

Proteins 2021 Jul 29. Epub 2021 Jul 29.

Department of Biochemistry, University of Washington, Seattle, Washington, USA.

For CASP14, we developed deep learning-based methods for predicting homo-oligomeric and hetero-oligomeric contacts and used them for oligomer modeling. To build structure models, we developed an oligomer structure generation method that utilizes predicted interchain contacts to guide iterative restrained minimization from random backbone structures. We supplemented this gradient-based fold-and-dock method with template-based and ab initio docking approaches using deep learning-based subunit predictions on 29 assembly targets. These methods produced oligomer models with summed Z-scores 5.5 units higher than the next best group, with the fold-and-dock method having the best relative performance. Over the eight targets for which this method was used, the best of the five submitted models had average oligomer TM-score of 0.71 (average oligomer TM-score of the next best group: 0.64), and explicit modeling of inter-subunit interactions improved modeling of six out of 40 individual domains (ΔGDT-TS > 2.0).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.26197DOI Listing
July 2021

Accurate prediction of protein structures and interactions using a three-track neural network.

Science 2021 08 15;373(6557):871-876. Epub 2021 Jul 15.

Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.abj8754DOI Listing
August 2021

Improved protein structure refinement guided by deep learning based accuracy estimation.

Nat Commun 2021 02 26;12(1):1340. Epub 2021 Feb 26.

Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA.

We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-21511-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7910447PMC
February 2021

Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein-Ligand Docking.

J Chem Theory Comput 2021 Mar 12;17(3):2000-2010. Epub 2021 Feb 12.

Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington 98195, United States.

Accurate and rapid calculation of protein-small molecule interaction free energies is critical for computational drug discovery. Because of the large chemical space spanned by drug-like molecules, classical force fields contain thousands of parameters describing atom-pair distance and torsional preferences; each parameter is typically optimized independently on simple representative molecules. Here, we describe a new approach in which small molecule force field parameters are jointly optimized guided by the rich source of information contained within thousands of available small molecule crystal structures. We optimize parameters by requiring that the experimentally determined molecular lattice arrangements have lower energy than all alternative lattice arrangements. Thousands of independent crystal lattice-prediction simulations were run on each of 1386 small molecule crystal structures, and energy function parameters of an implicit solvent energy model were optimized, so native crystal lattice arrangements had the lowest energy. The resulting energy model was implemented in Rosetta, together with a rapid genetic algorithm docking method employing grid-based scoring and receptor flexibility. The success rate of bound structure recapitulation in cross-docking on 1112 complexes was improved by more than 10% over previously published methods, with solutions within <1 Å in over half of the cases. Our results demonstrate that small molecule crystal structures are a rich source of information for guiding molecular force field development, and the improved Rosetta energy function should increase accuracy in a wide range of small molecule structure prediction and design studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jctc.0c01184DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8218654PMC
March 2021

Prediction of Protein Mutational Free Energy: Benchmark and Sampling Improvements Increase Classification Accuracy.

Front Bioeng Biotechnol 2020 8;8:558247. Epub 2020 Oct 8.

Cyrus Biotechnology, Seattle, WA, United States.

Software to predict the change in protein stability upon point mutation is a valuable tool for a number of biotechnological and scientific problems. To facilitate the development of such software and provide easy access to the available experimental data, the ProTherm database was created. Biases in the methods and types of information collected has led to disparity in the types of mutations for which experimental data is available. For example, mutations to alanine are hugely overrepresented whereas those involving charged residues, especially from one charged residue to another, are underrepresented. ProTherm subsets created as benchmark sets that do not account for this often underrepresent tense certain mutational types. This issue introduces systematic biases into previously published protocols' ability to accurately predict the change in folding energy on these classes of mutations. To resolve this issue, we have generated a new benchmark set with these problems corrected. We have then used the benchmark set to test a number of improvements to the point mutation energetics tools in the Rosetta software suite.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fbioe.2020.558247DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7579412PMC
October 2020

Efficient consideration of coordinated water molecules improves computational protein-protein and protein-ligand docking discrimination.

PLoS Comput Biol 2020 09 21;16(9):e1008103. Epub 2020 Sep 21.

Department of Biochemistry, University of Washington, Seattle, Washington, United States of America.

Highly coordinated water molecules are frequently an integral part of protein-protein and protein-ligand interfaces. We introduce an updated energy model that efficiently captures the energetic effects of these ordered water molecules on the surfaces of proteins. A two-stage method is developed in which polar groups arranged in geometries suitable for water placement are first identified, then a modified Monte Carlo simulation allows highly coordinated waters to be placed on the surface of a protein while simultaneously sampling amino acid side chain orientations. This "semi-explicit" water model is implemented in Rosetta and is suitable for both structure prediction and protein design. We show that our new approach and energy model yield significant improvements in native structure recovery of protein-protein and protein-ligand docking discrimination tests.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1008103DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529342PMC
September 2020

Macromolecular modeling and design in Rosetta: recent methods and frameworks.

Nat Methods 2020 07 1;17(7):665-680. Epub 2020 Jun 1.

Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA.

The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-020-0848-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7603796PMC
July 2020

Improved protein structure prediction using predicted interresidue orientations.

Proc Natl Acad Sci U S A 2020 01 2;117(3):1496-1503. Epub 2020 Jan 2.

Department of Biochemistry, University of Washington, Seattle, WA 98105;

The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1914677117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6983395PMC
January 2020

High-accuracy refinement using Rosetta in CASP13.

Proteins 2019 12 5;87(12):1276-1282. Epub 2019 Aug 5.

Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington.

Because proteins generally fold to their lowest free energy states, energy-guided refinement in principle should be able to systematically improve the quality of protein structure models generated using homologous structure or co-evolution derived information. However, because of the high dimensionality of the search space, there are far more ways to degrade the quality of a near native model than to improve it, and hence, refinement methods are very sensitive to energy function errors. In the 13th Critial Assessment of techniques for protein Structure Prediction (CASP13), we sought to carry out a thorough search for low energy states in the neighborhood of a starting model using restraints to avoid straying too far. The approach was reasonably successful in improving both regions largely incorrect in the starting models as well as core regions that started out closer to the correct structure. Models with GDT-HA over 70 were obtained for five targets and for one of those, an accuracy of 0.5 å backbone root-mean-square deviation (RMSD) was achieved. An important current challenge is to improve performance in refining oligomers and larger proteins, for which the search problem remains extremely difficult.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.25784DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6851472PMC
December 2019

De novo design of a fluorescence-activating β-barrel.

Nature 2018 09 12;561(7724):485-491. Epub 2018 Sep 12.

Department of Biochemistry, University of Washington, Seattle, WA, USA.

The regular arrangements of β-strands around a central axis in β-barrels and of α-helices in coiled coils contrast with the irregular tertiary structures of most globular proteins, and have fascinated structural biologists since they were first discovered. Simple parametric models have been used to design a wide range of α-helical coiled-coil structures, but to date there has been no success with β-barrels. Here we show that accurate de novo design of β-barrels requires considerable symmetry-breaking to achieve continuous hydrogen-bond connectivity and eliminate backbone strain. We then build ensembles of β-barrel backbone models with cavity shapes that match the fluorogenic compound DFHBI, and use a hierarchical grid-based search method to simultaneously optimize the rigid-body placement of DFHBI in these cavities and the identities of the surrounding amino acids to achieve high shape and chemical complementarity. The designs have high structural accuracy and bind and fluorescently activate DFHBI in vitro and in Escherichia coli, yeast and mammalian cells. This de novo design of small-molecule binding activity, using backbones custom-built to bind the ligand, should enable the design of increasingly sophisticated ligand-binding proteins, sensors and catalysts that are not limited by the backbone geometries available in known protein structures.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-018-0509-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6275156PMC
September 2018

GalaxyGPCRloop: Template-Based and Ab Initio Structure Sampling of the Extracellular Loops of G-Protein-Coupled Receptors.

J Chem Inf Model 2018 06 7;58(6):1234-1243. Epub 2018 Jun 7.

Department of Chemistry , Seoul National University , Seoul 08826 , Republic of Korea.

The second extracellular loops (ECL2s) of G-protein-coupled receptors (GPCRs) are often involved in GPCR functions, and their structures have important implications in drug discovery. However, structure prediction of ECL2 is difficult because of its long length and the structural diversity among different GPCRs. In this study, a new ECL2 conformational sampling method involving both template-based and ab initio sampling was developed. Inspired by the observation of similar ECL2 structures of closely related GPCRs, a template-based sampling method employing loop structure templates selected from the structure database was developed. A new metric for evaluating similarity of the target loop to templates was introduced for template selection. An ab initio loop sampling method was also developed to treat cases without highly similar templates. The ab initio method is based on the previously developed fragment assembly and loop closure method. A new sampling component that takes advantage of secondary structure prediction was added. In addition, a conserved disulfide bridge restraining ECL2 conformation was predicted and analytically incorporated into sampling, reducing the effective dimension of the conformational search space. The sampling method was combined with an existing energy function for comparison with previously reported loop structure prediction methods, and the benchmark test demonstrated outstanding performance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.8b00148DOI Listing
June 2018

Protein homology model refinement by large-scale energy optimization.

Proc Natl Acad Sci U S A 2018 03 5;115(12):3054-3059. Epub 2018 Mar 5.

Department of Biochemistry, University of Washington, Seattle, WA 98105;

Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1719115115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5866580PMC
March 2018

Sampling and energy evaluation challenges in ligand binding protein design.

Protein Sci 2017 Dec 30;26(12):2426-2437. Epub 2017 Oct 30.

Institute for Protein Design, University of Washington, Seattle, Washington.

The steroid hormone 17α-hydroxylprogesterone (17-OHP) is a biomarker for congenital adrenal hyperplasia and hence there is considerable interest in development of sensors for this compound. We used computational protein design to generate protein models with binding sites for 17-OHP containing an extended, nonpolar, shape-complementary binding pocket for the four-ring core of the compound, and hydrogen bonding residues at the base of the pocket to interact with carbonyl and hydroxyl groups at the more polar end of the ligand. Eight of 16 designed proteins experimentally tested bind 17-OHP with micromolar affinity. A co-crystal structure of one of the designs revealed that 17-OHP is rotated 180° around a pseudo-two-fold axis in the compound and displays multiple binding modes within the pocket, while still interacting with all of the designed residues in the engineered site. Subsequent rounds of mutagenesis and binding selection improved the ligand affinity to nanomolar range, while appearing to constrain the ligand to a single bound conformation that maintains the same "flipped" orientation relative to the original design. We trace the discrepancy in the design calculations to two sources: first, a failure to model subtle backbone changes which alter the distribution of sidechain rotameric states and second, an underestimation of the energetic cost of desolvating the carbonyl and hydroxyl groups of the ligand. The difference between design model and crystal structure thus arises from both sampling limitations and energy function inaccuracies that are exacerbated by the near two-fold symmetry of the molecule.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pro.3317DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5699494PMC
December 2017

Protein structure prediction using Rosetta in CASP12.

Proteins 2018 03 8;86 Suppl 1:113-121. Epub 2017 Oct 8.

Department of Biochemistry, University of Washington, Seattle, Washington.

We describe several notable aspects of our structure predictions using Rosetta in CASP12 in the free modeling (FM) and refinement (TR) categories. First, we had previously generated (and published) models for most large protein families lacking experimentally determined structures using Rosetta guided by co-evolution based contact predictions, and for several targets these models proved better starting points for comparative modeling than any known crystal structure-our model database thus starts to fulfill one of the goals of the original protein structure initiative. Second, while our "human" group simply submitted ROBETTA models for most targets, for six targets expert intervention improved predictions considerably; the largest improvement was for T0886 where we correctly parsed two discontinuous domains guided by predicted contact maps to accurately identify a structural homolog of the same fold. Third, Rosetta all atom refinement followed by MD simulations led to consistent but small improvements when starting models were close to the native structure, and larger but less consistent improvements when starting models were further away.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.25390DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6802495PMC
March 2018

Automatic structure prediction of oligomeric assemblies using Robetta in CASP12.

Proteins 2018 03 16;86 Suppl 1:283-291. Epub 2017 Oct 16.

Department of Biochemistry, University of Washington, Seattle, Washington, 98195.

Many naturally occurring protein systems function primarily as symmetric assemblies. Prediction of the quaternary structure of these assemblies is an important biological problem. This article describes automated tools we have developed for predicting the structures of symmetric protein assemblies in the Robetta structure prediction server. We assess the performance of this pipeline on a set of targets from the recent CASP12/CAPRI blind quaternary structure prediction experiment. Our approach successfully predicted 5 of 7 symmetric assemblies in this challenge, and was assessed as the best participating server group, and 1 of only 2 groups (human or server) with 2 predictions judged as high quality by the assessors. We also assess the method on a broader set of 22 natively symmetric CASP12 targets, where we show that oligomeric modeling can improve the accuracy of monomeric structure determination, particularly in highly intertwined oligomers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.25387DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019630PMC
March 2018

The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design.

J Chem Theory Comput 2017 Jun 12;13(6):3031-3048. Epub 2017 May 12.

Department of Chemical and Biomolecular Engineering, Johns Hopkins University , 3400 North Charles Street, Baltimore, Maryland 21218, United States.

Over the past decade, the Rosetta biomolecular modeling suite has informed diverse biological questions and engineering challenges ranging from interpretation of low-resolution structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parametrized from small-molecule and X-ray crystal structure data used to approximate the energy associated with each biomolecule conformation. This paper describes the mathematical models and physical concepts that underlie the latest Rosetta energy function, called the Rosetta Energy Function 2015 (REF15). Applying these concepts, we explain how to use Rosetta energies to identify and analyze the features of biomolecular models. Finally, we discuss the latest advances in the energy function that extend its capabilities from soluble proteins to also include membrane proteins, peptides containing noncanonical amino acids, small molecules, carbohydrates, nucleic acids, and other macromolecules.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jctc.7b00125DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5717763PMC
June 2017

Protein structure determination using metagenome sequence data.

Science 2017 Jan;355(6322):294-298

Department of Biochemistry, University of Washington, Seattle, WA 98105, USA.

Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aah4043DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5493203PMC
January 2017

Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules.

J Chem Theory Comput 2016 Dec 7;12(12):6201-6212. Epub 2016 Nov 7.

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center , 1100 Fairview Avenue N., Seattle, Washington 98019, United States.

Most biomolecular modeling energy functions for structure prediction, sequence design, and molecular docking have been parametrized using existing macromolecular structural data; this contrasts molecular mechanics force fields which are largely optimized using small-molecule data. In this study, we describe an integrated method that enables optimization of a biomolecular modeling energy function simultaneously against small-molecule thermodynamic data and high-resolution macromolecular structural data. We use this approach to develop a next-generation Rosetta energy function that utilizes a new anisotropic implicit solvation model, and an improved electrostatics and Lennard-Jones model, illustrating how energy functions can be considerably improved in their ability to describe large-scale energy landscapes by incorporating both small-molecule and macromolecule data. The energy function improves performance in a wide range of protein structure prediction challenges, including monomeric structure prediction, protein-protein and protein-ligand docking, protein sequence design, and prediction of the free energy changes by mutation, while reasonably recapitulating small-molecule thermodynamic properties.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jctc.6b00819DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5515585PMC
December 2016

Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11.

Proteins 2016 09 6;84 Suppl 1:181-8. Epub 2016 Mar 6.

Department of Biochemistry, University of Washington, Seattle, Washington, 98195.

In CASP11 we generated protein structure models using simulated ambiguous and unambiguous nuclear Overhauser effect (NOE) restraints with a two stage protocol. Low resolution models were generated guided by the unambiguous restraints using continuous chain folding for alpha and alpha-beta proteins, and iterative annealing for all beta proteins to take advantage of the strand pairing information implicit in the restraints. The Rosetta fragment/model hybridization protocol was then used to recombine and regularize these models, and refine them in the Rosetta full atom energy function guided by both the unambiguous and the ambiguous restraints. Fifteen out of 19 targets were modeled with GDT-TS quality scores greater than 60 for Model 1, significantly improving upon the non-assisted predictions. Our results suggest that atomic level accuracy is achievable using sparse NOE data when there is at least one correctly assigned NOE for every residue. Proteins 2016; 84(Suppl 1):181-188. © 2016 Wiley Periodicals, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.25006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5490372PMC
September 2016

Large-scale determination of previously unsolved protein structures using evolutionary information.

Elife 2015 Sep 3;4:e09248. Epub 2015 Sep 3.

Department of Biochemistry, University of Washington, Seattle, United States.

The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue-residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.09248DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4602095PMC
September 2015

High-resolution protein-protein docking by global optimization: recent advances and future challenges.

Curr Opin Struct Biol 2015 Dec 1;35:24-31. Epub 2015 Sep 1.

Department of Chemistry, Seoul National University, Seoul 151-747, Republic of Korea. Electronic address:

A computational protein-protein docking method that predicts atomic details of protein-protein interactions from protein monomer structures is an invaluable tool for understanding the molecular mechanisms of protein interactions and for designing molecules that control such interactions. Compared to low-resolution docking, high-resolution docking explores the conformational space in atomic resolution to provide predictions with atomic details. This allows for applications to more challenging docking problems that involve conformational changes induced by binding. Recently, high-resolution methods have become more promising as additional information such as global shapes or residue contacts are now available from experiments or sequence/structure data. In this review article, we highlight developments in high-resolution docking made during the last decade, specifically regarding global optimization methods employed by the docking methods. We also discuss two major challenges in high-resolution docking: prediction of backbone flexibility and water-mediated interactions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.sbi.2015.08.001DOI Listing
December 2015

CASP11 refinement experiments with ROSETTA.

Proteins 2016 09 14;84 Suppl 1:314-22. Epub 2015 Aug 14.

Department of Biochemistry, University of Washington, Seattle, Washington, 98195.

We report new Rosetta-based approaches to tackling the major issues that confound protein structure refinement, and the testing of these approaches in the CASP11 experiment. Automated refinement protocols were developed that integrate a range of sampling methods using parallel computation and multiobjective optimization. In CASP11, we used a more aggressive large-scale structure rebuilding approach for poor starting models, and a less aggressive local rebuilding plus core refinement approach for starting models likely to be closer to the native structure. The more incorrectly modeled a structure was predicted to be, the more it was allowed to vary during refinement. The CASP11 experiment revealed strengths and weaknesses of the approaches: the high-resolution strategy incorporating local rebuilding with core refinement consistently improved starting structures, while the low-resolution strategy incorporating the reconstruction of large parts of the structures improved starting models in some cases but often considerably worsened them, largely because of model selection issues. Overall, the results suggest the high-resolution refinement protocol is a promising method orthogonal to other approaches, while the low-resolution refinement method clearly requires further development. Proteins 2016; 84(Suppl 1):314-322. © 2015 Wiley Periodicals, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.24862DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4724349PMC
September 2016

The origin of consistent protein structure refinement from structural averaging.

Structure 2015 Jun 7;23(6):1123-8. Epub 2015 May 7.

Department of Biochemistry, University of Washington, Seattle, WA 98195, USA; Institute for Protein Design, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA. Electronic address:

Recent studies have shown that explicit solvent molecular dynamics (MD) simulation followed by structural averaging can consistently improve protein structure models. We find that improvement upon averaging is not limited to explicit water MD simulation, as consistent improvements are also observed for more efficient implicit solvent MD or Monte Carlo minimization simulations. To determine the origin of these improvements, we examine the changes in model accuracy brought about by averaging at the individual residue level. We find that the improvement in model quality from averaging results from the superposition of two effects: a dampening of deviations from the correct structure in the least well modeled regions, and a reinforcement of consistent movements towards the correct structure in better modeled regions. These observations are consistent with an energy landscape model in which the magnitude of the energy gradient toward the native structure decreases with increasing distance from the native state.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.str.2015.03.022DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456269PMC
June 2015

Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments.

PLoS One 2014 24;9(11):e113811. Epub 2014 Nov 24.

Department of Chemistry, Seoul National University, Seoul, Republic of Korea.

Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0113811PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4242723PMC
January 2016

Blind prediction of interfacial water positions in CAPRI.

Proteins 2014 Apr 23;82(4):620-32. Epub 2013 Nov 23.

Interdisciplinary Research Institute USR3078 CNRS, University Lille North of France, Villeneuve d'Ascq, France.

We report the first assessment of blind predictions of water positions at protein-protein interfaces, performed as part of the critical assessment of predicted interactions (CAPRI) community-wide experiment. Groups submitting docking predictions for the complex of the DNase domain of colicin E2 and Im2 immunity protein (CAPRI Target 47), were invited to predict the positions of interfacial water molecules using the method of their choice. The predictions-20 groups submitted a total of 195 models-were assessed by measuring the recall fraction of water-mediated protein contacts. Of the 176 high- or medium-quality docking models-a very good docking performance per se-only 44% had a recall fraction above 0.3, and a mere 6% above 0.5. The actual water positions were in general predicted to an accuracy level no better than 1.5 Å, and even in good models about half of the contacts represented false positives. This notwithstanding, three hotspot interface water positions were quite well predicted, and so was one of the water positions that is believed to stabilize the loop that confers specificity in these complexes. Overall the best interface water predictions was achieved by groups that also produced high-quality docking models, indicating that accurate modelling of the protein portion is a determinant factor. The use of established molecular mechanics force fields, coupled to sampling and optimization procedures also seemed to confer an advantage. Insights gained from this analysis should help improve the prediction of protein-water interactions and their role in stabilizing protein complexes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.24439DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582081PMC
April 2014

Community-wide evaluation of methods for predicting the effect of mutations on protein-protein interactions.

Proteins 2013 Nov 23;81(11):1980-7. Epub 2013 Aug 23.

Department of Biochemistry, University of Washington, Seattle, Washington, 98195.

Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side-chain sampling and backbone relaxation, evaluated packing, electrostatic, and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of both existing and new prediction methodologies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.24356DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4143140PMC
November 2013

GalaxyRefine: Protein structure refinement driven by side-chain repacking.

Nucleic Acids Res 2013 Jul 3;41(Web Server issue):W384-8. Epub 2013 Jun 3.

Department of Chemistry, Seoul National University, Seoul 151-747, Korea.

The quality of model structures generated by contemporary protein structure prediction methods strongly depends on the degree of similarity between the target and available template structures. Therefore, the importance of improving template-based model structures beyond the accuracy available from template information has been emphasized in the structure prediction community. The GalaxyRefine web server, freely available at http://galaxy.seoklab.org/refine, is based on a refinement method that has been successfully tested in CASP10. The method first rebuilds side chains and performs side-chain repacking and subsequent overall structure relaxation by molecular dynamics simulation. According to the CASP10 assessment, this method showed the best performance in improving the local structure quality. The method can improve both global and local structure quality on average, when used for refining the models generated by state-of-the-art protein structure prediction servers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt458DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692086PMC
July 2013

GalaxyGemini: a web server for protein homo-oligomer structure prediction based on similarity.

Bioinformatics 2013 Apr 14;29(8):1078-80. Epub 2013 Feb 14.

Department of Chemistry, Seoul National University, Seoul, Republic of Korea.

Summary: A large number of proteins function as homo-oligomers; therefore, predicting homo-oligomeric structure of proteins is of primary importance for understanding protein function at the molecular level. Here, we introduce a web server for prediction of protein homo-oligomer structure. The server takes a protein monomer structure as input and predicts its homo-oligomer structure from oligomer templates selected based on sequence and tertiary/quaternary structure similarity. Using protein model structures as input, the server shows clear improvement over the best methods of CASP9 in predicting oligomeric structures from amino acid sequences.

Availability: http://galaxy.seoklab.org/gemini.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt079DOI Listing
April 2013

GalaxyTBM: template-based modeling by building a reliable core and refining unreliable local regions.

BMC Bioinformatics 2012 Aug 10;13:198. Epub 2012 Aug 10.

Department of Chemistry, Seoul National University, Seoul, 151-747, Republic of Korea.

Background: Protein structures can be reliably predicted by template-based modeling (TBM) when experimental structures of homologous proteins are available. However, it is challenging to obtain structures more accurate than the single best templates by either combining information from multiple templates or by modeling regions that vary among templates or are not covered by any templates.

Results: We introduce GalaxyTBM, a new TBM method in which the more reliable core region is modeled first from multiple templates and less reliable, variable local regions, such as loops or termini, are then detected and re-modeled by an ab initio method. This TBM method is based on "Seok-server," which was tested in CASP9 and assessed to be amongst the top TBM servers. The accuracy of the initial core modeling is enhanced by focusing on more conserved regions in the multiple-template selection and multiple sequence alignment stages. Additional improvement is achieved by ab initio modeling of up to 3 unreliable local regions in the fixed framework of the core structure. Overall, GalaxyTBM reproduced the performance of Seok-server, with GalaxyTBM and Seok-server resulting in average GDT-TS of 68.1 and 68.4, respectively, when tested on 68 single-domain CASP9 TBM targets. For application to multi-domain proteins, GalaxyTBM must be combined with domain-splitting methods.

Conclusion: Application of GalaxyTBM to CASP9 targets demonstrates that accurate protein structure prediction is possible by use of a multiple-template-based approach, and ab initio modeling of variable regions can further enhance the model quality.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-13-198DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3462707PMC
August 2012
-->