Publications by authors named "Roman A Laskowski"

63 Publications

AlphaFold heralds a data-driven revolution in biology and medicine.

Nat Med 2021 Oct;27(10):1666-1669

European Bioinformatics Institute - European Molecular Biology Laboratory EMBL-EBI, South Building, Wellcome Genome Campus, Hinxton, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41591-021-01533-0DOI Listing
October 2021

A computational and structural analysis of germline and somatic variants affecting the DDR mechanism, and their impact on human diseases.

Sci Rep 2021 07 12;11(1):14268. Epub 2021 Jul 12.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

DNA-Damage Response (DDR) proteins are crucial for maintaining the integrity of the genome by identifying and repairing errors in DNA. Variants affecting their function can have severe consequences since failure to repair damaged DNA can result in cells turning cancerous. Here, we compare germline and somatic variants in DDR genes, specifically looking at their locations in the corresponding three-dimensional (3D) structures, Pfam domains, and protein-protein interaction interfaces. We show that somatic variants in metastatic cases are more likely to be found in Pfam domains and protein interaction interfaces than are pathogenic germline variants or variants of unknown significance (VUS). We also show that there are hotspots in the structures of ATM and BRCA2 proteins where pathogenic germline, and recurrent somatic variants from primary and metastatic tumours, cluster together in 3D. Moreover, in the ATM, BRCA1 and BRCA2 genes from prostate cancer patients, the distributions of germline benign, pathogenic, VUS, and recurrent somatic variants differ across Pfam domains. Together, these results provide a better characterisation of the most recurrent affected regions in DDRs and could help in the understanding of individual susceptibility to tumour development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-021-93715-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275599PMC
July 2021

Impact of Structural Observables From Simulations to Predict the Effect of Single-Point Mutations in MHC Class II Peptide Binders.

Front Mol Biosci 2021 30;8:636562. Epub 2021 Mar 30.

Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Medellin, Colombia.

The prediction of peptide binders to Major Histocompatibility Complex (MHC) class II receptors is of great interest to study autoimmune diseases and for vaccine development. Most approaches predict the affinities using sequence-based models trained on experimental data and multiple alignments from known peptide substrates. However, detecting activity differences caused by single-point mutations is a challenging task. In this work, we used interactions calculated from simulations to build scoring matrices for quickly estimating binding differences by single-point mutations. We modelled a set of 837 peptides bound to an MHC class II allele, and optimized the sampling of the conformations using the Rosetta backrub method by comparing the results to molecular dynamics simulations. From the dynamic trajectories of each complex, we averaged and compared structural observables for each amino acid at each position of the 9°mer peptide core region. With this information, we generated the scoring-matrices to predict the sign of the binding differences. We then compared the performance of the best scoring-matrix to different computational methodologies that range in computational costs. Overall, the prediction of the activity differences caused by single mutated peptides was lower than 60% for all the methods. However, the developed scoring-matrix in combination with existing methods reports an increase in the performance, up to 86% with a scoring method that uses molecular dynamics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmolb.2021.636562DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8253603PMC
March 2021

An automated protocol for modelling peptide substrates to proteases.

BMC Bioinformatics 2020 Dec 29;21(1):586. Epub 2020 Dec 29.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Background: Proteases are key drivers in many biological processes, in part due to their specificity towards their substrates. However, depending on the family and molecular function, they can also display substrate promiscuity which can also be essential. Databases compiling specificity matrices derived from experimental assays have provided valuable insights into protease substrate recognition. Despite this, there are still gaps in our knowledge of the structural determinants. Here, we compile a set of protease crystal structures with bound peptide-like ligands to create a protocol for modelling substrates bound to protease structures, and for studying observables associated to the binding recognition.

Results: As an application, we modelled a subset of protease-peptide complexes for which experimental cleavage data are available to compare with informational entropies obtained from protease-specificity matrices. The modelled complexes were subjected to conformational sampling using the Backrub method in Rosetta, and multiple observables from the simulations were calculated and compared per peptide position. We found that some of the calculated structural observables, such as the relative accessible surface area and the interaction energy, can help characterize a protease's substrate recognition, giving insights for the potential prediction of novel substrates by combining additional approaches.

Conclusion: Overall, our approach provides a repository of protease structures with annotated data, and an open source computational protocol to reproduce the modelling and dynamic analysis of the protease-peptide complexes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03931-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771086PMC
December 2020

VarSite: Disease variants and protein structure.

Protein Sci 2020 01 27;29(1):111-119. Epub 2019 Oct 27.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

VarSite is a web server mapping known disease-associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image-based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease-associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at: https://www.ebi.ac.uk/thornton-srv/databases/VarSite.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pro.3746DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933866PMC
January 2020

VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations.

Bioinformatics 2019 11;35(22):4854-4856

European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.

Motivation: Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence.

Results: Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information.

Availability And Implementation: https://www.ebi.ac.uk/thornton-srv/databases/VarMap.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz482DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853667PMC
November 2019

Protein structure and phenotypic analysis of pathogenic and population missense variants in .

Mol Genet Genomic Med 2017 Sep 20;5(5):495-507. Epub 2017 Jun 20.

Wellcome Trust Sanger InstituteWellcome Genome Campus, HinxtonCambridgeCB1 8RQUK.

Background: Syntaxin-binding protein 1, encoded by , is highly expressed in the brain and involved in fusing synaptic vesicles with the plasma membrane. Studies have shown that pathogenic loss-of-function variants in this gene result in various types of epilepsies, mostly beginning early in life. We were interested to model pathogenic missense variants on the protein structure to investigate the mechanism of pathogenicity and genotype-phenotype correlations.

Methods: We report 11 patients with pathogenic de novo mutations in identified in the first 4293 trios of the Deciphering Developmental Disorder (DDD) study, including six missense variants. We analyzed the structural locations of the pathogenic missense variants from this study and the literature, as well as population missense variants extracted from Exome Aggregation Consortium (ExAC).

Results: Pathogenic variants are significantly more likely to occur at highly conserved locations than population variants, and be buried inside the protein domain. Pathogenic mutations are also more likely to destabilize the domain structure compared with population variants, increasing the proportion of (partially) unfolded domains that are prone to aggregation or degradation. We were unable to detect any genotype-phenotype correlation, but unlike previously reported cases, most of the DDD patients with pathogenic variants did not present with very early-onset or severe epilepsy and encephalopathy, though all have developmental delay with intellectual disability and most display behavioral problems and suffered seizures in later childhood.

Conclusion: Variants across that cause loss of function can result in severe intellectual disability with or without seizures, consistent with a haploinsufficiency mechanism. Pathogenic missense mutations act through destabilization of the protein domain, making it prone to aggregation or degradation. The presence or absence of early seizures may reflect ascertainment bias in the literature as well as the broad recruitment strategy of the DDD study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/mgg3.304DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5606886PMC
September 2017

PDBsum: Structural summaries of PDB entries.

Protein Sci 2018 01 27;27(1):129-134. Epub 2017 Oct 27.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.

PDBsum is a web server providing structural information on the entries in the Protein Data Bank (PDB). The analyses are primarily image-based and include protein secondary structure, protein-ligand and protein-DNA interactions, PROCHECK analyses of structural quality, and many others. The 3D structures can be viewed interactively in RasMol, PyMOL, and a JavaScript viewer called 3Dmol.js. Users can upload their own PDB files and obtain a set of password-protected PDBsum analyses for each. The server is freely accessible to all at: http://www.ebi.ac.uk/pdbsum.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pro.3289DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5734310PMC
January 2018

The ProFunc Function Prediction Server.

Methods Mol Biol 2017 ;1611:75-95

European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

The ProFunc web server is a tool for helping identify the function of a given protein whose 3D coordinates have been experimentally determined or homology modeled. It uses a cocktail of both sequence- and structure-based methods to identify matches to other proteins that may, in turn, suggest the query protein's most likely function. The server was originally developed to aid the worldwide structural genomics effort at the start of the millennium. It accepts a file containing the protein's 3D coordinates in PDB format, and, when processing is complete, sends an email containing a link to the password-protected result pages. The results include an at-a-glance summary, as well as separate pages containing more detailed analyses. The server can be found at: http://www.ebi.ac.uk/thornton-srv/databases/profunc .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-7015-5_7DOI Listing
February 2018

Structural analysis of pathogenic mutations in the DYRK1A gene in patients with developmental disorders.

Hum Mol Genet 2017 02;26(3):519-526

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

Haploinsufficiency in DYRK1A is associated with a recognizable developmental syndrome, though the mechanism of action of pathogenic missense mutations is currently unclear. Here we present 19 de novo mutations in this gene, including five missense mutations, identified by the Deciphering Developmental Disorder study. Protein structural analysis reveals that the missense mutations are either close to the ATP or peptide binding-sites within the kinase domain, or are important for protein stability, suggesting they lead to a loss of the protein's function mechanism. Furthermore, there is some correlation between the magnitude of the change and the severity of the resultant phenotype. A comparison of the distribution of the pathogenic mutations along the length of DYRK1A with that of natural variants, as found in the ExAC database, confirms that mutations in the N-terminal end of the kinase domain are more disruptive of protein function. In particular, pathogenic mutations occur in significantly closer proximity to the ATP and the substrate peptide than the natural variants. Overall, we suggest that de novo dominant mutations in DYRK1A account for nearly 0.5% of severe developmental disorders due to substantially reduced kinase function.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddw409DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5409128PMC
February 2017

Chopping and Changing: the Evolution of the Flavin-dependent Monooxygenases.

J Mol Biol 2016 07 14;428(15):3131-46. Epub 2016 Jul 14.

EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Electronic address:

Flavin-dependent monooxygenases play a variety of key physiological roles and are also very powerful biotechnological tools. These enzymes have been classified into eight different classes (A-H) based on their sequences and biochemical features. By combining structural and sequence analysis, and phylogenetic inference, we have explored the evolutionary history of classes A, B, E, F, and G and demonstrate that their multidomain architectures reflect their phylogenetic relationships, suggesting that the main evolutionary steps in their divergence are likely to have arisen from the recruitment of different domains. Additionally, the functional divergence within in each class appears to have been the result of other mechanisms such as a complex set of single-point mutations. Our results reinforce the idea that a main constraint on the evolution of cofactor-dependent enzymes is the functional binding of the cofactor. Additionally, a remarkable feature of this family is that the sequence of the key flavin adenine dinucleotide-binding domain is split into at least two parts in all classes studied here. We propose a complex set of evolutionary events that gave rise to the origin of the different classes within this family.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmb.2016.07.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4981433PMC
July 2016

Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis.

PLoS One 2016 6;11(7):e0158704. Epub 2016 Jul 6.

Institute of Organic Chemistry and Biochemistry, Prague 6, Czech Republic.

Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue-amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein-DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158704PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934765PMC
July 2017

Rising levels of atmospheric oxygen and evolution of Nrf2.

Sci Rep 2016 06 14;6:27740. Epub 2016 Jun 14.

Department of Chemistry, King's College London, United Kingdom.

In mammals, the master transcription regulator of antioxidant defences is provided by the Nrf2 protein. Phylogenetic analyses of Nrf2 sequences are used here to derive a molecular clock that manifests persuasive evidence that Nrf2 orthologues emerged, and then diverged, at two time points that correlate with well-established geochemical and palaeobiological chronologies during progression of the 'Great Oxygenation Event'. We demonstrate that orthologues of Nrf2 first appeared in fungi around 1.5 Ga during the Paleoproterozoic when photosynthetic oxygen was being absorbed into the oceans. A subsequent significant divergence in Nrf2 is seen during the split between fungi and the Metazoa approximately 1.0-1.2 Ga, at a time when oceanic ventilation released free oxygen to the atmosphere, but with most being absorbed by methane oxidation and oxidative weathering of land surfaces until approximately 800 Ma. Atmospheric oxygen levels thereafter accumulated giving rise to metazoan success known as the Cambrian explosion commencing at ~541 Ma. Atmospheric O2 levels then rose in the mid Paleozoic (359-252 Ma), and Nrf2 diverged once again at the division between mammals and non-mammalian vertebrates during the Permian-Triassic boundary (~252 Ma). Understanding Nrf2 evolution as an effective antioxidant response may have repercussions for improved human health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep27740DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4906274PMC
June 2016

BetaSCPWeb: side-chain prediction for protein structures using Voronoi diagrams and geometry prioritization.

Nucleic Acids Res 2016 07 5;44(W1):W416-23. Epub 2016 May 5.

School of Mechanical Engineering, Hanyang University, Korea

Many applications, such as protein design, homology modeling, flexible docking, etc. require the prediction of a protein's optimal side-chain conformations from just its amino acid sequence and backbone structure. Side-chain prediction (SCP) is an NP-hard energy minimization problem. Here, we present BetaSCPWeb which efficiently computes a conformation close to optimal using a geometry-prioritization method based on the Voronoi diagram of spherical atoms. Its outputs are visual, textual and PDB file format. The web server is free and open to all users at http://voronoi.hanyang.ac.kr/betascpweb with no login requirement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw368DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987919PMC
July 2016

Protein Structure Databases.

Methods Mol Biol 2016 ;1415:31-53

European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses, and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds-particularly useful for newly solved structures, and especially those of unknown function. Beyond these are a vast number of databases for the more specialized user, dealing with specific families, diseases, structural features, and so on.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-3572-7_2DOI Listing
December 2017

Large-Scale Quantitative Assessment of Binding Preferences in Protein-Nucleic Acid Complexes.

J Chem Theory Comput 2015 Apr;11(4):1939-48

The growing number of high-quality experimental (X-ray, NMR) structures of protein–DNA complexes has sufficient enough information to assess whether universal rules governing the DNA sequence recognition process apply. While previous studies have investigated the relative abundance of various modes of amino acid–base contacts (van der Waals contacts, hydrogen bonds), relatively little is known about the energetics of these noncovalent interactions. In the present study, we have performed the first large-scale quantitative assessment of binding preferences in protein–DNA complexes by calculating the interaction energies in all 80 possible amino acid–DNA base combinations. We found that several mutual amino acid–base orientations featuring bidentate hydrogen bonds capable of unambiguous one-to-one recognition correspond to unique minima in the potential energy space of the amino acid–base pairs. A clustering algorithm revealed that these contacts form a spatially well-defined group offering relatively little conformational freedom. Various molecular mechanics force field and DFT-D ab initio calculations were performed, yielding similar results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/ct501168nDOI Listing
April 2015

Integrating population variation and protein structural analysis to improve clinical interpretation of missense variation: application to the WD40 domain.

Hum Mol Genet 2016 Mar 5;25(5):927-35. Epub 2016 Jan 5.

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK,

We present a generic, multidisciplinary approach for improving our understanding of novel missense variants in recently discovered disease genes exhibiting genetic heterogeneity, by combining clinical and population genetics with protein structural analysis. Using six new de novo missense diagnoses in TBL1XR1 from the Deciphering Developmental Disorders study, together with population variation data, we show that the β-propeller structure of the ubiquitous WD40 domain provides a convincing way to discriminate between pathogenic and benign variation. Children with likely pathogenic mutations in this gene have severely delayed language development, often accompanied by intellectual disability, autism, dysmorphology and gastrointestinal problems. Amino acids affected by likely pathogenic missense mutations are either crucial for the stability of the fold, forming part of a highly conserved symmetrically repeating hydrogen-bonded tetrad, or located at the top face of the β-propeller, where 'hotspot' residues affect the binding of β-catenin to the TBLR1 protein. In contrast, those altered by population variation are significantly less likely to be spatially clustered towards the top face or to be at buried or highly conserved residues. This result is useful not only for interpreting benign and pathogenic missense variants in this gene, but also in other WD40 domains, many of which are associated with disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddv625DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754046PMC
March 2016

Proteins: interaction at a distance.

IUCrJ 2015 Nov 30;2(Pt 6):609-10. Epub 2015 Oct 30.

European Bioinformatics Institute (EMBL-EBI) , Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

How do the surface side chains of a protein behave when it binds to another protein? Do they optimize interactions by crumpling inwards or by extending outwards?
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1107/S2052252515020217DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4645104PMC
November 2015

Representative Amino Acid Side-Chain Interactions in Protein-DNA Complexes: A Comparison of Highly Accurate Correlated Ab Initio Quantum Mechanical Calculations and Efficient Approaches for Applications to Large Systems.

J Chem Theory Comput 2015 Sep 6;11(9):4086-92. Epub 2015 Aug 6.

Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic , 166 10 Prague, Czech Republic.

Representative pairs of amino acid side chains and nucleic acid bases extracted from available high-quality structures of protein-DNA complexes were analyzed using a range of computational methods. CCSD(T)/CBS interaction energies were calculated for the chosen 272 pairs. These reference interaction energies were used to test the MP2.5/CBS, MP2.X/CBS, MP2-F12, DFT-D3, PM6, and Amber force field methods. Method MP2.5 provided excellent agreement with reference data (root-mean-square error (RMSE) of 0.11 kcal/mol), which is more than 1 order of magnitude faster than the CCSD(T) method. When MP2-F12 and MP2.5 were combined, the results were within reasonable accuracy (0.20 kcal/mol), with a computational savings of almost 2 orders of magnitude. Therefore, this method is a promising tool for accurate calculations of interaction energies in protein-DNA motifs of up to ∼100 atoms, for which CCSD(T)/CBS benchmark calculations are not feasible. B3-LYP-D3 calculated with def2-TZVPP and def2-QZVP basis sets yielded sufficiently good results with a reasonably small RMSE. This method provided better results for neutral systems, whereas positively charged species exhibited the worst agreement with the benchmark data. The Amber force field yielded unbalanced results-performing well for systems containing nonpolar amino acids but severely underestimating interaction energies for charged complexes. The semiempirical PM6 method with corrections for hydrogen bonding and dispersion energy (PM6-D3H4) exhibited considerably smaller error than the Amber force field, which makes it an effective tool for modeling extended protein-ligand complexes (of up to 10,000 atoms).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jctc.5b00398DOI Listing
September 2015

BetaCavityWeb: a webserver for molecular voids and channels.

Nucleic Acids Res 2015 Jul 22;43(W1):W413-8. Epub 2015 Apr 22.

Vorononi Diagram Research Center, Hanyang University, Korea School of Mechanical Engineering, Hanyang University, Korea

Molecular cavities, which include voids and channels, are critical for molecular function. We present a webserver, BetaCavityWeb, which computes these cavities for a given molecular structure and a given spherical probe, and reports their geometrical properties: volume, boundary area, buried area, etc. The server's algorithms are based on the Voronoi diagram of atoms and its derivative construct: the beta-complex. The correctness of the computed result and computational efficiency are both mathematically guaranteed. BetaCavityWeb is freely accessible at the Voronoi Diagram Research Center (VDRC) (http://voronoi.hanyang.ac.kr/betacavityweb).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv360DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489219PMC
July 2015

Anatomy of enzyme channels.

BMC Bioinformatics 2014 Nov 18;15:379. Epub 2014 Nov 18.

Department of Physical Chemistry, Regional Centre of Advanced Technologies and Materials, Faculty of Science, Palacký University Olomouc, tř. 17. listopadu 12, Olomouc, 771 46, Czech Republic.

Background: Enzyme active sites can be connected to the exterior environment by one or more channels passing through the protein. Despite our current knowledge of enzyme structure and function, surprisingly little is known about how often channels are present or about any structural features such channels may have in common.

Results: Here, we analyze the long channels (i.e. >15 Å) leading to the active sites of 4,306 enzyme structures. We find that over 64% of enzymes contain two or more long channels, their typical length being 28 Å. We show that amino acid compositions of the channel significantly differ both to the composition of the active site, surface and interior of the protein.

Conclusions: The majority of enzymes have buried active sites accessible via a network of access channels. This indicates that enzymes tend to have buried active sites, with channels controlling access to, and egress from, them, and that suggests channels may play a key role in helping determine enzyme substrate.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-014-0379-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4245731PMC
November 2014

CATH: comprehensive structural and functional annotations for genome sequences.

Nucleic Acids Res 2015 Jan 27;43(Database issue):D376-81. Epub 2014 Oct 27.

Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK.

The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku947DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384018PMC
January 2015

BetaVoid: molecular voids via beta-complexes and Voronoi diagrams.

Proteins 2014 Sep 20;82(9):1829-49. Epub 2014 Mar 20.

Vorononi Diagram Research Center, Hanyang University, Korea.

Molecular external structure is important for molecular function, with voids on the surface and interior being one of the most important features. Hence, recognition of molecular voids and accurate computation of their geometrical properties, such as volume, area and topology, are crucial, yet most popular algorithms are based on the crude use of sampling points and thus are approximations even with a significant amount of computation. In this article, we propose an analytic approach to the problem using the Voronoi diagram of atoms and the beta-complex. The correctness and efficiency of the proposed algorithm is mathematically proved and experimentally verified. The benchmark test clearly shows the superiority of BetaVoid to two popular programs: VOIDOO and CASTp. The proposed algorithm is implemented in the BetaVoid program which is freely available at the Voronoi Diagram Research Center (http://voronoi.hanyang.ac.kr).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.24537DOI Listing
September 2014

Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.

PLoS Comput Biol 2013 12;9(12):e1003382. Epub 2013 Dec 12.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genomes Campus, Cambridge, Cambridgeshire, United Kingdom.

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1003382DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861039PMC
August 2014

LigSearch: a knowledge-based web server to identify likely ligands for a protein target.

Acta Crystallogr D Biol Crystallogr 2013 Dec 19;69(Pt 12):2395-402. Epub 2013 Nov 19.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England.

Identifying which ligands might bind to a protein before crystallization trials could provide a significant saving in time and resources. LigSearch, a web server aimed at predicting ligands that might bind to and stabilize a given protein, has been developed. Using a protein sequence and/or structure, the system searches against a variety of databases, combining available knowledge, and provides a clustered and ranked output of possible ligands. LigSearch can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/LigSearch.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1107/S0907444913022294DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852652PMC
December 2013

PDBsum additions.

Nucleic Acids Res 2014 Jan 22;42(Database issue):D292-6. Epub 2013 Oct 22.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and Department of Physical Chemistry, Regional Centre of Advanced Technologies and Materials, Faculty of Science, Palacký University Olomouc, tř. 17. listopadu 12, 771 46 Olomouc, Czech Republic.

PDBsum, http://www.ebi.ac.uk/pdbsum, is a website providing numerous pictorial analyses of each entry in the Protein Data Bank. It portrays the structural features of all proteins, DNA and ligands in the entry, as well as depicting the interactions between them. The latest features, described here, include annotation of human protein sequences with their naturally occurring amino acid variants, dynamic graphs showing the relationships between related protein domain architectures, analyses of ligand binding clusters across different experimental determinations of the same protein, analyses of tunnels in proteins and new search options.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt940DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965036PMC
January 2014

Abstracting knowledge from the Protein Data Bank.

Biopolymers 2013 Mar 29;99(3):183-8. Epub 2012 Sep 29.

EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

In the 40 years since its inception, the Protein Data Bank (PDB) has amassed over 80,000 experimentally determined structural models of proteins, plus many models of DNA and RNA fragments. The majority of the protein models have contributed, in some way, to an understanding of their particular protein's function, be it through the conformation of its catalytic residues, the details of its interactions with other proteins, substrate molecules, DNA, and so on. However, the totality of the data in the PDB provides a rich source of more generalized knowledge about proteins, their molecular biology, and evolution. Here, we describe how the focus of protein structural analysis has developed over the past 40 years. © 2012 Wiley Periodicals, Inc. Biopolymers 99: 183-188, 2013.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/bip.22107DOI Listing
March 2013

Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies.

PLoS Comput Biol 2012 1;8(3):e1002403. Epub 2012 Mar 1.

EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1002403DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3291543PMC
June 2012

FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies.

Nucleic Acids Res 2012 Jan 17;40(Database issue):D776-82. Epub 2011 Oct 17.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr852DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245072PMC
January 2012

Exome sequencing identifies a missense mutation in Isl1 associated with low penetrance otitis media in dearisch mice.

Genome Biol 2011 Sep 21;12(9):R90. Epub 2011 Sep 21.

Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

Background: Inflammation of the middle ear (otitis media) is very common and can lead to serious complications if not resolved. Genetic studies suggest an inherited component, but few of the genes that contribute to this condition are known. Mouse mutants have contributed significantly to the identification of genes predisposing to otitis media

Results: The dearisch mouse mutant is an ENU-induced mutant detected by its impaired Preyer reflex (ear flick in response to sound). Auditory brainstem responses revealed raised thresholds from as early as three weeks old. Pedigree analysis suggested a dominant but partially penetrant mode of inheritance. The middle ear of dearisch mutants shows a thickened mucosa and cellular effusion suggesting chronic otitis media with effusion with superimposed acute infection. The inner ear, including the sensory hair cells, appears normal. Due to the low penetrance of the phenotype, normal backcross mapping of the mutation was not possible. Exome sequencing was therefore employed to identify a non-conservative tyrosine to cysteine (Y71C) missense mutation in the Islet1 gene, Isl1(Drsh). Isl1 is expressed in the normal middle ear mucosa. The findings suggest the Isl1(Drsh) mutation is likely to predispose carriers to otitis media.

Conclusions: Dearisch, Isl1(Drsh), represents the first point mutation in the mouse Isl1 gene and suggests a previously unrecognized role for this gene. It is also the first recorded exome sequencing of the C3HeB/FeJ background relevant to many ENU-induced mutants. Most importantly, the power of exome resequencing to identify ENU-induced mutations without a mapped gene locus is illustrated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2011-12-9-r90DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308053PMC
September 2011
-->