Publications by authors named "Pralay Mitra"

22 Publications

  • Page 1 of 1

High-Performance Whole-Cell Simulation Exploiting Modular Cell Biology Principles.

J Chem Inf Model 2021 Mar 8;61(3):1481-1492. Epub 2021 Mar 8.

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal 721302, India.

One of the grand challenges of this century is modeling and simulating a whole cell. Extreme regulation of an extensive quantity of model and simulation data during whole-cell modeling and simulation renders it a computationally expensive research problem in systems biology. In this article, we present a high-performance whole-cell simulation exploiting modular cell biology principles. We prepare the simulation by dividing the unicellular bacterium, (), into subcells utilizing the spatially localized densely connected protein clusters/modules. We set up a Brownian dynamics-based parallel whole-cell simulation framework by utilizing the Hamiltonian mechanics-based equations of motion. Though the velocity Verlet integration algorithm possesses the capability of solving the equations of motion, it lacks the ability to capture and deal with particle-collision scenarios. Hence, we propose an algorithm for detecting and resolving both elastic and inelastic collisions and subsequently modify the velocity Verlet integrator by incorporating our algorithm into it. Also, we address the boundary conditions to arrest the molecules' motion outside the subcell. For efficiency, we define one hashing-based data structure called the cellular dictionary to store all of the subcell-related information. A benchmark analysis of our CUDA C/C++ simulation code when tested on using the CPU-GPU cluster indicates that the computational time requirement decreases with the increase in the number of computing cores and becomes stable at around 128 cores. Additional testing on higher organisms such as and informs us that our proposed work can be extended to any organism and is scalable for high-end CPU-GPU clusters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.0c01282DOI Listing
March 2021

Estimating Change in Foldability Due to Multipoint Deletions in Protein Structures.

J Chem Inf Model 2020 12 22;60(12):6679-6690. Epub 2020 Nov 22.

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India.

Insertions/deletions of amino acids in the protein backbone potentially result in altered structural/functional specifications. They can either contribute positively to the evolutionary process or can result in disease conditions. Despite being the second most prevalent form of protein modification, there are no databases or computational frameworks that delineate harmful multipoint deletions (MPD) from beneficial ones. We introduce a positive unlabeled learning-based prediction framework (PROFOUND) that utilizes fold-level attributes, environment-specific properties, and deletion site-specific properties to predict the change in foldability arising from such MPDs, both in the non-loop and loop regions of protein structures. In the absence of any protein structure dataset to study MPDs, we introduce a dataset with 153 MPD instances that lead to native-like folded structures and 7650 unlabeled MPD instances whose effect on the foldability of the corresponding proteins is unknown. PROFOUND on 10-fold cross-validation on our newly introduced dataset reports a recall of 82.2% (86.6%) and a fall out rate (FR) of 14.2% (20.6%), corresponding to MPDs in the protein loop (non-loop) region. The low FR suggests that the foldability in proteins subject to MPDs is not random and necessitates unique specifications of the deleted region. In addition, we find that additional evolutionary attributes contribute to higher recall and lower FR. The first of a kind foldability prediction system owing to MPD instances and the newly introduced dataset will potentially aid in novel protein engineering endeavors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.0c00802DOI Listing
December 2020

Ebola Virus VP35 Protein: Modeling of the Tetrameric Structure and an Analysis of Its Interaction with Human PKR.

J Proteome Res 2020 11 18;19(11):4533-4542. Epub 2020 Sep 18.

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India.

The Viral Protein 35 (VP35), a crucial protein of the Zaire Ebolavirus (EBOV), interacts with a plethora of human proteins to cripple the human immune system. Despite its importance, the entire structure of the tetrameric assembly of EBOV VP35 and the means by which it antagonizes the autophosphorylation of the kinase domain of human protein kinase R (PKR) is still elusive. We consult existing structural information to model a tetrameric assembly of the VP35 protein where 93% of the protein is modeled using crystal structure templates. We analyze our modeled tetrameric structure to identify interchain bonding networks and use molecular dynamics simulations and normal-mode analysis to unravel the flexibility and deformability of the different regions of the VP35 protein. We establish that the C-terminal of VP35 (VP35) directly interacts with PKR to prevent it from autophosphorylation. Further, we identify three plausible VP35-PKR complexes with better affinity than the PKR dimer formed during autophosphorylation and use protein design to establish a new stretch in VP35 that interacts with PKR. The proposed tetrameric assembly will aid in better understanding of the VP35 protein, and the reported VP35-PKR complexes along with their interacting sites will help in the shortlisting of small molecule inhibitors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.0c00473DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7640970PMC
November 2020

ETV6 gene aberrations in non-haematological malignancies: A review highlighting ETV6 associated fusion genes in solid tumors.

Biochim Biophys Acta Rev Cancer 2020 08 10;1874(1):188389. Epub 2020 Jul 10.

School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India. Electronic address:

ETV6 (translocation-Ets-leukemia virus) gene is a transcriptional repressor mainly involved in haematopoiesis and maintenance of vascular networks and has developed to be a major oncogene with the potential ability of forming fusion partners with many other genes with carcinogenic consequences. ETV6 fusions function primarily by constitutive activation of kinase activity of the fusion partners, modifications in the normal functions of ETV6 transcription factor, loss of function of ETV6 or the partner gene and activation of a proto-oncogene near the site of translocation. The role of ETV6 fusion gene in tumorigenesis has been well-documented and more variedly found in haematological malignancies. However, the role of the ETV6 oncogene in solid tumors has also risen to prominence due to an increasing number of cases being reported with this malignancy. Since, solid tumors can be well-targeted, the diagnosis of this genre of tumors based on ETV6 malignancy is of crucial importance for treatment. This review highlights the important ETV6 associated fusions in solid tumors along with critical insights as to existing and novel means of targeting it. A consolidation of novel therapies such as immune, gene, RNAi, stem cell therapy and protein degradation hitherto unused in the case of ETV6 solid tumor malignancies may open further therapeutic avenues.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.bbcan.2020.188389DOI Listing
August 2020

Estimating the Effect of Single-Point Mutations on Protein Thermodynamic Stability and Analyzing the Mutation Landscape of the p53 Protein.

J Chem Inf Model 2020 06 21;60(6):3315-3323. Epub 2020 May 21.

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India.

Nonsynonymous single-nucleotide polymorphisms often result in altered protein stability while playing crucial roles both in the evolution process and in the development of human diseases. Prediction of change in the thermodynamic stability due to such missense mutations will help in protein engineering endeavors and will contribute to a better understanding of different disease conditions. Here, we develop a machine-learning-based framework, viz., ProTSPoM, to estimate the change in protein thermodynamic stability arising out of single-point mutations (SPMs). ProTSPoM outperforms existing methods on the S2648 and S1925 databases and reports a Pearson correlation coefficient of 0.82 (0.88) and a root-mean-squared-error of 0.92 (1.06) kcal/mol between the predicted and experimental ΔΔ values on the long-established S350 (tumor suppressor p53 protein) data set. Further, we estimate the change in thermodynamic stability for all possible SPMs in the DNA binding domain of the p53 protein. We identify single-nucleotide polymorphisms in p53 which are plausibly detrimental to its structural integrity and interaction affinity with the DNA molecule. ProTSPoM with its reliable estimates and time-efficient prediction is well suited to be integrated with existing protein engineering techniques. The ProTSPoM web server is accessible at http://cosmos.iitkgp.ac.in/ProTSPoM/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.0c00256DOI Listing
June 2020

Boosting phosphorylation site prediction with sequence feature-based machine learning.

Proteins 2020 02 22;88(2):284-291. Epub 2019 Aug 22.

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India.

Protein phosphorylation is one of the essential posttranslation modifications playing a vital role in the regulation of many fundamental cellular processes. We propose a LightGBM-based computational approach that uses evolutionary, geometric, sequence environment, and amino acid-specific features to decipher phosphate binding sites from a protein sequence. Our method, while compared with other existing methods on 2429 protein sequences taken from standard Phospho.ELM (P.ELM) benchmark data set featuring 11 organisms reports a higher F score = 0.504 (harmonic mean of the precision and recall) and ROC AUC = 0.836 (area under the curve of the receiver operating characteristics). The computation time of our proposed approach is much less than that of the recently developed deep learning-based framework. Structural analysis on selected protein sequences informs that our prediction is the superset of the phosphorylation sites, as mentioned in P.ELM data set. The foundation of our scheme is manual feature engineering and a decision tree-based classification. Hence, it is intuitive, and one can interpret the final tree as a set of rules resulting in a deeper understanding of the relationships between biophysical features and phosphorylation sites. Our innovative problem transformation method permits more control over precision and recall as is demonstrated by the fact that if we incorporate output probability of the existing deep learning framework as an additional feature, then our prediction improves (F score = 0.546; ROC AUC = 0.849). The implementation of our method can be accessed at http://cse.iitkgp.ac.in/~pralay/resources/PPSBoost/ and is mirrored at https://cosmos.iitkgp.ac.in/PPSBoost.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.25801DOI Listing
February 2020

An Evolutionary Profile Guided Greedy Parallel Replica-Exchange Monte Carlo Search Algorithm for Rapid Convergence in Protein Design.

IEEE/ACM Trans Comput Biol Bioinform 2021 Mar-Apr;18(2):489-499. Epub 2021 Apr 8.

Protein design, also known as the inverse protein folding problem, is the identification of a protein sequence that folds into a target protein structure. Protein design is proved as an NP-hard problem. While researchers are working on designing heuristics with an emphasis on new scoring functions, we propose a replica-exchange Monte Carlo (REMC) search algorithm that ensures faster convergence using a greedy strategy. Using biological insights, we construct an evolutionary profile to encode the amino acid variability in different positions of the target protein from its structural homologs. The evolutionary profile guides the REMC search, and the greedy approach confirms appreciable exploration and exploitation of the sequence-structure fitness surface. We allow termination of a simulation trajectory once stagnant situation is detected. A series of sequence and structure level validations establish the goodness of our design. On a benchmark dataset, our algorithm reports an average root-mean-square deviation of 1.21Å between the target and the design proteins when modeled with an existing protein folding software. Besides, our algorithm assures 6.16 times overall speedup. In Molecular Dynamics simulations, we observe that four out of selected five design proteins report better to comparable stability to the corresponding target proteins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2019.2928809DOI Listing
April 2021

Delineation of crosstalk between HSP27 and MMP-2/MMP-9: A synergistic therapeutic avenue for glioblastoma management.

Biochim Biophys Acta Gen Subj 2019 07 25;1863(7):1196-1209. Epub 2019 Apr 25.

School of Medical Science and Technology, Indian Institute of Technology, Kharagpur, India. Electronic address:

Background: Epithelial to mesenchymal transition (EMT) and extracellular matrix (ECM) remodeling, are the two elemental processes promoting glioblastoma (GBM). In the present work we propose a mechanistic modelling of GBM and in process establish a hypothesis elucidating critical crosstalk between heat shock proteins (HSPs) and matrix metalloproteinases (MMPs) with synergistic upregulation of EMT-like process and ECM remodeling.

Methods: The interaction and the precise binding site between the HSP and MMP proteins was assayed computationally, in-vitro and in GBM clinical samples.

Results: A positive crosstalk of HSP27 with MMP-2 and MMP-9 was established in both GBM patient tissues and cell-lines. This association was found to be of prime significance for ECM remodeling and promotion of EMT-like characteristics. In-silico predictions revealed 3 plausible interaction sites of HSP27 interacting with MMP-2 and MMP-9. Site-directed mutagenesis followed by in-vitro immunoprecipitation assay (IP) with 3 mutated recombinant HSP27, confirmed an interface stretch containing residues 29-40 of HSP27 to be a common interaction site for both MMP-2 and MMP-9. This was further validated with in-vitro IP of truncated (sans AA 29-40) recombinant HSP27 with MMP-2 and MMP-9.

Conclusion: The association of HSP27 with MMP-2 and MMP-9 proteins along with the identified interacting stretch has the potential to contribute towards drug development to inhibit GBM infiltration and migration.

General Significance: Current findings provide a novel therapeutic target for GBM opening a new horizon in the field of GBM management.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.bbagen.2019.04.015DOI Listing
July 2019

Analyzing Change in Protein Stability Associated with Single Point Deletions in a Newly Defined Protein Structure Database.

J Proteome Res 2019 03 18;18(3):1402-1410. Epub 2019 Feb 18.

Department of Structural Biology , Weizmann Institute of Science , Rehovot 76100 , Israel.

Protein backbone alternation due to insertion/deletion or mutation operation often results in a change of fundamental biophysical properties of proteins. The proposed work intends to encode the protein stability changes associated with single point deletions (SPDs) of amino acids in proteins. The encoding will help in the primary screening of detrimental backbone modifications before opting for expensive in vitro experimentations. In the absence of any benchmark database documenting SPDs, we curate a data set containing SPDs that lead to both folded conformations and unfolded state. We differentiate these SPD instances with the help of simple structural and physicochemical features and eventually classify the foldability resulting out of SPDs using a Random Forest classifier and an Elliptic Envelope based outlier detector. Adhering to leave one out cross validation, the accuracy of the Random Forest classifier and the Elliptic Envelope is of 99.4% and 98.1%, respectively. The newly defined database and the delineation of SPD instances based on its resulting foldability provide a head start toward finding a solution to the given problem.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.9b00048DOI Listing
March 2019

Changing the Apoptosis Pathway through Evolutionary Protein Design.

J Mol Biol 2019 02 6;431(4):825-841. Epub 2019 Jan 6.

Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, 1150 West Medical Center Drive, Ann Arbor, MI 48109, USA. Electronic address:

One obstacle in de novo protein design is the vast sequence space that needs to be searched through to obtain functional proteins. We developed a new method using structural profiles created from evolutionarily related proteins to constrain the simulation search process, with functions specified by atomic-level ligand-protein binding interactions. The approach was applied to redesigning the BIR3 domain of the X-linked inhibitor of apoptosis protein (XIAP), whose primary function is to suppress the cell death by inhibiting caspase-9 activity; however, the function of the wild-type XIAP can be eliminated by the binding of Smac peptides. Isothermal calorimetry and luminescence assay reveal that the designed XIAP domains can bind strongly with the Smac peptides but do not significantly inhibit the caspase-9 proteolytic activity in vitro compared with the wild-type XIAP protein. Detailed mutation assay experiments suggest that the binding specificity in the designs is essentially determined by the interplay of structural profile and physical interactions, which demonstrates the potential to modify apoptosis pathways through computational design.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmb.2018.12.016DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6876990PMC
February 2019

Bacterial flagellar switching: a molecular mechanism directed by the logic of an electric motor.

J Mol Model 2018 Sep 13;24(10):280. Epub 2018 Sep 13.

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, 721302, India.

Flagellar rotation regulates the phenomenon of chemotaxis in bacteria. The interaction between the stator unit and the rotor unit of the flagellar motors is responsible for switching the direction of bacterial flagellar rotation. However, the molecular interaction mechanism between the stator (MotA/MotB) and the rotor (FliG/FliM/FliN) proteins for the flagellar rotational direction switching was not very clear. To address this, the asymmetry in the copies of FliG, FliM, and FliN molecules was resolved by reconstructing the switch complex using a modeled rotor unit that fulfills the experimentally available geometric constraints. The diameter of our assembled switch complex supported the existing literature. Experimental evidence and the conformational spread model validates our constructed switch complex. Subsequently, normal mode analysis (NMA) on these constructed protomer units revealed that the most fluctuating molecule in the rotor unit is FliG, which interacts with the bacterial stator through its C-terminal domain. NMA also facilitates our understanding of the reorientation mechanism of FliG between the two states of its flagellar rotation, i.e., counter-clockwise to clockwise and vice versa. Our observations regarding speed regulation, the gap between rotor and stator, and the flagellar switching due to the activity of cytoplasmic proteins, indicate that the bacterial flagellar motor uses the same mechanism as that of an electric motor. Graphical abstract Molecular mechanism of the bacterial flagellar switch.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00894-018-3819-0DOI Listing
September 2018

A network-based zoning for parallel whole-cell simulation.

Bioinformatics 2019 01;35(1):88-94

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India.

Motivation: In Computational Cell Biology, whole-cell modeling and simulation is an absolute requirement to analyze and explore the cell of an organism. Despite few individual efforts on modeling, the prime obstacle hindering its development and progress is its compute-intensive nature. Towards this end, little knowledge is available on how to reduce the enormous computational overhead and which computational systems will be of use.

Results: In this article, we present a network-based zoning approach that could potentially be utilized in the parallelization of whole-cell simulations. Firstly, we construct the protein-protein interaction graph of the whole-cell of an organism using experimental data from various sources. Based on protein interaction information, we predict protein locality and allocate confidence score to the interactions accordingly. We then identify the modules of strictly localized interacting proteins by performing interaction graph clustering based on the confidence score of the interactions. By applying this method to Escherichia coli K12, we identified 188 spatially localized clusters. After a thorough Gene Ontology-based analysis, we proved that the clusters are also in functional proximity. We then conducted Principal Coordinates Analysis to predict the spatial distribution of the clusters in the simulation space. Our automated computational techniques can partition the entire simulation space (cell) into simulation sub-cells. Each of these sub-cells can be simulated on separate computing units of the High-Performance Computing (HPC) systems. We benchmarked our method using proteins. However, our method can be extended easily to add other cellular components like DNA, RNA and metabolites.

Availability And Implementation: .

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty530DOI Listing
January 2019

Ebolavirus interferon antagonists-protein interaction perspectives to combat pathogenesis.

Brief Funct Genomics 2018 11;17(6):392-401

Department of Computer Science and Engineering, IIT Kharagpur, India.

Zaire ebolavirus, one of the most pathogenic species of Ebolavirus, is a significant threat to the human community being both highly infectious and lethal. The viral proteins (VPs), specifically VP24 and VP35, antagonize the interferon (IFN) proteins accountable for human immune response. Several efforts have been made to design vaccines and therapeutics drugs. However, the success is not encouraging because of limited knowledge about the binding site information of the VPs. Such limitations stem largely from the highly infectious nature of the virus that requires specialized personnel and biosafety laboratories. As an alternative, computational techniques have also been adopted to improve the success rate of drug discovery. This article elaborates on the interactions between viral and human IFN proteins that lead to IFN antagonism. A computational framework is proposed after evaluating existing computational studies. This protein interaction and protein design-based computational framework identified critical interacting residues of the VP (VP24) responsible for the formation of a stable complex with the human KPNA5 (karyopherin alpha proteins 5). The mutations of those critical residues, as demonstrated in this article, affected the overall stability of the complex because of a sharp decrease in both the number of hydrogen bonds and possible charge-charge interactions. Therefore, we proposed that the framework could be an effective alternative to experimental work for destabilizing interactions between the VPs and human proteins responsible for IFN induction and response.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bfgp/elx034DOI Listing
November 2018

An evolution-based approach to De Novo protein design and case study on Mycobacterium tuberculosis.

PLoS Comput Biol 2013 Oct 24;9(10):e1003298. Epub 2013 Oct 24.

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.

Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1003298DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812052PMC
October 2013

EvoDesign: De novo protein design based on structural and evolutionary profiles.

Nucleic Acids Res 2013 Jul 13;41(Web Server issue):W273-80. Epub 2013 May 13.

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA.

Protein design aims to identify new protein sequences of desirable structure and biological function. Most current de novo protein design methods rely on physics-based force fields to search for low free-energy states following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force field design, which cannot accurately describe the atomic interactions or distinguish correct folds. We developed a new web server, EvoDesign, to design optimal protein sequences of given scaffolds along with multiple sequence and structure-based features to assess the foldability and goodness of the designs. EvoDesign uses an evolution-profile-based Monte Carlo search with the profiles constructed from homologous structure families in the Protein Data Bank. A set of local structure features, including secondary structure, torsion angle and solvation, are predicted by single-sequence neural-network training and used to smooth the sequence motif and accommodate the physicochemical packing. The EvoDesign algorithm has been extensively tested in large-scale protein design experiments, which demonstrate enhanced foldability and structural stability of designed sequences compared with the physics-based designing methods. The EvoDesign server is freely available at http://zhanglab.ccmb.med.umich.edu/EvoDesign.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt384DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692067PMC
July 2013

How many protein-protein interactions types exist in nature?

PLoS One 2012 13;7(6):e38913. Epub 2012 Jun 13.

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.

"Protein quaternary structure universe" refers to the ensemble of all protein-protein complexes across all organisms in nature. The number of quaternary folds thus corresponds to the number of ways proteins physically interact with other proteins. This study focuses on answering two basic questions: Whether the number of protein-protein interactions is limited and, if yes, how many different quaternary folds exist in nature. By all-to-all sequence and structure comparisons, we grouped the protein complexes in the protein data bank (PDB) into 3,629 families and 1,761 folds. A statistical model was introduced to obtain the quantitative relation between the numbers of quaternary families and quaternary folds in nature. The total number of possible protein-protein interactions was estimated around 4,000, which indicates that the current protein repository contains only 42% of quaternary folds in nature and a full coverage needs approximately a quarter century of experimental effort. The results have important implications to the protein complex structural modeling and the structure genomics of protein-protein interactions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038913PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3374795PMC
December 2012

PRUNE and PROBE--two modular web services for protein-protein docking.

Nucleic Acids Res 2011 Jul 16;39(Web Server issue):W229-34. Epub 2011 May 16.

Bioinformatics Centre, Indian Institute of Science, Bangalore 560 012, India.

The protein-protein docking programs typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. Although the tasks can be integrated or performed in a serial order, they are by nature modular, allowing an opportunity to substitute one algorithm with another. We have implemented two modular web services, (i) PRUNE: to select a subset of docking poses generated during sampling search (http://pallab.serc.iisc.ernet.in/prune) and (ii) PROBE: to refine, score and rank them (http://pallab.serc.iisc.ernet.in/probe). The former uses a new interface area based edge-scoring function to eliminate >95% of the poses generated during docking search. In contrast to other multi-parameter-based screening functions, this single parameter based elimination reduces the computational time significantly, in addition to increasing the chances of selecting native-like models in the top rank list. The PROBE server performs ranking of pruned poses, after structure refinement and scoring using a regression model for geometric compatibility, and normalized interaction energy. While web-service similar to PROBE is infrequent, no web-service akin to PRUNE has been described before. Both the servers are publicly accessible and free for use.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr317DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3125751PMC
July 2011

Combining Bayes classification and point group symmetry under Boolean framework for enhanced protein quaternary structure inference.

Structure 2011 Mar;19(3):304-12

Bioinformatics Centre, Supercomputer Education Research Centre, Indian Institute of Science, Bangalore 560 012, India.

Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces ≥90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.str.2011.01.009DOI Listing
March 2011

Using correlated parameters for improved ranking of protein-protein docking decoys.

J Comput Chem 2011 Apr 12;32(5):787-96. Epub 2010 Oct 12.

Bioinformatics Centre, Indian Institute of Science, Bangalore, Karnataka, India.

A successful protein-protein docking study culminates in identification of decoys at top ranks with near-native quaternary structures. However, this task remains enigmatic because no generalized scoring functions exist that effectively infer decoys according to the similarity to near-native quaternary structures. Difficulties arise because of the highly irregular nature of the protein surface and the significant variation of the nonbonding and solvation energies based on the chemical composition of the protein-protein interface. In this work, we describe a novel method combining an interface-size filter, a regression model for geometric compatibility (based on two correlated surface and packing parameters), and normalized interaction energy (calculated from correlated nonbonded and solvation energies), to effectively rank decoys from a set of 10,000 decoys. Tests on 30 unbound binary protein-protein complexes show that in 16 cases we can identify at least one decoy in top three ranks having ≤10 Å backbone root mean square deviation from true binding geometry. Comparisons with other state-of-art methods confirm the improved ranking power of our method without the use of any experiment-guided restraints, evolutionary information, statistical propensities, or modified interaction energy equations. Tests on 118 less-difficult bound binary protein-protein complexes with ≤35% sequence redundancy at the interface showed that in 77% cases, at least 1 in 10,000 decoys were identified with ≤5Å backbone root mean square deviation from true geometry at first rank. The work will promote the use of new concepts where correlations among parameters provide more robust scoring models. It will facilitate studies involving molecular interactions, including modeling of large macromolecular assemblies and protein structure prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/jcc.21657DOI Listing
April 2011

dockYard--a repository to assist modeling of protein-protein docking.

J Mol Model 2011 Mar 4;17(3):599-606. Epub 2010 Jun 4.

Bioinformatics Centre, Supercomputer Education Research Centre, Indian Institute of Science, Bangalore, 560 012, India.

In the absence of interlogs, building docking models is a time intensive task, involving generation of a large pool of docking decoys followed by refinement and screening to identify near native docking solutions. This limits the researcher interested in building docking methods with the choice of benchmarking only a limited number of protein complexes. We have created a repository called dockYard ( http://pallab.serc.iisc.ernet.in/dockYard ), that allows modelers interested in protein-protein interaction to access large volume of information on protein dimers and their interlogs, and also download decoys for their work if they are interested in building modeling methods. dockYard currently offers four categories of docking decoys derived from: Bound (native dimer co-crystallized), Unbound (individual subunits are crystallized, as well as the target dimer), Variants (match the previous two categories in at least one subunit with 100% sequence identity), and Interlogs (match the previous categories in at least one subunit with ≥ 90% or ≥ 50% sequence identity). The web service offers options for full or selective download based on search parameters. Our portal also serves as a repository to modelers who may want to share their decoy sets with the community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00894-010-0758-9DOI Listing
March 2011

New measures for estimating surface complementarity and packing at protein-protein interfaces.

FEBS Lett 2010 Mar 12;584(6):1163-8. Epub 2010 Feb 12.

Bioinformatics Centre, Indian Institute of Science, Bangalore, India.

A number of methods exist that use different approaches to assess geometric properties like the surface complementarity and atom packing at the protein-protein interface. We have developed two new and conceptually different measures using the Delaunay tessellation and interface slice selection to compute the surface complementarity and atom packing at the protein-protein interface in a straightforward manner. Our measures show a strong correlation among themselves and with other existing measures, and can be calculated in a highly time-efficient manner. The measures are discriminative for evaluating biological, as well as non-biological protein-protein contacts, especially from large protein complexes and large-scale structural studies (http://pallab.serc.iisc.ernet.in/nip_nsc).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.febslet.2010.02.021DOI Listing
March 2010

Interface of apoptotic protein complexes has distinct properties.

In Silico Biol 2009 ;9(5-6):365-78

Supercomputer Education and Research Centre, Bangalore, India Bioinformatics Centre, Indian Institute of Science, Bangalore, India.

Apoptosis is a programmed mechanism of cell death that is a normal component of development and health of multi-cellular organisms. In this study, we ask if interface properties of apoptotic protein complexes are different from protein complexes in general. We find that although in apoptotic protein complexes the overall distribution of interface size, surface complementarity, hydrogen bonding, hydrophobicity are similar to general interface properties, apoptotic complexes tend to have more fragmented interfaces and different secondary structural preferences. The statistics on the number of interfaces where specific amino acid(s) occur with significantly enhanced frequency suggest that Arg, Met and Asp are most important functional residues. The role of Met is believed to be unique, as evidenced from the existing data on hot spot potential of residues. These findings together provide insight into the possible role of various physico-chemical attributes at the protein interface in regulation of the apoptosis process.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3233/ISB-2009-0411DOI Listing
September 2013