Publications by authors named "Igor B Kuznetsov"

21 Publications

  • Page 1 of 1

Immuno-transcriptomic profiling of extracranial pediatric solid malignancies.

Cell Rep 2021 Nov;37(8):110047

University of Toronto Musculoskeletal Oncology Unit, Sinai Health System; Department of Surgery, University of Toronto, Toronto, ON, Canada.

We perform an immunogenomics analysis utilizing whole-transcriptome sequencing of 657 pediatric extracranial solid cancer samples representing 14 diagnoses, and additionally utilize transcriptomes of 131 pediatric cancer cell lines and 147 normal tissue samples for comparison. We describe patterns of infiltrating immune cells, T cell receptor (TCR) clonal expansion, and translationally relevant immune checkpoints. We find that tumor-infiltrating lymphocytes and TCR counts vary widely across cancer types and within each diagnosis, and notably are significantly predictive of survival in osteosarcoma patients. We identify potential cancer-specific immunotherapeutic targets for adoptive cell therapies including cell-surface proteins, tumor germline antigens, and lineage-specific transcription factors. Using an orthogonal immunopeptidomics approach, we find several potential immunotherapeutic targets in osteosarcoma and Ewing sarcoma and validated PRAME as a bona fide multi-pediatric cancer target. Importantly, this work provides a critical framework for immune targeting of extracranial solid tumors using parallel immuno-transcriptomic and -peptidomic approaches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2021.110047DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8642810PMC
November 2021

Outcome-Related Signatures Identified by Whole Transcriptome Sequencing of Resectable Stage III/IV Melanoma Evaluated after Starting Hu14.18-IL2.

Clin Cancer Res 2020 07 9;26(13):3296-3306. Epub 2020 Mar 9.

Department of Human Oncology, University of Wisconsin-Madison, Madison, Wisconsin.

Purpose: We analyzed whole transcriptome sequencing in tumors from 23 patients with stage III or IV melanoma from a pilot trial of the anti-GD2 immunocytokine, hu14.18-IL2, to identify predictive immune and/or tumor biomarkers in patients with melanoma at high risk for recurrence.

Experimental Design: Patients were randomly assigned to receive the first of three monthly courses of hu14.18-IL2 immunotherapy either before (Group A) or after (Group B) complete surgical resection of all known diseases. Tumors were evaluated by histology and whole transcriptome sequencing.

Results: Tumor-infiltrating lymphocyte (TIL) levels directly associated with relapse-free survival (RFS) and overall survival (OS) in resected tumors from Group A, where early responses to the immunotherapy agent could be assessed. TIL levels directly associated with a previously reported immune signature, which associated with RFS and OS, particularly in Group A tumors. In Group A tumors, there were decreased cell-cycling gene RNA transcripts, but increased RNA transcripts for repair and growth genes. We found that outcome (RFS and OS) was directly associated with several immune signatures and immune-related RNA transcripts and inversely associated with several tumor growth-associated transcripts, particularly in Group A tumors. Most of these associations were not seen in Group B tumors.

Conclusions: We interpret these data to signify that both immunologic and tumoral cell processes, as measured by RNA-sequencing analyses detected shortly after initiation of hu14.18-IL2 therapy, are associated with long-term survival and could potentially be used as prognostic biomarkers in tumor resection specimens obtained after initiating neoadjuvant immunotherapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/1078-0432.CCR-19-3294DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334053PMC
July 2020

Clinically Relevant Cytotoxic Immune Cell Signatures and Clonal Expansion of T-Cell Receptors in High-Risk -Not-Amplified Human Neuroblastoma.

Clin Cancer Res 2018 11 21;24(22):5673-5684. Epub 2018 May 21.

Oncogenomics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.

High-risk neuroblastoma is an aggressive disease. DNA sequencing studies have revealed a paucity of actionable genomic alterations and a low mutation burden, posing challenges to develop effective novel therapies. We used RNA sequencing (RNA-seq) to investigate the biology of this disease, including a focus on tumor-infiltrating lymphocytes (TIL). We performed deep RNA-seq on pretreatment diagnostic tumors from 129 high-risk and 21 low- or intermediate-risk patients with neuroblastomas. We used single-sample gene set enrichment analysis to detect gene expression signatures of TILs in tumors and examined their association with clinical and molecular parameters, including patient outcome. The expression profiles of 190 additional pretreatment diagnostic neuroblastomas, a neuroblastoma tissue microarray, and T-cell receptor (TCR) sequencing were used to validate our findings. We found that -not-amplified (-NA) tumors had significantly higher cytotoxic TIL signatures compared with -amplified (-A) tumors. A reported MYCN activation signature was significantly associated with poor outcome for high-risk patients with -NA tumors; however, a subgroup of these patients who had elevated activated natural killer (NK) cells, CD8 T cells, and cytolytic signatures showed improved outcome and expansion of infiltrating TCR clones. Furthermore, we observed upregulation of immune exhaustion marker genes, indicating an immune-suppressive microenvironment in these neuroblastomas. This study provides evidence that RNA signatures of cytotoxic TIL are associated with the presence of activated NK/T cells and improved outcomes in high-risk neuroblastoma patients harboring -NA tumors. Our findings suggest that these high-risk patients with -NA neuroblastoma may benefit from additional immunotherapies incorporated into the current therapeutic strategies. .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/1078-0432.CCR-18-0599DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6504934PMC
November 2018

Identification of non-random sequence properties in groups of signature peptides obtained in random sequence peptide microarray experiments.

Authors:
Igor B Kuznetsov

Biopolymers 2016 May;106(3):318-29

Cancer Research Center and Department of Epidemiology and Biostatistics, University at Albany, State University of New York, One Discovery Drive, Rensselaer, NY, 12144.

Immunosignaturing is an emerging experimental technique that uses random sequence peptide microarrays to detect antibodies produced by the immune system in response to a particular disease. Two important questions regarding immunosignaturing are "Do microarray peptides that exhibit a strong affinity to a given type of antibodies share common sequence properties?" and "If so, what are those properties?" In this work, three statistical tests designed to detect non-random patterns in the amino acid makeup of a group of microarray peptides are presented. One test detects patterns of significantly biased amino acid usage, whereas the other two detect patterns of significant bias in the biochemical properties. These tests do not require a large number of peptides per group. The tests were applied to analyze 19 groups of peptides identified in immunosignaturing experiments as being specific for antibodies produced in response to various types of cancer and other diseases. The positional distribution of the biochemical properties of the amino acids in these 19 peptide groups was also studied. Remarkably, despite the random nature of the sequence libraries used to design the microarrays, a unique group-specific non-random pattern was identified in the majority of the peptide groups studied. © 2016 Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 318-329, 2016.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/bip.22845DOI Listing
May 2016

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

BMC Res Notes 2015 May 7;8:187. Epub 2015 May 7.

Cancer Research Center and Department of Epidemiology and Biostatistics, University at Albany, State University of New York, One Discovery Drive, Rensselaer, NY, 12144, USA.

Background: Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities.

Findings: PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach.

Conclusions: PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors' knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13104-015-1152-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4477417PMC
May 2015

Protein sequence alignment with family-specific amino acid similarity matrices.

Authors:
Igor B Kuznetsov

BMC Res Notes 2011 Aug 16;4:296. Epub 2011 Aug 16.

Cancer Research Center, Department of Epidemiology and Biostatistics, University at Albany, State University of New York, One Discovery Drive, Rensselaer, NY, USA 12144.

Background: Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one needs a way of selecting the best performing scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment.

Findings: I utilize a comprehensive set of reference alignments obtained from structural superposition of homologous and analogous proteins to design a quantitative statistical framework for evaluating the performance of alignment scoring functions in global pairwise sequence alignment. This framework is applied to study how existing general-purpose amino acid similarity matrices perform on individual protein families and structural folds, and to compare them to family-specific and fold-specific matrices derived in this work. I describe an adaptive alignment procedure that automatically selects an appropriate similarity matrix and optimized gap penalties based on the properties of the sequences being aligned.

Conclusions: The results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix. However, using fold-specific similarity matrices can only marginally improve sequence alignment of proteins that share the same structural fold but do not share a common evolutionary origin. The family-specific matrices derived in this work and the optimized gap penalties are available at http://taurus.crc.albany.edu/fsm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1756-0500-4-296DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3201029PMC
August 2011

Simplified computational methods for the analysis of protein flexibility.

Authors:
Igor B Kuznetsov

Curr Protein Pept Sci 2009 Dec;10(6):607-13

Gen*NY*sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, One Discovery Drive, Rensselaer, NY 12144, USA.

Conformational flexibility is an inherent property of the protein structure. Large scale changes in the protein conformation play a key role in a variety of fundamental biological activities and have been implicated in a number of diseases. The time scales of functionally relevant dynamic processes in proteins generally do not allow the researchers to study them by the means of detailed atomic level simulations. Therefore, less computationally demanding methods based on the coarse grained models of protein structure and bioinformatics approaches are particularly important for the flexibility-related studies. This review is focused on two broad categories of protein flexibility - protein disorder and conformational switches. In the case of protein disorder, a flexible protein segment or entire protein is structurally disordered, meaning that it does not have a well-defined folded 3D structure. In the case of conformational switches, the protein backbone of a flexible segment can change or "switch" from one specific folded 3D conformation to another. In this review, the relative strengths and limitations of the existing computational tools, mostly from the bioinformatics domain, used to study and predict protein disorder and conformational switches will be discussed and the main challenges will be highlighted.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2174/138920309789630552DOI Listing
December 2009

A web server for inferring the human N-acetyltransferase-2 (NAT2) enzymatic phenotype from NAT2 genotype.

Bioinformatics 2009 May 4;25(9):1185-6. Epub 2009 Mar 4.

Gen*NY*Sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, One Discovery Drive, Rensselaer, NY 12144, USA.

Unlabelled: N-acetyltransferase-2 (NAT2) is an important enzyme that catalyzes the acetylation of aromatic and heterocyclic amine carcinogens. Individuals in human populations are divided into three NAT2 acetylator phenotypes: slow, rapid and intermediate. NAT2PRED is a web server that implements a supervised pattern recognition method to infer NAT2 phenotype from SNPs found in NAT2 gene positions 282, 341, 481, 590, 803 and 857. The web server can be used for a fast determination of NAT2 phenotypes in genetic screens.

Availability: Freely available at http://nat2pred.rit.albany.edu.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btp121DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672629PMC
May 2009

FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins.

Bioinformation 2008 5;3(3):134-6. Epub 2008 Nov 5.

GenNY*sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, One Discovery Drive, University at Albany, Rensselaer, NY 12144, USA.

Unlabelled: Conformational switches observed in the protein backbone play a key role in a variety of fundamental biological activities. This paper describes a web-server that implements a pattern recognition algorithm trained on the examples from the Database of Macromolecular Movements to predict residue positions involved in conformational switches. Prediction can be performed at an adjustable false positive rate using a user-supplied protein sequence in FASTA format or a structure in a Protein Data Bank (PDB) file. If a protein sequence is submitted, then the web-server uses sequence-derived information only (such as evolutionary conservation of residue positions). If a PDB file is submitted, then the web-server uses sequence-derived information and residue solvent accessibility calculated from this file.

Availability: FlexPred is publicly available at http://flexpred.rit.albany.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.6026/97320630003134DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2639688PMC
June 2010

CFP: a web-server for constructing sequence-based protein conformational flexibility profiles.

Bioinformation 2009 Oct 19;4(5):176-8. Epub 2009 Oct 19.

Cancer Research Center, Department of Epidemiology and Biostatistics, University at Albany, One Discovery Drive, Rensselaer, NY 12144, USA.

Unlabelled: Many proteins contain conformationally flexible segments that undergo significant changes in the backbone conformation or completely lack a well-defined conformation. Previously, we have developed the generalized local propensity (GLP), a quantitative sequence-based measure of the protein backbone flexibility. In this paper, we present the CFP (Conformational Flexibility Profile) web-server that constructs the GLP flexibility profile for a user-submitted sequence and uses this profile to identify segments with high backbone flexibility. The statistical significance of a flexible sequence segment is assessed using the discrete scan statistics based on the density of flexible residues observed in this segment.

Availability: CFP is publicly available at http://cfp.rit.albany.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.6026/97320630004176DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859570PMC
October 2009

On the Accuracy of Sequence-Based Computational Inference of Protein Residues Involved in Interactions with DNA.

Trends Appl Sci Res 2008 Dec;3(4):285-291

Gen NY sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, One Discovery Drive Rensselaer, 12144 New York, USA.

Methods for computational inference of DNA-binding residues in DNA-binding proteins are usually developed using classification techniques trained to distinguish between binding and non-binding residues on the basis of known examples observed in experimentally determined high-resolution structures of protein-DNA complexes. What degree of accuracy can be expected when a computational methods is applied to a particular novel protein remains largely unknown. We test the utility of classification methods on the example of Kernel Logistic Regression (KLR) predictors of DNA-binding residues. We show that predictors that utilize sequence properties of proteins can successfully predict DNA-binding residues in proteins from a novel structural class. We use Multiple Linear Regression (MLR) to establish a quantitative relationship between protein properties and the expected accuracy of KLR predictors. Present results indicate that in the case of novel proteins the expected accuracy provided by an MLR model is close to the actual accuracy and can be used to assess the overall confidence of the prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3923/tasr.2008.285.291DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832327PMC
December 2008

ProBias: a web-server for the identification of user-specified types of compositionally biased segments in protein sequences.

Authors:
Igor B Kuznetsov

Bioinformatics 2008 Jul 14;24(13):1534-5. Epub 2008 May 14.

Gen*NY*sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, One Discovery Drive, University at Albany, Rensselaer, NY 12144, USA.

Unlabelled: Most proteins contain compositionally biased segments (CBS) in which one or more amino acid types are significantly overrepresented. CBS that contain amino acids with similar chemical properties can have functional and structural importance. This article describes ProBias, a web-server that searches a protein sequence for CBS composed of user-specified amino acid types. ProBias utilizes the discrete scan statistics to estimate statistical significance of CBS and is able to detect even subtle local deviations from the random independence model. The web-server also analyzes the global compositional bias of the input sequence. In the case of novel proteins that lack functional annotation, statistically significant CBS reported by ProBias can be used to guide the search for potential functionally important sites or domains.

Availability: Freely available at http://lcg.rit.albany.edu/ProBias.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btn233DOI Listing
July 2008

Ordered conformational change in the protein backbone: prediction of conformationally variable positions from sequence and low-resolution structural data.

Authors:
Igor B Kuznetsov

Proteins 2008 Jul;72(1):74-87

Department of Epidemiology and Biostatistics, Gen*NY*sis Center for Excellence in Cancer Genomics, University at Albany, Rensselaer, New York 12144, USA.

Ordered conformational changes are an important structural property of proteins and are involved in a variety of fundamental biological activities. Large-scale analyses of the implications of such changes for protein function and dysfunction require efficient methods for automated recognition of conformationally variable residue positions. The goal of this work was to study sequence and low-resolution structural properties of residue positions that change backbone conformation upon changes in protein environment and the utility of these properties for automated recognition of such conformationally variable positions. This study was performed using a large nonredundant set of experimentally characterized proteins that undergo ordered conformational transitions obtained from the Database of Macromolecular Movements. The results of this study show that ordered changes in backbone conformation are not limited to solvent accessible loop regions. A considerable fraction of conformationally variable positions is observed in helices and strands, and in buried positions. Conformationally variable positions are less conserved in evolution. Local patterns of (a) sequence neighbors, (b) evolutionary conservation, and (c) solvent accessibility can be used to predict conformationally variable positions with balanced sensitivity and specificity, albeit with large variance at the level of individual proteins. However, including a pattern of secondary structure into the prediction scheme results in a highly unbalanced performance when all conformationally variable positions located in regular secondary structure are misclassified. Application of the present methodology to the prion protein (PrP) shows that conformationally variable positions predicted in its ordered C-terminal domain are located within segments presumed to be involved in refolding of PrP.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.21899DOI Listing
July 2008

DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins.

Bioinformatics 2007 Mar 19;23(5):634-6. Epub 2007 Jan 19.

Gen*NY*Sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, One Discovery Drive, University at Albany, Rensselaer, NY 12144, USA.

Unlabelled: This article describes DP-Bind, a web server for predicting DNA-binding sites in a DNA-binding protein from its amino acid sequence. The web server implements three machine learning methods: support vector machine, kernel logistic regression and penalized logistic regression. Prediction can be performed using either the input sequence alone or an automatically generated profile of evolutionary conservation of the input sequence in the form of PSI-BLAST position-specific scoring matrix (PSSM). PSSM-based kernel logistic regression achieves the accuracy of 77.2%, sensitivity of 76.4% and specificity of 76.6%. The outputs of all three individual methods are combined into a consensus prediction to help identify positions predicted with high level of confidence.

Availability: Freely available at http://lcg.rit.albany.edu/dp-bind.

Supplementary Information: http://lcg.rit.albany.edu/dp-bind/dpbind_supplement.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btl672DOI Listing
March 2007

Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.

Proteins 2006 Jul;64(1):19-27

Gen*NY*sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, Rensselaer, NewYork 12144, USA.

Proteins that interact with DNA are involved in a number of fundamental biological activities such as DNA replication, transcription, and repair. A reliable identification of DNA-binding sites in DNA-binding proteins is important for functional annotation, site-directed mutagenesis, and modeling protein-DNA interactions. We apply Support Vector Machine (SVM), a supervised pattern recognition method, to predict DNA-binding sites in DNA-binding proteins using the following features: amino acid sequence, profile of evolutionary conservation of sequence positions, and low-resolution structural information. We use a rigorous statistical approach to study the performance of predictors that utilize different combinations of features and how this performance is affected by structural and sequence properties of proteins. Our results indicate that an SVM predictor based on a properly scaled profile of evolutionary conservation in the form of a position specific scoring matrix (PSSM) significantly outperforms a PSSM-based neural network predictor. The highest accuracy is achieved by SVM predictor that combines the profile of evolutionary conservation with low-resolution structural information. Our results also show that knowledge-based predictors of DNA-binding sites perform significantly better on proteins from mainly-alpha structural class and that the performance of these predictors is significantly correlated with certain structural and sequence properties of proteins. These observations suggest that it may be possible to assign a reliability index to the overall accuracy of the prediction of DNA-binding sites in any given protein using its sequence and structural properties. A web-server implementation of the predictors is freely available online at http://lcg.rit.albany.edu/dp-bind/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.20977DOI Listing
July 2006

A novel sensitive method for the detection of user-defined compositional bias in biological sequences.

Bioinformatics 2006 May 24;22(9):1055-63. Epub 2006 Feb 24.

Gen*NY*sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, State University of New York One Discovery Drive, Rensselaer, NY 12144, USA.

Motivation: Most biological sequences contain compositionally biased segments in which one or more residue types are significantly overrepresented. The function and evolution of these segments are poorly understood. Usually, all types of compositionally biased segments are masked and ignored during sequence analysis. However, it has been shown for a number of proteins that biased segments that contain amino acids with similar chemical properties are involved in a variety of molecular functions and human diseases. A detailed large-scale analysis of the functional implications and evolutionary conservation of different compositionally biased segments requires a sensitive method capable of detecting user-specified types of compositional bias.

Results: We present BIAS, a novel sensitive method for the detection of compositionally biased segments composed of a user-specified set of residue types. BIAS uses the discrete scan statistics that provides a highly accurate correction for multiple tests to compute analytical estimates of the significance of each compositionally biased segment. The method can take into account global compositional bias when computing analytical estimates of the significance of local clusters. BIAS is benchmarked against SEG, SAPS and CAST programs. We also use BIAS to show that groups of proteins with the same biological function are significantly associated with particular types of compositionally biased segments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btl049DOI Listing
May 2006

Comparative computational analysis of prion proteins reveals two fragments with unusual structural properties and a pattern of increase in hydrophobicity associated with disease-promoting mutations.

Protein Sci 2004 Dec;13(12):3230-44

Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, NY 10029, USA.

Prion diseases are a group of neurodegenerative disorders associated with conversion of a normal prion protein, PrPC, into a pathogenic conformation, PrPSc. The PrPSc is thought to promote the conversion of PrPC. The structure and stability of PrPC are well characterized, whereas little is known about the structure of PrPSc, what parts of PrPC undergo conformational transition, or how mutations facilitate this transition. We use a computational knowledge-based approach to analyze the intrinsic structural propensities of the C-terminal domain of PrP and gain insights into possible mechanisms of structural conversion. We compare the properties of PrP sequences to those of a PrP paralog, Doppel, and to the distributions of structural propensities observed in known protein structures from the Protein Data Bank. We show that the prion protein contains at least two sequence fragments with highly unusual intrinsic propensities, PrP(114-125) and helix B. No segments with unusual properties were found in Doppel protein, which is topologically identical to PrP but does not undergo structural rearrangements. Known disease-promoting PrP mutations form a statistically significant cluster in the region comprising helices B and C. Due to their unusual properties, PrP(114-125) and the C terminus of helix B may be considered as primary candidates for sites involved in conformational transition from PrPC to PrPSc. The results of our study also show that most PrP mutations associated with neurodegenerative disorders increase local hydrophobicity. We suggest that the observed increase in hydrophobicity may facilitate PrP-to-PrP or/and PrP-to-cofactor interactions, and thus promote structural conversion.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1110/ps.04833404DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2287303PMC
December 2004

Similarity between the C-terminal domain of the prion protein and chimpanzee cytomegalovirus glycoprotein UL9.

Protein Eng 2003 Dec;16(12):861-3

Department of Biomathematical Sciences, Mount Sinai School of Medicine, Box 1023, One Gustave L. Levy Place, New York, NY 10029, USA.

Prion diseases are a group of fatal neurodegenerative disorders associated with structural conversion of a normal, mostly alpha-helical cellular prion protein, PrP(C), into a pathogenic beta-sheet-rich conformation, PrP(Sc). The structure of PrP(C) is well studied, whereas the insolubility of PrP(Sc) makes the characterization of its structure problematic. No proteins similar to PrP, except for its paralog with the same fold, PrP-Doppel, are known. However, PrP-Doppel does not undergo a structural transition into a beta-sheet-rich conformation. Structural information from proteins that share a weak but significant sequence similarity with PrP may be used to gain additional insights into the conformation of PrP(Sc). We construct a sequence profile corresponding to the structured domain of PrP and use this profile to search the SWISS-PROT and TrEMBL databases. We identify a significant sequence similarity between PrP and chimpanzee cytomegalovirus glycoprotein UL9. This glycoprotein scores higher than all PrP-Doppel sequences. Fold recognition methods assign a mainly-beta fold to UL9. Owing to the observed sequence similarity with PrP and a putative mainly-beta fold, the UL9 glycoprotein may represent a potential target for experimental structure determination aimed at obtaining a structural template for PrP(Sc) modeling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/protein/gzg113DOI Listing
December 2003

Class-specific correlations between protein folding rate, structure-derived, and sequence-derived descriptors.

Proteins 2004 Feb;54(2):333-41

Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029, USA.

Small single-domain proteins that fold by simple two-state kinetics have been shown to exhibit a wide variation in their folding rates. It has been proposed that folding mechanisms in these proteins are largely determined by the native-state topology, and a significant correlation between folding rate and measures of the average topological complexity, such as relative contact order (RCO), has been reported. We perform a statistical analysis of folding rate and RCO in all three major structural classes (alpha, beta, and alpha/beta) of small two-state proteins and of RCO in groups of analogous and homologous small single-domain proteins with the same topology. We also study correlation between folding rate and the average physicochemical properties of amino acid sequences in two-state proteins. Our results indicate that 1) helical proteins have statistically distinguishable, class-specific folding rates; 2) RCO accounts for essentially all the variation of folding rate in helical proteins, but for only a part of the variation in beta-sheet-containing proteins; and 3) only a small fraction of the protein topologies studied show a topology-specific RCO. We also report a highly significant correlation between the folding rate and average intrinsic structural propensities of protein sequences. These results suggest that intrinsic structural propensities may be an important determinant of the rate of folding in small two-state proteins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.10518DOI Listing
February 2004

On the properties and sequence context of structurally ambivalent fragments in proteins.

Protein Sci 2003 Nov;12(11):2420-33

Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029, USA.

The goal of this work is to characterize structurally ambivalent fragments in proteins. We have searched the Protein Data Bank and identified all structurally ambivalent peptides (SAPs) of length five or greater that exist in two different backbone conformations. The SAPs were classified in five distinct categories based on their structure. We propose a novel index that provides a quantitative measure of conformational variability of a sequence fragment. It measures the context-dependent width of the distribution of (phi,xi) dihedral angles associated with each amino acid type. This index was used to analyze the local structural propensity of both SAPs and the sequence fragments contiguous to them. We also analyzed type-specific amino acid composition, solvent accessibility, and overall structural properties of SAPs and their sequence context. We show that each type of SAP has an unusual, type-specific amino acid composition and, as a result, simultaneous intrinsic preferences for two distinct types of backbone conformation. All types of SAPs have lower sequence complexity than average. Fragments that adopt helical conformation in one protein and sheet conformation in another have the lowest sequence complexity and are sampled from a relatively limited repertoire of possible residue combinations. A statistically significant difference between two distinct conformations of the same SAP is observed not only in the overall structural properties of proteins harboring the SAP but also in the properties of its flanking regions and in the pattern of solvent accessibility. These results have implications for protein design and structure prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2366964PMC
http://dx.doi.org/10.1110/ps.03209703DOI Listing
November 2003

Discriminative ability with respect to amino acid types: assessing the performance of knowledge-based potentials without threading.

Proteins 2002 Nov;49(2):266-84

Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029, USA.

We present a novel method designed to analyze the discriminative ability of knowledge-based potentials with respect to the 20 residue types. The method is based on the preference of amino acids for specific types of protein environment, and uses a virtual mutagenesis experiment to estimate how much information a given potential can provide about environments of each amino acid type. This allows one to test and optimize the performance of real potentials at the level of individual amino acids, using actual data on residue environments from a dataset of known protein structures. We have applied our method to long-range and medium-range pairwise distance-dependent potentials. The results of our study indicate that these potentials are only able to discriminate between a very limited number of residue types, and that discriminative ability is extremely sensitive to the choice of parameters used to construct the potentials, and even to the size of the training dataset. We also show that different types of pairwise distance potentials are dominated by different types of interactions. These dominant interactions strongly depend on the type of approximation used to define residue position. For each potential, our methodology is able to identify a potential-specific amino acid distance matrix and a reduced amino acid alphabet of any specified size, which may have implications for sequence alignment and multibody models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.10211DOI Listing
November 2002
-->