Publications by authors named "Asa Ben-Hur"

48 Publications

Decoding co-/post-transcriptional complexities of plant transcriptomes and epitranscriptome using next-generation sequencing technologies.

Biochem Soc Trans 2020 12;48(6):2399-2414

Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China.

Next-generation sequencing (NGS) technologies - Illumina RNA-seq, Pacific Biosciences isoform sequencing (PacBio Iso-seq), and Oxford Nanopore direct RNA sequencing (DRS) - have revealed the complexity of plant transcriptomes and their regulation at the co-/post-transcriptional level. Global analysis of mature mRNAs, transcripts from nuclear run-on assays, and nascent chromatin-bound mRNAs using short as well as full-length and single-molecule DRS reads have uncovered potential roles of different forms of RNA polymerase II during the transcription process, and the extent of co-transcriptional pre-mRNA splicing and polyadenylation. These tools have also allowed mapping of transcriptome-wide start sites in cap-containing RNAs, poly(A) site choice, poly(A) tail length, and RNA base modifications. The emerging theme from recent studies is that reprogramming of gene expression in response to developmental cues and stresses at the co-/post-transcriptional level likely plays a crucial role in eliciting appropriate responses for optimal growth and plant survival under adverse conditions. Although the mechanisms by which developmental cues and different stresses regulate co-/post-transcriptional splicing are largely unknown, a few recent studies indicate that the external cues target spliceosomal and splicing regulatory proteins to modulate alternative splicing. In this review, we provide an overview of recent discoveries on the dynamics and complexities of plant transcriptomes, mechanistic insights into splicing regulation, and discuss critical gaps in co-/post-transcriptional research that need to be addressed using diverse genomic and biochemical approaches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1042/BST20190492DOI Listing
December 2020

Splicing Factor Transcript Abundance in Saliva as a Diagnostic Tool for Breast Cancer.

Genes (Basel) 2020 08 3;11(8). Epub 2020 Aug 3.

Department of Biochemistry and Molecular Biology, The Institute for Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel.

Breast cancer is the second leading cause of death in women above 60 years in the US. Screening mammography is recommended for women above 50 years; however, 22% of breast cancer cases are diagnosed in women below this age. We set out to develop a test based on the detection of cell-free RNA from saliva. To this end, we sequenced RNA from a pool of ten women. The 1254 transcripts identified were enriched for genes with an annotation of alternative pre-mRNA splicing. Pre-mRNA splicing is a tightly regulated process and its misregulation in cancer cells promotes the formation of cancer-driving isoforms. For these reasons, we chose to focus on splicing factors as biomarkers for the early detection of breast cancer. We found that the level of the splicing factors is unique to each woman and consistent in the same woman at different time points. Next, we extracted RNA from 36 healthy subjects and 31 breast cancer patients. Recording the mRNA level of seven splicing factors in these samples demonstrated that the combination of all these factors is different in the two groups ( value = 0.005). Our results demonstrate a differential abundance of splicing factor mRNA in the saliva of breast cancer patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/genes11080880DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7463790PMC
August 2020

Digital Image Analysis of Heterogeneous Tuberculosis Pulmonary Pathology in Non-Clinical Animal Models using Deep Convolutional Neural Networks.

Sci Rep 2020 04 8;10(1):6047. Epub 2020 Apr 8.

Mycobacteria Research Laboratories, Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, Colorado, United States of America.

Efforts to develop effective and safe drugs for treatment of tuberculosis require preclinical evaluation in animal models. Alongside efficacy testing of novel therapies, effects on pulmonary pathology and disease progression are monitored by using histopathology images from these infected animals. To compare the severity of disease across treatment cohorts, pathologists have historically assigned a semi-quantitative histopathology score that may be subjective in terms of their training, experience, and personal bias. Manual histopathology therefore has limitations regarding reproducibility between studies and pathologists, potentially masking successful treatments. This report describes a pathologist-assistive software tool that reduces these user limitations, while providing a rapid, quantitative scoring system for digital histopathology image analysis. The software, called 'Lesion Image Recognition and Analysis' (LIRA), employs convolutional neural networks to classify seven different pathology features, including three different lesion types from pulmonary tissues of the C3HeB/FeJ tuberculosis mouse model. LIRA was developed to improve the efficiency of histopathology analysis for mouse tuberculosis infection models, this approach has also broader applications to other disease models and tissues. The full source code and documentation is available from https://Github.com/TB-imaging/LIRA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-62960-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7142129PMC
April 2020

Transcriptome Analysis of Drought-Resistant and Drought-Sensitive Sorghum () Genotypes in Response to PEG-Induced Drought Stress.

Int J Mol Sci 2020 Jan 24;21(3). Epub 2020 Jan 24.

Department of Biology and Cell and Molecular Biology Program, Colorado State University, Fort Collins, CO 80523, USA.

Drought is a major limiting factor of crop yields. In response to drought, plants reprogram their gene expression, which ultimately regulates a multitude of biochemical and physiological processes. The timing of this reprogramming and the nature of the drought-regulated genes in different genotypes are thought to confer differential tolerance to drought stress. Sorghum is a highly drought-tolerant crop and has been increasingly used as a model cereal to identify genes that confer tolerance. Also, there is considerable natural variation in resistance to drought in different sorghum genotypes. Here, we evaluated drought resistance in four genotypes to polyethylene glycol (PEG)-induced drought stress at the seedling stage and performed transcriptome analysis in seedlings of sorghum genotypes that are either drought-resistant or drought-sensitive to identify drought-regulated changes in gene expression that are unique to drought-resistant genotypes of sorghum. Our analysis revealed that about 180 genes are differentially regulated in response to drought stress only in drought-resistant genotypes and most of these (over 70%) are up-regulated in response to drought. Among these, about 70 genes are novel with no known function and the remaining are transcription factors, signaling and stress-related proteins implicated in drought tolerance in other crops. This study revealed a set of drought-regulated genes, including many genes encoding uncharacterized proteins that are associated with drought tolerance at the seedling stage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms21030772DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7037816PMC
January 2020

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Authors:
Naihui Zhou Yuxiang Jiang Timothy R Bergquist Alexandra J Lee Balint Z Kacsoh Alex W Crocker Kimberley A Lewis George Georghiou Huy N Nguyen Md Nafiz Hamid Larry Davis Tunca Dogan Volkan Atalay Ahmet S Rifaioglu Alperen Dalkıran Rengul Cetin Atalay Chengxin Zhang Rebecca L Hurto Peter L Freddolino Yang Zhang Prajwal Bhat Fran Supek José M Fernández Branislava Gemovic Vladimir R Perovic Radoslav S Davidović Neven Sumonja Nevena Veljkovic Ehsaneddin Asgari Mohammad R K Mofrad Giuseppe Profiti Castrense Savojardo Pier Luigi Martelli Rita Casadio Florian Boecker Heiko Schoof Indika Kahanda Natalie Thurlby Alice C McHardy Alexandre Renaux Rabie Saidi Julian Gough Alex A Freitas Magdalena Antczak Fabio Fabris Mark N Wass Jie Hou Jianlin Cheng Zheng Wang Alfonso E Romero Alberto Paccanaro Haixuan Yang Tatyana Goldberg Chenguang Zhao Liisa Holm Petri Törönen Alan J Medlar Elaine Zosa Itamar Borukhov Ilya Novikov Angela Wilkins Olivier Lichtarge Po-Han Chi Wei-Cheng Tseng Michal Linial Peter W Rose Christophe Dessimoz Vedrana Vidulin Saso Dzeroski Ian Sillitoe Sayoni Das Jonathan Gill Lees David T Jones Cen Wan Domenico Cozzetto Rui Fa Mateo Torres Alex Warwick Vesztrocy Jose Manuel Rodriguez Michael L Tress Marco Frasca Marco Notaro Giuliano Grossi Alessandro Petrini Matteo Re Giorgio Valentini Marco Mesiti Daniel B Roche Jonas Reeb David W Ritchie Sabeur Aridhi Seyed Ziaeddin Alborzi Marie-Dominique Devignes Da Chen Emily Koo Richard Bonneau Vladimir Gligorijević Meet Barot Hai Fang Stefano Toppo Enrico Lavezzo Marco Falda Michele Berselli Silvio C E Tosatto Marco Carraro Damiano Piovesan Hafeez Ur Rehman Qizhong Mao Shanshan Zhang Slobodan Vucetic Gage S Black Dane Jo Erica Suh Jonathan B Dayton Dallas J Larsen Ashton R Omdahl Liam J McGuffin Danielle A Brackenridge Patricia C Babbitt Jeffrey M Yunes Paolo Fontana Feng Zhang Shanfeng Zhu Ronghui You Zihan Zhang Suyang Dai Shuwei Yao Weidong Tian Renzhi Cao Caleb Chandler Miguel Amezola Devon Johnson Jia-Ming Chang Wen-Hung Liao Yi-Wei Liu Stefano Pascarelli Yotam Frank Robert Hoehndorf Maxat Kulmanov Imane Boudellioua Gianfranco Politano Stefano Di Carlo Alfredo Benso Kai Hakala Filip Ginter Farrokh Mehryary Suwisa Kaewphan Jari Björne Hans Moen Martti E E Tolvanen Tapio Salakoski Daisuke Kihara Aashish Jain Tomislav Šmuc Adrian Altenhoff Asa Ben-Hur Burkhard Rost Steven E Brenner Christine A Orengo Constance J Jeffery Giovanni Bosco Deborah A Hogan Maria J Martin Claire O'Donovan Sean D Mooney Casey S Greene Predrag Radivojac Iddo Friedberg

Genome Biol 2019 11 19;20(1):244. Epub 2019 Nov 19.

Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.

Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.

Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1835-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6864930PMC
November 2019

Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities.

Bioinformatics 2019 07;35(14):i269-i277

Department of Computer Science, Colorado State University, Fort Collins, CO, USA.

Motivation: Deep learning architectures have recently demonstrated their power in predicting DNA- and RNA-binding specificity. Existing methods fall into three classes: Some are based on convolutional neural networks (CNNs), others use recurrent neural networks (RNNs) and others rely on hybrid architectures combining CNNs and RNNs. However, based on existing studies the relative merit of the various architectures remains unclear.

Results: In this study we present a systematic exploration of deep learning architectures for predicting DNA- and RNA-binding specificity. For this purpose, we present deepRAM, an end-to-end deep learning tool that provides an implementation of a wide selection of architectures; its fully automatic model selection procedure allows us to perform a fair and unbiased comparison of deep learning architectures. We find that deeper more complex architectures provide a clear advantage with sufficient training data, and that hybrid CNN/RNN architectures outperform other methods in terms of accuracy. Our work provides guidelines that can assist the practitioner in choosing an appropriate network architecture, and provides insight on the difference between the models learned by convolutional and recurrent networks. In particular, we find that although recurrent networks improve model accuracy, this comes at the expense of a loss in the interpretability of the features learned by the model.

Availability And Implementation: The source code for deepRAM is available at https://github.com/MedChaabane/deepRAM.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz339DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612801PMC
July 2019

Development of the Automated Primer Design Workflow Uniqprimer and Diagnostic Primers for the Broad-Host-Range Plant Pathogen .

Plant Dis 2019 Nov 20;103(11):2893-2902. Epub 2019 Aug 20.

The Connecticut Agricultural Experiment Station, New Haven, CT 06511, U.S.A.

Uniqprimer, a software pipeline developed in Python, was deployed as a user-friendly internet tool in Rice Galaxy for comparative genome analyses to design primer sets for PCRassays capable of detecting target bacterial taxa. The pipeline was trialed with , a destructive broad-host-range bacterial pathogen found in most potato-growing regions. is a highly variable genus, and some primers available to detect this genus and species exhibit common diagnostic failures. Upon uploading a selection of target and nontarget genomes, six primer sets were rapidly identified with Uniqprimer, of which two were specific and sensitive when tested with . . The remaining four amplified a minority of the nontarget strains tested. The two promising candidate primer sets were trialed with DNA isolated from 116 field samples from across the United States that were previously submitted for testing. . was detected in 41 samples, demonstrating the applicability of our detection primers and suggesting widespread occurrence of . in North America.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1094/PDIS-10-18-1819-REDOI Listing
November 2019

Learning protein binding affinity using privileged information.

BMC Bioinformatics 2018 Nov 15;19(1):425. Epub 2018 Nov 15.

Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan.

Background: Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data.

Results: In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well.

Conclusions: The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2448-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6238365PMC
November 2018

Abiotic Stresses Modulate Landscape of Poplar Transcriptome via Alternative Splicing, Differential Intron Retention, and Isoform Ratio Switching.

Front Plant Sci 2018 12;9. Epub 2018 Feb 12.

Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States.

Abiotic stresses affect plant physiology, development, growth, and alter pre-mRNA splicing. Western poplar is a model woody tree and a potential bioenergy feedstock. To investigate the extent of stress-regulated alternative splicing (AS), we conducted an in-depth survey of leaf, root, and stem xylem transcriptomes under drought, salt, or temperature stress. Analysis of approximately one billion of genome-aligned RNA-Seq reads from tissue- or stress-specific libraries revealed over fifteen millions of novel splice junctions. Transcript models supported by both RNA-Seq and single molecule isoform sequencing (Iso-Seq) data revealed a broad array of novel stress- and/or tissue-specific isoforms. Analysis of Iso-Seq data also resulted in the discovery of 15,087 novel transcribed regions of which 164 show AS. Our findings demonstrate that abiotic stresses profoundly perturb transcript isoform profiles and trigger widespread intron retention (IR) events. Stress treatments often increased or decreased retention of specific introns - a phenomenon described here as differential intron retention (DIR). Many differentially retained introns were regulated in a stress- and/or tissue-specific manner. A subset of transcripts harboring super stress-responsive DIR events showed persisting fluctuations in the degree of IR across all treatments and tissue types. To investigate coordinated dynamics of intron-containing transcripts in the study we quantified absolute copy number of isoforms of two conserved transcription factors (TFs) using Droplet Digital PCR. This case study suggests that stress treatments can be associated with coordinated switches in relative ratios between fully spliced and intron-retaining isoforms and may play a role in adjusting transcriptome to abiotic stresses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpls.2018.00005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5816337PMC
February 2018

Exploring the relationship between intron retention and chromatin accessibility in plants.

BMC Genomics 2018 01 5;19(1):21. Epub 2018 Jan 5.

Computer Science Department, Colorado State University, 1873 Campus Delivery, Fort Collins, 80523, CO, USA.

Background: Intron retention (IR) is the most prevalent form of alternative splicing in plants. IR, like other forms of alternative splicing, has an important role in increasing gene product diversity and regulating transcript functionality. Splicing is known to occur co-transcriptionally and is influenced by the speed of transcription which in turn, is affected by chromatin structure. It follows that chromatin structure may have an important role in the regulation of splicing, and there is preliminary evidence in metazoans to suggest that this is indeed the case; however, nothing is known about the role of chromatin structure in regulating IR in plants. DNase I-seq is a useful experimental tool for genome-wide interrogation of chromatin accessibility, providing information on regions of chromatin with very high likelihood of cleavage by the enzyme DNase I, known as DNase I Hypersensitive Sites (DHSs). While it is well-established that promoter regions are highly accessible and are over-represented with DHSs, not much is known about DHSs in the bodies of genes, and their relationship to splicing in general, and IR in particular.

Results: In this study we use publicly available DNase I-seq data in arabidopsis and rice to investigate the relationship between IR and chromatin structure. We find that IR events are highly enriched in DHSs in both species. This implies that chromatin is more open in retained introns, which is consistent with a kinetic model of the process whereby higher speeds of transcription in those regions give less time for the spliceosomal machinery to recognize and splice out those introns co-transcriptionally. The more open chromatin in IR can also be the result of regulation mediated by DNA-binding proteins. To test this, we performed an exhaustive search for footprints left by DNA-binding proteins that are associated with IR. We identified several hundred short sequence elements that exhibit footprints in their DNase I-seq coverage, the telltale sign for binding events of a regulatory protein, protecting its binding site from cleavage by DNase I. A highly significant fraction of those sequence elements are conserved between arabidopsis and rice, a strong indication of their functional importance.

Conclusions: In this study we have established an association between IR and chromatin accessibility, and presented a mechanistic hypothesis that explains the observed association from the perspective of the co-transcriptional nature of splicing. Furthermore, we identified conserved sequence elements for DNA-binding proteins that affect splicing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-017-4393-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5756433PMC
January 2018

Amino acid composition predicts prion activity.

PLoS Comput Biol 2017 04 10;13(4):e1005465. Epub 2017 Apr 10.

Department of Computer Science, Colorado State University, Fort Collins, Colorado, United States of America.

Many prion-forming proteins contain glutamine/asparagine (Q/N) rich domains, and there are conflicting opinions as to the role of primary sequence in their conversion to the prion form: is this phenomenon driven primarily by amino acid composition, or, as a recent computational analysis suggested, dependent on the presence of short sequence elements with high amyloid-forming potential. The argument for the importance of short sequence elements hinged on the relatively-high accuracy obtained using a method that utilizes a collection of length-six sequence elements with known amyloid-forming potential. We weigh in on this question and demonstrate that when those sequence elements are permuted, even higher accuracy is obtained; we also propose a novel multiple-instance machine learning method that uses sequence composition alone, and achieves better accuracy than all existing prion prediction approaches. While we expect there to be elements of primary sequence that affect the process, our experiments suggest that sequence composition alone is sufficient for predicting protein sequences that are likely to form prions. A web-server for the proposed method is available at http://faculty.pieas.edu.pk/fayyaz/prank.html, and the code for reproducing our experiments is available at http://doi.org/10.5281/zenodo.167136.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1005465DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5402983PMC
April 2017

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Authors:
Yuxiang Jiang Tal Ronnen Oron Wyatt T Clark Asma R Bankapur Daniel D'Andrea Rosalba Lepore Christopher S Funk Indika Kahanda Karin M Verspoor Asa Ben-Hur Da Chen Emily Koo Duncan Penfold-Brown Dennis Shasha Noah Youngs Richard Bonneau Alexandra Lin Sayed M E Sahraeian Pier Luigi Martelli Giuseppe Profiti Rita Casadio Renzhi Cao Zhaolong Zhong Jianlin Cheng Adrian Altenhoff Nives Skunca Christophe Dessimoz Tunca Dogan Kai Hakala Suwisa Kaewphan Farrokh Mehryary Tapio Salakoski Filip Ginter Hai Fang Ben Smithers Matt Oates Julian Gough Petri Törönen Patrik Koskinen Liisa Holm Ching-Tai Chen Wen-Lian Hsu Kevin Bryson Domenico Cozzetto Federico Minneci David T Jones Samuel Chapman Dukka Bkc Ishita K Khan Daisuke Kihara Dan Ofer Nadav Rappoport Amos Stern Elena Cibrian-Uhalte Paul Denny Rebecca E Foulger Reija Hieta Duncan Legge Ruth C Lovering Michele Magrane Anna N Melidoni Prudence Mutowo-Meullenet Klemens Pichler Aleksandra Shypitsyna Biao Li Pooya Zakeri Sarah ElShal Léon-Charles Tranchevent Sayoni Das Natalie L Dawson David Lee Jonathan G Lees Ian Sillitoe Prajwal Bhat Tamás Nepusz Alfonso E Romero Rajkumar Sasidharan Haixuan Yang Alberto Paccanaro Jesse Gillis Adriana E Sedeño-Cortés Paul Pavlidis Shou Feng Juan M Cejuela Tatyana Goldberg Tobias Hamp Lothar Richter Asaf Salamov Toni Gabaldon Marina Marcet-Houben Fran Supek Qingtian Gong Wei Ning Yuanpeng Zhou Weidong Tian Marco Falda Paolo Fontana Enrico Lavezzo Stefano Toppo Carlo Ferrari Manuel Giollo Damiano Piovesan Silvio C E Tosatto Angela Del Pozo José M Fernández Paolo Maietta Alfonso Valencia Michael L Tress Alfredo Benso Stefano Di Carlo Gianfranco Politano Alessandro Savino Hafeez Ur Rehman Matteo Re Marco Mesiti Giorgio Valentini Joachim W Bargsten Aalt D J van Dijk Branislava Gemovic Sanja Glisic Vladmir Perovic Veljko Veljkovic Nevena Veljkovic Danillo C Almeida-E-Silva Ricardo Z N Vencio Malvika Sharan Jörg Vogel Lakesh Kansakar Shanshan Zhang Slobodan Vucetic Zheng Wang Michael J E Sternberg Mark N Wass Rachael P Huntley Maria J Martin Claire O'Donovan Peter N Robinson Yves Moreau Anna Tramontano Patricia C Babbitt Steven E Brenner Michal Linial Christine A Orengo Burkhard Rost Casey S Greene Sean D Mooney Iddo Friedberg Predrag Radivojac

Genome Biol 2016 09 7;17(1):184. Epub 2016 Sep 7.

Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA.

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.

Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-016-1037-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5015320PMC
September 2016

aPPRove: An HMM-Based Method for Accurate Prediction of RNA-Pentatricopeptide Repeat Protein Binding Events.

PLoS One 2016 25;11(8):e0160645. Epub 2016 Aug 25.

Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, 32611, United States of America.

Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The [Formula: see text] class contains tandem [Formula: see text]-type motif sequences, and the [Formula: see text] class contains alternating [Formula: see text], [Formula: see text] and [Formula: see text] type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a [Formula: see text]-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the [Formula: see text] class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for [Formula: see text]-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0160645PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4999063PMC
August 2017

A survey of the sorghum transcriptome using single-molecule long reads.

Nat Commun 2016 06 24;7:11706. Epub 2016 Jun 24.

Department of Biology, Program in Molecular Plant Biology, Program in Cell and Molecular Biology, Colorado State University, Fort Collins, Colorado 80523, USA.

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms11706DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4931028PMC
June 2016

PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources.

F1000Res 2015 16;4:259. Epub 2015 Jul 16.

Department of Computer Science, Colorado State University, Fort Collins, CO, 80523, USA.

The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.6670.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4722686PMC
February 2016

Transcriptome-Wide Identification of RNA Targets of Arabidopsis SERINE/ARGININE-RICH45 Uncovers the Unexpected Roles of This RNA Binding Protein in RNA Processing.

Plant Cell 2015 Dec 24;27(12):3294-308. Epub 2015 Nov 24.

Department of Biology and Program in Molecular Plant Biology, Colorado State University, Fort Collins, Colorado 80523

Plant SR45 and its metazoan ortholog RNPS1 are serine/arginine-rich (SR)-like RNA binding proteins that function in splicing/postsplicing events and regulate diverse processes in eukaryotes. Interactions of SR45 with both RNAs and proteins are crucial for regulating RNA processing. However, in vivo RNA targets of SR45 are currently unclear. Using RNA immunoprecipitation followed by high-throughput sequencing, we identified over 4000 Arabidopsis thaliana RNAs that directly or indirectly associate with SR45, designated as SR45-associated RNAs (SARs). Comprehensive analyses of these SARs revealed several roles for SR45. First, SR45 associates with and regulates the expression of 30% of abscisic acid (ABA) signaling genes at the postsplicing level. Second, although most SARs are derived from intron-containing genes, surprisingly, 340 SARs are derived from intronless genes. Expression analysis of the SARs suggests that SR45 differentially regulates intronless and intron-containing SARs. Finally, we identified four overrepresented RNA motifs in SARs that likely mediate SR45's recognition of its targets. Therefore, SR45 plays an unexpected role in mRNA processing of intronless genes, and numerous ABA signaling genes are targeted for regulation at the posttranscriptional level. The diverse molecular functions of SR45 uncovered in this study are likely applicable to other species in view of its conservation across eukaryotes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1105/tpc.15.00641DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4707455PMC
December 2015

A close look at protein function prediction evaluation protocols.

Gigascience 2015 14;4:41. Epub 2015 Sep 14.

Department of Computer Science, Colorado State University, Fort Collins, 80523 CO USA.

Background: The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance.

Results: The CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods.

Conclusions: These results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13742-015-0082-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570743PMC
July 2016

Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

J Biomed Semantics 2015 18;6. Epub 2015 Mar 18.

Department of Computing and Information Systems, University of Melbourne, Parkville, 3010 Victoria Australia ; Health and Biomedical Informatics Centre, University of Melbourne, Parkville, 3010 Victoria Australia.

Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-015-0006-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4441003PMC
May 2015

PAIRpred: partner-specific prediction of interacting residues from sequence and structure.

Proteins 2014 Jul 6;82(7):1142-55. Epub 2013 Dec 6.

Department of Computer Science, Colorado State University, Fort Collins, Colorado, 80523.

We present a novel partner-specific protein-protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter-protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding-associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of the potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at http://combi.cs.colostate.edu/supplements/pairpred/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.24479DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4329725PMC
July 2014

A bioinformatics method for identifying Q/N-rich prion-like domains in proteins.

Methods Mol Biol 2013 ;1017:219-28

Department of Biochemistry, Colorado State University, Fort Collins, CO, USA.

Numerous proteins contain domains that are enriched in glutamine and asparagine residues, and aggregation of some of these proteins has been linked to both prion formation in yeast and a number of human diseases. Unfortunately, predicting whether a given glutamine/asparagine-rich protein will aggregate has proven difficult. Here we describe a recently developed algorithm designed to predict the aggregation propensity of glutamine/asparagine-rich proteins. We discuss the basis for the algorithm, its limitations, and usage of recently developed online and downloadable versions of the algorithm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-62703-438-8_16DOI Listing
December 2013

Combining heterogeneous data sources for accurate functional annotation of proteins.

BMC Bioinformatics 2013 28;14 Suppl 3:S10. Epub 2013 Feb 28.

Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA.

Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species and models trained from collections of data within a given species. This version of GOstruct participated in the recent Critical Assessment of Functional Annotations (CAFA) challenge; since then we have significantly improved the natural language processing component of the method, which now provides performance that is on par with that provided by sequence information. The GOstruct framework is available for download at http://strut.sourceforge.net.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-14-S3-S10DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584846PMC
July 2013

A large-scale evaluation of computational protein function prediction.

Authors:
Predrag Radivojac Wyatt T Clark Tal Ronnen Oron Alexandra M Schnoes Tobias Wittkop Artem Sokolov Kiley Graim Christopher Funk Karin Verspoor Asa Ben-Hur Gaurav Pandey Jeffrey M Yunes Ameet S Talwalkar Susanna Repo Michael L Souza Damiano Piovesan Rita Casadio Zheng Wang Jianlin Cheng Hai Fang Julian Gough Patrik Koskinen Petri Törönen Jussi Nokso-Koivisto Liisa Holm Domenico Cozzetto Daniel W A Buchan Kevin Bryson David T Jones Bhakti Limaye Harshal Inamdar Avik Datta Sunitha K Manjari Rajendra Joshi Meghana Chitale Daisuke Kihara Andreas M Lisewski Serkan Erdin Eric Venner Olivier Lichtarge Robert Rentzsch Haixuan Yang Alfonso E Romero Prajwal Bhat Alberto Paccanaro Tobias Hamp Rebecca Kaßner Stefan Seemayer Esmeralda Vicedo Christian Schaefer Dominik Achten Florian Auer Ariane Boehm Tatjana Braun Maximilian Hecht Mark Heron Peter Hönigschmid Thomas A Hopf Stefanie Kaufmann Michael Kiening Denis Krompass Cedric Landerer Yannick Mahlich Manfred Roos Jari Björne Tapio Salakoski Andrew Wong Hagit Shatkay Fanny Gatzmann Ingolf Sommer Mark N Wass Michael J E Sternberg Nives Škunca Fran Supek Matko Bošnjak Panče Panov Sašo Džeroski Tomislav Šmuc Yiannis A I Kourmpetis Aalt D J van Dijk Cajo J F ter Braak Yuanpeng Zhou Qingtian Gong Xinran Dong Weidong Tian Marco Falda Paolo Fontana Enrico Lavezzo Barbara Di Camillo Stefano Toppo Liang Lan Nemanja Djuric Yuhong Guo Slobodan Vucetic Amos Bairoch Michal Linial Patricia C Babbitt Steven E Brenner Christine Orengo Burkhard Rost Sean D Mooney Iddo Friedberg

Nat Methods 2013 Mar 27;10(3):221-7. Epub 2013 Jan 27.

School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA.

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.2340DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584181PMC
March 2013

ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data.

Nucleic Acids Res 2013 Jan 9;41(Database issue):D142-51. Epub 2012 Nov 9.

Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain.

Chimeric RNAs that comprise two or more different transcripts have been identified in many cancers and among the Expressed Sequence Tags (ESTs) isolated from different organisms; they might represent functional proteins and produce different disease phenotypes. The ChiTaRS database of Chimeric Transcripts and RNA-Sequencing data (http://chitars.bioinfo.cnio.es/) collects more than 16 000 chimeric RNAs from humans, mice and fruit flies, 233 chimeras confirmed by RNA-seq reads and ∼2000 cancer breakpoints. The database indicates the expression and tissue specificity of these chimeras, as confirmed by RNA-seq data, and it includes mass spectrometry results for some human entries at their junctions. Moreover, the database has advanced features to analyze junction consistency and to rank chimeras based on the evidence of repeated junction sites. Finally, 'Junction Search' screens through the RNA-seq reads found at the chimeras' junction sites to identify putative junctions in novel sequences entered by users. Thus, ChiTaRS is an extensive catalog of human, mouse and fruit fly chimeras that will extend our understanding of the evolution of chimeric transcripts in eukaryotes and can be advantageous in the analysis of human cancer breakpoints.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531201PMC
January 2013

Multiple instance learning of Calmodulin binding sites.

Bioinformatics 2012 Sep;28(18):i416-i422

Department of Computer Science, Colorado State University, Fort Collins, CO 80523-1873, USA.

Motivation: Calmodulin (CaM) is a ubiquitously conserved protein that acts as a calcium sensor, and interacts with a large number of proteins. Detection of CaM binding proteins and their interaction sites experimentally requires a significant effort, so accurate methods for their prediction are important.

Results: We present a novel algorithm (MI-1 SVM) for binding site prediction and evaluate its performance on a set of CaM-binding proteins extracted from the Calmodulin Target Database. Our approach directly models the problem of binding site prediction as a large-margin classification problem, and is able to take into account uncertainty in binding site location. We show that the proposed algorithm performs better than the standard SVM formulation, and illustrate its ability to recover known CaM binding motifs. A highly accurate cascaded classification approach using the proposed binding site prediction method to predict CaM binding proteins in Arabidopsis thaliana is also presented.

Availability: Matlab code for training MI-1 SVM and the cascaded classification approach is available on request.

Contact: fayyazafsar@gmail.com or asa@cs.colostate.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts416DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436843PMC
September 2012

Identification of an intronic splicing regulatory element involved in auto-regulation of alternative splicing of SCL33 pre-mRNA.

Plant J 2012 Dec 29;72(6):935-46. Epub 2012 Oct 29.

Department of Biology, Program in Molecular Plant Biology, Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USADepartment of Computer Science and Program in Molecular Plant Biology, Colorado State University, Fort Collins, CO 80523, USA.

In Arabidopsis, pre-mRNAs of serine/arginine-rich (SR) proteins undergo extensive alternative splicing (AS). However, little is known about the cis-elements and trans-acting proteins involved in regulating AS. Using a splicing reporter (GFP-intron-GFP), consisting of the GFP coding sequence interrupted by an alternatively spliced intron of SCL33, we investigated whether cis-elements within this intron are sufficient for AS, and which SR proteins are necessary for regulated AS. Expression of the splicing reporter in protoplasts faithfully produced all splice variants from the intron, suggesting that cis-elements required for AS reside within the intron. To determine which SR proteins are responsible for AS, the splicing pattern of the GFP-intron-GFP reporter was investigated in protoplasts of three single and three double mutants of SR genes. These analyses revealed that SCL33 and a closely related paralog, SCL30a, are functionally redundant in generating specific splice variants from this intron. Furthermore, SCL33 protein bound to a conserved sequence in this intron, indicating auto-regulation of AS. Mutations in four GAAG repeats within the conserved region impaired generation of the same splice variants that are affected in the scl33 scl30a double mutant. In conclusion, we have identified the first intronic cis-element involved in AS of a plant SR gene, and elucidated a mechanism for auto-regulation of AS of this intron.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/tpj.12004DOI Listing
December 2012

Deciphering the plant splicing code: experimental and computational approaches for predicting alternative splicing and splicing regulatory elements.

Front Plant Sci 2012 7;3:18. Epub 2012 Feb 7.

Program in Molecular Plant Biology, Department of Biology, Colorado State University Fort Collins, CO, USA.

Extensive alternative splicing (AS) of precursor mRNAs (pre-mRNAs) in multicellular eukaryotes increases the protein-coding capacity of a genome and allows novel ways to regulate gene expression. In flowering plants, up to 48% of intron-containing genes exhibit AS. However, the full extent of AS in plants is not yet known, as only a few high-throughput RNA-Seq studies have been performed. As the cost of obtaining RNA-Seq reads continues to fall, it is anticipated that huge amounts of plant sequence data will accumulate and help in obtaining a more complete picture of AS in plants. Although it is not an onerous task to obtain hundreds of millions of reads using high-throughput sequencing technologies, computational tools to accurately predict and visualize AS are still being developed and refined. This review will discuss the tools to predict and visualize transcriptome-wide AS in plants using short-reads and highlight their limitations. Comparative studies of AS events between plants and animals have revealed that there are major differences in the most prevalent types of AS events, suggesting that plants and animals differ in the way they recognize exons and introns. Extensive studies have been performed in animals to identify cis-elements involved in regulating AS, especially in exon skipping. However, few such studies have been carried out in plants. Here, we review the current state of research on splicing regulatory elements (SREs) and briefly discuss emerging experimental and computational tools to identify cis-elements involved in regulation of AS in plants. The availability of curated alternative splice forms in plants makes it possible to use computational tools to predict SREs involved in AS regulation, which can then be verified experimentally. Such studies will permit identification of plant-specific features involved in AS regulation and contribute to deciphering the splicing code in plants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpls.2012.00018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3355732PMC
August 2012

De novo design of synthetic prion domains.

Proc Natl Acad Sci U S A 2012 Apr 2;109(17):6519-24. Epub 2012 Apr 2.

Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA.

Prions are important disease agents and epigenetic regulatory elements. Prion formation involves the structural conversion of proteins from a soluble form into an insoluble amyloid form. In many cases, this structural conversion is driven by a glutamine/asparagine (Q/N)-rich prion-forming domain. However, our understanding of the sequence requirements for prion formation and propagation by Q/N-rich domains has been insufficient for accurate prion propensity prediction or prion domain design. By focusing exclusively on amino acid composition, we have developed a prion aggregation prediction algorithm (PAPA), specifically designed to predict prion propensity of Q/N-rich proteins. Here, we show not only that this algorithm is far more effective than traditional amyloid prediction algorithms at predicting prion propensity of Q/N-rich proteins, but remarkably, also that PAPA is capable of rationally designing protein domains that function as prions in vivo.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1119366109DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3340034PMC
April 2012

SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data.

Genome Biol 2012 Jan 31;13(1):R4. Epub 2012 Jan 31.

Department of Computer Science, 1873 Campus Delivery, Colorado State University, Fort Collins, CO 80523-1873, USA.

We propose a method for predicting splice graphs that enhances curated gene models using evidence from RNA-Seq and EST alignments. Results obtained using RNA-Seq experiments in Arabidopsis thaliana show that predictions made by our SpliceGrapher method are more consistent with current gene models than predictions made by TAU and Cufflinks. Furthermore, analysis of plant and human data indicates that the machine learning approach used by SpliceGrapher is useful for discriminating between real and spurious splice sites, and can improve the reliability of detection of alternative splicing. SpliceGrapher is available for download at http://SpliceGrapher.sf.net.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2012-13-1-r4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3334585PMC
January 2012

Comparative analysis of serine/arginine-rich proteins across 27 eukaryotes: insights into sub-family classification and extent of alternative splicing.

PLoS One 2011 14;6(9):e24542. Epub 2011 Sep 14.

Department of Bioinformatics and Population Genetics, Universität zu Köln, Köln, Germany.

Alternative splicing (AS) of pre-mRNA is a fundamental molecular process that generates diversity in the transcriptome and proteome of eukaryotic organisms. SR proteins, a family of splicing regulators with one or two RNA recognition motifs (RRMs) at the N-terminus and an arg/ser-rich domain at the C-terminus, function in both constitutive and alternative splicing. We identified SR proteins in 27 eukaryotic species, which include plants, animals, fungi and "basal" eukaryotes that lie outside of these lineages. Using RNA recognition motifs (RRMs) as a phylogenetic marker, we classified 272 SR genes into robust sub-families. The SR gene family can be split into five major groupings, which can be further separated into 11 distinct sub-families. Most flowering plants have double or nearly double the number of SR genes found in vertebrates. The majority of plant SR genes are under purifying selection. Moreover, in all paralogous SR genes in Arabidopsis, rice, soybean and maize, one of the two paralogs is preferentially expressed throughout plant development. We also assessed the extent of AS in SR genes based on a splice graph approach (http://combi.cs.colostate.edu/as/gmap_SRgenes). AS of SR genes is a widespread phenomenon throughout multiple lineages, with alternative 3' or 5' splicing events being the most prominent type of event. However, plant-enriched sub-families have 57%-88% of their SR genes experiencing some type of AS compared to the 40%-54% seen in other sub-families. The SR gene family is pervasive throughout multiple eukaryotic lineages, conserved in sequence and domain organization, but differs in gene number across lineages with an abundance of SR genes in flowering plants. The higher number of alternatively spliced SR genes in plants emphasizes the importance of AS in generating splice variants in these organisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0024542PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173450PMC
February 2012

Experimental and computational approaches for the study of calmodulin interactions.

Phytochemistry 2011 Jul 19;72(10):1007-19. Epub 2011 Feb 19.

Department of Biology, Program in Molecular Plant Biology, Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA.

Ca(2+), a universal messenger in eukaryotes, plays a major role in signaling pathways that control many growth and developmental processes in plants as well as their responses to various biotic and abiotic stresses. Cellular changes in Ca(2+) in response to diverse signals are recognized by protein sensors that either have their activity modulated or that interact with other proteins and modulate their activity. Calmodulins (CaMs) and CaM-like proteins (CMLs) are Ca(2+) sensors that have no enzymatic activity of their own but upon binding Ca(2+) interact and modulate the activity of other proteins involved in a large number of plant processes. Protein-protein interactions play a key role in Ca(2+)/CaM-mediated in signaling pathways. In this review, using CaM as an example, we discuss various experimental approaches and computational tools to identify protein-protein interactions. During the last two decades hundreds of CaM-binding proteins in plants have been identified using a variety of approaches ranging from simple screening of expression libraries with labeled CaM to high-throughput screens using protein chips. However, the high-throughput methods have not been applied to the entire proteome of any plant system. Nevertheless, the data provided by these screens allows the development of computational tools to predict CaM-interacting proteins. Using all known binding sites of CaM, we developed a computational method that predicted over 700 high confidence CaM interactors in the Arabidopsis proteome. Most (>600) of these are not known to bind calmodulin, suggesting that there are likely many more CaM targets than previously known. Functional analyses of some of the experimentally identified Ca(2+) sensor target proteins have uncovered their precise role in Ca(2+)-mediated processes. Further studies on identifying novel targets of CaM and CMLs and generating their interaction network - "calcium sensor interactome" - will help us in understanding how Ca(2+) regulates a myriad of cellular and physiological processes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.phytochem.2010.12.022DOI Listing
July 2011