Publications by authors named "Yoseph Barash"

38 Publications

Rapid and Scalable Profiling of Nascent RNA with fastGRO.

Cell Rep 2020 Nov;33(6):108373

The Wistar Institute, Gene Expression and Regulation Program, 3601 Spruce Street, Philadelphia, PA 19104, USA. Electronic address:

Genome-wide profiling of nascent RNA has become a fundamental tool to study transcription regulation. Unlike steady-state RNA-sequencing (RNA-seq), nascent RNA profiling mirrors real-time activity of RNA polymerases and provides an accurate readout of transcriptome-wide variations. Some species of nuclear RNAs (i.e., large intergenic noncoding RNAs [lincRNAs] and eRNAs) have a short half-life and can only be accurately gauged by nascent RNA techniques. Furthermore, nascent RNA-seq detects post-cleavage RNA at termination sites and promoter-associated antisense RNAs, providing insights into RNA polymerase II (RNAPII) dynamics and processivity. Here, we present a run-on assay with 4-thio ribonucleotide (4-S-UTP) labeling, followed by reversible biotinylation and affinity purification via streptavidin. Our protocol allows streamlined sample preparation within less than 3 days. We named the technique fastGRO (fast Global Run-On). We show that fastGRO is highly reproducible and yields a more complete and extensive coverage of nascent RNA than comparable techniques can. Importantly, we demonstrate that fastGRO is scalable and can be performed with as few as 0.5 × 10 cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2020.108373DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7702699PMC
November 2020

Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study.

Genome Biol 2020 06 19;21(1):149. Epub 2020 Jun 19.

Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, USA.

Despite the success and fast adaptation of deep learning models in biomedical domains, their lack of interpretability remains an issue. Here, we introduce Enhanced Integrated Gradients (EIG), a method to identify significant features associated with a specific prediction task. Using RNA splicing prediction as well as digit classification as case studies, we demonstrate that EIG improves upon the original Integrated Gradients method and produces sets of informative features. We then apply EIG to identify A1CF as a key regulator of liver-specific alternative splicing, supporting this finding with subsequent analysis of relevant A1CF functional (RNA-seq) and binding data (PAR-CLIP).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02055-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305616PMC
June 2020

Meta-analysis of transcriptomic variation in T-cell populations reveals both variable and consistent signatures of gene expression and splicing.

RNA 2020 10 17;26(10):1320-1333. Epub 2020 Jun 17.

Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.

Human CD4 T cells are often subdivided into distinct subtypes, including Th1, Th2, Th17, and Treg cells, that are thought to carry out distinct functions in the body. Typically, these T-cell subpopulations are defined by the expression of distinct gene repertoires; however, there is variability between studies regarding the methods used for isolation and the markers used to define each T-cell subtype. Therefore, how reliably studies can be compared to one another remains an open question. Moreover, previous analysis of gene expression in CD4 T-cell subsets has largely focused on gene expression rather than alternative splicing. Here we take a meta-analysis approach, comparing eleven independent RNA-seq studies of human Th1, Th2, Th17, and/or Treg cells to determine the consistency in gene expression and splicing within each subtype across studies. We find that known master-regulators are consistently enriched in the appropriate subtype; however, cytokines and other genes often used as markers are more variable. Importantly, we also identify previously unknown transcriptomic markers that appear to consistently differentiate between subsets, including a few Treg-specific splicing patterns. Together this work highlights the heterogeneity in gene expression between samples designated as the same subtype, but also suggests additional markers that can be used to define functional groupings.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1261/rna.075929.120DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7491319PMC
October 2020

Mapping RNA splicing variations in clinically accessible and nonaccessible tissues to facilitate Mendelian disease diagnosis using RNA-seq.

Genet Med 2020 Jul 30;22(7):1181-1190. Epub 2020 Mar 30.

Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.

Purpose: RNA-seq is a promising approach to improve diagnoses by detecting pathogenic aberrations in RNA splicing that are missed by DNA sequencing. RNA-seq is typically performed on clinically accessible tissues (CATs) from blood and skin. RNA tissue specificity makes it difficult to identify aberrations in relevant but nonaccessible tissues (non-CATs). We determined how RNA-seq from CATs represent splicing in and across genes and non-CATs.

Methods: We quantified RNA splicing in 801 RNA-seq samples from 56 different adult and fetal tissues from Genotype-Tissue Expression Project (GTEx) and ArrayExpress. We identified genes and splicing events in each non-CAT and determined when RNA-seq in each CAT would inadequately represent them. We developed an online resource, MAJIQ-CAT, for exploring our analysis for specific genes and tissues.

Results: In non-CATs, 40.2% of genes have splicing that is inadequately represented by at least one CAT; 6.3% of genes have splicing inadequately represented by all CATs. A majority (52.1%) of inadequately represented genes are lowly expressed in CATs (transcripts per million (TPM) < 1), but 5.8% are inadequately represented despite being well expressed (TPM > 10).

Conclusion: Many splicing events in non-CATs are inadequately evaluated using RNA-seq from CATs. MAJIQ-CAT allows users to explore which accessible tissues, if any, best represent splicing in genes and tissues of interest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41436-020-0780-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7335339PMC
July 2020

Integrative analysis reveals RNA G-quadruplexes in UTRs are selectively constrained and enriched for functional associations.

Nat Commun 2020 Jan 27;11(1):527. Epub 2020 Jan 27.

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5' and 3' UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we find that negative selection acting on central guanines of UTR pG4s is comparable to that of missense variation in protein-coding sequences. At multiple GWAS-implicated SNPs within pG4 UTR sequences, we find robust allelic imbalance in gene expression across diverse tissue contexts in GTEx, suggesting that variants affecting G-quadruplex formation within UTRs may also contribute to phenotypic variation. Our results establish UTR G4s as important cis-regulatory elements and point to a link between disruption of UTR pG4 and disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14404-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6985247PMC
January 2020

Genomic profiling of human vascular cells identifies TWIST1 as a causal gene for common vascular diseases.

PLoS Genet 2020 01 9;16(1):e1008538. Epub 2020 Jan 9.

Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Genome-wide association studies have identified multiple novel genomic loci associated with vascular diseases. Many of these loci are common non-coding variants that affect the expression of disease-relevant genes within coronary vascular cells. To identify such genes on a genome-wide level, we performed deep transcriptomic analysis of genotyped primary human coronary artery smooth muscle cells (HCASMCs) and coronary endothelial cells (HCAECs) from the same subjects, including splicing Quantitative Trait Loci (sQTL), allele-specific expression (ASE), and colocalization analyses. We identified sQTLs for TARS2, YAP1, CFDP1, and STAT6 in HCASMCs and HCAECs, and 233 ASE genes, a subset of which are also GTEx eGenes in arterial tissues. Colocalization of GWAS association signals for coronary artery disease (CAD), migraine, stroke and abdominal aortic aneurysm with GTEx eGenes in aorta, coronary artery and tibial artery discovered novel candidate risk genes for these diseases. At the CAD and stroke locus tagged by rs2107595 we demonstrate colocalization with expression of the proximal gene TWIST1. We show that disrupting the rs2107595 locus alters TWIST1 expression and that the risk allele has increased binding of the NOTCH signaling protein RBPJ. Finally, we provide data that TWIST1 expression influences vascular SMC phenotypes, including proliferation and calcification, as a potential mechanism supporting a role for TWIST1 in CAD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008538DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6975560PMC
January 2020

Missense Mutations in NKAP Cause a Disorder of Transcriptional Regulation Characterized by Marfanoid Habitus and Cognitive Impairment.

Am J Hum Genet 2019 11 3;105(5):987-995. Epub 2019 Oct 3.

Division of Human Genetics, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA; Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Laboratory of Rare Disease Research, Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 113-8657, Japan. Electronic address:

NKAP is a ubiquitously expressed nucleoplasmic protein that is currently known as a transcriptional regulatory molecule via its interaction with HDAC3 and spliceosomal proteins. Here, we report a disorder of transcriptional regulation due to missense mutations in the X chromosome gene, NKAP. These mutations are clustered in the C-terminal region of NKAP where NKAP interacts with HDAC3 and post-catalytic spliceosomal complex proteins. Consistent with a role for the C-terminal region of NKAP in embryogenesis, nkap mutant zebrafish with a C-terminally truncated NKAP demonstrate severe developmental defects. The clinical features of affected individuals are highly conserved and include developmental delay, hypotonia, joint contractures, behavioral abnormalities, Marfanoid habitus, and scoliosis. In affected cases, transcriptome analysis revealed the presence of a unique transcriptome signature, which is characterized by the downregulation of long genes with higher exon numbers. These observations indicate the critical role of NKAP in transcriptional regulation and demonstrate that perturbations of the C-terminal region lead to developmental defects in both humans and zebrafish.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2019.09.009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848994PMC
November 2019

An ancient germ cell-specific RNA-binding protein protects the germline from cryptic splice site poisoning.

Elife 2019 01 24;8. Epub 2019 Jan 24.

Institute of Genetic Medicine, Newcastle University, Newcastle, United Kingdom.

Male germ cells of all placental mammals express an ancient nuclear RNA binding protein of unknown function called RBMXL2. Here we find that deletion of the retrogene encoding RBMXL2 blocks spermatogenesis. Transcriptome analyses of age-matched deletion mice show that RBMXL2 controls splicing patterns during meiosis. In particular, RBMXL2 represses the selection of aberrant splice sites and the insertion of cryptic and premature terminal exons. Our data suggest a retrogene has been conserved across mammals as part of a splicing control mechanism that is fundamentally important to germ cell biology. We propose that this mechanism is essential to meiosis because it buffers the high ambient concentrations of splicing activators, thereby preventing poisoning of key transcripts and disruption to gene expression by aberrant splice site selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.39304DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6345566PMC
January 2019

Aberrant splicing in B-cell acute lymphoblastic leukemia.

Nucleic Acids Res 2018 11;46(21):11357-11369

Department of Pathology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

Aberrant splicing is a hallmark of leukemias with mutations in splicing factor (SF)-encoding genes. Here we investigated its prevalence in pediatric B-cell acute lymphoblastic leukemias (B-ALL), where SFs are not mutated. By comparing these samples to normal pro-B cells, we found thousands of aberrant local splice variations (LSVs) per sample, with 279 LSVs in 241 genes present in every comparison. These genes were enriched in RNA processing pathways and encoded ∼100 SFs, e.g. hnRNPA1. HNRNPA1 3'UTR was most pervasively mis-spliced, yielding the transcript subject to nonsense-mediated decay. To mimic this event, we knocked it down in B-lymphoblastoid cells and identified 213 hnRNPA1-regulated exon usage events comprising the hnRNPA1 splicing signature in pediatric leukemia. Some of its elements were LSVs in DICER1 and NT5C2, known cancer drivers. We searched for LSVs in other leukemia and lymphoma drivers and discovered 81 LSVs in 41 additional genes. Seventy-seven LSVs out of 81 were confirmed using two large independent B-ALL RNA-seq datasets, and the twenty most common B-ALL drivers, including NT5C2, showed higher prevalence of aberrant splicing than of somatic mutations. Thus, post-transcriptional deregulation of SF can drive widespread changes in B-ALL splicing and likely contributes to disease pathogenesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky946DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6277088PMC
November 2018

Poly(C)-Binding Protein Pcbp2 Enables Differentiation of Definitive Erythropoiesis by Directing Functional Splicing of the Runx1 Transcript.

Mol Cell Biol 2018 08 30;38(16). Epub 2018 Jul 30.

Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA

Formation of the mammalian hematopoietic system is under a complex set of developmental controls. Here, we report that mouse embryos lacking the KH domain poly(C) binding protein, Pcbp2, are selectively deficient in the definitive erythroid lineage. Compared to wild-type controls, transcript splicing analysis of the Pcbp2 embryonic liver reveals accentuated exclusion of an exon (exon 6) that encodes a highly conserved transcriptional control segment of the hematopoietic master regulator, Runx1. Embryos rendered homozygous for a Runx1 locus lacking this cassette exon (Runx1ΔE6) effectively phenocopy the loss of the definitive erythroid lineage in Pcbp2 embryos. These data support a model in which enhancement of Runx1 cassette exon 6 inclusion by Pcbp2 serves a critical role in development of hematopoietic progenitors and constitutes a critical step in the developmental pathway of the definitive erythropoietic lineage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/MCB.00175-18DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6066754PMC
August 2018

Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates.

Bioinformatics 2018 05;34(9):1488-1497

Department of Genetics, Perelman School of Medicine.

Motivation: A key component in many RNA-Seq-based studies is contrasting multiple replicates from different experimental conditions. In this setup, replicates play a key role as they allow to capture underlying biological variability inherent to the compared conditions, as well as experimental variability. However, what constitutes a 'bad' replicate is not necessarily well defined. Consequently, researchers might discard valuable data or downstream analysis may be hampered by failed experiments.

Results: Here we develop a probability model to weigh a given RNA-Seq sample as a representative of an experimental condition when performing alternative splicing analysis. We demonstrate that this model detects outlier samples which are consistently and significantly different compared with other samples from the same condition. Moreover, we show that instead of discarding such samples the proposed weighting scheme can be used to downweight samples and specific splicing variations suspected as outliers, gaining statistical power. These weights can then be used for differential splicing (DS) analysis, where the resulting algorithm offers a generalization of the MAJIQ algorithm. Using both synthetic and real-life data, we perform an extensive evaluation of the improved MAJIQ algorithm in different scenarios involving perturbed samples, mislabeled samples, same condition groups, and different levels of coverage, showing it compares favorably to other tools. Overall, this work offers an outlier detection algorithm that can be combined with any splicing pipeline, a generalized and improved version of MAJIQ for DS detection, and evaluation metrics with matching code and data for DS algorithms.

Availability And Implementation: Software and data are accessible via majiq.biociphers.org/norton_et_al_2017/.

Contact: yosephb@upenn.edu.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx790DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454425PMC
May 2018

ESRP1 Mutations Cause Hearing Loss due to Defects in Alternative Splicing that Disrupt Cochlear Development.

Dev Cell 2017 11 26;43(3):318-331.e5. Epub 2017 Oct 26.

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Clinical Research Building, Room 463, 415 Curie Boulevard, Philadelphia, PA 19104, USA. Electronic address:

Alternative splicing contributes to gene expression dynamics in many tissues, yet its role in auditory development remains unclear. We performed whole-exome sequencing in individuals with sensorineural hearing loss (SNHL) and identified pathogenic mutations in Epithelial Splicing-Regulatory Protein 1 (ESRP1). Patient-derived induced pluripotent stem cells showed alternative splicing defects that were restored upon repair of an ESRP1 mutant allele. To determine how ESRP1 mutations cause hearing loss, we evaluated Esrp1 mouse embryos and uncovered alterations in cochlear morphogenesis, auditory hair cell differentiation, and cell fate specification. Transcriptome analysis revealed impaired expression and splicing of genes with essential roles in cochlea development and auditory function. Aberrant splicing of Fgfr2 blocked stria vascularis formation due to erroneous ligand usage, which was corrected by reducing Fgf9 gene dosage. These findings implicate mutations in ESRP1 as a cause of SNHL and demonstrate the complex interplay between alternative splicing, inner ear development, and auditory function.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.devcel.2017.09.026DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5687886PMC
November 2017

MAJIQ-SPEL: web-tool to interrogate classical and complex splicing variations from RNA-Seq data.

Bioinformatics 2018 Jan;34(2):300-302

Department of Genetics, Perelman School of Medicine, Philadelphia, PA, USA.

Summary: Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis.

Availability And Implementation: Program and code will be available at http://majiq.biociphers.org/majiq-spel.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx565DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7263396PMC
January 2018

Transcriptome analysis of hypoxic cancer cells uncovers intron retention in EIF2B5 as a mechanism to inhibit translation.

PLoS Biol 2017 Sep 29;15(9):e2002623. Epub 2017 Sep 29.

Department of Radiation Oncology Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Cells adjust to hypoxic stress within the tumor microenvironment by downregulating energy-consuming processes including translation. To delineate mechanisms of cellular adaptation to hypoxia, we performed RNA-Seq of normoxic and hypoxic head and neck cancer cells. These data revealed a significant down regulation of genes known to regulate RNA processing and splicing. Exon-level analyses classified > 1,000 mRNAs as alternatively spliced under hypoxia and uncovered a unique retained intron (RI) in the master regulator of translation initiation, EIF2B5. Notably, this intron was expressed in solid tumors in a stage-dependent manner. We investigated the biological consequence of this RI and demonstrate that its inclusion creates a premature termination codon (PTC), that leads to a 65kDa truncated protein isoform that opposes full-length eIF2Bε to inhibit global translation. Furthermore, expression of 65kDa eIF2Bε led to increased survival of head and neck cancer cells under hypoxia, providing evidence that this isoform enables cells to adapt to conditions of low oxygen. Additional work to uncover -cis and -trans regulators of EIF2B5 splicing identified several factors that influence intron retention in EIF2B5: a weak splicing potential at the RI, hypoxia-induced expression and binding of the splicing factor SRSF3, and increased binding of total and phospho-Ser2 RNA polymerase II specifically at the intron retained under hypoxia. Altogether, these data reveal differential splicing as a previously uncharacterized mode of translational control under hypoxia and are supported by a model in which hypoxia-induced changes to cotranscriptional processing lead to selective retention of a PTC-containing intron in EIF2B5.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.2002623DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636171PMC
September 2017

Phosphoproteomics reveals that glycogen synthase kinase-3 phosphorylates multiple splicing factors and is associated with alternative splicing.

J Biol Chem 2017 11 15;292(44):18240-18255. Epub 2017 Sep 15.

From the Pharmacology Graduate Group,

Glycogen synthase kinase-3 (GSK-3) is a constitutively active, ubiquitously expressed protein kinase that regulates multiple signaling pathways. kinase assays and genetic and pharmacological manipulations of GSK-3 have identified more than 100 putative GSK-3 substrates in diverse cell types. Many more have been predicted on the basis of a recurrent GSK-3 consensus motif ((pS/pT)(S/T)), but this prediction has not been tested by analyzing the GSK-3 phosphoproteome. Using stable isotope labeling of amino acids in culture (SILAC) and MS techniques to analyze the repertoire of GSK-3-dependent phosphorylation in mouse embryonic stem cells (ESCs), we found that ∼2.4% of (pS/pT)(S/T) sites are phosphorylated in a GSK-3-dependent manner. A comparison of WT and knock-out ( DKO) ESCs revealed prominent GSK-3-dependent phosphorylation of multiple splicing factors and regulators of RNA biosynthesis as well as proteins that regulate transcription, translation, and cell division. DKO reduced phosphorylation of the splicing factors RBM8A, SRSF9, and PSF as well as the nucleolar proteins NPM1 and PHF6, and recombinant GSK-3β phosphorylated these proteins RNA-Seq of WT and DKO ESCs identified ∼190 genes that are alternatively spliced in a GSK-3-dependent manner, supporting a broad role for GSK-3 in regulating alternative splicing. The MS data also identified posttranscriptional regulation of protein abundance by GSK-3, with ∼47 proteins (1.4%) whose levels increased and ∼78 (2.4%) whose levels decreased in the absence of GSK-3. This study provides the first unbiased analysis of the GSK-3 phosphoproteome and strong evidence that GSK-3 broadly regulates alternative splicing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1074/jbc.M117.813527DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5672046PMC
November 2017

Integrative deep models for alternative splicing.

Bioinformatics 2017 Jul;33(14):i274-i282

Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA.

Motivation: Advancements in sequencing technologies have highlighted the role of alternative splicing (AS) in increasing transcriptome complexity. This role of AS, combined with the relation of aberrant splicing to malignant states, motivated two streams of research, experimental and computational. The first involves a myriad of techniques such as RNA-Seq and CLIP-Seq to identify splicing regulators and their putative targets. The second involves probabilistic models, also known as splicing codes, which infer regulatory mechanisms and predict splicing outcome directly from genomic sequence. To date, these models have utilized only expression data. In this work, we address two related challenges: Can we improve on previous models for AS outcome prediction and can we integrate additional sources of data to improve predictions for AS regulatory factors.

Results: We perform a detailed comparison of two previous modeling approaches, Bayesian and Deep Neural networks, dissecting the confounding effects of datasets and target functions. We then develop a new target function for AS prediction in exon skipping events and show it significantly improves model accuracy. Next, we develop a modeling framework that leverages transfer learning to incorporate CLIP-Seq, knockdown and over expression experiments, which are inherently noisy and suffer from missing values. Using several datasets involving key splice factors in mouse brain, muscle and heart we demonstrate both the prediction improvements and biological insights offered by our new models. Overall, the framework we propose offers a scalable integrative solution to improve splicing code modeling as vast amounts of relevant genomic data become available.

Availability And Implementation: Code and data available at: majiq.biociphers.org/jha_et_al_2017/.

Contact: yosephb@upenn.edu.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx268DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870723PMC
July 2017

PRiMeUM: A Model for Predicting Risk of Metastasis in Uveal Melanoma.

Invest Ophthalmol Vis Sci 2017 08;58(10):4096-4105

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.

Purpose: To create an interactive web-based tool for the Prediction of Risk of Metastasis in Uveal Melanoma (PRiMeUM) that can provide a personalized risk estimate of developing metastases within 48 months of primary uveal melanoma (UM) treatment. The model utilizes routinely collected clinical and tumor characteristics on 1227 UM, with the option of including chromosome information when available.

Methods: Using a cohort of 1227 UM cases, Cox proportional hazard modeling was used to assess significant predictors of metastasis including clinical and chromosomal characteristics. A multivariate model to predict risk of metastasis was evaluated using machine learning methods including logistic regression, decision trees, survival random forest, and survival-based regression models. Based on cross-validation results, a logistic regression classifier was developed to compute an individualized risk of metastasis based on clinical and chromosomal information.

Results: The PRiMeUM model provides prognostic information for personalized risk of metastasis in UM. The accuracy of the risk prediction ranged between 80% (using chromosomal features only), 83% using clinical features only (age, sex, tumor location, and size), and 85% (clinical and chromosomal information). Kaplan-Meier analysis showed these risk scores to be highly predictive of metastasis (P < 0.0001).

Conclusions: PRiMeUM provides a tool for predicting an individual's personal risk of metastasis based on their individual and tumor characteristics. It will aid physicians with decisions concerning frequency of systemic surveillance and can be used as a criterion for entering clinical trials for adjuvant therapies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1167/iovs.17-22255DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6108308PMC
August 2017

Ancient antagonism between CELF and RBFOX families tunes mRNA splicing outcomes.

Genome Res 2017 08 16;27(8):1360-1370. Epub 2017 May 16.

Department of Genetics.

Over 95% of human multi-exon genes undergo alternative splicing, a process important in normal development and often dysregulated in disease. We sought to analyze the global splicing regulatory network of CELF2 in human T cells, a well-studied splicing regulator critical to T cell development and function. By integrating high-throughput sequencing data for binding and splicing quantification with sequence features and probabilistic splicing code models, we find evidence of splicing antagonism between CELF2 and the RBFOX family of splicing factors. We validate this functional antagonism through knockdown and overexpression experiments in human cells and find CELF2 represses mRNA and protein levels. Because both families of proteins have been implicated in the development and maintenance of neuronal, muscle, and heart tissues, we analyzed publicly available data in these systems. Our analysis suggests global, antagonistic coregulation of splicing by the CELF and RBFOX proteins in mouse muscle and heart in several physiologically relevant targets, including proteins involved in calcium signaling and members of the MEF2 family of transcription factors. Importantly, a number of these coregulated events are aberrantly spliced in mouse models and human patients with diseases that affect these tissues, including heart failure, diabetes, or myotonic dystrophy. Finally, analysis of exons regulated by ancient CELF family homologs in chicken, , and suggests this antagonism is conserved throughout evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.220517.117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5538552PMC
August 2017

A SLM2 Feedback Pathway Controls Cortical Network Activity and Mouse Behavior.

Cell Rep 2016 12;17(12):3269-3280

Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne NE1 3BZ, UK. Electronic address:

The brain is made up of trillions of synaptic connections that together form neural networks needed for normal brain function and behavior. SLM2 is a member of a conserved family of RNA binding proteins, including Sam68 and SLM1, that control splicing of Neurexin1-3 pre-mRNAs. Whether SLM2 affects neural network activity is unknown. Here, we find that SLM2 levels are maintained by a homeostatic feedback control pathway that predates the divergence of SLM2 and Sam68. SLM2 also controls the splicing of Tomosyn2, LysoPLD/ATX, Dgkb, Kif21a, and Cask, each of which are important for synapse function. Cortical neural network activity dependent on synaptic connections between SLM2-expressing-pyramidal neurons and interneurons is decreased in Slm2-null mice. Additionally, these mice are anxious and have a decreased ability to recognize novel objects. Our data reveal a pathway of SLM2 homeostatic auto-regulation controlling brain network activity and behavior.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2016.12.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5199341PMC
December 2016

A new view of transcriptome complexity and regulation through the lens of local splicing variations.

Elife 2016 Feb 1;5:e11752. Epub 2016 Feb 1.

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.

Alternative splicing (AS) can critically affect gene function and disease, yet mapping splicing variations remains a challenge. Here, we propose a new approach to define and quantify mRNA splicing in units of local splicing variations (LSVs). LSVs capture previously defined types of alternative splicing as well as more complex transcript variations. Building the first genome wide map of LSVs from twelve mouse tissues, we find complex LSVs constitute over 30% of tissue dependent transcript variations and affect specific protein families. We show the prevalence of complex LSVs is conserved in humans and identify hundreds of LSVs that are specific to brain subregions or altered in Alzheimer's patients. Amongst those are novel isoforms in the Camk2 family and a novel poison exon in Ptbp1, a key splice factor in neurogenesis. We anticipate the approach presented here will advance the ability to relate tissue-specific splice variation to genetic variation, phenotype, and disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.11752DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4801060PMC
February 2016

Convergence of Acquired Mutations and Alternative Splicing of CD19 Enables Resistance to CART-19 Immunotherapy.

Cancer Discov 2015 Dec 29;5(12):1282-95. Epub 2015 Oct 29.

Division of Cancer Pathobiology, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania. Immunology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania. Cell & Molecular Biology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania. Department of Pathology & Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania.

Unlabelled: The CD19 antigen, expressed on most B-cell acute lymphoblastic leukemias (B-ALL), can be targeted with chimeric antigen receptor-armed T cells (CART-19), but relapses with epitope loss occur in 10% to 20% of pediatric responders. We detected hemizygous deletions spanning the CD19 locus and de novo frameshift and missense mutations in exon 2 of CD19 in some relapse samples. However, we also discovered alternatively spliced CD19 mRNA species, including one lacking exon 2. Pull-down/siRNA experiments identified SRSF3 as a splicing factor involved in exon 2 retention, and its levels were lower in relapsed B-ALL. Using genome editing, we demonstrated that exon 2 skipping bypasses exon 2 mutations in B-ALL cells and allows expression of the N-terminally truncated CD19 variant, which fails to trigger killing by CART-19 but partly rescues defects associated with CD19 loss. Thus, this mechanism of resistance is based on a combination of deleterious mutations and ensuing selection for alternatively spliced RNA isoforms.

Significance: CART-19 yield 70% response rates in patients with B-ALL, but also produce escape variants. We discovered that the underlying mechanism is the selection for preexisting alternatively spliced CD19 isoforms with the compromised CART-19 epitope. This mechanism suggests a possibility of targeting alternative CD19 ectodomains, which could improve survival of patients with B-cell neoplasms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/2159-8290.CD-15-1020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4670800PMC
December 2015

Widespread JNK-dependent alternative splicing induces a positive feedback loop through CELF2-mediated regulation of MKK7 during T-cell activation.

Genes Dev 2015 Oct;29(19):2054-66

Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennysylvania 19104, USA.

Alternative splicing is prevalent among genes encoding signaling molecules; however, the functional consequence of differential isoform expression remains largely unknown. Here we demonstrate that, in response to T-cell activation, the Jun kinase (JNK) kinase MAP kinase kinase 7 (MKK7) is alternatively spliced to favor an isoform that lacks exon 2. This isoform restores a JNK-docking site within MKK7 that is disrupted in the larger isoform. Consistently, we show that skipping of MKK7 exon 2 enhances JNK pathway activity, as indicated by c-Jun phosphorylation and up-regulation of TNF-α. Moreover, this splicing event is itself dependent on JNK signaling. Thus, MKK7 alternative splicing represents a positive feedback loop through which JNK promotes its own signaling. We further show that repression of MKK7 exon 2 is dependent on the presence of flanking sequences and the JNK-induced expression of the RNA-binding protein CELF2, which binds to these regulatory elements. Finally, we found that ∼25% of T-cell receptor-mediated alternative splicing events are dependent on JNK signaling. Strikingly, these JNK-dependent events are also significantly enriched for responsiveness to CELF2. Together, our data demonstrate a widespread role for the JNK-CELF2 axis in controlling splicing during T-cell activation, including a specific role in propagating JNK signaling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gad.267245.115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4604346PMC
October 2015

RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease.

Science 2015 Jan 18;347(6218):1254806. Epub 2014 Dec 18.

Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada. Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada. McLaughlin Centre, University of Toronto, Toronto, Ontario M5G 0A4, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada. eScience Group, Microsoft Research, Redmond, WA 98052, USA.

To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1254806DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4362528PMC
January 2015

Splicing code modeling.

Adv Exp Med Biol 2014 ;825:451-66

Department of Genetics, University of Pennsylvania, University Park, PA, USA,

How do cis and trans elements involved in pre-mRNA splicing come together to form a splicing "code"? This question has been a driver of much of the research involving RNA biogenesis. The variability of splicing outcome across developmental stages and between tissues coupled with association of splicing defects with numerous diseases highlights the importance of such a code. However, the sheer number of elements involved in splicing regulation and the context-specific manner of their operation have made the derivation of such a code challenging. Recently, machine learning-based methods have been developed to infer computational models for a splicing code. These methods use high-throughput experiments measuring mRNA expression at exonic resolution and binding locations of RNA-binding proteins (RBPs) to infer what the regulatory elements that control the inclusion of a given pre-mRNA segment are. The inferred regulatory models can then be applied to genomic sequences or experimental conditions that have not been measured to predict splicing outcome. Moreover, the models themselves can be interrogated to identify new regulatory mechanisms, which can be subsequently tested experimentally. In this chapter, we survey the current state of this technology, and illustrate how it can be applied by non-computational or RNA splicing experts to study regulation of specific exons by using the AVISPA web tool.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-1221-6_13DOI Listing
December 2014

Predicting alternative splicing.

Methods Mol Biol 2014 ;1126:411-23

Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.

Alternative splicing of pre-mRNA is a complex process whose outcome depends on elements reviewed in the previous chapters such as the core spliceosome units, how the core spliceosome units interact between themselves and with other splicing enhancers and repressors, primary sequence motifs, and local RNA secondary structure. Connections between RNA splicing, transcription, and other processes have also been reviewed in the previous chapters. Splicing is inherently a stochastic process: Some defective transcripts are produced and handled by mechanisms such as nonsense-mediated decay (NMD), and studies report high variability at the transcript level between cells supposedly in similar states. Nonetheless, splicing is obviously not a random process: Many determinants of splicing regulation have been identified, and experimental measurements detect highly robust and conserved splicing changes between developmental stages and tissues. These observations naturally lead to the following questions: Can we devise a method that predicts given a cellular context and the primary transcript what would be the splicing outcome? What can such a method tell us about the underlying mechanisms that govern alternative splicing?This chapter describes how these questions can be framed and addressed using machine-learning methodology. We describe how to extract putative RNA regulatory features from genomic sequence of exons and proximal introns, how to define target values based on experimental measurements of exon inclusion, how to learn a simple splicing model that optimizes the prediction the observed exon inclusion levels from the identified RNA features, and how to subsequently evaluate the learned model accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-62703-980-2_28DOI Listing
October 2014

In silico to in vivo splicing analysis using splicing code models.

Methods 2014 May 7;67(1):3-12. Epub 2013 Dec 7.

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA. Electronic address:

With the growing appreciation of RNA splicing's role in gene regulation, development, and disease, researchers from diverse fields find themselves investigating exons of interest. Commonly, researchers are interested in knowing if an exon is alternatively spliced, if it is differentially included in specific tissues or in developmental stages, and what regulatory elements control its inclusion. An important step towards the ability to perform such analysis in silico was made with the development of computational splicing code models. Aimed as a practical how-to guide, we demonstrate how researchers can now use these code models to analyze a gene of interest, focusing on Bin1 as a case study. Bridging integrator 1 (BIN1) is a nucleocytoplasmic adaptor protein known to be functionally regulated through alternative splicing in a tissue-specific manner. Specific Bin1 isoforms have been associated with muscular diseases and cancers, making the study of its splicing regulation of wide interest. Using AVISPA, a recently released web tool based on splicing code models, we show that many Bin1 tissue-dependent isoforms are correctly predicted, along with many of its known regulators. We review the best practices and constraints of using the tool, demonstrate how AVISPA is used to generate high confidence novel regulatory hypotheses, and experimentally validate predicted regulators of Bin1 alternative splicing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ymeth.2013.11.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321995PMC
May 2014

AVISPA: a web tool for the prediction and analysis of alternative splicing.

Genome Biol 2013 ;14(10):R114

Transcriptome complexity and its relation to numerous diseases underpins the need to predict in silico splice variants and the regulatory elements that affect them. Building upon our recently described splicing code, we developed AVISPA, a Galaxy-based web tool for splicing prediction and analysis. Given an exon and its proximal sequence, the tool predicts whether the exon is alternatively spliced, displays tissue-dependent splicing patterns, and whether it has associated regulatory elements. We assess AVISPA's accuracy on an independent dataset of tissue-dependent exons, and illustrate how the tool can be applied to analyze a gene of interest. AVISPA is available at http://avispa.biociphers.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2013-14-10-r114DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4014802PMC
September 2014

Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context.

Bioinformatics 2011 Sep 29;27(18):2554-62. Epub 2011 Jul 29.

Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.

Motivation: Alternative splicing is a major contributor to cellular diversity in mammalian tissues and relates to many human diseases. An important goal in understanding this phenomenon is to infer a 'splicing code' that predicts how splicing is regulated in different cell types by features derived from RNA, DNA and epigenetic modifiers.

Methods: We formulate the assembly of a splicing code as a problem of statistical inference and introduce a Bayesian method that uses an adaptively selected number of hidden variables to combine subgroups of features into a network, allows different tissues to share feature subgroups and uses a Gibbs sampler to hedge predictions and ascertain the statistical significance of identified features.

Results: Using data for 3665 cassette exons, 1014 RNA features and 4 tissue types derived from 27 mouse tissues (http://genes.toronto.edu/wasp), we benchmarked several methods. Our method outperforms all others, and achieves relative improvements of 52% in splicing code quality and up to 22% in classification error, compared with the state of the art. Novel combinations of regulatory features and novel combinations of tissues that share feature subgroups were identified using our method.

Contact: frey@psi.toronto.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btr444DOI Listing
September 2011