Publications by authors named "Colin Dewey"

46 Publications

CellO: comprehensive and hierarchical cell type classification of human cells with the Cell Ontology.

iScience 2021 Jan 8;24(1):101913. Epub 2020 Dec 8.

Department of Computer Sciences, University of Wisconsin - Madison, Madison, WI 53706, USA.

Cell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification of cell clusters by considering the rich hierarchical structure of known cell types. Furthermore, CellO comes pre-trained on a comprehensive data set of human, healthy, untreated primary samples in the Sequence Read Archive. CellO's comprehensive training set enables it to run out of the box on diverse cell types and achieves competitive or even superior performance when compared to existing state-of-the-art methods. Lastly, CellO's linear models are easily interpreted, thereby enabling exploration of cell-type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO's models across the ontology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.isci.2020.101913DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7753962PMC
January 2021

PLK1 and NOTCH Positively Correlate in Melanoma and Their Combined Inhibition Results in Synergistic Modulations of Key Melanoma Pathways.

Mol Cancer Ther 2021 01 11;20(1):161-172. Epub 2020 Nov 11.

Department of Dermatology, University of Wisconsin, Madison, Wisconsin.

Melanoma is one of the most serious forms of skin cancer, and its increasing incidence coupled with nonlasting therapeutic options for metastatic disease highlights the need for additional novel approaches for its management. In this study, we determined the potential interactions between polo-like kinase 1 (PLK1, a serine/threonine kinase involved in mitotic regulation) and NOTCH1 (a type I transmembrane protein deciding cell fate during development) in melanoma. Employing an in-house human melanoma tissue microarray (TMA) containing multiple cases of melanomas and benign nevi, coupled with high-throughput, multispectral quantitative fluorescence imaging analysis, we found a positive correlation between PLK1 and NOTCH1 in melanoma. Furthermore, The Cancer Genome Atlas database analysis of patients with melanoma showed an association of higher mRNA levels of and with poor overall, as well as disease-free, survival. Next, utilizing small-molecule inhibitors of PLK1 and NOTCH (BI 6727 and MK-0752, respectively), we found a synergistic antiproliferative response of combined treatment in multiple human melanoma cells. To determine the molecular targets of the overall and synergistic responses of combined PLK1 and NOTCH inhibition, we conducted RNA-sequencing analysis employing a unique regression model with interaction terms. We identified the modulations of several key genes relevant to melanoma progression/metastasis, including , and , as well as some new genes such as , and , which have not been well studied in melanoma. In conclusion, our study demonstrated a synergistic antiproliferative response of concomitant targeting of PLK1 and NOTCH in melanoma, unraveling a potential novel therapeutic approach for detailed preclinical/clinical evaluation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/1535-7163.MCT-20-0654DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7790869PMC
January 2021

Cell type specific gene expression profiling reveals a role for complement component C3 in neutrophil responses to tissue damage.

Sci Rep 2020 09 24;10(1):15716. Epub 2020 Sep 24.

Department of Medical Microbiology and Immunology, University of Wisconsin-Madison, Madison, WI, USA.

Tissue damage induces rapid recruitment of leukocytes and changes in the transcriptional landscape that influence wound healing. However, the cell-type specific transcriptional changes that influence leukocyte function and tissue repair have not been well characterized. Here, we employed translating ribosome affinity purification (TRAP) and RNA sequencing, TRAP-seq, in larval zebrafish to identify genes differentially expressed in neutrophils, macrophages, and epithelial cells in response to wounding. We identified the complement pathway and c3a.1, homologous to the C3 component of human complement, as significantly increased in neutrophils in response to wounds. c3a.1 zebrafish larvae have impaired neutrophil directed migration to tail wounds with an initial lag in recruitment early after wounding. Moreover, c3a.1 zebrafish larvae have impaired recruitment to localized bacterial infections and reduced survival that is, at least in part, neutrophil mediated. Together, our findings support the power of TRAP-seq to identify cell type specific changes in gene expression that influence neutrophil behavior in response to tissue damage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-72750-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518243PMC
September 2020

PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments.

Genome Res 2020 11 21;30(11):1655-1666. Epub 2020 Sep 21.

Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706, USA.

Publicly available RNA-seq data is routinely used for retrospective analysis to elucidate new biology. Novel transcript discovery enabled by joint analysis of large collections of RNA-seq data sets has emerged as one such analysis. Current methods for transcript discovery rely on a '2-Step' approach where the first step encompasses building transcripts from individual data sets, followed by the second step that merges predicted transcripts across data sets. To increase the power of transcript discovery from large collections of RNA-seq data sets, we developed a novel '1-Step' approach named Pooling RNA-seq and Assembling Models (PRAM) that builds transcript models from pooled RNA-seq data sets. We demonstrate in a computational benchmark that 1-Step outperforms 2-Step approaches in predicting overall transcript structures and individual splice junctions, while performing competitively in detecting exonic nucleotides. Applying PRAM to 30 human ENCODE RNA-seq data sets identified unannotated transcripts with epigenetic and RAMPAGE signatures similar to those of recently annotated transcripts. In a case study, we discovered and experimentally validated new transcripts through the application of PRAM to mouse hematopoietic RNA-seq data sets. We uncovered new transcripts that share a differential expression pattern with a neighboring gene implicated in human hematopoietic phenotypes, and we provided evidence for the conservation of this relationship in human. PRAM is implemented as an R/Bioconductor package.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.252445.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7605252PMC
November 2020

Giant Island Mice Exhibit Widespread Gene Expression Changes in Key Metabolic Organs.

Genome Biol Evol 2020 08;12(8):1277-1301

Laboratory of Genetics, University of Wisconsin - Madison.

Island populations repeatedly evolve extreme body sizes, but the genomic basis of this pattern remains largely unknown. To understand how organisms on islands evolve gigantism, we compared genome-wide patterns of gene expression in Gough Island mice, the largest wild house mice in the world, and mainland mice from the WSB/EiJ wild-derived inbred strain. We used RNA-seq to quantify differential gene expression in three key metabolic organs: gonadal adipose depot, hypothalamus, and liver. Between 4,000 and 8,800 genes were significantly differentially expressed across the evaluated organs, representing between 20% and 50% of detected transcripts, with 20% or more of differentially expressed transcripts in each organ exhibiting expression fold changes of at least 2×. A minimum of 73 candidate genes for extreme size evolution, including Irs1 and Lrp1, were identified by considering differential expression jointly with other data sets: 1) genomic positions of published quantitative trait loci for body weight and growth rate, 2) whole-genome sequencing of 16 wild-caught Gough Island mice that revealed fixed single-nucleotide differences between the strains, and 3) publicly available tissue-specific regulatory elements. Additionally, patterns of differential expression across three time points in the liver revealed that Arid5b potentially regulates hundreds of genes. Functional enrichment analyses pointed to cell cycling, mitochondrial function, signaling pathways, inflammatory response, and nutrient metabolism as potential causes of weight accumulation in Gough Island mice. Collectively, our results indicate that extensive gene regulatory evolution in metabolic organs accompanied the rapid evolution of gigantism during the short time house mice have inhabited Gough Island.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evaa118DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487164PMC
August 2020

Whole-Genome Alignment.

Authors:
Colin N Dewey

Methods Mol Biol 2019 ;1910:121-147

Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.

Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-9074-0_4DOI Listing
January 2020

Genome-wide effects on transcription from ppGpp binding to its two sites on RNA polymerase.

Proc Natl Acad Sci U S A 2019 04 10;116(17):8310-8319. Epub 2019 Apr 10.

Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706;

The second messenger nucleotide ppGpp dramatically alters gene expression in bacteria to adjust cellular metabolism to nutrient availability. ppGpp binds to two sites on RNA polymerase (RNAP) in , but it has also been reported to bind to many other proteins. To determine the role of the RNAP binding sites in the genome-wide effects of ppGpp on transcription, we used RNA-seq to analyze transcripts produced in response to elevated ppGpp levels in strains with/without the ppGpp binding sites on RNAP. We examined RNAs rapidly after ppGpp production without an accompanying nutrient starvation. This procedure enriched for direct effects of ppGpp on RNAP rather than for indirect effects on transcription resulting from starvation-induced changes in metabolism or on secondary events from the initial effects on RNAP. The transcriptional responses of all 757 genes identified after 5 minutes of ppGpp induction depended on ppGpp binding to RNAP. Most (>75%) were not reported in earlier studies. The regulated transcripts encode products involved not only in translation but also in many other cellular processes. In vitro transcription analysis of more than 100 promoters from the in vivo dataset identified a large collection of directly regulated promoters, unambiguously demonstrated that most effects of ppGpp on transcription in vivo were direct, and allowed comparison of DNA sequences from inhibited, activated, and unaffected promoter classes. Our analysis greatly expands our understanding of the breadth of the stringent response and suggests promoter sequence features that contribute to the specific effects of ppGpp.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1819682116DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6486775PMC
April 2019

GATA Factor-Regulated Samd14 Enhancer Confers Red Blood Cell Regeneration and Survival in Severe Anemia.

Dev Cell 2017 08;42(3):213-225.e4

Department of Cell and Regenerative Biology, UW-Madison Blood Research Program, Carbone Cancer Center, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA. Electronic address:

An enhancer with amalgamated E-box and GATA motifs (+9.5) controls expression of the regulator of hematopoiesis GATA-2. While similar GATA-2-occupied elements are common in the genome, occupancy does not predict function, and GATA-2-dependent genetic networks are incompletely defined. A "+9.5-like" element resides in an intron of Samd14 (Samd14-Enh) encoding a sterile alpha motif (SAM) domain protein. Deletion of Samd14-Enh in mice strongly decreased Samd14 expression in bone marrow and spleen. Although steady-state hematopoiesis was normal, Samd14-Enh mice died in response to severe anemia. Samd14-Enh stimulated stem cell factor/c-Kit signaling, which promotes erythrocyte regeneration. Anemia activated Samd14-Enh by inducing enhancer components and enhancer chromatin accessibility. Thus, a GATA-2/anemia-regulated enhancer controls expression of an SAM domain protein that confers survival in anemia. We propose that Samd14-Enh and an ensemble of anemia-responsive enhancers are essential for erythrocyte regeneration in stress erythropoiesis, a vital process in pathologies, including β-thalassemia, myelodysplastic syndrome, and viral infection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.devcel.2017.07.009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5578808PMC
August 2017

Zebrafish zic2 controls formation of periocular neural crest and choroid fissure morphogenesis.

Dev Biol 2017 09 6;429(1):92-104. Epub 2017 Jul 6.

Department of Integrative Biology, University of Wisconsin, Madison, WI 53706, USA; Department of Neuroscience, University of Wisconsin, Madison, WI 53706, USA; McPherson Eye Research Institute, University of Wisconsin, Madison, WI, 53706, USA. Electronic address:

The vertebrate retina develops in close proximity to the forebrain and neural crest-derived cartilages of the face and jaw. Coloboma, a congenital eye malformation, is associated with aberrant forebrain development (holoprosencephaly) and with craniofacial defects (frontonasal dysplasia) in humans, suggesting a critical role for cross-lineage interactions during retinal morphogenesis. ZIC2, a zinc-finger transcription factor, is linked to human holoprosencephaly. We have previously used morpholino assays to show zebrafish zic2 functions in the developing forebrain, retina and craniofacial cartilage. We now report that zebrafish with genetic lesions in zebrafish zic2 orthologs, zic2a and zic2b, develop with retinal coloboma and craniofacial anomalies. We demonstrate a requirement for zic2 in restricting pax2a expression and show evidence that zic2 function limits Hh signaling. RNA-seq transcriptome analysis identified an early requirement for zic2 in periocular neural crest as an activator of alx1, a transcription factor with essential roles in craniofacial and ocular morphogenesis in human and zebrafish. Collectively, these data establish zic2 mutant zebrafish as a powerful new genetic model for in-depth dissection of cell interactions and genetic controls during craniofacial complex development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ydbio.2017.07.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5603172PMC
September 2017

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

Bioinformatics 2017 Sep;33(18):2914-2923

Department of Computer Sciences.

Motivation: The NCBI's Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA.

Results: We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline.

Availability And Implementation: The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http://github.com/deweylab/metasra-pipeline.

Contact: cdewey@biostat.wisc.edu.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx334DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870770PMC
September 2017

Analysis of embryonic development in the unsequenced axolotl: Waves of transcriptomic upheaval and stability.

Dev Biol 2017 06 28;426(2):143-154. Epub 2016 Jul 28.

Regenerative Biology, Morgridge Institute for Research, Madison, WI, United States. Electronic address:

The axolotl (Ambystoma mexicanum) has long been the subject of biological research, primarily owing to its outstanding regenerative capabilities. However, the gene expression programs governing its embryonic development are particularly underexplored, especially when compared to other amphibian model species. Therefore, we performed whole transcriptome polyA+ RNA sequencing experiments on 17 stages of embryonic development. As the axolotl genome is unsequenced and its gene annotation is incomplete, we built de novo transcriptome assemblies for each stage and garnered functional annotation by comparing expressed contigs with known genes in other organisms. In evaluating the number of differentially expressed genes over time, we identify three waves of substantial transcriptome upheaval each followed by a period of relative transcriptome stability. The first wave of upheaval is between the one and two cell stage. We show that the number of differentially expressed genes per unit time is higher between the one and two cell stage than it is across the mid-blastula transition (MBT), the period of zygotic genome activation. We use total RNA sequencing to demonstrate that the vast majority of genes with increasing polyA+ signal between the one and two cell stage result from polyadenylation rather than de novo transcription. The first stable phase begins after the two cell stage and continues until the mid-blastula transition, corresponding with the pre-MBT phase of transcriptional quiescence in amphibian development. Following this is a peak of differential gene expression corresponding with the activation of the zygotic genome and a phase of transcriptomic stability from stages 9-11. We observe a third wave of transcriptomic change between stages 11 and 14, followed by a final stable period. The last two stable phases have not been documented in amphibians previously and correspond to times of major morphogenic change in the axolotl embryo: gastrulation and neurulation. These results yield new insights into global gene expression during early stages of amphibian embryogenesis and will help to further develop the axolotl as a model species for developmental and regenerative biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ydbio.2016.05.024DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5272911PMC
June 2017

Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.

Genome Res 2016 08 12;26(8):1124-33. Epub 2016 Jul 12.

Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706, USA; Department of Computer Sciences, University of Wisconsin, Madison, Wisconsin 53706, USA.

RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of unique sequence. To improve quantification accuracy in such difficult cases, we developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data. We found that ChIP-seq data of RNA polymerase II and histone modifications were particularly informative in this approach. In qRT-PCR validations, pRSEM was shown to be superior than competing methods in estimating relative isoform abundances within or across conditions. Data-driven simulations suggested that pRSEM has a greatly decreased false-positive rate at the expense of a small increase in false-negative rate. In aggregate, our study demonstrates that pRSEM transforms existing capacity to precisely estimate transcript abundances, especially at the isoform level.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.199174.115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4971760PMC
August 2016

Mechanism governing heme synthesis reveals a GATA factor/heme circuit that controls differentiation.

EMBO Rep 2016 Feb 23;17(2):249-65. Epub 2015 Dec 23.

Department of Cell and Regenerative Biology, UW-Madison Blood Research Program, Carbone Cancer Center, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA

Metal ion-containing macromolecules have fundamental roles in essentially all biological processes throughout the evolutionary tree. For example, iron-containing heme is a cofactor in enzyme catalysis and electron transfer and an essential hemoglobin constituent. To meet the intense demand for hemoglobin assembly in red blood cells, the cell type-specific factor GATA-1 activates transcription of Alas2, encoding the rate-limiting enzyme in heme biosynthesis, 5-aminolevulinic acid synthase-2 (ALAS-2). Using genetic editing to unravel mechanisms governing heme biosynthesis, we discovered a GATA factor- and heme-dependent circuit that establishes the erythroid cell transcriptome. CRISPR/Cas9-mediated ablation of two Alas2 intronic cis elements strongly reduces GATA-1-induced Alas2 transcription, heme biosynthesis, and surprisingly, GATA-1 regulation of other vital constituents of the erythroid cell transcriptome. Bypassing ALAS-2 function in Alas2 cis element-mutant cells by providing its catalytic product 5-aminolevulinic acid rescues heme biosynthesis and the GATA-1-dependent genetic network. Heme amplifies GATA-1 function by downregulating the heme-sensing transcriptional repressor Bach1 and via a Bach1-insensitive mechanism. Through this dual mechanism, heme and a master regulator collaborate to orchestrate a cell type-specific transcriptional program that promotes cellular differentiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.15252/embr.201541465DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5290819PMC
February 2016

Cis-regulatory mechanisms governing stem and progenitor cell transitions.

Sci Adv 2015 Sep 4;1(8):e1500503. Epub 2015 Sep 4.

Carbone Cancer Center, Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705, USA. ; University of Wisconsin-Madison Blood Research Program, Madison, WI 53705, USA.

Cis-element encyclopedias provide information on phenotypic diversity and disease mechanisms. Although cis-element polymorphisms and mutations are instructive, deciphering function remains challenging. Mutation of an intronic GATA motif (+9.5) in GATA2, encoding a master regulator of hematopoiesis, underlies an immunodeficiency associated with myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML). Whereas an inversion relocalizes another GATA2 cis-element (-77) to the proto-oncogene EVI1, inducing EVI1 expression and AML, whether this reflects ectopic or physiological activity is unknown. We describe a mouse strain that decouples -77 function from proto-oncogene deregulation. The -77(-/-) mice exhibited a novel phenotypic constellation including late embryonic lethality and anemia. The -77 established a vital sector of the myeloid progenitor transcriptome, conferring multipotentiality. Unlike the +9.5(-/-) embryos, hematopoietic stem cell genesis was unaffected in -77(-/-) embryos. These results illustrate a paradigm in which cis-elements in a locus differentially control stem and progenitor cell transitions, and therefore the individual cis-element alterations cause unique and overlapping disease phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/sciadv.1500503DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643771PMC
September 2015

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.

PLoS Comput Biol 2015 Oct 20;11(10):e1004491. Epub 2015 Oct 20.

Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin, United States of America.

Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells' regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50-100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1004491DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4618727PMC
October 2015

Hematopoietic Signaling Mechanism Revealed from a Stem/Progenitor Cell Cistrome.

Mol Cell 2015 Jul 11;59(1):62-74. Epub 2015 Jun 11.

Department of Cell and Regenerative Biology, Carbone Cancer Center, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705, USA; UW-Madison Blood Research Program, Madison, WI 53706, USA. Electronic address:

Thousands of cis-elements in genomes are predicted to have vital functions. Although conservation, activity in surrogate assays, polymorphisms, and disease mutations provide functional clues, deletion from endogenous loci constitutes the gold-standard test. A GATA-2-binding, Gata2 intronic cis-element (+9.5) required for hematopoietic stem cell genesis in mice is mutated in a human immunodeficiency syndrome. Because +9.5 is the only cis-element known to mediate stem cell genesis, we devised a strategy to identify functionally comparable enhancers ("+9.5-like") genome-wide. Gene editing revealed +9.5-like activity to mediate GATA-2 occupancy, chromatin opening, and transcriptional activation. A +9.5-like element resided in Samd14, which encodes a protein of unknown function. Samd14 increased hematopoietic progenitor levels/activity and promoted signaling by a pathway vital for hematopoietic stem/progenitor cell regulation (stem cell factor/c-Kit), and c-Kit rescued Samd14 loss-of-function phenotypes. Thus, the hematopoietic stem/progenitor cell cistrome revealed a mediator of a signaling pathway that has broad importance for stem/progenitor cell biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.molcel.2015.05.020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499333PMC
July 2015

EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments.

Bioinformatics 2015 Aug 5;31(16):2614-22. Epub 2015 Apr 5.

Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.

Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data.

Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression.

Availability And Implementation: An R package containing examples and sample datasets is available at Bioconductor.

Contact: kendzior@biostat.wisc.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv193DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528625PMC
August 2015

Evaluation of de novo transcriptome assemblies from RNA-Seq data.

Genome Biol 2014 Dec 21;15(12):553. Epub 2014 Dec 21.

De novo RNA-Seq assembly facilitates the study of transcriptomes for species without sequenced genomes, but it is challenging to select the most accurate assembly in this context. To address this challenge, we developed a model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown. We show that RSEM-EVAL correctly reflects assembly accuracy, as measured by REF-EVAL, a refined set of ground-truth-based scores that we also developed. Guided by RSEM-EVAL, we assembled the transcriptome of the regenerating axolotl limb; this assembly compares favorably to a previous assembly. A software package implementing our methods, DETONATE, is freely available at http://deweylab.biostat.wisc.edu/detonate.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-014-0553-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298084PMC
December 2014

Gata2 cis-element is required for hematopoietic stem cell generation in the mammalian embryo.

J Exp Med 2013 Dec 2;210(13):2833-42. Epub 2013 Dec 2.

Department of Cell and Regenerative Biology, Carbone Cancer Center, and 2 McArdle Laboratory for Cancer Research, 3 UW-Madison Blood Research Program; and 4 Department of Biostatistics and Medical Informatics, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705.

The generation of hematopoietic stem cells (HSCs) from hemogenic endothelium within the aorta, gonad, mesonephros (AGM) region of the mammalian embryo is crucial for development of the adult hematopoietic system. We described a deletion of a Gata2 cis-element (+9.5) that depletes fetal liver HSCs, is lethal at E13-14 of embryogenesis, and is mutated in an immunodeficiency that progresses to myelodysplasia/leukemia. Here, we demonstrate that the +9.5 element enhances Gata2 expression and is required to generate long-term repopulating HSCs in the AGM. Deletion of the +9.5 element abrogated the capacity of hemogenic endothelium to generate HSC-containing clusters in the aorta. Genomic analyses indicated that the +9.5 element regulated a rich ensemble of genes that control hemogenic endothelium and HSCs, as well as genes not implicated in hematopoiesis. These results reveal a mechanism that controls stem cell emergence from hemogenic endothelium to establish the adult hematopoietic system.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1084/jem.20130733DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3865483PMC
December 2013

Bicaudal-C spatially controls translation of vertebrate maternal mRNAs.

RNA 2013 Nov 23;19(11):1575-82. Epub 2013 Sep 23.

The Xenopus Cripto-1 protein is confined to the cells of the animal hemisphere during early embryogenesis where it regulates the formation of anterior structures. Cripto-1 protein accumulates only in animal cells because cripto-1 mRNA in cells of the vegetal hemisphere is translationally repressed. Here, we show that the RNA binding protein, Bicaudal-C (Bic-C), functioned directly in this vegetal cell-specific repression. While Bic-C protein is normally confined to vegetal cells, ectopic expression of Bic-C in animal cells repressed a cripto-1 mRNA reporter and associated with endogenous cripto-1 mRNA. Repression by Bic-C required its N-terminal domain, comprised of multiple KH motifs, for specific binding to relevant control elements within the cripto-1 mRNA and a functionally separable C-terminal translation repression domain. Bic-C-mediated repression required the 5' CAP and translation initiation factors, but not a poly(A) tail or the conserved SAM domain within Bic-C. Bic-C-directed immunoprecipitation followed by deep sequencing of associated mRNAs identified multiple Bic-C-regulated mRNA targets, including cripto-1 mRNA, providing new insights and tools for understanding the role of Bic-C in vertebrate development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1261/rna.041665.113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851724PMC
November 2013

Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs.

Bioinformatics 2013 Sep 11;29(18):2300-10. Epub 2013 Jul 11.

Department of Computer Sciences, University of Wisconsin, Madison, WI 53706, USA.

Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues.

Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate.

Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer.

Contact: cdewey@biostat.wisc.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt396DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3753571PMC
September 2013

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Nat Protoc 2013 Aug 11;8(8):1494-512. Epub 2013 Jul 11.

Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA.

De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nprot.2013.084DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875132PMC
August 2013

Comparative RNA-seq analysis in the unsequenced axolotl: the oncogene burst highlights early gene expression in the blastema.

PLoS Comput Biol 2013 7;9(3):e1002936. Epub 2013 Mar 7.

Regenerative Biology, Morgridge Institute for Research, Madison, Wisconsin, United States of America.

The salamander has the remarkable ability to regenerate its limb after amputation. Cells at the site of amputation form a blastema and then proliferate and differentiate to regrow the limb. To better understand this process, we performed deep RNA sequencing of the blastema over a time course in the axolotl, a species whose genome has not been sequenced. Using a novel comparative approach to analyzing RNA-seq data, we characterized the transcriptional dynamics of the regenerating axolotl limb with respect to the human gene set. This approach involved de novo assembly of axolotl transcripts, RNA-seq transcript quantification without a reference genome, and transformation of abundances from axolotl contigs to human genes. We found a prominent burst in oncogene expression during the first day and blastemal/limb bud genes peaking at 7 to 14 days. In addition, we found that limb patterning genes, SALL genes, and genes involved in angiogenesis, wound healing, defense/immunity, and bone development are enriched during blastema formation and development. Finally, we identified a category of genes with no prior literature support for limb regeneration that are candidates for further evaluation based on their expression pattern during the regenerative process.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1002936DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591270PMC
November 2013

Rbm20 regulates titin alternative splicing as a splicing repressor.

Nucleic Acids Res 2013 Feb 9;41(4):2659-72. Epub 2013 Jan 9.

Muscle Biology Laboratory, Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA.

Titin, a sarcomeric protein expressed primarily in striated muscles, is responsible for maintaining the structure and biomechanical properties of muscle cells. Cardiac titin undergoes developmental size reduction from 3.7 megadaltons in neonates to primarily 2.97 megadaltons in the adult. This size reduction results from gradually increased exon skipping between exons 50 and 219 of titin mRNA. Our previous study reported that Rbm20 is the splicing factor responsible for this process. In this work, we investigated its molecular mechanism. We demonstrate that Rbm20 mediates exon skipping by binding to titin pre-mRNA to repress the splicing of some regions; the exons/introns in these Rbm20-repressed regions are ultimately skipped. Rbm20 was also found to mediate intron retention and exon shuffling. The two Rbm20 speckles found in nuclei from muscle tissues were identified as aggregates of Rbm20 protein on the partially processed titin pre-mRNAs. Cooperative repression and alternative 3' splice site selection were found to be used by Rbm20 to skip different subsets of titin exons, and the splicing pathway selected depended on the ratio of Rbm20 to other splicing factors that vary with tissue type and developmental age.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1362DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575840PMC
February 2013

Genomic variation in natural populations of Drosophila melanogaster.

Genetics 2012 Oct 5;192(2):533-98. Epub 2012 Jun 5.

Department of Evolution and Ecology, University of California, Davis, CA 95616, USA.

This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5'- and 3'-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.112.142018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3454882PMC
October 2012

Whole-genome alignment.

Authors:
Colin N Dewey

Methods Mol Biol 2012 ;855:237-57

Biostatistics and Medical Informatics and Computer Sciences, Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA.

Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-61779-582-4_8DOI Listing
July 2012

Sequence Surveyor: leveraging overview for scalable genomic alignment visualization.

IEEE Trans Vis Comput Graph 2011 Dec;17(12):2392-401

University of Wisconsin-Madison, USA.

In this paper, we introduce overview visualization tools for large-scale multiple genome alignment data. Genome alignment visualization and, more generally, sequence alignment visualization are an important tool for understanding genomic sequence data. As sequencing techniques improve and more data become available, greater demand is being placed on visualization tools to scale to the size of these new datasets. When viewing such large data, we necessarily cannot convey details, rather we specifically design overview tools to help elucidate large-scale patterns. Perceptual science, signal processing theory, and generality provide a framework for the design of such visualizations that can scale well beyond current approaches. We present Sequence Surveyor, a prototype that embodies these ideas for scalable multiple whole-genome alignment overview visualization. Sequence Surveyor visualizes sequences in parallel, displaying data using variable color, position, and aggregation encodings. We demonstrate how perceptual science can inform the design of visualization techniques that remain visually manageable at scale and how signal processing concepts can inform aggregation schemes that highlight global trends, outliers, and overall data distributions as the problem scales. These techniques allow us to visualize alignments with over 100 whole bacterial-sized genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TVCG.2011.232DOI Listing
December 2011

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:
Bo Li Colin N Dewey

BMC Bioinformatics 2011 Aug 4;12:323. Epub 2011 Aug 4.

Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA.

Background: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.

Results: We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.

Conclusions: RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-323DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163565PMC
August 2011

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

PLoS Comput Biol 2011 Jul 14;7(7):e1002111. Epub 2011 Jul 14.

Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America.

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1002111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3136429PMC
July 2011

Positional orthology: putting genomic evolutionary relationships into context.

Authors:
Colin N Dewey

Brief Bioinform 2011 Sep 24;12(5):401-12. Epub 2011 Jun 24.

Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 5785 Medical Sciences Center, 1300 University Ave, Madison, WI 53706, USA.

Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of 'positional orthology' has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term 'toporthology', with respect to the evolutionary events experienced by a gene's ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbr040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178058PMC
September 2011