Publications by authors named "Jason G Underwood"

35 Publications

Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes.

Nat Methods 2021 May 7;18(5):507-519. Epub 2021 May 7.

Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA.

RNA-binding proteins (RBPs) are critical regulators of gene expression and RNA processing that are required for gene function. Yet the dynamics of RBP regulation in single cells is unknown. To address this gap in understanding, we developed STAMP (Surveying Targets by APOBEC-Mediated Profiling), which efficiently detects RBP-RNA interactions. STAMP does not rely on ultraviolet cross-linking or immunoprecipitation and, when coupled with single-cell capture, can identify RBP-specific and cell-type-specific RNA-protein interactions for multiple RBPs and cell types in single, pooled experiments. Pairing STAMP with long-read sequencing yields RBP target sites in an isoform-specific manner. Finally, Ribo-STAMP leverages small ribosomal subunits to measure transcriptome-wide ribosome association in single cells. STAMP enables the study of RBP-RNA interactomes and translational landscapes with unprecedented cellular resolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-021-01128-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8148648PMC
May 2021

A high-quality bonobo genome refines the analysis of hominid evolution.

Nature 2021 Jun 5;594(7861):77-81. Epub 2021 May 5.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03519-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8172381PMC
June 2021

Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility.

Science 2020 12;370(6523)

Department of Biology, University of Bari 'Aldo Moro', 70125 Bari, Italy.

The rhesus macaque () is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.abc6617DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7818670PMC
December 2020

Massively multiplex single-molecule oligonucleosome footprinting.

Elife 2020 12 2;9. Epub 2020 Dec 2.

Department of Biochemistry & Biophysics, University of California San Francisco, San Francisco, United States.

Our understanding of the beads-on-a-string arrangement of nucleosomes has been built largely on high-resolution sequence-agnostic imaging methods and sequence-resolved bulk biochemical techniques. To bridge the divide between these approaches, we present the single-molecule adenine methylated oligonucleosome sequencing assay (SAMOSA). SAMOSA is a high-throughput single-molecule sequencing method that combines adenine methyltransferase footprinting and single-molecule real-time DNA sequencing to natively and nondestructively measure nucleosome positions on individual chromatin fibres. SAMOSA data allows unbiased classification of single-molecular 'states' of nucleosome occupancy on individual chromatin fibres. We leverage this to estimate nucleosome regularity and spacing on single chromatin fibres genome-wide, at predicted transcription factor binding motifs, and across human epigenomic domains. Our analyses suggest that chromatin is comprised of both regular and irregular single-molecular oligonucleosome patterns that differ subtly in their relative abundance across epigenomic domains. This irregularity is particularly striking in constitutive heterochromatin, which has typically been viewed as a conformationally static entity. Our proof-of-concept study provides a powerful new methodology for studying nucleosome organization at a previously intractable resolution and offers up new avenues for modeling and visualizing higher order chromatin structure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.59404DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7735760PMC
December 2020

An evolutionary driver of interspersed segmental duplications in primates.

Genome Biol 2020 08 10;21(1):202. Epub 2020 Aug 10.

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.

Background: The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP).

Results: Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis.

Conclusions: LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02074-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7419210PMC
August 2020

ORF Capture-Seq as a versatile method for targeted identification of full-length isoforms.

Nat Commun 2020 05 11;11(1):2326. Epub 2020 May 11.

Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA.

Most human protein-coding genes are expressed as multiple isoforms, which greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every coding gene, the majority of alternative isoforms remains uncharacterized due to (i) vast differences of overall levels between different isoforms expressed from common genes, and (ii) the difficulty of obtaining full-length transcript sequences. Here, we present ORF Capture-Seq (OCS), a flexible method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As a proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude when compared to unenriched samples. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will accelerate mapping of the human transcriptome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-16174-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214433PMC
May 2020

Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes.

Science 2019 10;366(6463)

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aax2083DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6860971PMC
October 2019

In This Issue, Volume 14, Issue 6.

ACS Chem Biol 2019 06;14(6):1065

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.9b00441DOI Listing
June 2019

In This Issue, Volume 14, Issue 5.

ACS Chem Biol 2019 05;14(5):822-823

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.9b00352DOI Listing
May 2019

In This Issue, Volume 14, Issue 4.

ACS Chem Biol 2019 04;14(4):566

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.9b00261DOI Listing
April 2019

In This Issue, Volume 13, Issue 3.

ACS Chem Biol 2018 03;13(3):495

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.8b00207DOI Listing
March 2018

In This Issue, Volume 13, Issue 4.

ACS Chem Biol 2018 04;13(4):841

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.8b00310DOI Listing
April 2018

In This Issue, Volume 14, Issue 1.

ACS Chem Biol 2019 01;14(1)

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.9b00017DOI Listing
January 2019

In This Issue, Volume 14, Issue 2.

ACS Chem Biol 2019 02;14(2):141

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.9b00100DOI Listing
February 2019

In This Issue, Volume 13, Issue 11.

ACS Chem Biol 2018 11;13(11):3042

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.8b00980DOI Listing
November 2018

In This Issue, Volume 13, Issue 10.

ACS Chem Biol 2018 10;13(10):2824

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.8b00897DOI Listing
October 2018

In This Issue, Volume 13, Issue 9.

ACS Chem Biol 2018 09;13(9):2359

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.8b00836DOI Listing
September 2018

Transcriptional fates of human-specific segmental duplications in brain.

Genome Res 2018 10 18;28(10):1566-1576. Epub 2018 Sep 18.

Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.237610.118DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169893PMC
October 2018

In This Issue, Volume 13, Issue 7.

ACS Chem Biol 2018 07;13(7):1699

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.8b00612DOI Listing
July 2018

Genomic characterization of the RH locus detects complex and novel structural variation in multi-ethnic cohorts.

Genet Med 2019 02 29;21(2):477-486. Epub 2018 Jun 29.

Bloodworks NW Research Institute, Seattle, Washington, USA.

Purpose: Rh antigens can provoke severe alloimmune reactions, particularly in high-risk transfusion contexts, such as sickle cell disease. Rh antigens are encoded by the paralogs, RHD and RHCE, located in one of the most complex genetic loci. Our goal was to characterize RH genetic variation in multi-ethnic cohorts, with the focus on detecting RH structural variation (SV).

Methods: We customized analytical methods to estimate paralog-specific copy number from next-generation sequencing (NGS) data. We applied these methods to clinically characterized samples, including four World Health Organization (WHO) genotyping references and 1135 Asian and Native American blood donors. Subsequently, we surveyed 1715 African American samples from the Jackson Heart Study.

Results: Most samples in each dataset exhibited SV. SV detection enabled prediction of the immunogenic RhD and RhC antigens in concordance (>99%) with serological phenotyping. RhC antigen expression was associated with exon 2 hybrid alleles (RHCE*CE-D(2)-CE). Clinically relevant exon 4-7 hybrid alleles (RHD*D-CE(4-7)-D) and exon 9 hybrid alleles (RHCE*CE-D(9)-CE) were prevalent in African Americans.

Conclusion: This study shows custom NGS methods can accurately detect RH SV, and that SV is important to inform prediction of relevant RH alleles. Additionally, this study provides the first large NGS survey of RH alleles in African Americans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41436-018-0074-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311147PMC
February 2019

Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation.

Genome Res 2018 07 8;28(7):1029-1038. Epub 2018 Jun 8.

Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA.

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.233460.117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028123PMC
July 2018

High-resolution comparative analysis of great ape genomes.

Science 2018 06;360(6393)

Bionano Genomics, San Diego, CA 92121, USA.

Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aar6343DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6178954PMC
June 2018

In This Issue, Volume 13, Issue 1.

ACS Chem Biol 2018 01;13(1)

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acschembio.8b00013DOI Listing
January 2018

Fallacy of the Unique Genome: Sequence Diversity within Single Strains.

mBio 2017 02 21;8(1). Epub 2017 Feb 21.

Department of Microbiology & Environmental Toxicology, UC Santa Cruz, Santa Cruz, California, USA

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged strain SS1 and its parent PMSS1 to assess intra- and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these isolates, including movement of the transposable element IS, large and small inversions, multiple single nucleotide polymorphisms, and variation in copy number. The gene was found as 1 to 4 tandem copies located off the island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in , which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB α-1,3 lipopolysaccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains. Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish "the genome" of a bacterial strain. Variability is usually reduced ("only sequence from a single colony"), ignored ("just publish the consensus"), or placed in the "too-hard" basket ("analysis of raw read data is more robust"). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as "the genome" of a bacterial strain may be misleading.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/mBio.02321-16DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5358919PMC
February 2017

Massively parallel digital transcriptional profiling of single cells.

Nat Commun 2017 01 16;8:14049. Epub 2017 Jan 16.

Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.

Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We describe a droplet-based system that enables 3' mRNA counting of tens of thousands of single cells per sample. Cell encapsulation, of up to 8 samples at a time, takes place in ∼6 min, with ∼50% cell capture efficiency. To demonstrate the system's technical performance, we collected transcriptome data from ∼250k single cells across 29 samples. We validated the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. We profiled 68k peripheral blood mononuclear cells to demonstrate the system's ability to characterize large immune populations. Finally, we used sequence variation in the transcriptome data to determine host and donor chimerism at single-cell resolution from bone marrow mononuclear cells isolated from transplant patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms14049DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5241818PMC
January 2017

High-Throughput Nuclease Probing of RNA Structures Using FragSeq.

Methods Mol Biol 2016 ;1490:105-34

Research and Development, Pacific Biosciences, Menlo Park, CA, USA.

High-throughput sequencing of cDNA (RNA-Seq) can be used to generate nuclease accessibility data for many distinct transcripts in the same mixture simultaneously. Such assays accelerate RNA structure analysis and provide researchers with new technologies to tackle biological questions on a transcriptome-wide scale. FragSeq is an experimental assay for transcriptome-wide RNA structure probing using RNA-Seq, coupled with data analysis tools that allow quantitative determination of nuclease accessibility at single-base resolution. We provide a practical guide to designing and carrying out FragSeq experiments and data analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-6433-8_8DOI Listing
January 2018

Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.

Nucleic Acids Res 2015 Oct 3;43(18):e116. Epub 2015 Jun 3.

Department of Internal Medicine, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, USA

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv562DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605286PMC
October 2015

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

Nat Biotechnol 2014 Sep 24;32(9):915-925. Epub 2014 Aug 24.

Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA.

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.2972DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167418PMC
September 2014

Long-read sequencing of chicken transcripts and identification of new transcript isoforms.

PLoS One 2014 15;9(4):e94650. Epub 2014 Apr 15.

The Gladstone Institutes, San Francisco, California, United States of America; Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America.

The chicken has long served as an important model organism in many fields, and continues to aid our understanding of animal development. Functional genomics studies aimed at probing the mechanisms that regulate development require high-quality genomes and transcript annotations. The quality of these resources has improved dramatically over the last several years, but many isoforms and genes have yet to be identified. We hope to contribute to the process of improving these resources with the data presented here: a set of long cDNA sequencing reads, and a curated set of new genes and transcript isoforms not currently represented in the most up-to-date genome annotation currently available to the community of researchers who rely on the chicken genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0094650PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3988055PMC
January 2015

Characterization of the human ESC transcriptome by hybrid sequencing.

Proc Natl Acad Sci U S A 2013 Dec 26;110(50):E4821-30. Epub 2013 Nov 26.

Department of Statistics and Department of Health Research and Policy, Stanford University, Stanford, CA 94305.

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1320101110DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864310PMC
December 2013