Publications by authors named "Nathan C Sheffield"

38 Publications

Bedshift: perturbation of genomic interval sets.

Genome Biol 2021 08 20;22(1):238. Epub 2021 Aug 20.

Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.

Functional genomics experiments, like ChIP-Seq or ATAC-Seq, produce results that are summarized as a region set. There is no way to objectively evaluate the effectiveness of region set similarity metrics. We present Bedshift, a tool for perturbing BED files by randomly shifting, adding, and dropping regions from a reference file. The perturbed files can be used to benchmark similarity metrics, as well as for other applications. We highlight differences in behavior between metrics, such as that the Jaccard score is most sensitive to added or dropped regions, while coverage score is most sensitive to shifted regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-021-02440-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8379854PMC
August 2021

Refget: standardised access to reference sequences.

Bioinformatics 2021 Jul 14. Epub 2021 Jul 14.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Motivation: Reference sequences are essential in creating a baseline of knowledge for many common bioinformatics methods, especially those using genomic sequencing.

Results: We have created refget, a Global Alliance for Genomics and Health API specification to access reference sequences and sub-sequences using an identifier derived from the sequence itself. We present four reference implementations across in-house and cloud infrastructure, a compliance suite and a web report used to ensure specification conformity across implementations.

Availability: The Refget specification can be found at: https://w3id.org/ga4gh/refget.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab524DOI Listing
July 2021

Embeddings of genomic region sets capture rich biological associations in lower dimensions.

Bioinformatics 2021 Jun 22. Epub 2021 Jun 22.

Center for Public Health Genomics, University of Virginia.

Motivation: Genomic region sets summarize functional genomics data and define locations of interest in the genome such as regulatory regions or transcription factor binding sites. The number of publicly available region sets has increased dramatically, leading to challenges in data analysis.

Results: We propose a new method to represent genomic region sets as vectors, or embeddings, using an adapted word2vec approach. We compared our approach to two simpler methods based on interval unions or term frequency-inverse document frequency and evaluated the methods in three ways: First, by classifying the cell line, antibody, or tissue type of the region set; second, by assessing whether similarity among embeddings can reflect simulated random perturbations of genomic regions; and third, by testing robustness of the proposed representations to different signal thresholds for calling peaks. Our word2vec-based region set embeddings reduce dimensionality from more than a hundred thousand to 100 without significant loss in classification performance. The vector representation could identify cell line, antibody, and tissue type with over 90% accuracy. We also found that the vectors could quantitatively summarize simulated random perturbations to region sets and are more robust to subsampling the data derived from different peak calling thresholds. Our evaluations demonstrate that the vectors retain useful biological information in relatively lower-dimensional spaces. We propose that vector representation of region sets is a promising approach for efficient analysis of genomic region data.

Availability: https://github.com/databio/regionset-embedding.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab439DOI Listing
June 2021

Multimodal analysis of cell-free DNA whole-genome sequencing for pediatric cancers with low mutational burden.

Nat Commun 2021 05 28;12(1):3230. Epub 2021 May 28.

St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria.

Sequencing of cell-free DNA in the blood of cancer patients (liquid biopsy) provides attractive opportunities for early diagnosis, assessment of treatment response, and minimally invasive disease monitoring. To unlock liquid biopsy analysis for pediatric tumors with few genetic aberrations, we introduce an integrated genetic/epigenetic analysis method and demonstrate its utility on 241 deep whole-genome sequencing profiles of 95 patients with Ewing sarcoma and 31 patients with other pediatric sarcomas. Our method achieves sensitive detection and classification of circulating tumor DNA in peripheral blood independent of any genetic alterations. Moreover, we benchmark different metrics for cell-free DNA fragmentation analysis, and we introduce the LIQUORICE algorithm for detecting circulating tumor DNA based on cancer-specific chromatin signatures. Finally, we combine several fragmentation-based metrics into an integrated machine learning classifier for liquid biopsy analysis that exploits widespread epigenetic deregulation and is tailored to cancers with low mutation rates. Clinical associations highlight the potential value of cfDNA fragmentation patterns as prognostic biomarkers in Ewing sarcoma. In summary, our study provides a comprehensive analysis of circulating tumor DNA beyond recurrent genetic aberrations, and it renders the benefits of liquid biopsy more readily accessible for childhood cancers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23445-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8163828PMC
May 2021

Identity and compatibility of reference genome resources.

NAR Genom Bioinform 2021 Jun 14;3(2):lqab036. Epub 2021 May 14.

Center for Public Health Genomics, University of Virginia, Virginia, 22908, USA.

Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: first, we derive unique identifiers for each resource; second, we record parent-child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data. https://refgenie.databio.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nargab/lqab036DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8121092PMC
June 2021

PEPPRO: quality control and processing of nascent RNA profiling data.

Genome Biol 2021 05 15;22(1):155. Epub 2021 May 15.

Center for Public Health Genomics, University of Virginia, Charlottesville, USA.

Nascent RNA profiling is growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniformly processed output files for downstream analysis and assesses adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report. PEPPRO can be run locally or using a cluster, providing a portable first step for genomic nascent RNA analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-021-02349-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8126160PMC
May 2021

Chromatin conformation capture (Hi-C) sequencing of patient-derived xenografts: analysis guidelines.

Gigascience 2021 Apr;10(4)

Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, USA.

Background: Sequencing of patient-derived xenograft (PDX) mouse models allows investigation of the molecular mechanisms of human tumor samples engrafted in a mouse host. Thus, both human and mouse genetic material is sequenced. Several methods have been developed to remove mouse sequencing reads from RNA-seq or exome sequencing PDX data and improve the downstream signal. However, for more recent chromatin conformation capture technologies (Hi-C), the effect of mouse reads remains undefined.

Results: We evaluated the effect of mouse read removal on the quality of Hi-C data using in silico created PDX Hi-C data with 10% and 30% mouse reads. Additionally, we generated 2 experimental PDX Hi-C datasets using different library preparation strategies. We evaluated 3 alignment strategies (Direct, Xenome, Combined) and 3 pipelines (Juicer, HiC-Pro, HiCExplorer) on Hi-C data quality.

Conclusions: Removal of mouse reads had little-to-no effect on data quality as compared with the results obtained with the Direct alignment strategy. Juicer extracted more valid chromatin interactions for Hi-C matrices, regardless of the mouse read removal strategy. However, the pipeline effect was minimal, while the library preparation strategy had the largest effect on all quality metrics. Together, our study presents comprehensive guidelines on PDX Hi-C data processing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giab022DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8058593PMC
April 2021

: a novel metric of similarity between genomic interval sets.

F1000Res 2020 9;9:581. Epub 2020 Jun 9.

Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.

Searching genomic interval sets produced by sequencing methods has been widely and routinely performed; however, existing metrics for quantifying similarities among interval sets are inconsistent. Here we introduce a self-consistent and effective metric of similarity and tool for comparing sequences based on their interval sets. With this metric, the similarity of two interval sets is quantified by a single index, the ratio of their effective overlap over the union: an index of indicates unrelated interval sets, and an index of means that the interval sets are identical. Analysis and tests confirm the effectiveness and self-consistency of the metric.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.23390.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7808057PMC
April 2021

IGD: high-performance search for large-scale genomic interval datasets.

Bioinformatics 2020 Dec 26. Epub 2020 Dec 26.

Center for Public Health Genomics, University of Virginia.

Summary: Databases of large-scale genome projects now contain thousands of genomic interval datasets. These data are a critical resource for understanding the function of DNA. However, our ability to examine and integrate interval data of this scale is limited. Here, we introduce the integrated genome database (IGD), a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory. IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions.

Availability: https://github.com/databio/IGD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa1062DOI Listing
December 2020

COCOA: coordinate covariation analysis of epigenetic heterogeneity.

Genome Biol 2020 09 7;21(1):240. Epub 2020 Sep 7.

Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.

A key challenge in epigenetics is to determine the biological significance of epigenetic variation among individuals. We present Coordinate Covariation Analysis (COCOA), a computational framework that uses covariation of epigenetic signals across individuals and a database of region sets to annotate epigenetic heterogeneity. COCOA is the first such tool for DNA methylation data and can also analyze any epigenetic signal with genomic coordinates. We demonstrate COCOA's utility by analyzing DNA methylation, ATAC-seq, and multi-omic data in supervised and unsupervised analyses, showing that COCOA provides new understanding of inter-sample epigenetic variation. COCOA is available on Bioconductor ( http://bioconductor.org/packages/COCOA ).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02139-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487606PMC
September 2020

Analytical Approaches for ATAC-seq Data Analysis.

Curr Protoc Hum Genet 2020 06;106(1):e101

Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia.

ATAC-seq, the assay for transposase-accessible chromatin using sequencing, is a quick and efficient approach to investigating the chromatin accessibility landscape. Investigating chromatin accessibility has broad utility for answering many biological questions, such as mapping nucleosomes, identifying transcription factor binding sites, and measuring differential activity of DNA regulatory elements. Because the ATAC-seq protocol is both simple and relatively inexpensive, there has been a rapid increase in the availability of chromatin accessibility data. Furthermore, advances in ATAC-seq protocols are rapidly extending its breadth to additional experimental conditions, cell types, and species. Accompanying the increase in data, there has also been an explosion of new tools and analytical approaches for analyzing it. Here, we explain the fundamentals of ATAC-seq data processing, summarize common analysis approaches, and review computational tools to provide recommendations for different research questions. This primer provides a starting point and a reference for analysis of ATAC-seq data. © 2020 Wiley Periodicals LLC.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cphg.101DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8191135PMC
June 2020

Refgenie: a reference genome resource manager.

Gigascience 2020 02;9(2)

Center for Public Health Genomics, University of Virginia, PO Box 800717, Charlottesville, VA, 22908, USA.

Background: Reference genome assemblies are essential for high-throughput sequencing analysis projects. Typically, genome assemblies are stored on disk alongside related resources; e.g., many sequence aligners require the assembly to be indexed. The resulting indexes are broadly applicable for downstream analysis, so it makes sense to share them. However, there is no simple tool to do this.

Results: Here, we introduce refgenie, a reference genome assembly asset manager. Refgenie makes it easier to organize, retrieve, and share genome analysis resources. In addition to genome indexes, refgenie can manage any files related to reference genomes, including sequences and annotation files. Refgenie includes a command line interface and a server application that provides a RESTful API, so it is useful for both tool development and analysis.

Conclusions: Refgenie streamlines sharing genome analysis resources among groups and across computing environments. Refgenie is available at https://refgenie.databio.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giz149DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6988606PMC
February 2020

Augmented Interval List: a novel data structure for efficient genomic interval search.

Bioinformatics 2019 12;35(23):4907-4911

Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.

Motivation: Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary.

Results: We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5-18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4-60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis.

Availability And Implementation: An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz407DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6901075PMC
December 2019

The chromatin accessibility landscape of primary human cancers.

Science 2018 10;362(6413)

We present the genome-wide chromatin accessibility profiles of 410 tumor samples spanning 23 cancer types from The Cancer Genome Atlas (TCGA). We identify 562,709 transposase-accessible DNA elements that substantially extend the compendium of known cis-regulatory elements. Integration of ATAC-seq (the assay for transposase-accessible chromatin using sequencing) with TCGA multi-omic data identifies a large number of putative distal enhancers that distinguish molecular subtypes of cancers, uncovers specific driving transcription factors via protein-DNA footprints, and nominates long-range gene-regulatory interactions in cancer. These data reveal genetic risk loci of cancer predisposition as active DNA regulatory elements in cancer, identify gene-regulatory interactions underlying cancer immune evasion, and pinpoint noncoding mutations that drive enhancer activation and may affect patient survival. These results suggest a systematic approach to understanding the noncoding genome in cancer to advance diagnosis and therapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aav1898DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6408149PMC
October 2018

The DNA methylation landscape of glioblastoma disease progression shows extensive heterogeneity in time and space.

Nat Med 2018 10 27;24(10):1611-1624. Epub 2018 Aug 27.

Department of Internal Medicine, Neuromed Campus Wagner-Jauregg, Kepler University Hospital, Johannes Kepler University of Linz, Linz, Austria.

Glioblastoma is characterized by widespread genetic and transcriptional heterogeneity, yet little is known about the role of the epigenome in glioblastoma disease progression. Here, we present genome-scale maps of DNA methylation in matched primary and recurring glioblastoma tumors, using data from a highly annotated clinical cohort that was selected through a national patient registry. We demonstrate the feasibility of DNA methylation mapping in a large set of routinely collected FFPE samples, and we validate bisulfite sequencing as a multipurpose assay that allowed us to infer a range of different genetic, epigenetic, and transcriptional characteristics of the profiled tumor samples. On the basis of these data, we identified subtle differences between primary and recurring tumors, links between DNA methylation and the tumor microenvironment, and an association of epigenetic tumor heterogeneity with patient survival. In summary, this study establishes an open resource for dissecting DNA methylation heterogeneity in a genetically diverse and heterogeneous cancer, and it demonstrates the feasibility of integrating epigenomics, radiology, and digital pathology for a national cohort, thereby leveraging existing samples and data collected as part of routine clinical practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41591-018-0156-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6181207PMC
October 2018

LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis.

Nucleic Acids Res 2018 07;46(W1):W194-W199

Center for Public Health Genomics, University of Virginia, USA.

The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, new genomic data can be easily connected to annotations from external data sources. Here, we present an interactive interface for enrichment analysis of genomic locus overlaps using a web server called LOLAweb. LOLAweb accepts a set of genomic ranges from the user and tests it for enrichment against a database of region sets. LOLAweb renders results in an R Shiny application to provide interactive visualization features, enabling users to filter, sort, and explore enrichment results dynamically. LOLAweb is built and deployed in a Linux container, making it scalable to many concurrent users on our servers and also enabling users to download and run LOLAweb locally.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky464DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030814PMC
July 2018

Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.

Nucleic Acids Res 2018 07;46(W1):W186-W193

Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway.

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky474DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030976PMC
July 2018

BART: a transcription factor prediction tool with query gene sets or epigenomic profiles.

Bioinformatics 2018 08;34(16):2867-2869

Center for Public Health Genomics, Charlottesville, VA, USA.

Summary: Identification of functional transcription factors that regulate a given gene set is an important problem in gene regulation studies. Conventional approaches for identifying transcription factors, such as DNA sequence motif analysis, are unable to predict functional binding of specific factors and not sensitive enough to detect factors binding at distal enhancers. Here, we present binding analysis for regulation of transcription (BART), a novel computational method and software package for predicting functional transcription factors that regulate a query gene set or associate with a query genomic profile, based on more than 6000 existing ChIP-seq datasets for over 400 factors in human or mouse. This method demonstrates the advantage of utilizing publicly available data for functional genomics research.

Availability And Implementation: BART is implemented in Python and available at http://faculty.virginia.edu/zanglab/bart.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty194DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084568PMC
August 2018

MIRA: an R package for DNA methylation-based inference of regulatory activity.

Bioinformatics 2018 08;34(15):2649-2650

Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.

Summary: DNA methylation contains information about the regulatory state of the cell. MIRA aggregates genome-scale DNA methylation data into a DNA methylation profile for a given region set with shared biological annotation. Using this profile, MIRA infers and scores the collective regulatory activity for the region set. MIRA facilitates regulatory analysis in situations where classical regulatory assays would be difficult and allows public sources of region sets to be leveraged for novel insight into the regulatory state of DNA methylation datasets.

Availability And Implementation: http://bioconductor.org/packages/MIRA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty083DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061852PMC
August 2018

DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma.

Nat Med 2017 Mar 30;23(3):386-395. Epub 2017 Jan 30.

Children's Cancer Research Institute, St. Anna Kinderkrebsforschung, Vienna, Austria.

Developmental tumors in children and young adults carry few genetic alterations, yet they have diverse clinical presentation. Focusing on Ewing sarcoma, we sought to establish the prevalence and characteristics of epigenetic heterogeneity in genetically homogeneous cancers. We performed genome-scale DNA methylation sequencing for a large cohort of Ewing sarcoma tumors and analyzed epigenetic heterogeneity on three levels: between cancers, between tumors, and within tumors. We observed consistent DNA hypomethylation at enhancers regulated by the disease-defining EWS-FLI1 fusion protein, thus establishing epigenomic enhancer reprogramming as a ubiquitous and characteristic feature of Ewing sarcoma. DNA methylation differences between tumors identified a continuous disease spectrum underlying Ewing sarcoma, which reflected the strength of an EWS-FLI1 regulatory signature and a continuum between mesenchymal and stem cell signatures. There was substantial epigenetic heterogeneity within tumors, particularly in patients with metastatic disease. In summary, our study provides a comprehensive assessment of epigenetic heterogeneity in Ewing sarcoma and thereby highlights the importance of considering nongenetic aspects of tumor heterogeneity in the context of cancer biology and personalized medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nm.4273DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5951283PMC
March 2017

Single-cell epigenomic variability reveals functional cancer heterogeneity.

Genome Biol 2017 01 24;18(1):15. Epub 2017 Jan 24.

Center for Personal Dynamic Regulomes, Stanford University School of Medicine, Stanford, CA, 94305, USA.

Background: Cell-to-cell heterogeneity is a major driver of cancer evolution, progression, and emergence of drug resistance. Epigenomic variation at the single-cell level can rapidly create cancer heterogeneity but is difficult to detect and assess functionally.

Results: We develop a strategy to bridge the gap between measurement and function in single-cell epigenomics. Using single-cell chromatin accessibility and RNA-seq data in K562 leukemic cells, we identify the cell surface marker CD24 as co-varying with chromatin accessibility changes linked to GATA transcription factors in single cells. Fluorescence-activated cell sorting of CD24 high versus low cells prospectively isolated GATA1 and GATA2 high versus low cells. GATA high versus low cells express differential gene regulatory networks, differential sensitivity to the drug imatinib mesylate, and differential self-renewal capacity. Lineage tracing experiments show that GATA/CD24hi cells have the capability to rapidly reconstitute the heterogeneity within the entire starting population, suggesting that GATA expression levels drive a phenotypically relevant source of epigenomic plasticity.

Conclusion: Single-cell chromatin accessibility can guide prospective characterization of cancer heterogeneity. Epigenomic subpopulations in cancer impact drug sensitivity and the clonal dynamics of cancer evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-016-1133-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5259890PMC
January 2017

Multi-Omics of Single Cells: Strategies and Applications.

Trends Biotechnol 2016 08 20;34(8):605-608. Epub 2016 May 20.

CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.

Most genome-wide assays provide averages across large numbers of cells, but recent technological advances promise to overcome this limitation. Pioneering single-cell assays are now available for genome, epigenome, transcriptome, proteome, and metabolome profiling. Here, we describe how these different dimensions can be combined into multi-omics assays that provide comprehensive profiles of the same cell.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tibtech.2016.04.004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959511PMC
August 2016

The second European interdisciplinary Ewing sarcoma research summit--A joint effort to deconstructing the multiple layers of a complex disease.

Oncotarget 2016 Feb;7(8):8613-24

Department of Oncology and Children's Research Center, University Children's Hospital, Zurich, Switzerland.

Despite multimodal treatment, long term outcome for patients with Ewing sarcoma is still poor. The second "European interdisciplinary Ewing sarcoma research summit" assembled a large group of scientific experts in the field to discuss their latest unpublished findings on the way to the identification of novel therapeutic targets and strategies. Ewing sarcoma is characterized by a quiet genome with presence of an EWSR1-ETS gene rearrangement as the only and defining genetic aberration. RNA-sequencing of recently described Ewing-like sarcomas with variant translocations identified them as biologically distinct diseases. Various presentations adressed mechanisms of EWS-ETS fusion protein activities with a focus on EWS-FLI1. Data were presented shedding light on the molecular underpinnings of genetic permissiveness to this disease uncovering interaction of EWS-FLI1 with recently discovered susceptibility loci. Epigenetic context as a consequence of the interaction between the oncoprotein, cell type, developmental stage, and tissue microenvironment emerged as dominant theme in the discussion of the molecular pathogenesis and inter- and intra-tumor heterogeneity of Ewing sarcoma, and the difficulty to generate animal models faithfully recapitulating the human disease. The problem of preclinical development of biologically targeted therapeutics was discussed and promising perspectives were offered from the study of novel in vitro models. Finally, it was concluded that in order to facilitate rapid pre-clinical and clinical development of novel therapies in Ewing sarcoma, the community needs a platform to maintain knowledge of unpublished results, systems and models used in drug testing and to continue the open dialogue initiated at the first two Ewing sarcoma summits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.18632/oncotarget.6937DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4890991PMC
February 2016

Differential DNA Methylation Analysis without a Reference Genome.

Cell Rep 2015 Dec 8;13(11):2621-2633. Epub 2015 Dec 8.

CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria; Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria; Max Planck Institute for Informatics, 66123 Saarbrücken, Germany. Electronic address:

Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2015.11.024DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4695333PMC
December 2015

LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor.

Bioinformatics 2016 Feb 27;32(4):587-9. Epub 2015 Oct 27.

CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria and Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.

Unlabelled: Genomic datasets are often interpreted in the context of large-scale reference databases. One approach is to identify significantly overlapping gene sets, which works well for gene-centric data. However, many types of high-throughput data are based on genomic regions. Locus Overlap Analysis (LOLA) provides easy and automatable enrichment analysis for genomic region sets, thus facilitating the interpretation of functional genomics and epigenomics data.

Availability And Implementation: R package available in Bioconductor and on the following website: http://lola.computational-epigenetics.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv612DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4743627PMC
February 2016

ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors.

Nat Methods 2015 Oct 17;12(10):963-965. Epub 2015 Aug 17.

CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is widely used to map histone marks and transcription factor binding throughout the genome. Here we present ChIPmentation, a method that combines chromatin immunoprecipitation with sequencing library preparation by Tn5 transposase ('tagmentation'). ChIPmentation introduces sequencing-compatible adaptors in a single-step reaction directly on bead-bound chromatin, which reduces time, cost and input requirements, thus providing a convenient and broadly useful alternative to existing ChIP-seq protocols.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.3542DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589892PMC
October 2015

Single-cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics.

Cell Rep 2015 Mar 26;10(8):1386-97. Epub 2015 Feb 26.

CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria; Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria; Max Planck Institute for Informatics, 66123 Saarbrücken, Germany. Electronic address:

Methods for single-cell genome and transcriptome sequencing have contributed to our understanding of cellular heterogeneity, whereas methods for single-cell epigenomics are much less established. Here, we describe a whole-genome bisulfite sequencing (WGBS) assay that enables DNA methylation mapping in very small cell populations (μWGBS) and single cells (scWGBS). Our assay is optimized for profiling many samples at low coverage, and we describe a bioinformatic method that analyzes collections of single-cell methylomes to infer cell-state dynamics. Using these technological advances, we studied epigenomic cell-state dynamics in three in vitro models of cellular differentiation and pluripotency, where we observed characteristic patterns of epigenome remodeling and cell-to-cell heterogeneity. The described method enables single-cell analysis of DNA methylation in a broad range of biological systems, including embryonic development, stem cell differentiation, and cancer. It can also be used to establish composite methylomes that account for cell-to-cell heterogeneity in complex tissue samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2015.02.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542311PMC
March 2015

Epigenome mapping reveals distinct modes of gene regulation and widespread enhancer reprogramming by the oncogenic fusion protein EWS-FLI1.

Cell Rep 2015 Feb 19;10(7):1082-95. Epub 2015 Feb 19.

Children's Cancer Research Institute, St. Anna Kinderkrebsforschung, 1090 Vienna, Austria; Department of Pediatrics, Medical University of Vienna, 1090 Vienna, Austria. Electronic address:

Transcription factor fusion proteins can transform cells by inducing global changes of the transcriptome, often creating a state of oncogene addiction. Here, we investigate the role of epigenetic mechanisms in this process, focusing on Ewing sarcoma cells that are dependent on the EWS-FLI1 fusion protein. We established reference epigenome maps comprising DNA methylation, seven histone marks, open chromatin states, and RNA levels, and we analyzed the epigenome dynamics upon downregulation of the driving oncogene. Reduced EWS-FLI1 expression led to widespread epigenetic changes in promoters, enhancers, and super-enhancers, and we identified histone H3K27 acetylation as the most strongly affected mark. Clustering of epigenetic promoter signatures defined classes of EWS-FLI1-regulated genes that responded differently to low-dose treatment with histone deacetylase inhibitors. Furthermore, we observed strong and opposing enrichment patterns for E2F and AP-1 among EWS-FLI1-correlated and anticorrelated genes. Our data describe extensive genome-wide rewiring of epigenetic cell states driven by an oncogenic fusion protein.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2015.01.042DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542316PMC
February 2015

Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions.

Genome Res 2013 May 12;23(5):777-88. Epub 2013 Mar 12.

Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27710, USA.

Regulatory elements recruit transcription factors that modulate gene expression distinctly across cell types, but the relationships among these remains elusive. To address this, we analyzed matched DNase-seq and gene expression data for 112 human samples representing 72 cell types. We first defined more than 1800 clusters of DNase I hypersensitive sites (DHSs) with similar tissue specificity of DNase-seq signal patterns. We then used these to uncover distinct associations between DHSs and promoters, CpG islands, conserved elements, and transcription factor motif enrichment. Motif analysis within clusters identified known and novel motifs in cell-type-specific and ubiquitous regulatory elements and supports a role for AP-1 regulating open chromatin. We developed a classifier that accurately predicts cell-type lineage based on only 43 DHSs and evaluated the tissue of origin for cancer cell types. A similar classifier identified three sex-specific loci on the X chromosome, including the XIST lincRNA locus. By correlating DNase I signal and gene expression, we predicted regulated genes for more than 500K DHSs. Finally, we introduce a web resource to enable researchers to use these results to explore these regulatory patterns and better understand how expression is modulated within and across human cell types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.152140.112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638134PMC
May 2013

Chromatin accessibility reveals insights into androgen receptor activation and transcriptional specificity.

Genome Biol 2012 Oct 3;13(10):R88. Epub 2012 Oct 3.

Background: Epigenetic mechanisms such as chromatin accessibility impact transcription factor binding to DNA and transcriptional specificity. The androgen receptor (AR), a master regulator of the male phenotype and prostate cancer pathogenesis, acts primarily through ligand-activated transcription of target genes. Although several determinants of AR transcriptional specificity have been elucidated, our understanding of the interplay between chromatin accessibility and AR function remains incomplete.

Results: We used deep sequencing to assess chromatin structure via DNase I hypersensitivity and mRNA abundance, and paired these datasets with three independent AR ChIP-seq datasets. Our analysis revealed qualitative and quantitative differences in chromatin accessibility that corresponded to both AR binding and an enrichment of motifs for potential collaborating factors, one of which was identified as SP1. These quantitative differences were significantly associated with AR-regulated mRNA transcription across the genome. Base-pair resolution of the DNase I cleavage profile revealed three distinct footprinting patterns associated with the AR-DNA interaction, suggesting multiple modes of AR interaction with the genome.

Conclusions: In contrast with other DNA-binding factors, AR binding to the genome does not only target regions that are accessible to DNase I cleavage prior to hormone induction. AR binding is invariably associated with an increase in chromatin accessibility and, consequently, changes in gene expression. Furthermore, we present the first in vivo evidence that a significant fraction of AR binds only to half of the full AR DNA motif. These findings indicate a dynamic quantitative relationship between chromatin structure and AR-DNA binding that impacts AR transcriptional specificity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2012-13-10-r88DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491416PMC
October 2012
-->