Publications by authors named "William Meyerson"

9 Publications

  • Page 1 of 1

Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples.

Nat Commun 2020 09 21;11(1):4748. Epub 2020 Sep 21.

The McDonnell Genome Institute at Washington University, St. Louis, MO, 63108, USA.

The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18151-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7505971PMC
September 2020

An integrative ENCODE resource for cancer genomics.

Nat Commun 2020 07 29;11(1):3696. Epub 2020 Jul 29.

Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, USA.

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14743-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391744PMC
July 2020

Origins and characterization of variants shared between databases of somatic and germline human mutations.

BMC Bioinformatics 2020 Jun 4;21(1):227. Epub 2020 Jun 4.

Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA.

Background: Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap.

Results: After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones.

Conclusions: Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3508-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7273669PMC
June 2020

Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences.

Cell 2020 03 20;180(5):915-927.e16. Epub 2020 Feb 20.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06511, USA. Electronic address:

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2020.01.032DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7210002PMC
March 2020

Estimating growth patterns and driver effects in tumor evolution from individual samples.

Nat Commun 2020 02 5;11(1):732. Epub 2020 Feb 5.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.

Tumors accumulate thousands of mutations, and sequencing them has given rise to methods for finding cancer drivers via mutational recurrence. However, these methods require large cohorts and underperform for low recurrence. Recently, ultra-deep sequencing has enabled accurate measurement of VAFs (variant-allele frequencies) for mutations, allowing the determination of evolutionary trajectories. Here, based solely on the VAF spectrum for an individual sample, we report on a method that identifies drivers and quantifies tumor growth. Drivers introduce perturbations into the spectrum, and our method uses the frequency of hitchhiking mutations preceding a driver to measure this. As validation, we use simulation models and 993 tumors from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium with previously identified drivers. Then we apply our method to an ultra-deep sequenced acute myeloid leukemia (AML) tumor and identify known cancer genes and additional driver candidates. In summary, our framework presents opportunities for personalized driver diagnosis using sequencing data from a single individual.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14407-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7002450PMC
February 2020

Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions.

Structure 2019 09 3;27(9):1469-1481.e3. Epub 2019 Jul 3.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06520, USA. Electronic address:

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.str.2019.06.001DOI Listing
September 2019

Genomics and data science: an application within an umbrella.

Genome Biol 2019 05 29;20(1):109. Epub 2019 May 29.

Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.

Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural "exports" and "imports" between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1724-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6540394PMC
May 2019

Loregic: a method to characterize the cooperative logic of regulatory factors.

PLoS Comput Biol 2015 Apr 17;11(4):e1004132. Epub 2015 Apr 17.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America; Department of Computer Science, Yale University, New Haven, Connecticut, United States of America.

The topology of the gene-regulatory network has been extensively analyzed. Now, given the large amount of available functional genomic data, it is possible to go beyond this and systematically study regulatory circuits in terms of logic elements. To this end, we present Loregic, a computational method integrating gene expression and regulatory network data, to characterize the cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target. We attempt to find the gate that best matches each triplet's observed gene expression pattern across many conditions. We make Loregic available as a general-purpose tool (github.com/gersteinlab/loregic). We validate it with known yeast transcription-factor knockout experiments. Next, using human ENCODE ChIP-Seq and TCGA RNA-Seq data, we are able to demonstrate how Loregic characterizes complex circuits involving both proximally and distally regulating transcription factors (TFs) and also miRNAs. Furthermore, we show that MYC, a well-known oncogenic driving TF, can be modeled as acting independently from other TFs (e.g., using OR gates) but antagonistically with repressing miRNAs. Finally, we inter-relate Loregic's gate logic with other aspects of regulation, such as indirect binding via protein-protein interactions, feed-forward loop motifs and global regulatory hierarchy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1004132DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4401777PMC
April 2015