Publications by authors named "Francois Aguet"

53 Publications

Unannotated proteins expand the MHC-I-restricted immunopeptidome in cancer.

Nat Biotechnol 2021 Oct 18. Epub 2021 Oct 18.

Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA.

Tumor-associated epitopes presented on MHC-I that can activate the immune system against cancer cells are typically identified from annotated protein-coding regions of the genome, but whether peptides originating from novel or unannotated open reading frames (nuORFs) can contribute to antitumor immune responses remains unclear. Here we show that peptides originating from nuORFs detected by ribosome profiling of malignant and healthy samples can be displayed on MHC-I of cancer cells, acting as additional sources of cancer antigens. We constructed a high-confidence database of translated nuORFs across tissues (nuORFdb) and used it to detect 3,555 translated nuORFs from MHC-I immunopeptidome mass spectrometry analysis, including peptides that result from somatic mutations in nuORFs of cancer samples as well as tumor-specific nuORFs translated in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs are an unexplored pool of MHC-I-presented, tumor-specific peptides with potential as immunotherapy targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41587-021-01021-3DOI Listing
October 2021

Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs.

Nat Commun 2021 06 7;12(1):3394. Epub 2021 Jun 7.

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23134-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8184741PMC
June 2021

Benchmarking association analyses of continuous exposures with RNA-seq in observational studies.

Brief Bioinform 2021 May 20. Epub 2021 May 20.

Harbor-UCLA Medical Center at the Lundquist Institute, USA.

Large datasets of hundreds to thousands of individuals measuring RNA-seq in observational studies are becoming available. Many popular software packages for analysis of RNA-seq data were constructed to study differences in expression signatures in an experimental design with well-defined conditions (exposures). In contrast, observational studies may have varying levels of confounding transcript-exposure associations; further, exposure measures may vary from discrete (exposed, yes/no) to continuous (levels of exposure), with non-normal distributions of exposure. We compare popular software for gene expression-DESeq2, edgeR and limma-as well as linear regression-based analyses for studying the association of continuous exposures with RNA-seq. We developed a computation pipeline that includes transformation, filtering and generation of empirical null distribution of association P-values, and we apply the pipeline to compute empirical P-values with multiple testing correction. We employ a resampling approach that allows for assessment of false positive detection across methods, power comparison and the computation of quantile empirical P-values. The results suggest that linear regression methods are substantially faster with better control of false detections than other methods, even with the resampling method to compute empirical P-values. We provide the proposed pipeline with fast algorithms in an R package Olivia, and implemented it to study the associations of measures of sleep disordered breathing with RNA-seq in peripheral blood mononuclear cells in participants from the Multi-Ethnic Study of Atherosclerosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbab194DOI Listing
May 2021

Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease.

Cell 2021 May 16;184(10):2633-2648.e19. Epub 2021 Apr 16.

Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Pathology, Stanford University, Stanford, CA 94305, USA. Electronic address:

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2021.03.050DOI Listing
May 2021

RNA-SeQC 2: Efficient RNA-seq quality control and quantification for large cohorts.

Bioinformatics 2021 Mar 2. Epub 2021 Mar 2.

Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.

Summary: Post-sequencing quality control is a crucial component of RNA sequencing (RNA-seq) data generation and analysis, as sample quality can be affected by sample storage, extraction, and sequencing protocols. RNA-seq is increasingly applied to cohorts ranging from hundreds to tens of thousands of samples in size, but existing tools do not readily scale to these sizes, and were not designed for a wide range of sample types and qualities. Here, we describe RNA-SeQC 2, an efficient reimplementation of RNA-SeQC (DeLuca et al., 2012) that adds multiple metrics designed to characterize sample quality across a wide range of RNA-seq protocols.

Availability And Implementation: The command-line tool, documentation, and C ++ source code are available at the GitHub repository https://github.com/getzlab/rnaseqc.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab135DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479667PMC
March 2021

A scalable unified framework of total and allele-specific counts for cis-QTL, fine-mapping, and prediction.

Nat Commun 2021 03 3;12(1):1424. Epub 2021 Mar 3.

Section of Genetic Medicine, The University of Chicago, Chicago, IL, USA.

Genetic studies of the transcriptome help bridge the gap between genetic variation and phenotypes. To maximize the potential of such studies, efficient methods to identify expression quantitative trait loci (eQTLs) and perform fine-mapping and genetic prediction of gene expression traits are needed. Current methods that leverage both total read counts and allele-specific expression to identify eQTLs are generally computationally intractable for large transcriptomic studies. Here, we describe a unified framework that addresses these needs and is scalable to thousands of samples. Using simulations and data from GTEx, we demonstrate its calibration and performance. For example, mixQTL shows a power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. To showcase the potential of mixQTL, we apply it to 49 GTEx tissues and find 20% additional eQTLs (FDR < 0.05, per tissue) that are significantly more enriched among trait associated variants and candidate cis-regulatory elements comparing to the standard approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-21592-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7930098PMC
March 2021

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Exploiting the GTEx resources to decipher the mechanisms at GWAS loci.

Genome Biol 2021 01 26;22(1):49. Epub 2021 Jan 26.

Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA.

The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02252-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7836161PMC
January 2021

The Lipogenic Regulator SREBP2 Induces Transferrin in Circulating Melanoma Cells and Suppresses Ferroptosis.

Cancer Discov 2021 03 17;11(3):678-695. Epub 2020 Nov 17.

Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, Massachusetts.

Circulating tumor cells (CTC) are shed by cancer into the bloodstream, where a viable subset overcomes oxidative stress to initiate metastasis. We show that single CTCs from patients with melanoma coordinately upregulate lipogenesis and iron homeostasis pathways. These are correlated with both intrinsic and acquired resistance to BRAF inhibitors across clonal cultures of -mutant CTCs. The lipogenesis regulator SREBP2 directly induces transcription of the iron carrier Transferrin (), reducing intracellular iron pools, reactive oxygen species, and lipid peroxidation, thereby conferring resistance to inducers of ferroptosis. Knockdown of endogenous impairs tumor formation by melanoma CTCs, and their tumorigenic defects are partially rescued by the lipophilic antioxidants ferrostatin-1 and vitamin E. In a prospective melanoma cohort, presence of CTCs with high lipogenic and iron metabolic RNA signatures is correlated with adverse clinical outcome, irrespective of treatment regimen. Thus, SREBP2-driven iron homeostatic pathways contribute to cancer progression, drug resistance, and metastasis. SIGNIFICANCE: Through single-cell analysis of primary and cultured melanoma CTCs, we have uncovered intrinsic cancer cell heterogeneity within lipogenic and iron homeostatic pathways that modulates resistance to BRAF inhibitors and to ferroptosis inducers. Activation of these pathways within CTCs is correlated with adverse clinical outcome, pointing to therapeutic opportunities..
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/2159-8290.CD-19-1500DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7933049PMC
March 2021

Inherited causes of clonal haematopoiesis in 97,691 whole genomes.

Nature 2020 10 14;586(7831):763-768. Epub 2020 Oct 14.

Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA.

Age is the dominant risk factor for most chronic human diseases, but the mechanisms through which ageing confers this risk are largely unknown. The age-related acquisition of somatic mutations that lead to clonal expansion in regenerating haematopoietic stem cell populations has recently been associated with both haematological cancer and coronary heart disease-this phenomenon is termed clonal haematopoiesis of indeterminate potential (CHIP). Simultaneous analyses of germline and somatic whole-genome sequences provide the opportunity to identify root causes of CHIP. Here we analyse high-coverage whole-genome sequences from 97,691 participants of diverse ancestries in the National Heart, Lung, and Blood Institute Trans-omics for Precision Medicine (TOPMed) programme, and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid and inflammatory traits that are specific to different CHIP driver genes. Association of a genome-wide set of germline genetic variants enabled the identification of three genetic loci associated with CHIP status, including one locus at TET2 that was specific to individuals of African ancestry. In silico-informed in vitro evaluation of the TET2 germline locus enabled the identification of a causal variant that disrupts a TET2 distal enhancer, resulting in increased self-renewal of haematopoietic stem cells. Overall, we observe that germline genetic variation shapes haematopoietic stem cell function, leading to CHIP through mechanisms that are specific to clonal haematopoiesis as well as shared mechanisms that lead to somatic mutations across tissues.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2819-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7944936PMC
October 2020

Whole genome sequence analysis of pulmonary function and COPD in 19,996 multi-ethnic participants.

Nat Commun 2020 10 14;11(1):5182. Epub 2020 Oct 14.

The Institute for Translational Genomics and Population Sciences, The Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA.

Chronic obstructive pulmonary disease (COPD), diagnosed by reduced lung function, is a leading cause of morbidity and mortality. We performed whole genome sequence (WGS) analysis of lung function and COPD in a multi-ethnic sample of 11,497 participants from population- and family-based studies, and 8499 individuals from COPD-enriched studies in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program. We identify at genome-wide significance 10 known GWAS loci and 22 distinct, previously unreported loci, including two common variant signals from stratified analysis of African Americans. Four novel common variants within the regions of PIAS1, RGN (two variants) and FTO show evidence of replication in the UK Biobank (European ancestry n ~ 320,000), while colocalization analyses leveraging multi-omic data from GTEx and TOPMed identify potential molecular mechanisms underlying four of the 22 novel loci. Our study demonstrates the value of performing WGS analyses and multi-omic follow-up in cohorts of diverse ancestry.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18334-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7598941PMC
October 2020

Fine-mapping and QTL tissue-sharing information improves the reliability of causal gene identification.

Genet Epidemiol 2020 Sep 10. Epub 2020 Sep 10.

Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, Illinois.

The integration of transcriptomic studies and genome-wide association studies (GWAS) via imputed expression has seen extensive application in recent years, enabling the functional characterization and causal gene prioritization of GWAS loci. However, the techniques for imputing transcriptomic traits from DNA variation remain underdeveloped. Furthermore, associations found when linking eQTL studies to complex traits through methods like PrediXcan can lead to false positives due to linkage disequilibrium between distinct causal variants. Therefore, the best prediction performance models may not necessarily lead to more reliable causal gene discovery. With the goal of improving discoveries without increasing false positives, we develop and compare multiple transcriptomic imputation approaches using the most recent GTEx release of expression and splicing data on 17,382 RNA-sequencing samples from 948 post-mortem donors in 54 tissues. We find that informing prediction models with posterior causal probability from fine-mapping (dap-g) and borrowing information across tissues (mashr) can lead to better performance in terms of number and proportion of significant associations that are colocalized and the proportion of silver standard genes identified as indicated by precision-recall and receiver operating characteristic curves. All prediction models are made publicly available at predictdb.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22346DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7693040PMC
September 2020

Cell type-specific genetic regulation of gene expression across human tissues.

Science 2020 09;369(6509)

Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain.

The Genotype-Tissue Expression (GTEx) project has identified expression and splicing quantitative trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the functional characterization of these QTLs has been limited by the heterogeneous cellular composition of GTEx tissue samples. We mapped interactions between computational estimates of cell type abundance and genotype to identify cell type-interaction QTLs for seven cell types and show that cell type-interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from cell type-interaction QTLs and enables the discovery of hundreds of previously unidentified colocalized loci that are masked in bulk tissue.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aaz8528DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8051643PMC
September 2020

Determinants of telomere length across human tissues.

Science 2020 09;369(6509)

Department of Public Health Sciences, University of Chicago, Chicago, IL, USA.

Telomere shortening is a hallmark of aging. Telomere length (TL) in blood cells has been studied extensively as a biomarker of human aging and disease; however, little is known regarding variability in TL in nonblood, disease-relevant tissue types. Here, we characterize variability in TLs from 6391 tissue samples, representing >20 tissue types and 952 individuals from the Genotype-Tissue Expression (GTEx) project. We describe differences across tissue types, positive correlation among tissue types, and associations with age and ancestry. We show that genetic variation affects TL in multiple tissue types and that TL may mediate the effect of age on gene expression. Our results provide the foundational knowledge regarding TL in healthy tissues that is needed to interpret epidemiological studies of TL and human health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aaz6876DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8108546PMC
September 2020

Transcriptomic signatures across human tissues identify functional rare genetic variation.

Science 2020 09 10;369(6509). Epub 2020 Sep 10.

University of Mississippi Medical Center, Jackson, MS, USA.

Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aaz5900DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7646251PMC
September 2020

The impact of sex on gene expression across human tissues.

Science 2020 09;369(6509)

Department of Statistics, University of Chicago, Chicago, IL, USA.

Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aba3066DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8136152PMC
September 2020

Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx.

Genome Biol 2020 09 11;21(1):233. Epub 2020 Sep 11.

Department of Genetics, Stanford University, Stanford, CA, USA.

Background: Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the v8 release also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx improves portability of this research across populations and further characterizes the impact of population structure on GWAS colocalization.

Results: Here, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in seven tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe 31 loci (0.02%) where a significant colocalization is called only with one eQTL ancestry adjustment method. Notably, both adjustments produce similar numbers of significant colocalizations within each of two different colocalization methods, COLOC and FINEMAP. Finally, we identify a small subset of eQTL-associated variants highly correlated with local ancestry, providing a resource to enhance functional follow-up.

Conclusions: We provide a local ancestry map for admixed individuals in the GTEx v8 release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of the results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02113-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7488497PMC
September 2020

A vast resource of allelic expression data spanning human tissues.

Genome Biol 2020 09 11;21(1):234. Epub 2020 Sep 11.

New York Genome Center, New York, NY, USA.

Allele expression (AE) analysis robustly measures cis-regulatory effects. Here, we present and demonstrate the utility of a vast AE resource generated from the GTEx v8 release, containing 15,253 samples spanning 54 human tissues for a total of 431 million measurements of AE at the SNP level and 153 million measurements at the haplotype level. In addition, we develop an extension of our tool phASER that allows effect sizes of cis-regulatory variants to be estimated using haplotype-level AE data. This AE resource is the largest to date, and we are able to make haplotype-level data publicly available. We anticipate that the availability of this resource will enable future studies of regulatory variation across human tissues.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02122-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7488534PMC
September 2020

sn-spMF: matrix factorization informs tissue-specific genetic regulation of gene expression.

Genome Biol 2020 09 11;21(1):235. Epub 2020 Sep 11.

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, MD, USA.

Genetic regulation of gene expression, revealed by expression quantitative trait loci (eQTLs), exhibits complex patterns of tissue-specific effects. Characterization of these patterns may allow us to better understand mechanisms of gene regulation and disease etiology. We develop a constrained matrix factorization model, sn-spMF, to learn patterns of tissue-sharing and apply it to 49 human tissues from the Genotype-Tissue Expression (GTEx) project. The learned factors reflect tissues with known biological similarity and identify transcription factors that may mediate tissue-specific effects. sn-spMF, available at https://github.com/heyuan7676/ts_eQTLs , can be applied to learn biologically interpretable patterns of eQTL tissue-specificity and generate testable mechanistic hypotheses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02129-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7488540PMC
September 2020

The FDA-approved drug Alectinib compromises SARS-CoV-2 nucleocapsid phosphorylation and inhibits viral infection in vitro.

bioRxiv 2020 Dec 16. Epub 2020 Dec 16.

While vaccines are vital for preventing COVID-19 infections, it is critical to develop new therapies to treat patients who become infected. Pharmacological targeting of a host factor required for viral replication can suppress viral spread with a low probability of viral mutation leading to resistance. In particular, host kinases are highly druggable targets and a number of conserved coronavirus proteins, notably the nucleoprotein (N), require phosphorylation for full functionality. In order to understand how targeting kinases could be used to compromise viral replication, we used a combination of phosphoproteomics and bioinformatics as well as genetic and pharmacological kinase inhibition to define the enzymes important for SARS-CoV-2 N protein phosphorylation and viral replication. From these data, we propose a model whereby SRPK1/2 initiates phosphorylation of the N protein, which primes for further phosphorylation by GSK-3a/b and CK1 to achieve extensive phosphorylation of the N protein SR-rich domain. Importantly, we were able to leverage our data to identify an FDA-approved kinase inhibitor, Alectinib, that suppresses N phosphorylation by SRPK1/2 and limits SARS-CoV-2 replication. Together, these data suggest that repurposing or developing novel host-kinase directed therapies may be an efficacious strategy to prevent or treat COVID-19 and other coronavirus-mediated diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.08.14.251207DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7430567PMC
December 2020

Genomic Profiling of Smoldering Multiple Myeloma Identifies Patients at a High Risk of Disease Progression.

J Clin Oncol 2020 07 22;38(21):2380-2389. Epub 2020 May 22.

Broad Institute of MIT and Harvard, Cambridge, MA.

Purpose: Smoldering multiple myeloma (SMM) is a precursor condition of multiple myeloma (MM) with a 10% annual risk of progression. Various prognostic models exist for risk stratification; however, those are based on solely clinical metrics. The discovery of genomic alterations that underlie disease progression to MM could improve current risk models.

Methods: We used next-generation sequencing to study 214 patients with SMM. We performed whole-exome sequencing on 166 tumors, including 5 with serial samples, and deep targeted sequencing on 48 tumors.

Results: We observed that most of the genetic alterations necessary for progression have already been acquired by the diagnosis of SMM. Particularly, we found that alterations of the mitogen-activated protein kinase pathway ( and single nucleotide variants [SNVs]), the DNA repair pathway (deletion 17p, , and SNVs), and (translocations or copy number variations) were all independent risk factors of progression after accounting for clinical risk staging. We validated these findings in an external SMM cohort by showing that patients who have any of these three features have a higher risk of progressing to MM. Moreover, APOBEC associated mutations were enriched in patients who progressed and were associated with a shorter time to progression in our cohort.

Conclusion: SMM is a genetically mature entity whereby most driver genetic alterations have already occurred, which suggests the existence of a right-skewed model of genetic evolution from monoclonal gammopathy of undetermined significance to MM. We identified and externally validated genomic predictors of progression that could distinguish patients at high risk of progression to MM and, thus, improve on the precision of current clinical models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1200/JCO.20.00437DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367550PMC
July 2020

Transcriptional and Cellular Diversity of the Human Heart.

Circulation 2020 08 14;142(5):466-482. Epub 2020 May 14.

Precision Cardiology Laboratory (N.R.T., M.C., S.J.F., A.W.H., A.-D.A., C.N.H., A.A., I.P., C.R., S.H.C., M.B., C.M.S., P.T.E.), Cambridge, MA.

Background: The human heart requires a complex ensemble of specialized cell types to perform its essential function. A greater knowledge of the intricate cellular milieu of the heart is critical to increase our understanding of cardiac homeostasis and pathology. As recent advances in low-input RNA sequencing have allowed definitions of cellular transcriptomes at single-cell resolution at scale, we have applied these approaches to assess the cellular and transcriptional diversity of the nonfailing human heart.

Methods: Microfluidic encapsulation and barcoding was used to perform single nuclear RNA sequencing with samples from 7 human donors, selected for their absence of overt cardiac disease. Individual nuclear transcriptomes were then clustered based on transcriptional profiles of highly variable genes. These clusters were used as the basis for between-chamber and between-sex differential gene expression analyses and intersection with genetic and pharmacologic data.

Results: We sequenced the transcriptomes of 287 269 single cardiac nuclei, revealing 9 major cell types and 20 subclusters of cell types within the human heart. Cellular subclasses include 2 distinct groups of resident macrophages, 4 endothelial subtypes, and 2 fibroblast subsets. Comparisons of cellular transcriptomes by cardiac chamber or sex reveal diversity not only in cardiomyocyte transcriptional programs but also in subtypes involved in extracellular matrix remodeling and vascularization. Using genetic association data, we identified strong enrichment for the role of cell subtypes in cardiac traits and diseases. Intersection of our data set with genes on cardiac clinical testing panels and the druggable genome reveals striking patterns of cellular specificity.

Conclusions: Using large-scale single nuclei RNA sequencing, we defined the transcriptional and cellular diversity in the normal human heart. Our identification of discrete cell subtypes and differentially expressed genes within the heart will ultimately facilitate the development of new therapeutics for cardiovascular diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1161/CIRCULATIONAHA.119.045401DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7666104PMC
August 2020

Scaling computational genomics to millions of individuals with GPUs.

Genome Biol 2019 11 1;20(1):228. Epub 2019 Nov 1.

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Current genomics methods are designed to handle tens to thousands of samples but will need to scale to millions to match the pace of data and hypothesis generation in biomedical science. Here, we show that high efficiency at low cost can be achieved by leveraging general-purpose libraries for computing using graphics processing units (GPUs), such as PyTorch and TensorFlow. We demonstrate > 200-fold decreases in runtime and ~ 5-10-fold reductions in cost relative to CPUs. We anticipate that the accessibility of these libraries will lead to a widespread adoption of GPUs in computational genomics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1836-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6823959PMC
November 2019

Liquid versus tissue biopsy for detecting acquired resistance and tumor heterogeneity in gastrointestinal cancers.

Nat Med 2019 09 9;25(9):1415-1421. Epub 2019 Sep 9.

Cancer Center, Massachusetts General Hospital, Boston, MA, USA.

During cancer therapy, tumor heterogeneity can drive the evolution of multiple tumor subclones harboring unique resistance mechanisms in an individual patient. Previous case reports and small case series have suggested that liquid biopsy (specifically, cell-free DNA (cfDNA)) may better capture the heterogeneity of acquired resistance. However, the effectiveness of cfDNA versus standard single-lesion tumor biopsies has not been directly compared in larger-scale prospective cohorts of patients following progression on targeted therapy. Here, in a prospective cohort of 42 patients with molecularly defined gastrointestinal cancers and acquired resistance to targeted therapy, direct comparison of postprogression cfDNA versus tumor biopsy revealed that cfDNA more frequently identified clinically relevant resistance alterations and multiple resistance mechanisms, detecting resistance alterations not found in the matched tumor biopsy in 78% of cases. Whole-exome sequencing of serial cfDNA, tumor biopsies and rapid autopsy specimens elucidated substantial geographic and evolutionary differences across lesions. Our data suggest that acquired resistance is frequently characterized by profound tumor heterogeneity, and that the emergence of multiple resistance alterations in an individual patient may represent the 'rule' rather than the 'exception'. These findings have profound therapeutic implications and highlight the potential advantages of cfDNA over tissue biopsy in the setting of acquired resistance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41591-019-0561-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6741444PMC
September 2019

RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues.

Science 2019 Jun;364(6444)

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

How somatic mutations accumulate in normal cells is poorly understood. A comprehensive analysis of RNA sequencing data from ~6700 samples across 29 normal tissues revealed multiple somatic variants, demonstrating that macroscopic clones can be found in many normal tissues. We found that sun-exposed skin, esophagus, and lung have a higher mutation burden than other tested tissues, which suggests that environmental factors can promote somatic mosaicism. Mutation burden was associated with both age and tissue-specific cell proliferation rate, highlighting that mutations accumulate over both time and number of cell divisions. Finally, normal tissues were found to harbor mutations in known cancer genes and hotspots. This study provides a broad view of macroscopic clonal expansion in human tissues, thus serving as a foundation for associating clonal expansion with environmental factors, aging, and risk of disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aaw0726DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350423PMC
June 2019

Next-generation characterization of the Cancer Cell Line Encyclopedia.

Nature 2019 05 8;569(7757):503-508. Epub 2019 May 8.

Broad Institute of Harvard and MIT, Cambridge, MA, USA.

Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous framework with which to study genetic variants, candidate targets, and small-molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes, including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from individuals of various lineages and ethnicities. Integration of these data with functional characterizations such as drug-sensitivity, short hairpin RNA knockdown and CRISPR-Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource for the acceleration of cancer research using model cancer cell lines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-019-1186-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6697103PMC
May 2019

Measuring Clathrin-Coated Vesicle Formation with Single-Molecule Resolution.

Methods Mol Biol 2018 ;1847:197-216

Division of Pharmaceutics and Pharmaceutical Chemistry, College of Pharmacy and the Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.

High-resolution fluorescence microscopy is increasingly contributing to our understanding of molecular processes. By utilizing single-molecule intensity information, imaging experiments can be rendered quantitative, yielding insights into the stoichiometry and kinetics of the components of a molecular assembly. Here, we describe the experimental and analytical steps needed to study the assembly of clathrin-coated vesicles with single-molecule resolution, using total internal reflection fluorescence microscopy. Many components of the protocol are broadly applicable to the characterization of other molecular processes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-8719-1_15DOI Listing
April 2019

Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk.

Nat Genet 2018 09 20;50(9):1327-1334. Epub 2018 Aug 20.

New York Genome Center, New York, NY, USA.

Coding variants represent many of the strongest associations between genotype and phenotype; however, they exhibit inter-individual differences in effect, termed 'variable penetrance'. Here, we study how cis-regulatory variation modifies the penetrance of coding variants. Using functional genomic and genetic data from the Genotype-Tissue Expression Project (GTEx), we observed that in the general population, purifying selection has depleted haplotype combinations predicted to increase pathogenic coding variant penetrance. Conversely, in cancer and autism patients, we observed an enrichment of penetrance increasing haplotype configurations for pathogenic variants in disease-implicated genes, providing evidence that regulatory haplotype configuration of coding variants affects disease risk. Finally, we experimentally validated this model by editing a Mendelian single-nucleotide polymorphism (SNP) using CRISPR/Cas9 on distinct expression haplotypes with the transcriptome as a phenotypic readout. Our results demonstrate that joint regulatory and coding variant effects are an important part of the genetic architecture of human traits and contribute to modified penetrance of disease-causing variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0192-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119105PMC
September 2018
-->