Publications by authors named "Shamil R Sunyaev"

78 Publications

Replicate sequencing libraries are important for quantification of allelic imbalance.

Nat Commun 2021 06 7;12(1):3370. Epub 2021 Jun 7.

Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA.

A sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23544-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8184992PMC
June 2021

Purifying selection on noncoding deletions of human regulatory loci detected using their cellular pleiotropy.

Genome Res 2021 Jun 7;31(6):935-946. Epub 2021 May 7.

Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA.

Genomic deletions provide a powerful loss-of-function model in noncoding regions to assess the role of purifying selection on genetic variation. Regulatory element function is characterized by nonuniform tissue and cell type activity, necessarily linking the study of fitness consequences from regulatory variants to their corresponding cellular activity. We generated a callset of deletions from genomes in the Alzheimer's Disease Neuroimaging Initiative (ADNI) and used deletions from The 1000 Genomes Project Consortium (1000GP) in order to examine whether purifying selection preserves noncoding sites of chromatin accessibility marked by DNase I hypersensitivity (DHS), histone modification (enhancer, transcribed, Polycomb-repressed, heterochromatin), and chromatin loop anchors. To examine this in a cellular activity-aware manner, we developed a statistical method, pleiotropy ratio score (PlyRS), which calculates a correlation-adjusted count of "cellular pleiotropy" for each noncoding base pair by analyzing shared regulatory annotations across tissues and cell types. By comparing real deletion PlyRS values to simulations in a length-matched framework and by using genomic covariates in analyses, we found that purifying selection acts to preserve both DHS and enhancer noncoding sites. However, we did not find evidence of purifying selection for noncoding transcribed, Polycomb-repressed, or heterochromatin sites beyond that of the noncoding background. Additionally, we found evidence that purifying selection is acting on chromatin loop integrity by preserving colocalized CTCF binding sites. At regions of DHS, enhancer, and CTCF within chromatin loop anchors, we found evidence that both sites of activity specific to a particular tissue or cell type and sites of cellularly pleiotropic activity are preserved by selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.275263.121DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8168579PMC
June 2021

Shared associations identify causal relationships between gene expression and immune cell phenotypes.

Commun Biol 2021 03 4;4(1):279. Epub 2021 Mar 4.

Department of Neurology, Yale School of Medicine, New Haven, CT, USA.

Genetic mapping studies have identified thousands of associations between common variants and hundreds of human traits. Translating these associations into mechanisms is complicated by two factors: they fall into gene regulatory regions; and they are rarely mapped to one causal variant. One way around these limitations is to find groups of traits that share associations, using this genetic link to infer a biological connection. Here, we assess how many trait associations in the same locus are due to the same genetic variant, and thus shared; and if these shared associations are due to causal relationships between traits. We find that only a subset of traits share associations, with many due to causal relationships rather than pleiotropy. We therefore suggest that simply observing overlapping associations at a genetic locus is insufficient to infer causality; direct evidence of shared associations is required to support mechanistic hypotheses in genetic studies of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-021-01823-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7933159PMC
March 2021

Population-specific causal disease effect sizes in functionally important regions impacted by selection.

Nat Commun 2021 02 17;12(1):1098. Epub 2021 Feb 17.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-21286-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7889654PMC
February 2021

Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases.

Genet Med 2021 06 12;23(6):1075-1085. Epub 2021 Feb 12.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Purpose: Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful.

Methods: We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols.

Results: We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases.

Conclusion: The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41436-020-01084-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8187147PMC
June 2021

Evidence for secondary-variant genetic burden and non-random distribution across biological modules in a recessive ciliopathy.

Nat Genet 2020 11 12;52(11):1145-1150. Epub 2020 Oct 12.

Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA.

The influence of genetic background on driver mutations is well established; however, the mechanisms by which the background interacts with Mendelian loci remain unclear. We performed a systematic secondary-variant burden analysis of two independent cohorts of patients with Bardet-Biedl syndrome (BBS) with known recessive biallelic pathogenic mutations in one of 17 BBS genes for each individual. We observed a significant enrichment of trans-acting rare nonsynonymous secondary variants in patients with BBS compared with either population controls or a cohort of individuals with a non-BBS diagnosis and recessive variants in the same gene set. Strikingly, we found a significant over-representation of secondary alleles in chaperonin-encoding genes-a finding corroborated by the observation of epistatic interactions involving this complex in vivo. These data indicate a complex genetic architecture for BBS that informs the biological properties of disease modules and presents a model for secondary-variant burden analysis in recessive disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-0707-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8272915PMC
November 2020

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale.

Nat Genet 2020 09 24;52(9):969-983. Epub 2020 Aug 24.

Department of Data Sciences, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-0676-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7483769PMC
September 2020

Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics.

Am J Hum Genet 2020 07 28;107(1):46-59. Epub 2020 May 28.

Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA. Electronic address:

In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.05.004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7332650PMC
July 2020

Identification of cancer driver genes based on nucleotide context.

Nat Genet 2020 02 3;52(2):208-218. Epub 2020 Feb 3.

Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Cancer genomes contain large numbers of somatic mutations but few of these mutations drive tumor development. Current approaches either identify driver genes on the basis of mutational recurrence or approximate the functional consequences of nonsynonymous mutations by using bioinformatic scores. Passenger mutations are enriched in characteristic nucleotide contexts, whereas driver mutations occur in functional positions, which are not necessarily surrounded by a particular nucleotide context. We observed that mutations in contexts that deviate from the characteristic contexts around passenger mutations provide a signal in favor of driver genes. We therefore developed a method that combines this feature with the signals traditionally used for driver-gene identification. We applied our method to whole-exome sequencing data from 11,873 tumor-normal pairs and identified 460 driver genes that clustered into 21 cancer-related pathways. Our study provides a resource of driver genes across 28 tumor types with additional driver genes identified according to mutations in unusual nucleotide contexts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-019-0572-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7031046PMC
February 2020

Mutations in RABL3 alter KRAS prenylation and are associated with hereditary pancreatic cancer.

Nat Genet 2019 09 12;51(9):1308-1314. Epub 2019 Aug 12.

Weill Cornell Medical College and New York Presbyterian Hospital, New York, NY, USA.

Pancreatic ductal adenocarcinoma is an aggressive cancer with limited treatment options. Approximately 10% of cases exhibit familial predisposition, but causative genes are not known in most families. We perform whole-genome sequence analysis in a family with multiple cases of pancreatic ductal adenocarcinoma and identify a germline truncating mutation in the member of the RAS oncogene family-like 3 (RABL3) gene. Heterozygous rabl3 mutant zebrafish show increased susceptibility to cancer formation. Transcriptomic and mass spectrometry approaches implicate RABL3 in RAS pathway regulation and identify an interaction with RAP1GDS1 (SmgGDS), a chaperone regulating prenylation of RAS GTPases. Indeed, the truncated mutant RABL3 protein accelerates KRAS prenylation and requires RAS proteins to promote cell proliferation. Finally, evidence in patient cohorts with developmental disorders implicates germline RABL3 mutations in RASopathy syndromes. Our studies identify RABL3 mutations as a target for genetic testing in cancer families and uncover a mechanism for dysregulated RAS activity in development and cancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-019-0475-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7159804PMC
September 2019

Applicability of the Mutation-Selection Balance Model to Population Genetics of Heterozygous Protein-Truncating Variants in Humans.

Mol Biol Evol 2019 08;36(8):1701-1710

Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.

The fate of alleles in the human population is believed to be highly affected by the stochastic force of genetic drift. Estimation of the strength of natural selection in humans generally necessitates a careful modeling of drift including complex effects of the population history and structure. Protein-truncating variants (PTVs) are expected to evolve under strong purifying selection and to have a relatively high per-gene mutation rate. Thus, it is appealing to model the population genetics of PTVs under a simple deterministic mutation-selection balance, as has been proposed earlier (Cassa et al. 2017). Here, we investigated the limits of this approximation using both computer simulations and data-driven approaches. Our simulations rely on a model of demographic history estimated from 33,370 individual exomes of the Non-Finnish European subset of the ExAC data set (Lek et al. 2016). Additionally, we compared the African and European subset of the ExAC study and analyzed de novo PTVs. We show that the mutation-selection balance model is applicable to the majority of human genes, but not to genes under the weakest selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msz092DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6738481PMC
August 2019

Associations of variants In the hexokinase 1 and interleukin 18 receptor regions with oxyhemoglobin saturation during sleep.

PLoS Genet 2019 04 16;15(4):e1007739. Epub 2019 Apr 16.

Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, United States of America.

Sleep disordered breathing (SDB)-related overnight hypoxemia is associated with cardiometabolic disease and other comorbidities. Understanding the genetic bases for variations in nocturnal hypoxemia may help understand mechanisms influencing oxygenation and SDB-related mortality. We conducted genome-wide association tests across 10 cohorts and 4 populations to identify genetic variants associated with three correlated measures of overnight oxyhemoglobin saturation: average and minimum oxyhemoglobin saturation during sleep and the percent of sleep with oxyhemoglobin saturation under 90%. The discovery sample consisted of 8,326 individuals. Variants with p < 1 × 10(-6) were analyzed in a replication group of 14,410 individuals. We identified 3 significantly associated regions, including 2 regions in multi-ethnic analyses (2q12, 10q22). SNPs in the 2q12 region associated with minimum SpO2 (rs78136548 p = 2.70 × 10(-10)). SNPs at 10q22 were associated with all three traits including average SpO2 (rs72805692 p = 4.58 × 10(-8)). SNPs in both regions were associated in over 20,000 individuals and are supported by prior associations or functional evidence. Four additional significant regions were detected in secondary sex-stratified and combined discovery and replication analyses, including a region overlapping Reelin, a known marker of respiratory complex neurons.These are the first genome-wide significant findings reported for oxyhemoglobin saturation during sleep, a phenotype of high clinical interest. Our replicated associations with HK1 and IL18R1 suggest that variants in inflammatory pathways, such as the biologically-plausible NLRP3 inflammasome, may contribute to nocturnal hypoxemia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1007739DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6467367PMC
April 2019

Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies.

Elife 2019 03 21;8. Epub 2019 Mar 21.

Department of Biomedical Informatics, Harvard Medical School, Boston, United States.

Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.

Editorial Note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.39702DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6428571PMC
March 2019

Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection.

Nat Commun 2019 02 15;10(1):790. Epub 2019 Feb 15.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA.

Understanding the role of rare variants is important in elucidating the genetic basis of human disease. Negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 - p)], where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α for 25 UK Biobank diseases and complex traits. All traits produce negative α estimates, with best-fit mean of -0.38 (s.e. 0.02) across traits. Despite larger rare variant effect sizes, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability for most traits analyzed. Using evolutionary modeling and forward simulations, we validate the α model of MAF-dependent trait effects and assess plausible values of relevant evolutionary parameters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-08424-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6377669PMC
February 2019

Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations.

Nat Genet 2019 01 3;51(1):36-41. Epub 2018 Dec 3.

Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Studies in experimental systems have identified a multitude of mutational mechanisms including DNA replication infidelity and DNA damage followed by inefficient repair or replicative bypass. However, the relative contributions of these mechanisms to human germline mutation remain unknown. Here, we show that error-prone damage bypass on the lagging strand plays a major role in human mutagenesis. Transcription-coupled DNA repair removes lesions on the transcribed strand; lesions on the non-transcribed strand are preferentially converted into mutations. In human polymorphism we detect a striking similarity between mutation types predominant on the non-transcribed strand and on the strand lagging during replication. Moreover, damage-induced mutations in cancers accumulate asymmetrically with respect to the direction of replication, suggesting that DNA lesions are resolved asymmetrically. We experimentally demonstrate that replication delay greatly attenuates the mutagenic effect of ultraviolet irradiation, confirming that replication converts DNA damage into mutations. We estimate that at least 10% of human mutations arise due to DNA damage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0285-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317876PMC
January 2019

Admixture mapping identifies novel loci for obstructive sleep apnea in Hispanic/Latino Americans.

Hum Mol Genet 2019 02;28(4):675-687

Physiology and Biophysics, University of Mississippi, Jackson, MS, USA.

Obstructive sleep apnea (OSA) is a common disorder associated with increased risk of cardiovascular disease and mortality. Its prevalence and severity vary across ancestral background. Although OSA traits are heritable, few genetic associations have been identified. To identify genetic regions associated with OSA and improve statistical power, we applied admixture mapping on three primary OSA traits [the apnea hypopnea index (AHI), overnight average oxyhemoglobin saturation (SaO2) and percentage time SaO2 < 90%] and a secondary trait (respiratory event duration) in a Hispanic/Latino American population study of 11 575 individuals with significant variation in ancestral background. Linear mixed models were performed using previously inferred African, European and Amerindian local genetic ancestry markers. Global African ancestry was associated with a lower AHI, higher SaO2 and shorter event duration. Admixture mapping analysis of the primary OSA traits identified local African ancestry at the chromosomal region 2q37 as genome-wide significantly associated with AHI (P < 5.7 × 10-5), and European and Amerindian ancestries at 18q21 suggestively associated with both AHI and percentage time SaO2 < 90% (P < 10-3). Follow-up joint ancestry-SNP association analyses identified novel variants in ferrochelatase (FECH), significantly associated with AHI and percentage time SaO2 < 90% after adjusting for multiple tests (P < 8 × 10-6). These signals contributed to the admixture mapping associations and were replicated in independent cohorts. In this first admixture mapping study of OSA, novel associations with variants in the iron/heme metabolism pathway suggest a role for iron in influencing respiratory traits underlying OSA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddy387DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6360325PMC
February 2019

PINES: phenotype-informed tissue weighting improves prediction of pathogenic noncoding variants.

Genome Biol 2018 10 25;19(1):173. Epub 2018 Oct 25.

Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

Functional characterization of the noncoding genome is essential for biological understanding of gene regulation and disease. Here, we introduce the computational framework PINES (Phenotype-Informed Noncoding Element Scoring), which predicts the functional impact of noncoding variants by integrating epigenetic annotations in a phenotype-dependent manner. PINES enables analyses to be customized towards genomic annotations from cell types of the highest relevance given the phenotype of interest. We illustrate that PINES identifies functional noncoding variation more accurately than methods that do not use phenotype-weighted knowledge, while at the same time being flexible and easy to use via a dedicated web portal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-018-1546-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6203199PMC
October 2018

An integrated clinical program and crowdsourcing strategy for genomic sequencing and Mendelian disease gene discovery.

NPJ Genom Med 2018 13;3:21. Epub 2018 Aug 13.

13Endocrine Unit, Massachusetts General Hospital and Harvard Medical School, Harvard Medical School, Boston, MA 02114 USA.

Despite major progress in defining the genetic basis of Mendelian disorders, the molecular etiology of many cases remains unknown. Patients with these undiagnosed disorders often have complex presentations and require treatment by multiple health care specialists. Here, we describe an integrated clinical diagnostic and research program using whole-exome and whole-genome sequencing (WES/WGS) for Mendelian disease gene discovery. This program employs specific case ascertainment parameters, a WES/WGS computational analysis pipeline that is optimized for Mendelian disease gene discovery with variant callers tuned to specific inheritance modes, an interdisciplinary crowdsourcing strategy for genomic sequence analysis, matchmaking for additional cases, and integration of the findings regarding gene causality with the clinical management plan. The interdisciplinary gene discovery team includes clinical, computational, and experimental biomedical specialists who interact to identify the genetic etiology of the disease, and when so warranted, to devise improved or novel treatments for affected patients. This program effectively integrates the clinical and research missions of an academic medical center and affords both diagnostic and therapeutic options for patients suffering from genetic disease. It may therefore be germane to other academic medical institutions engaged in implementing genomic medicine programs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41525-018-0060-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6089983PMC
August 2018

Multiethnic Meta-Analysis Identifies RAI1 as a Possible Obstructive Sleep Apnea-related Quantitative Trait Locus in Men.

Am J Respir Cell Mol Biol 2018 03;58(3):391-401

30 School of Public Health, University of Adelaide, Adelaide, South Australia, Australia.

Obstructive sleep apnea (OSA) is a common heritable disorder displaying marked sexual dimorphism in disease prevalence and progression. Previous genetic association studies have identified a few genetic loci associated with OSA and related quantitative traits, but they have only focused on single ethnic groups, and a large proportion of the heritability remains unexplained. The apnea-hypopnea index (AHI) is a commonly used quantitative measure characterizing OSA severity. Because OSA differs by sex, and the pathophysiology of obstructive events differ in rapid eye movement (REM) and non-REM (NREM) sleep, we hypothesized that additional genetic association signals would be identified by analyzing the NREM/REM-specific AHI and by conducting sex-specific analyses in multiethnic samples. We performed genome-wide association tests for up to 19,733 participants of African, Asian, European, and Hispanic/Latino American ancestry in 7 studies. We identified rs12936587 on chromosome 17 as a possible quantitative trait locus for NREM AHI in men (N = 6,737; P = 1.7 × 10) but not in women (P = 0.77). The association with NREM AHI was replicated in a physiological research study (N = 67; P = 0.047). This locus overlapping the RAI1 gene and encompassing genes PEMT1, SREBF1, and RASD1 was previously reported to be associated with coronary artery disease, lipid metabolism, and implicated in Potocki-Lupski syndrome and Smith-Magenis syndrome, which are characterized by abnormal sleep phenotypes. We also identified gene-by-sex interactions in suggestive association regions, suggesting that genetic variants for AHI appear to vary by sex, consistent with the clinical observations of strong sexual dimorphism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1165/rcmb.2017-0237OCDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5854957PMC
March 2018

Identifying DNase I hypersensitive sites as driver distal regulatory elements in breast cancer.

Nat Commun 2017 09 5;8(1):436. Epub 2017 Sep 5.

Moores Cancer Center, University of California, La Jolla, San Diego, CA, 92093, USA.

Efforts to identify driver mutations in cancer have largely focused on genes, whereas non-coding sequences remain relatively unexplored. Here we develop a statistical method based on characteristics known to influence local mutation rate and a series of enrichment filters in order to identify distal regulatory elements harboring putative driver mutations in breast cancer. We identify ten DNase I hypersensitive sites that are significantly mutated in breast cancers and associated with the aberrant expression of neighboring genes. A pan-cancer analysis shows that three of these elements are significantly mutated across multiple cancer types and have mutation densities similar to protein-coding driver genes. Functional characterization of the most highly mutated DNase I hypersensitive sites in breast cancer (using in silico and experimental approaches) confirms that they are regulatory elements and affect the expression of cancer genes. Our study suggests that mutations of regulatory elements in tumors likely play an important role in cancer development.Cancer driver mutations can occur within noncoding genomic sequences. Here, the authors develop a statistical approach to identify candidate noncoding driver mutations in DNase I hypersensitive sites in breast cancer and experimentally demonstrate they are regulatory elements of known cancer genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-017-00100-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5585396PMC
September 2017

Negative selection in humans and fruit flies involves synergistic epistasis.

Science 2017 05;356(6337):539-542

Negative selection against deleterious alleles produced by mutation influences within-population variation as the most pervasive form of natural selection. However, it is not known whether deleterious alleles affect fitness independently, so that cumulative fitness loss depends exponentially on the number of deleterious alleles, or synergistically, so that each additional deleterious allele results in a larger decrease in relative fitness. Negative selection with synergistic epistasis should produce negative linkage disequilibrium between deleterious alleles and, therefore, an underdispersed distribution of the number of deleterious alleles in the genome. Indeed, we detected underdispersion of the number of rare loss-of-function alleles in eight independent data sets from human and fly populations. Thus, selection against rare protein-disrupting alleles is characterized by synergistic epistasis, which may explain how human and fly populations persist despite high genomic mutation rates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aah5238DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200135PMC
May 2017

Estimating the selective effects of heterozygous protein-truncating variants from human exome data.

Nat Genet 2017 May 3;49(5):806-810. Epub 2017 Apr 3.

Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

The evolutionary cost of gene loss is a central question in genetics and has been investigated in model organisms and human cell lines. In humans, tolerance of the loss of one or both functional copies of a gene is related to the gene's causal role in disease. However, estimates of the selection and dominance coefficients in humans have been elusive. Here we analyze exome sequence data from 60,706 individuals to make genome-wide estimates of selection against heterozygous loss of gene function. Using this distribution of selection coefficients for heterozygous protein-truncating variants (PTVs), we provide corresponding Bayesian estimates for individual genes. We find that genes under the strongest selection are enriched in embryonic lethal mouse knockouts, Mendelian disease-associated genes, and regulators of transcription. Screening by essentiality, we find a large set of genes under strong selection that are likely to have crucial functions but have not yet been thoroughly characterized.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3831DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618255PMC
May 2017

Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types.

Nat Genet 2017 Apr 20;49(4):600-605. Epub 2017 Feb 20.

Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

Most autoimmune-disease-risk effects identified by genome-wide association studies (GWAS) localize to open chromatin with gene-regulatory activity. GWAS loci are also enriched in expression quantitative trait loci (eQTLs), thus suggesting that most risk variants alter gene expression. However, because causal variants are difficult to identify, and cis-eQTLs occur frequently, it remains challenging to identify specific instances of disease-relevant changes to gene regulation. Here, we used a novel joint likelihood framework with higher resolution than that of previous methods to identify loci where autoimmune-disease risk and an eQTL are driven by a single shared genetic effect. Using eQTLs from three major immune subpopulations, we found shared effects in only ∼25% of the loci examined. Thus, we show that a fraction of gene-regulatory changes suggest strong mechanistic hypotheses for disease risk, but we conclude that most risk mechanisms are not likely to involve changes in basal gene expression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3795DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374036PMC
April 2017

Variants in angiopoietin-2 (ANGPT2) contribute to variation in nocturnal oxyhaemoglobin saturation level.

Hum Mol Genet 2016 12;25(23):5244-5253

Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.

Genetic determinants of sleep-disordered breathing (SDB), a common set of disorders that contribute to significant cardiovascular and neuropsychiatric morbidity, are not clear. Overnight nocturnal oxygen saturation (SaO2) is a clinically relevant and easily measured indicator of SDB severity but its genetic contribution has never been studied. Our recent study suggests nocturnal SaO2 is heritable. We performed linkage analysis, association analysis and haplotype analysis of average nocturnal oxyhaemoglobin saturation in participants in the Cleveland Family Study (CFS), followed by gene-based association and additional tests in four independent samples. Linkage analysis identified a peak (LOD = 4.29) on chromosome 8p23. Follow-up association analysis identified two haplotypes in angiopoietin-2 (ANGPT2) that significantly contributed to the variation of SaO2 (P = 8 × 10-5) and accounted for a portion of the linkage evidence. Gene-based association analysis replicated the association of ANGPT2 and nocturnal SaO2. A rare missense SNP rs200291021 in ANGPT2 was associated with serum angiopoietin-2 level (P = 1.29 × 10-4), which was associated with SaO2 (P = 0.002). Our study provides the first evidence for the association of ANGPT2, a gene previously implicated in acute lung injury syndromes, with nocturnal SaO2, suggesting that this gene has a broad range of effects on gas exchange, including influencing oxygenation during sleep.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddw324DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6078634PMC
December 2016

Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection.

Mol Biol Evol 2016 10 28;33(10):2555-64. Epub 2016 Jun 28.

Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School Program in Medical and Population Genetics, The Broad Institute, Cambridge, MA

Deleterious mutations are expected to evolve under negative selection and are usually purged from the population. However, deleterious alleles segregate in the human population and some disease-associated variants are maintained at considerable frequencies. Here, we test the hypothesis that balancing selection may counteract purifying selection in neighboring regions and thus maintain deleterious variants at higher frequency than expected from their detrimental fitness effect. We first show in realistic simulations that balancing selection reduces the density of polymorphic sites surrounding a locus under balancing selection, but at the same time markedly increases the population frequency of the remaining variants, including even substantially deleterious alleles. To test the predictions of our simulations empirically, we then use whole-exome sequencing data from 6,500 human individuals and focus on the most established example for balancing selection in the human genome, the major histocompatibility complex (MHC). Our analysis shows an elevated frequency of putatively deleterious coding variants in nonhuman leukocyte antigen (non-HLA) genes localized in the MHC region. The mean frequency of these variants declined with physical distance from the classical HLA genes, indicating dependency on genetic linkage. These results reveal an indirect cost of the genetic diversity maintained by balancing selection, which has hitherto been perceived as mostly advantageous, and have implications both for the evolution of recombination and also for the epidemiology of various MHC-associated diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msw127DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5026253PMC
October 2016

Genetic Associations with Obstructive Sleep Apnea Traits in Hispanic/Latino Americans.

Am J Respir Crit Care Med 2016 Oct;194(7):886-897

10 Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas.

Rationale: Obstructive sleep apnea is a common disorder associated with increased risk for cardiovascular disease, diabetes, and premature mortality. Although there is strong clinical and epidemiologic evidence supporting the importance of genetic factors in influencing obstructive sleep apnea, its genetic basis is still largely unknown. Prior genetic studies focused on traits defined using the apnea-hypopnea index, which contains limited information on potentially important genetically determined physiologic factors, such as propensity for hypoxemia and respiratory arousability.

Objectives: To define novel obstructive sleep apnea genetic risk loci for obstructive sleep apnea, we conducted genome-wide association studies of quantitative traits in Hispanic/Latino Americans from three cohorts.

Methods: Genome-wide data from as many as 12,558 participants in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Starr County Health Studies population-based cohorts were metaanalyzed for association with the apnea-hypopnea index, average oxygen saturation during sleep, and average respiratory event duration.

Measurements And Main Results: Two novel loci were identified at genome-level significance (rs11691765, GPR83, P = 1.90 × 10 for the apnea-hypopnea index, and rs35424364; C6ORF183/CCDC162P, P = 4.88 × 10 for respiratory event duration) and seven additional loci were identified with suggestive significance (P < 5 × 10). Secondary sex-stratified analyses also identified one significant and several suggestive associations. Multiple loci overlapped genes with biologic plausibility.

Conclusions: These are the first genome-level significant findings reported for obstructive sleep apnea-related physiologic traits in any population. These findings identify novel associations in inflammatory, hypoxia signaling, and sleep pathways.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1164/rccm.201512-2431OCDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5074655PMC
October 2016

Genes with monoallelic expression contribute disproportionately to genetic diversity in humans.

Nat Genet 2016 Mar 25;48(3):231-237. Epub 2016 Jan 25.

Dana-Farber Cancer Institute, Boston, USA.

An unexpectedly large number of human autosomal genes are subject to monoallelic expression (MAE). Our analysis of 4,227 such genes uncovers surprisingly high genetic variation across human populations. This increased diversity is unlikely to reflect relaxed purifying selection. Remarkably, MAE genes exhibit an elevated recombination rate and an increased density of hypermutable sequence contexts. However, these factors do not fully account for the increased diversity. We find that the elevated nucleotide diversity of MAE genes is also associated with greater allelic age: variants in these genes tend to be older and are enriched in polymorphisms shared by Neanderthals and chimpanzees. Both synonymous and nonsynonymous alleles of MAE genes have elevated average population frequencies. We also observed strong enrichment of the MAE signature among genes reported to evolve under balancing selection. We propose that an important biological function of widespread MAE might be the generation of cell-to-cell heterogeneity; the increased genetic variation contributes to this heterogeneity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3493DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4942303PMC
March 2016

Leveraging Distant Relatedness to Quantify Human Mutation and Gene-Conversion Rates.

Am J Hum Genet 2015 Dec 12;97(6):775-89. Epub 2015 Nov 12.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.

The rate at which human genomes mutate is a central biological parameter that has many implications for our ability to understand demographic and evolutionary phenomena. We present a method for inferring mutation and gene-conversion rates by using the number of sequence differences observed in identical-by-descent (IBD) segments together with a reconstructed model of recent population-size history. This approach is robust to, and can quantify, the presence of substantial genotyping error, as validated in coalescent simulations. We applied the method to 498 trio-phased sequenced Dutch individuals and inferred a point mutation rate of 1.66 × 10(-8) per base per generation and a rate of 1.26 × 10(-9) for <20 bp indels. By quantifying how estimates varied as a function of allele frequency, we inferred the probability that a site is involved in non-crossover gene conversion as 5.99 × 10(-6). We found that recombination does not have observable mutagenic effects after gene conversion is accounted for and that local gene-conversion rates reflect recombination rates. We detected a strong enrichment of recent deleterious variation among mismatching variants found within IBD regions and observed summary statistics of local sharing of IBD segments to closely match previously proposed metrics of background selection; however, we found no significant effects of selection on our mutation-rate estimates. We detected no evidence of strong variation of mutation rates in a number of genomic annotations obtained from several recent studies. Our analysis suggests that a mutation-rate estimate higher than that reported by recent pedigree-based studies should be adopted in the context of DNA-based demographic reconstruction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.10.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678427PMC
December 2015
-->