Publications by authors named "Thomas W Blackwell"

23 Publications

  • Page 1 of 1

Whole genome sequence analysis of platelet traits in the NHLBI trans-omics for precision medicine initiative.

Hum Mol Genet 2021 Sep 6. Epub 2021 Sep 6.

Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.

Platelets play a key role in thrombosis and hemostasis. Platelet count (PLT) and mean platelet volume (MPV) are highly heritable quantitative traits, with hundreds of genetic signals previously identified, mostly in European ancestry populations. We here utilize whole genome sequencing from NHLBI's Trans-Omics for Precision Medicine Initiative (TOPMed) in a large multi-ethnic sample to further explore common and rare variation contributing to PLT (n = 61 200) and MPV (n = 23 485). We identified and replicated secondary signals at MPL (rs532784633) and PECAM1 (rs73345162), both more common in African ancestry populations. We also observed rare variation in Mendelian platelet related disorder genes influencing variation in platelet traits in TOPMed cohorts (not enriched for blood disorders). For example, association of GP9 with lower PLT and higher MPV was partly driven by a pathogenic Bernard-Soulier syndrome variant (rs5030764, p.Asn61Ser), and the signals at TUBB1 and CD36 were partly driven by loss of function variants not annotated as pathogenic in ClinVar (rs199948010 and rs571975065). However, residual signal remained for these gene-based signals after adjusting for lead variants, suggesting that additional variants in Mendelian genes with impacts in general population cohorts remain to be identified. Gene-based signals were also identified at several GWAS identified loci for genes not annotated for Mendelian platelet disorders (PTPRH, TET2, CHEK2), with somatic variation driving the result at TET2. These results highlight the value of whole genome sequencing in populations of diverse genetic ancestry to identify novel regulatory and coding signals, even for well-studied traits like platelet traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddab252DOI Listing
September 2021

Presence and transmission of mitochondrial heteroplasmic mutations in human populations of European and African ancestry.

Mitochondrion 2021 Sep 21;60:33-42. Epub 2021 Jul 21.

Framingham Heart Study, Framingham, MA 01702, USA; Population Sciences Branch, NHLBI/NIH, Bethesda, MD 20892, USA.

We investigated the concordance of mitochondrial DNA heteroplasmic mutations (heteroplasmies) in 6745 maternal pairs of European (EA, n = 4718 pairs) and African (AA, n = 2027 pairs) Americans in whole blood. Mother-offspring pairs displayed the highest concordance rate, followed by sibling-sibling and more distantly-related maternal pairs. The allele fractions of concordant heteroplasmies exhibited high correlation (R = 0.8) between paired individuals. Discordant heteroplasmies were more likely to be in coding regions, be nonsynonymous or nonsynonymous-deleterious (p < 0.001). The number of deleterious heteroplasmies was significantly correlated with advancing age (20-44, 45-64, and ≥65 years, p-trend = 0.01). One standard deviation increase in heteroplasmic burden (i.e., the number of heteroplasmies carried by an individual) was associated with 0.17 to 0.26 (p < 1e - 23) standard deviation decrease in mtDNA copy number, independent of age. White blood cell count and differential count jointly explained 0.5% to 1.3% (p ≤ 0.001) variance in heteroplasmic burden. A genome-wide association and meta-analysis identified a region at 11p11.12 (top signal rs779031139, p = 2.0e - 18, minor allele frequency = 0.38) associated with the heteroplasmic burden. However, the 11p11.12 region is adjacent to a nuclear mitochondrial DNA (NUMT) corresponding to a 542 bp area of the D-loop. This region was no longer significant after excluding heteroplasmies within the 542 bp from the heteroplasmic burden. The discovery that blood mtDNA heteroplasmies were both inherited and somatic origins and that an increase in heteroplasmic burden was strongly associated with a decrease in average number of mtDNA copy number in blood are important findings to be considered in association studies of mtDNA with disease traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.mito.2021.07.004DOI Listing
September 2021

Whole-genome sequencing association analysis of quantitative red blood cell phenotypes: The NHLBI TOPMed program.

Am J Hum Genet 2021 05 21;108(5):874-893. Epub 2021 Apr 21.

Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA.

Whole-genome sequencing (WGS), a powerful tool for detecting novel coding and non-coding disease-causing variants, has largely been applied to clinical diagnosis of inherited disorders. Here we leveraged WGS data in up to 62,653 ethnically diverse participants from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and assessed statistical association of variants with seven red blood cell (RBC) quantitative traits. We discovered 14 single variant-RBC trait associations at 12 genomic loci, which have not been reported previously. Several of the RBC trait-variant associations (RPN1, ELL2, MIDN, HBB, HBA1, PIEZO1, and G6PD) were replicated in independent GWAS datasets imputed to the TOPMed reference panel. Most of these discovered variants are rare/low frequency, and several are observed disproportionately among non-European Ancestry (African, Hispanic/Latino, or East Asian) populations. We identified a 3 bp indel p.Lys2169del (g.88717175_88717177TCT[4]) (common only in the Ashkenazi Jewish population) of PIEZO1, a gene responsible for the Mendelian red cell disorder hereditary xerocytosis (MIM: 194380), associated with higher mean corpuscular hemoglobin concentration (MCHC). In stepwise conditional analysis and in gene-based rare variant aggregated association analysis, we identified several of the variants in HBB, HBA1, TMPRSS6, and G6PD that represent the carrier state for known coding, promoter, or splice site loss-of-function variants that cause inherited RBC disorders. Finally, we applied base and nuclease editing to demonstrate that the sentinel variant rs112097551 (nearest gene RPN1) acts through a cis-regulatory element that exerts long-range control of the gene RUVBL1 which is essential for hematopoiesis. Together, these results demonstrate the utility of WGS in ethnically diverse population-based samples and gene editing for expanding knowledge of the genetic architecture of quantitative hematologic traits and suggest a continuum between complex trait and Mendelian red cell disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2021.04.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8206199PMC
May 2021

Robust, flexible, and scalable tests for Hardy-Weinberg equilibrium across diverse ancestries.

Genetics 2021 May;218(1)

Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.

Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/genetics/iyab044DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128395PMC
May 2021

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Asthma and its relationship to mitochondrial copy number: Results from the Asthma Translational Genomics Collaborative (ATGC) of the Trans-Omics for Precision Medicine (TOPMed) program.

PLoS One 2020 25;15(11):e0242364. Epub 2020 Nov 25.

Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America.

Background: Mitochondria support critical cellular functions, such as energy production through oxidative phosphorylation, regulation of reactive oxygen species, apoptosis, and calcium homeostasis.

Objective: Given the heightened level of cellular activity in patients with asthma, we sought to determine whether mitochondrial DNA (mtDNA) copy number measured in peripheral blood differed between individuals with and without asthma.

Methods: Whole genome sequence data was generated as part of the Trans-Omics for Precision Medicine (TOPMed) Program on participants from the Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-ethnicity (SAPPHIRE) and the Study of African Americans, Asthma, Genes, & Environment II (SAGE II). We restricted our analysis to individuals who self-identified as African American (3,651 asthma cases and 1,344 controls). Mitochondrial copy number was estimated using the sequencing read depth ratio for the mitochondrial and nuclear genomes. Respiratory complex expression was assessed using RNA-sequencing.

Results: Average mitochondrial copy number was significantly higher among individuals with asthma when compared with controls (SAPPHIRE: 218.60 vs. 200.47, P<0.001; SAGE II: 235.99 vs. 223.07, P<0.001). Asthma status was significantly associated with mitochondrial copy number after accounting for potential explanatory variables, such as participant age, sex, leukocyte counts, and mitochondrial haplogroup. Despite the consistent relationship between asthma status and mitochondrial copy number, the latter was not associated with time-to-exacerbation or patient-reported asthma control. Mitochondrial respiratory complex gene expression was disproportionately lower in individuals with asthma when compared with individuals without asthma and other protein-encoding genes.

Conclusions: We observed a robust association between asthma and higher mitochondrial copy number. Asthma having an effect on mitochondria function was also supported by lower respiratory complex gene expression in this group.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242364PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7688161PMC
January 2021

Inherited causes of clonal haematopoiesis in 97,691 whole genomes.

Nature 2020 10 14;586(7831):763-768. Epub 2020 Oct 14.

Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA.

Age is the dominant risk factor for most chronic human diseases, but the mechanisms through which ageing confers this risk are largely unknown. The age-related acquisition of somatic mutations that lead to clonal expansion in regenerating haematopoietic stem cell populations has recently been associated with both haematological cancer and coronary heart disease-this phenomenon is termed clonal haematopoiesis of indeterminate potential (CHIP). Simultaneous analyses of germline and somatic whole-genome sequences provide the opportunity to identify root causes of CHIP. Here we analyse high-coverage whole-genome sequences from 97,691 participants of diverse ancestries in the National Heart, Lung, and Blood Institute Trans-omics for Precision Medicine (TOPMed) programme, and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid and inflammatory traits that are specific to different CHIP driver genes. Association of a genome-wide set of germline genetic variants enabled the identification of three genetic loci associated with CHIP status, including one locus at TET2 that was specific to individuals of African ancestry. In silico-informed in vitro evaluation of the TET2 germline locus enabled the identification of a causal variant that disrupts a TET2 distal enhancer, resulting in increased self-renewal of haematopoietic stem cells. Overall, we observe that germline genetic variation shapes haematopoietic stem cell function, leading to CHIP through mechanisms that are specific to clonal haematopoiesis as well as shared mechanisms that lead to somatic mutations across tissues.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2819-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7944936PMC
October 2020

Mapping the 17q12-21.1 Locus for Variants Associated with Early-Onset Asthma in African Americans.

Am J Respir Crit Care Med 2021 02;203(4):424-436

Department of Internal Medicine, Center for Individualized and Genomic Medicine Research and.

The 17q12-21.1 locus is one of the most highly replicated genetic associations with asthma. Individuals of African descent have lower linkage disequilibrium in this region, which could facilitate identifying causal variants. To identify functional variants at 17q12-21.1 associated with early-onset asthma among African American individuals. We evaluated African American participants from SAPPHIRE (Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity) ( = 1,940), SAGE II (Study of African Americans, Asthma, Genes and Environment) ( = 885), and GCPD-A (Study of the Genetic Causes of Complex Pediatric Disorders-Asthma) ( = 2,805). Associations with asthma onset at ages under 5 years were meta-analyzed across cohorts. The lead signal was reevaluated considering haplotypes informed by genetic ancestry (i.e., African vs. European). Both an expression-quantitative trait locus analysis and a phenome-wide association study were performed on the lead variant. The meta-analyzed results from SAPPHIRE, SAGE II, and the GCPD-A identified rs11078928 as the top association for early-onset asthma. A haplotype analysis suggested that the asthma association partitioned most closely with the rs11078928 genotype. Genetic ancestry did not appear to influence the effect of this variant. In the expression-quantitative trait locus analysis, rs11078928 was related to alternative splicing of (gasdermin-B) transcripts. The phenome-wide association study of rs11078928 suggested that this variant was predominantly associated with asthma and asthma-associated symptoms. A splice-acceptor polymorphism appears to be a causal variant for asthma at the 17q12-21.1 locus. This variant appears to have the same magnitude of effect in individuals of African and European descent.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1164/rccm.202006-2623OCDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885840PMC
February 2021

Identification of CFTR variants in Latino patients with cystic fibrosis from the Dominican Republic and Puerto Rico.

Pediatr Pulmonol 2020 02 30;55(2):533-540. Epub 2019 Oct 30.

Department of Pediatrics, Centro de Neumología Pediátrica, San Juan, Puerto Rico.

Background: In cystic fibrosis (CF), the spectrum and frequency of CFTR variants differ by geography and race/ethnicity. CFTR variants in White patients are well-described compared with Latino patients. No studies of CFTR variants have been done in patients with CF in the Dominican Republic or Puerto Rico.

Methods: CFTR was sequenced in 61 Dominican Republican patients and 21 Puerto Rican patients with CF and greater than ​​​​60 mmol/L sweat chloride. The spectrum of CFTR variants was identified and the proportion of patients with 0, 1, or 2 CFTR variants identified was determined. The functional effects of identified CFTR variants were investigated using clinical annotation databases and computational prediction tools.

Results: Our study found 10% of Dominican patients had two CFTR variants identified compared with 81% of Puerto Rican patients. No CFTR variants were identified in 69% of Dominican patients and 10% of Puerto Rican patients. In Dominican patients, there were 19 identified CFTR variants, accounting for 25 out of 122 disease alleles (20%). In Puerto Rican patients, there were 16 identified CFTR variants, accounting for 36 out of 42 disease alleles (86%) in Puerto Rican patients. Thirty CFTR variants were identified overall. The most frequent variants for Dominican patients were p.Phe508del and p.Ala559Thr and for Puerto Rican patients were p.Phe508del, p.Arg1066Cys, p.Arg334Trp, and p.I507del.

Conclusions: In this first description of the CFTR variants in patients with CF from the Dominican Republic and Puerto Rico, there was a low detection rate of two CFTR variants after full sequencing with the majority of patients from the Dominican Republic without identified variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/ppul.24549DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571374PMC
February 2020

Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls.

Nature 2019 06 22;570(7759):71-76. Epub 2019 May 22.

Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, South Korea.

Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10) and candidate genes from knockout mice (P = 5.2 × 10). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000-185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-019-1231-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6699738PMC
June 2019

Association Between Titin Loss-of-Function Variants and Early-Onset Atrial Fibrillation.

JAMA 2018 12;320(22):2354-2364

Department of Molecular and Functional Genomics, Geisinger, Danville, Pennsylvania.

Importance: Atrial fibrillation (AF) is the most common arrhythmia affecting 1% of the population. Young individuals with AF have a strong genetic association with the disease, but the mechanisms remain incompletely understood.

Objective: To perform large-scale whole-genome sequencing to identify genetic variants related to AF.

Design, Setting, And Participants: The National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine Program includes longitudinal and cohort studies that underwent high-depth whole-genome sequencing between 2014 and 2017 in 18 526 individuals from the United States, Mexico, Puerto Rico, Costa Rica, Barbados, and Samoa. This case-control study included 2781 patients with early-onset AF from 9 studies and identified 4959 controls of European ancestry from the remaining participants. Results were replicated in the UK Biobank (346 546 participants) and the MyCode Study (42 782 participants).

Exposures: Loss-of-function (LOF) variants in genes at AF loci and common genetic variation across the whole genome.

Main Outcomes And Measures: Early-onset AF (defined as AF onset in persons <66 years of age). Due to multiple testing, the significance threshold for the rare variant analysis was P = 4.55 × 10-3.

Results: Among 2781 participants with early-onset AF (the case group), 72.1% were men, and the mean (SD) age of AF onset was 48.7 (10.2) years. Participants underwent whole-genome sequencing at a mean depth of 37.8 fold and mean genome coverage of 99.1%. At least 1 LOF variant in TTN, the gene encoding the sarcomeric protein titin, was present in 2.1% of case participants compared with 1.1% in control participants (odds ratio [OR], 1.76 [95% CI, 1.04-2.97]). The proportion of individuals with early-onset AF who carried a LOF variant in TTN increased with an earlier age of AF onset (P value for trend, 4.92 × 10-4), and 6.5% of individuals with AF onset prior to age 30 carried a TTN LOF variant (OR, 5.94 [95% CI, 2.64-13.35]; P = 1.65 × 10-5). The association between TTN LOF variants and AF was replicated in an independent study of 1582 patients with early-onset AF (cases) and 41 200 control participants (OR, 2.16 [95% CI, 1.19-3.92]; P = .01).

Conclusions And Relevance: In a case-control study, there was a statistically significant association between an LOF variant in the TTN gene and early-onset AF, with the variant present in a small percentage of participants with early-onset AF (the case group). Further research is necessary to understand whether this is a causal relationship.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1001/jama.2018.18179DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6436530PMC
December 2018

Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees.

Proc Natl Acad Sci U S A 2018 01 26;115(2):379-384. Epub 2017 Dec 26.

Department of Statistics, Seoul National University, Seoul 08826, Republic of Korea.

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant -expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1705859115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5777025PMC
January 2018

Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

Sci Data 2017 12 19;4:170179. Epub 2017 Dec 19.

Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK.

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.179DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735917PMC
December 2017

A Low-Frequency Inactivating Variant Enriched in the Finnish Population Is Associated With Fasting Insulin Levels and Type 2 Diabetes Risk.

Diabetes 2017 07 24;66(7):2019-2032. Epub 2017 Mar 24.

Diabetes and Endocrinology Unit, Department of Clinical Sciences Malmö, Lund University Diabetes Centre, Malmö, Sweden.

To identify novel coding association signals and facilitate characterization of mechanisms influencing glycemic traits and type 2 diabetes risk, we analyzed 109,215 variants derived from exome array genotyping together with an additional 390,225 variants from exome sequence in up to 39,339 normoglycemic individuals from five ancestry groups. We identified a novel association between the coding variant (p.Pro50Thr) in and fasting plasma insulin (FI), a gene in which rare fully penetrant mutations are causal for monogenic glycemic disorders. The low-frequency allele is associated with a 12% increase in FI levels. This variant is present at 1.1% frequency in Finns but virtually absent in individuals from other ancestries. Carriers of the FI-increasing allele had increased 2-h insulin values, decreased insulin sensitivity, and increased risk of type 2 diabetes (odds ratio 1.05). In cellular studies, the AKT2-Thr50 protein exhibited a partial loss of function. We extend the allelic spectrum for coding variants in associated with disorders of glucose homeostasis and demonstrate bidirectional effects of variants within the pleckstrin homology domain of .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2337/db16-1329DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5482074PMC
July 2017

The genetic architecture of type 2 diabetes.

Nature 2016 08 11;536(7614):41-47. Epub 2016 Jul 11.

Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK.

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature18642DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5034897PMC
August 2016

Evolutionary-conserved gene expression response profiles across mammalian tissues.

OMICS 2007 ;11(1):96-115

Bioinformatics Program, University of Michigan, Ann Arbor, Michigan 48109, USA.

Gene expression responses are complex and frequently involve the actions of many genes to effect coordinated patterns. We hypothesized these coordinated responses are evolutionarily conserved and used a comparison of human and mouse gene expression profiles to identify the most prominent conserved features across a set of normal mammalian tissues. Based on data from multiple studies across multiple tissues in human and mouse, 13 gene expression modes across multiple tissues were identified in each of these species using principal component analysis. Strikingly, 1-to-1 pairing of human and mouse modes was observed in 12 out of 13 modes obtained from the two species independently. These paired modes define evolutionarily conserved gene expression response modes (CGEMs). Notably, in this study we were able to extract biological responses that are not overwhelmed by laboratory-to-laboratory or species-to-species variation. Of the variation in our gene expression dataset, 84% can be explained using these CGEMs. Functional annotation was performed using Gene Ontology, pathway, and transcription factor binding site over representation. Our conclusion is that we found an unbiased way of obtaining conserved gene response modes that accounts for a considerable portion of gene expression variation in a given dataset, as well as validates the conservation of major gene expression response modes across the mammals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/omi.2006.0007DOI Listing
May 2007

Integration of genome and chromatin structure with gene expression profiles to predict c-MYC recognition site binding and function.

PLoS Comput Biol 2007 Apr;3(4):e63

Bioinformatics Program, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.

The MYC genes encode nuclear sequence specific-binding DNA-binding proteins that are pleiotropic regulators of cellular function, and the c-MYC proto-oncogene is deregulated and/or mutated in most human cancers. Experimental studies of MYC binding to the genome are not fully consistent. While many c-MYC recognition sites can be identified in c-MYC responsive genes, other motif matches-even experimentally confirmed sites-are associated with genes showing no c-MYC response. We have developed a computational model that integrates multiple sources of evidence to predict which genes will bind and be regulated by MYC in vivo. First, a Bayesian network classifier is used to predict those c-MYC recognition sites that are most likely to exhibit high-occupancy binding in chromatin immunoprecipitation studies. This classifier incorporates genomic sequence, experimentally determined genomic chromatin acetylation islands, and predicted methylation status from a computational model estimating the likelihood of genomic DNA methylation. We find that the predictions from this classifier are also applicable to other transcription factors, such as cAMP-response element-binding protein, whose binding sites are sensitive to DNA methylation. Second, the MYC binding probability is combined with the gene expression profile data from nine independent microarray datasets in multiple tissues. Finally, we may consider gene function annotations in Gene Ontology to predict the c-MYC targets. We assess the performance of our prediction results by comparing them with the c-myc targets identified in the biomedical literature. In total, we predict 460 likely c-MYC target genes in the human genome, of which 67 have been reported to be both bound and regulated by MYC, 68 are bound by MYC, and another 80 are MYC-regulated. The approach thus successfully identifies many known c-MYC targets and suggests many novel sites. Our findings suggest that to identify c-MYC genomic targets, integration of different data sources helps to improve the accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.0030063DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1847699PMC
April 2007

Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics.

Genome Biol 2006 28;7(4):R35. Epub 2006 Apr 28.

Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA.

Background: Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database.

Results: Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene.

Conclusion: This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2006-7-4-r35DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557991PMC
September 2006

Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study.

Nat Biotechnol 2006 Mar;24(3):333-8

University of Michigan, 100 Washtenaw Rd., Palmer Commons 2035B, Ann Arbor, Michigan 48109, USA.

The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously nonannotated gene sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt1183DOI Listing
March 2006

Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database.

Proteomics 2005 Aug;5(13):3226-45

Internal Medicine, University of Michigan, Ann Arbor, MI 48109-0656, USA.

HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anti-coagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics.med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasets had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay-based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pmic.200500358DOI Listing
August 2005
-->