Publications by authors named "Dmitry Prokopenko"

26 Publications

  • Page 1 of 1

TMEM106B and CPOX are genetic determinants of cerebrospinal fluid Alzheimer's disease biomarker levels.

Alzheimers Dement 2021 May 14. Epub 2021 May 14.

Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, Antwerp, Belgium.

Introduction: Neurofilament light (NfL), chitinase-3-like protein 1 (YKL-40), and neurogranin (Ng) are biomarkers for Alzheimer's disease (AD) to monitor axonal damage, astroglial activation, and synaptic degeneration, respectively.

Methods: We performed genome-wide association studies (GWAS) using DNA and cerebrospinal fluid (CSF) samples from the EMIF-AD Multimodal Biomarker Discovery study for discovery, and the Alzheimer's Disease Neuroimaging Initiative study for validation analyses. GWAS were performed for all three CSF biomarkers using linear regression models adjusting for relevant covariates.

Results: We identify novel genome-wide significant associations between DNA variants in TMEM106B and CSF levels of NfL, and between CPOX and YKL-40. We confirm previous work suggesting that YKL-40 levels are associated with DNA variants in CHI3L1.

Discussion: Our study provides important new insights into the genetic architecture underlying interindividual variation in three AD-related CSF biomarkers. In particular, our data shed light on the sequence of events regarding the initiation and progression of neuropathological processes relevant in AD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/alz.12330DOI Listing
May 2021

An APP ectodomain mutation outside of the Aβ domain promotes Aβ production in vitro and deposition in vivo.

J Exp Med 2021 Jun;218(6)

Department of Neurobiology, University of Chicago, Chicago, IL.

Familial Alzheimer's disease (FAD)-linked mutations in the APP gene occur either within the Aβ-coding region or immediately proximal and are located in exons 16 and 17, which encode Aβ peptides. We have identified an extremely rare, partially penetrant, single nucleotide variant (SNV), rs145081708, in APP that corresponds to a Ser198Pro substitution in exon 5. We now report that in stably transfected cells, expression of APP harboring the S198P mutation (APPS198P) leads to elevated production of Aβ peptides by an unconventional mechanism in which the folding and exit of APPS198P from the endoplasmic reticulum is accelerated. More importantly, coexpression of APP S198P and the FAD-linked PS1ΔE9 variant in the brains of male and female transgenic mice leads to elevated steady-state Aβ peptide levels and acceleration of Aβ deposition compared with age- and gender-matched mice expressing APP and PS1ΔE9. This is the first AD-linked mutation in APP present outside of exons 16 and 17 that enhances Aβ production and deposition.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1084/jem.20210313DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8034382PMC
June 2021

Whole-genome sequencing reveals new Alzheimer's disease-associated rare variants in loci related to synaptic function and neuronal development.

Alzheimers Dement 2021 Apr 2. Epub 2021 Apr 2.

Genetics and Aging Research Unit and The Henry and Allison McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA.

Introduction: Genome-wide association studies have led to numerous genetic loci associated with Alzheimer's disease (AD). Whole-genome sequencing (WGS) now permits genome-wide analyses to identify rare variants contributing to AD risk.

Methods: We performed single-variant and spatial clustering-based testing on rare variants (minor allele frequency [MAF] ≤1%) in a family-based WGS-based association study of 2247 subjects from 605 multiplex AD families, followed by replication in 1669 unrelated individuals.

Results: We identified 13 new AD candidate loci that yielded consistent rare-variant signals in discovery and replication cohorts (4 from single-variant, 9 from spatial-clustering), implicating these genes: FNBP1L, SEL1L, LINC00298, PRKCH, C15ORF41, C2CD3, KIF2A, APC, LHX9, NALCN, CTNNA2, SYTL3, and CLSTN2.

Discussion: Downstream analyses of these novel loci highlight synaptic function, in contrast to common AD-associated variants, which implicate innate immunity and amyloid processing. These loci have not been associated previously with AD, emphasizing the ability of WGS to identify AD-associated rare variants, particularly outside of the exome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/alz.12319DOI Listing
April 2021

Estimating the effective sample size in association studies of quantitative traits.

G3 (Bethesda) 2021 Mar 18. Epub 2021 Mar 18.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/g3journal/jkab057DOI Listing
March 2021

Genome-wide association study of Alzheimer's disease CSF biomarkers in the EMIF-AD Multimodal Biomarker Discovery dataset.

Transl Psychiatry 2020 11 22;10(1):403. Epub 2020 Nov 22.

Department of Psychiatry, University Hospital of Lausanne, Lausanne, Switzerland.

Alzheimer's disease (AD) is the most prevalent neurodegenerative disorder and the most common form of dementia in the elderly. Susceptibility to AD is considerably determined by genetic factors which hitherto were primarily identified using case-control designs. Elucidating the genetic architecture of additional AD-related phenotypic traits, ideally those linked to the underlying disease process, holds great promise in gaining deeper insights into the genetic basis of AD and in developing better clinical prediction models. To this end, we generated genome-wide single-nucleotide polymorphism (SNP) genotyping data in 931 participants of the European Medical Information Framework Alzheimer's Disease Multimodal Biomarker Discovery (EMIF-AD MBD) sample to search for novel genetic determinants of AD biomarker variability. Specifically, we performed genome-wide association study (GWAS) analyses on 16 traits, including 14 measures derived from quantifications of five separate amyloid-beta (Aβ) and tau-protein species in the cerebrospinal fluid (CSF). In addition to confirming the well-established effects of apolipoprotein E (APOE) on diagnostic outcome and phenotypes related to Aβ42, we detected novel potential signals in the zinc finger homeobox 3 (ZFHX3) for CSF-Aβ38 and CSF-Aβ40 levels, and confirmed the previously described sex-specific association between SNPs in geminin coiled-coil domain containing (GMNC) and CSF-tau. Utilizing the results from independent case-control AD GWAS to construct polygenic risk scores (PRS) revealed that AD risk variants only explain a small fraction of CSF biomarker variability. In conclusion, our study represents a detailed first account of GWAS analyses on CSF-Aβ and -tau-related traits in the EMIF-AD MBD dataset. In subsequent work, we will utilize the genomics data generated here in GWAS of other AD-relevant clinical outcomes ascertained in this unique dataset.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41398-020-01074-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7680793PMC
November 2020

deletion and a rare variant associated with late onset Alzheimer's disease trigger BACE1 accumulation in axonal swellings.

Sci Transl Med 2020 11;12(570)

Department of Neuroscience, Tufts University School of Medicine, Boston, MA 02111, USA.

Axonal dystrophy, indicative of perturbed axonal transport, occurs early during Alzheimer's disease (AD) pathogenesis. Little is known about the mechanisms underlying this initial sign of the pathology. This study proves that Golgi-localized γ-ear-containing ARF binding protein 3 (GGA3) loss of function, due to genetic deletion or a rare variant that cosegregates with late-onset AD, disrupts the axonal trafficking of the β-site APP-cleaving enzyme 1 (BACE1) resulting in its accumulation in axonal swellings in cultured neurons and in vivo. We show that BACE pharmacological inhibition ameliorates BACE1 axonal trafficking and diminishes axonal dystrophies in null neurons in vitro and in vivo. These data indicate that axonal accumulation of BACE1 engendered by GGA3 loss of function results in local toxicity leading to axonopathy. deletion exacerbates axonal dystrophies in a mouse model of AD before β-amyloid (Aβ) deposition. Our study strongly supports a role for GGA3 in AD pathogenesis, where GGA3 loss of function triggers BACE1 axonal accumulation independently of extracellular Aβ, and initiates a cascade of events leading to the axonal damage distinctive of the early stage of AD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/scitranslmed.aba1871DOI Listing
November 2020

Whole-genome sequencing reveals new Alzheimer's disease-associated rare variants in loci related to synaptic function and neuronal development.

medRxiv 2020 Nov 4. Epub 2020 Nov 4.

Introduction: Genome-wide association studies have led to numerous genetic loci associated with Alzheimer's disease (AD). Whole-genome sequencing (WGS) now permit genome-wide analyses to identify rare variants contributing to AD risk.

Methods: We performed single-variant and spatial clustering-based testing on rare variants (minor allele frequency ≤1%) in a family-based WGS-based association study of 2,247 subjects from 605 multiplex AD families, followed by replication in 1,669 unrelated individuals.

Results: We identified 13 new AD candidate loci that yielded consistent rare-variant signals in discovery and replication cohorts (4 from single-variant, 9 from spatial-clustering), implicating these genes: FNBP1L, SEL1L, LINC00298, PRKCH, C15ORF41, C2CD3, KIF2A, APC, LHX9, NALCN, CTNNA2, SYTL3, CLSTN2.

Discussion: Downstream analyses of these novel loci highlight synaptic function, in contrast to common AD-associated variants, which implicate innate immunity. These loci have not been previously associated with AD, emphasizing the ability of WGS to identify AD-associated rare variants, particularly outside of coding regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.11.03.20225540DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7654884PMC
November 2020

Genome-Wide Gene-by-Smoking Interaction Study of Chronic Obstructive Pulmonary Disease.

Am J Epidemiol 2021 05;190(5):875-885

Risk of chronic obstructive pulmonary disease (COPD) is determined by both cigarette smoking and genetic susceptibility, but little is known about gene-by-smoking interactions. We performed a genome-wide association analysis of 179,689 controls and 21,077 COPD cases from UK Biobank subjects of European ancestry recruited from 2006 to 2010, considering genetic main effects and gene-by-smoking interaction effects simultaneously (2-degrees-of-freedom (df) test) as well as interaction effects alone (1-df interaction test). We sought to replicate significant results in COPDGene (United States, 2008-2010) and SpiroMeta Consortium (multiple countries, 1947-2015) data. We considered 2 smoking variables: 1) ever/never and 2) current/noncurrent. In the 1-df test, we identified 1 genome-wide significant locus on 15q25.1 (cholinergic receptor nicotinic β4 subunit, or CHRNB4) for ever- and current smoking and identified PI*Z allele (rs28929474) of serpin family A member 1 (SERPINA1) for ever-smoking and 3q26.2 (MDS1 and EVI1 complex locus, or MECOM) for current smoking in an analysis of previously reported COPD loci. In the 2-df test, most of the significant signals were also significant for genetic marginal effects, aside from 16q22.1 (sphingomyelin phosphodiesterase 3, or SMPD3) and 19q13.2 (Egl-9 family hypoxia inducible factor 2, or EGLN2). The significant effects at 15q25.1 and 19q13.2 loci, both previously described in prior genome-wide association studies of COPD or smoking, were replicated in COPDGene and SpiroMeta. We identified interaction effects at previously reported COPD loci; however, we failed to identify novel susceptibility loci.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/aje/kwaa227DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8096488PMC
May 2021

Whole genome sequence analysis of pulmonary function and COPD in 19,996 multi-ethnic participants.

Nat Commun 2020 10 14;11(1):5182. Epub 2020 Oct 14.

The Institute for Translational Genomics and Population Sciences, The Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA.

Chronic obstructive pulmonary disease (COPD), diagnosed by reduced lung function, is a leading cause of morbidity and mortality. We performed whole genome sequence (WGS) analysis of lung function and COPD in a multi-ethnic sample of 11,497 participants from population- and family-based studies, and 8499 individuals from COPD-enriched studies in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program. We identify at genome-wide significance 10 known GWAS loci and 22 distinct, previously unreported loci, including two common variant signals from stratified analysis of African Americans. Four novel common variants within the regions of PIAS1, RGN (two variants) and FTO show evidence of replication in the UK Biobank (European ancestry n ~ 320,000), while colocalization analyses leveraging multi-omic data from GTEx and TOPMed identify potential molecular mechanisms underlying four of the 22 novel loci. Our study demonstrates the value of performing WGS analyses and multi-omic follow-up in cohorts of diverse ancestry.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18334-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7598941PMC
October 2020

Aβ-accelerated neurodegeneration caused by Alzheimer's-associated variant R1279Q is rescued by angiotensin system inhibition in mice.

Sci Transl Med 2020 09;12(563)

Ken and Ruth Davee Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.

Recent genome-wide association studies identified the angiotensin-converting enzyme gene () as an Alzheimer's disease (AD) risk locus. However, the pathogenic mechanism by which causes AD is unknown. Using whole-genome sequencing, we identified rare coding variants in AD families and investigated one, ACE1 R1279Q, in knockin (KI) mice. Similar to AD, ACE1 was increased in neurons, but not microglia or astrocytes, of KI brains, which became elevated further with age. Angiotensin II (angII) and angII receptor AT1R signaling were also increased in KI brains. Autosomal dominant neurodegeneration and neuroinflammation occurred with aging in KI hippocampus, which were absent in the cortex and cerebellum. Female KI mice exhibited greater hippocampal electroencephalograph disruption and memory impairment compared to males. variant effects were more pronounced in female KI mice, suggesting a mechanism for higher AD risk in women. Hippocampal neurodegeneration was completely rescued by treatment with brain-penetrant drugs that inhibit ACE1 and AT1R. Although variant-induced neurodegeneration did not depend on β-amyloid (Aβ) pathology, amyloidosis in 5XFAD mice crossed to KI mice accelerated neurodegeneration and neuroinflammation, whereas Aβ deposition was unchanged. KI mice had normal blood pressure and cerebrovascular functions. Our findings strongly suggest that increased ACE1/angII signaling causes aging-dependent, Aβ-accelerated selective hippocampal neuron vulnerability and female susceptibility, hallmarks of AD that have hitherto been enigmatic. We conclude that repurposed brain-penetrant ACE inhibitors and AT1R blockers may protect against AD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/scitranslmed.aaz2541DOI Listing
September 2020

locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies.

Genet Epidemiol 2021 02 14;45(1):82-98. Epub 2020 Sep 14.

Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA.

locStra is an -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22356DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7856019PMC
February 2021

Identification of Novel Alzheimer's Disease Loci Using Sex-Specific Family-Based Association Analysis of Whole-Genome Sequence Data.

Sci Rep 2020 03 19;10(1):5029. Epub 2020 Mar 19.

Genetics and Aging Unit and McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.

With the advent of whole genome-sequencing (WGS) studies, family-based designs enable sex-specific analysis approaches that can be applied to only affected individuals; tests using family-based designs are attractive because they are completely robust against the effects of population substructure. These advantages make family-based association tests (FBATs) that use siblings as well as parents especially suited for the analysis of late-onset diseases such as Alzheimer's Disease (AD). However, the application of FBATs to assess sex-specific effects can require additional filtering steps, as sensitivity to sequencing errors is amplified in this type of analysis. Here, we illustrate the implementation of robust analysis approaches and additional filtering steps that can minimize the chances of false positive-findings due to sex-specific sequencing errors. We apply this approach to two family-based AD datasets and identify four novel loci (GRID1, RIOK3, MCPH1, ZBTB7C) showing sex-specific association with AD risk. Following stringent quality control filtering, the strongest candidate is ZBTB7C (P = 1.83 × 10), in which the minor allele of rs1944572 confers increased risk for AD in females and protection in males. ZBTB7C encodes the Zinc Finger and BTB Domain Containing 7C, a transcriptional repressor of membrane metalloproteases (MMP). Members of this MMP family were implicated in AD neuropathology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-61883-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7081222PMC
March 2020

Negative evidence for a role of APH1B T27I variant in Alzheimer's disease.

Hum Mol Genet 2020 04;29(6):955-966

Department of Neurobiology, The University of Chicago, Chicago, IL 60637, USA.

γ-secretase is a macromolecular complex that catalyzes intramembranous hydrolysis of more than 100 membrane-bound substrates. The complex is composed of presenilin (PS1 or PS2), anterior pharynx defect-1 (APH-1), nicastrin (NCT) and PEN-2 and early-onset; autosomal dominant forms of Alzheimer's disease (AD) are caused by inheritance of mutations of PS. No mutations in genes encoding NCT, or PEN-2 have been identified to date that cause AD. In this regard, a large genetic meta-analysis of four cohorts consisting of more than 600 000 individuals identified a common missense variant, rs117618017 in the APH1B gene that results in a T27I mutation, as a novel genome-wide significant locus. In order to confirm the findings that rs117618017 is associated with risk of AD, we performed a genetic screen from deep whole genome sequencing of the large NIMH family-based Alzheimer's Disease (AD) dataset. In parallel, we sought to uncover potential molecular mechanism(s) by which APH-1B T27I might be associated with AD by generating stable HEK293 cell lines, wherein endogenous APH-1A and APH-1B expression was silenced and into which either the wild type APH-1B or the APH-1B T27I variant was stably expressed. We then tested the impact of expressing either the wild type APH-1B or the APH-1B T27I variant on γ-secretase processing of human APP, the murine Notch derivative mNΔE and human neuregulin-1. We now report that we fail to confirm the association of rs1047552 with AD in our cohort and that cells expressing the APH-1B T27I variant show no discernable impact on the γ-secretase processing of established substrates compared with cells expressing wild-type APH-1B.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddaa017DOI Listing
April 2020

Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations.

Nat Genet 2019 03 25;51(3):494-505. Epub 2019 Feb 25.

Department of Internal Medicine and Environmental Health Center, School of Medicine, Kangwon National University, Chuncheon, South Korea.

Chronic obstructive pulmonary disease (COPD) is the leading cause of respiratory mortality worldwide. Genetic risk loci provide new insights into disease pathogenesis. We performed a genome-wide association study in 35,735 cases and 222,076 controls from the UK Biobank and additional studies from the International COPD Genetics Consortium. We identified 82 loci associated with P < 5 × 10; 47 of these were previously described in association with either COPD or population-based measures of lung function. Of the remaining 35 new loci, 13 were associated with lung function in 79,055 individuals from the SpiroMeta consortium. Using gene expression and regulation data, we identified functional enrichment of COPD risk loci in lung tissue, smooth muscle, and several lung cell types. We found 14 COPD loci shared with either asthma or pulmonary fibrosis. COPD genetic risk loci clustered into groups based on associations with quantitative imaging features and comorbidities. Our analyses provide further support for the genetic susceptibility and heterogeneity of COPD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0342-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546635PMC
March 2019

New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries.

Nat Genet 2019 03 25;51(3):481-493. Epub 2019 Feb 25.

Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.

Reduced lung function predicts mortality and is key to the diagnosis of chronic obstructive pulmonary disease (COPD). In a genome-wide association study in 400,102 individuals of European ancestry, we define 279 lung function signals, 139 of which are new. In combination, these variants strongly predict COPD in independent populations. Furthermore, the combined effect of these variants showed generalizability across smokers and never smokers, and across ancestral groups. We highlight biological pathways, known and potential drug targets for COPD and, in phenome-wide association studies, autoimmune-related and other pleiotropic effects of lung function-associated variants. This new genetic evidence has potential to improve future preventive and therapeutic strategies for COPD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0321-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6397078PMC
March 2019

Whole exome sequencing analysis in severe chronic obstructive pulmonary disease.

Hum Mol Genet 2018 11;27(21):3801-3812

Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America.

Chronic obstructive pulmonary disease (COPD), one of the leading causes of death worldwide, is substantially influenced by genetic factors. Alpha-1 antitrypsin deficiency demonstrates that rare coding variants of large effect can influence COPD susceptibility. To identify additional rare coding variants in patients with severe COPD, we conducted whole exome sequencing analysis in 2543 subjects from two family-based studies (Boston Early-Onset COPD Study and International COPD Genetics Network) and one case-control study (COPDGene). Applying a gene-based segregation test in the family-based data, we identified significant segregation of rare loss of function variants in TBC1D10A and RFPL1 (P-value < 2x10-6), but were unable to find similar variants in the case-control study. In single-variant, gene-based and pathway association analyses, we were unable to find significant findings that replicated or were significant in meta-analysis. However, we found that the top results in the two datasets were in proximity to each other in the protein-protein interaction network (P-value = 0.014), suggesting enrichment of these results for similar biological processes. A network of these association results and their neighbors was significantly enriched in the transforming growth factor beta-receptor binding and cilia-related pathways. Finally, in a more detailed examination of candidate genes, we identified individuals with putative high-risk variants, including patients harboring homozygous mutations in genes associated with cutis laxa and Niemann-Pick Disease Type C. Our results likely reflect heterogeneity of genetic risk for COPD along with limitations of statistical power and functional annotation, and highlight the potential of network analysis to gain insight into genetic association studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddy269DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6196654PMC
November 2018

Whole-Genome Sequencing in Severe Chronic Obstructive Pulmonary Disease.

Am J Respir Cell Mol Biol 2018 11;59(5):614-622

1 Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts.

Genome-wide association studies have identified common variants associated with chronic obstructive pulmonary disease (COPD). Whole-genome sequencing (WGS) offers comprehensive coverage of the entire genome, as compared with genotyping arrays or exome sequencing. We hypothesized that WGS in subjects with severe COPD and smoking control subjects with normal pulmonary function would allow us to identify novel genetic determinants of COPD. We sequenced 821 patients with severe COPD and 973 control subjects from the COPDGene and Boston Early-Onset COPD studies, including both non-Hispanic white and African American individuals. We performed single-variant and grouped-variant analyses, and in addition, we assessed the overlap of variants between sequencing- and array-based imputation. Our most significantly associated variant was in a known region near HHIP (combined P = 1.6 × 10); additional variants approaching genome-wide significance included previously described regions in CHRNA5, TNS1, and SERPINA6/SERPINA1 (the latter in African American individuals). None of our associations were clearly driven by rare variants, and we found minimal evidence of replication of genes identified by previously reported smaller sequencing studies. With WGS, we identified more than 20 million new variants, not seen with imputation, including more than 10,000 of potential importance in previously identified COPD genome-wide association study regions. WGS in severe COPD identifies a large number of potentially important functional variants, with the strongest associations being in known COPD risk loci, including HHIP and SERPINA1. Larger sample sizes will be needed to identify associated variants in novel regions of the genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1165/rcmb.2018-0088OCDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6236690PMC
November 2018

PolyGEE: a generalized estimating equation approach to the efficient and robust estimation of polygenic effects in large-scale association studies.

Biostatistics 2018 07;19(3):295-306

Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA and Department of Genomic Mathematics, University of Bonn, Sigmund-Freud-Strasse 25, 53127 Bonn, Germany.

To quantify polygenic effects, i.e. undetected genetic effects, in large-scale association studies, we propose a generalized estimating equation (GEE) based estimation framework. We develop a marginal model for single-variant association test statistics of complex diseases that generalizes existing approaches such as LD Score regression and that is applicable to population-based designs, to family-based designs or to arbitrary combinations of both. We extend the standard GEE approach so that the parameters of the proposed marginal model can be estimated based on working-correlation/linkage-disequilibrium (LD) matrices from external reference panels. Our method achieves substantial efficiency gains over standard approaches, while it is robust against misspecification of the LD structure, i.e. the LD structure of the reference panel can differ substantially from the true LD structure in the study population. In simulation studies and in applications to population-based and family-based studies, we illustrate the features of the proposed GEE framework. Our results suggest that our approach can be up to 100% more efficient than existing methodology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/biostatistics/kxx040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5991211PMC
July 2018

Reporting Correct p Values in VEGAS Analyses.

Twin Res Hum Genet 2017 06 27;20(3):257-259. Epub 2017 Mar 27.

Department of Biostatistics,Harvard T.H. Chan School of Public Health,Boston,Massachusetts,USA.

VEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false-positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p values.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1017/thg.2017.16DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5516093PMC
June 2017

On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows.

Genet Epidemiol 2017 05 20;41(4):332-340. Epub 2017 Mar 20.

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.

For the association analysis of whole-genome sequencing (WGS) studies, we propose an efficient and fast spatial-clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed-window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease-associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R-implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window-based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region-based association signal of ITGB3 replicates in an independent data set and achieves formally genome-wide significance. Software Implementation: An implementation of the algorithm in R is available at: https://github.com/heidefier/cluster_wgs_data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5525021PMC
May 2017

Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project.

Bioinformatics 2016 05 31;32(9):1366-72. Epub 2015 Dec 31.

Institute of Genomic Mathematics, University of Bonn, Bonn, Germany, Department of Biostatistics, Harvard School of Public Health, Boston, USA.

Motivation: Population stratification is one of the major sources of confounding in genetic association studies, potentially causing false-positive and false-negative results. Here, we present a novel approach for the identification of population substructure in high-density genotyping data/next generation sequencing data. The approach exploits the co-appearances of rare genetic variants in individuals. The method can be applied to all available genetic loci and is computationally fast. Using sequencing data from the 1000 Genomes Project, the features of the approach are illustrated and compared to existing methodology (i.e. EIGENSTRAT). We examine the effects of different cutoffs for the minor allele frequency on the performance of the approach. We find that our approach works particularly well for genetic loci with very small minor allele frequencies. The results suggest that the inclusion of rare-variant data/sequencing data in our approach provides a much higher resolution picture of population substructure than it can be obtained with existing methodology. Furthermore, in simulation studies, we find scenarios where our method was able to control the type 1 error more precisely and showed higher power.

Availability And Implementation:

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv752DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860507PMC
May 2016

On the Recombination Rate Estimation in the Presence of Population Substructure.

PLoS One 2015 30;10(12):e0145152. Epub 2015 Dec 30.

Institute of Genomic Mathematics, University of Bonn, Bonn, Germany.

As recombination events are not uniformly distributed along the human genome, the estimation of fine-scale recombination maps, e.g. HapMap Project, has been one of the major research endeavors over the last couple of years. For simulation studies, these estimates provide realistic reference scenarios to design future study and to develop novel methodology. To achieve a feasible framework for the estimation of such recombination maps, existing methodology uses sample probabilities for a two-locus model with recombination, with recent advances allowing for computationally fast implementations. In this work, we extend the existing theoretical framework for the recombination rate estimation to the presence of population substructure. We show under which assumptions the existing methodology can still be applied. We illustrate our extension of the methodology by an extensive simulation study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0145152PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4696844PMC
June 2016

Using Network Methodology to Infer Population Substructure.

PLoS One 2015 22;10(6):e0130708. Epub 2015 Jun 22.

Institute of Genomic Mathematics, University of Bonn, Bonn, Germany; Institute of Human Genetics, University of Bonn, Bonn, Germany.

One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130708PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4476755PMC
March 2016

'Location, Location, Location': a spatial approach for rare variant analysis and an application to a study on non-syndromic cleft lip with or without cleft palate.

Bioinformatics 2012 Dec 8;28(23):3027-33. Epub 2012 Oct 8.

Department of Genomic Mathematics, University of Bonn, 53127, Germany.

Motivation: For the analysis of rare variants in sequence data, numerous approaches have been suggested. Fixed and flexible threshold approaches collapse the rare variant information of a genomic region into a test statistic with reduced dimensionality. Alternatively, the rare variant information can be combined in statistical frameworks that are based on suitable regression models, machine learning, etc. Although the existing approaches provide powerful tests that can incorporate information on allele frequencies and prior biological knowledge, differences in the spatial clustering of rare variants between cases and controls cannot be incorporated. Based on the assumption that deleterious variants and protective variants cluster or occur in different parts of the genomic region of interest, we propose a testing strategy for rare variants that builds on spatial cluster methodology and that guides the identification of the biological relevant segments of the region. Our approach does not require any assumption about the directions of the genetic effects.

Results: In simulation studies, we assess the power of the clustering approach and compare it with existing methodology. Our simulation results suggest that the clustering approach for rare variants is well powered, even in situations that are ideal for standard methods. The efficiency of our spatial clustering approach is not affected by the presence of rare variants that have opposite effect size directions. An application to a sequencing study for non-syndromic cleft lip with or without cleft palate (NSCL/P) demonstrates its practical relevance. The proposed testing strategy is applied to a genomic region on chromosome 15q13.3 that was implicated in NSCL/P etiology in a previous genome-wide association study, and its results are compared with standard approaches.

Availability: Source code and documentation for the implementation in R will be provided online. Currently, the R-implementation only supports genotype data. We currently are working on an extension for VCF files.

Contact: [email protected]
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts568DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516147PMC
December 2012

Gel-based oligonucleotide microarray approach to analyze protein-ssDNA binding specificity.

Nucleic Acids Res 2008 Jun 12;36(10):e61. Epub 2008 May 12.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32 Vavilov Street, 119991 Moscow, Russian Federation.

Gel-based oligonucleotide microarray approach was developed for quantitative profiling of binding affinity of a protein to single-stranded DNA (ssDNA). To demonstrate additional capabilities of this method, we analyzed the binding specificity of ribonuclease (RNase) binase from Bacillus intermedius (EC 3.1.27.3) to ssDNA using generic hexamer oligodeoxyribonucleotide microchip. Single-stranded octamer oligonucleotides were immobilized within 3D hemispherical gel pads. The octanucleotides in individual pads 5'-{N}N(1)N(2)N(3)N(4)N(5)N(6){N}-3' consisted of a fixed hexamer motif N(1)N(2)N(3)N(4)N(5)N(6) in the middle and variable parts {N} at the ends, where {N} represent A, C, G and T in equal proportions. The chip has 4096 pads with a complete set of hexamer sequences. The affinity was determined by measuring dissociation of the RNase-ssDNA complexes with the temperature increasing from 0 degrees C to 50 degrees C in quasi-equilibrium conditions. RNase binase showed the highest sequence-specificity of binding to motifs 5'-NNG(A/T/C)GNN-3' with the order of preference: GAG > GTG > GCG. High specificity towards G(A/T/C)G triplets was also confirmed by measuring fluorescent anisotropy of complexes of binase with selected oligodeoxyribonucleotides in solution. The affinity of RNase binase to other 3-nt sequences was also ranked. These results demonstrate the applicability of the method and provide the ground for further investigations of nonenzymatic functions of RNases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkn246DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2425478PMC
June 2008
-->