Publications by authors named "Maarten van Iterson"

31 Publications

Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression.

Nat Genet 2021 Sep 2;53(9):1300-1310. Epub 2021 Sep 2.

Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.

Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-021-00913-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8432599PMC
September 2021

Genome-wide identification of genes regulating DNA methylation using genetic anchors for causal inference.

Genome Biol 2020 08 28;21(1):220. Epub 2020 Aug 28.

Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, 2333 ZC, Leiden, The Netherlands.

Background: DNA methylation is a key epigenetic modification in human development and disease, yet there is limited understanding of its highly coordinated regulation. Here, we identify 818 genes that affect DNA methylation patterns in blood using large-scale population genomics data.

Results: By employing genetic instruments as causal anchors, we establish directed associations between gene expression and distant DNA methylation levels, while ensuring specificity of the associations by correcting for linkage disequilibrium and pleiotropy among neighboring genes. The identified genes are enriched for transcription factors, of which many consistently increased or decreased DNA methylation levels at multiple CpG sites. In addition, we show that a substantial number of transcription factors affected DNA methylation at their experimentally determined binding sites. We also observe genes encoding proteins with heterogenous functions that have widespread effects on DNA methylation, e.g., NFKBIE, CDCA7(L), and NLRC5, and for several examples, we suggest plausible mechanisms underlying their effect on DNA methylation.

Conclusion: We report hundreds of genes that affect DNA methylation and provide key insights in the principles underlying epigenetic regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02114-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7453518PMC
August 2020

A characterization of cis- and trans-heritability of RNA-Seq-based gene expression.

Eur J Hum Genet 2020 02 26;28(2):253-263. Epub 2019 Sep 26.

Department of Biological Psychology, Amsterdam Public Health research institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.

Insights into individual differences in gene expression and its heritability (h) can help in understanding pathways from DNA to phenotype. We estimated the heritability of gene expression of 52,844 genes measured in whole blood in the largest twin RNA-Seq sample to date (1497 individuals including 459 monozygotic twin pairs and 150 dizygotic twin pairs) from classical twin modeling and identity-by-state-based approaches. We estimated for each gene h, composed of cis-heritability (h, the variance explained by single nucleotide polymorphisms in the cis-window of the gene), and trans-heritability (h, the residual variance explained by all other genome-wide variants). Mean h was 0.26, which was significantly higher than heritability estimates earlier found in a microarray-based study using largely overlapping (>60%) RNA samples (mean h = 0.14, p = 6.15 × 10). Mean h was 0.06 and strongly correlated with beta of the top cis expression quantitative loci (eQTL, ρ = 0.76, p < 10) and with estimates from earlier RNA-Seq-based studies. Mean h was 0.20 and correlated with the beta of the corresponding trans-eQTL (ρ = 0.04, p < 1.89 × 10) and was significantly higher for genes involved in cytokine-cytokine interactions (p = 4.22 × 10), many other immune system pathways, and genes identified in genome-wide association studies for various traits including behavioral disorders and cancer. This study provides a thorough characterization of cis- and trans-h estimates of gene expression, which is of value for interpretation of GWAS and gene expression studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41431-019-0511-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6974598PMC
February 2020

DNA methylation signatures of educational attainment.

NPJ Sci Learn 2018 23;3. Epub 2018 Mar 23.

1Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.

Educational attainment is a key behavioural measure in studies of cognitive and physical health, and socioeconomic status. We measured DNA methylation at 410,746 CpGs ( = 4152) and identified 58 CpGs associated with educational attainment at loci characterized by pleiotropic functions shared with neuronal, immune and developmental processes. Associations overlapped with those for smoking behaviour, but remained after accounting for smoking at many CpGs: Effect sizes were on average 28% smaller and genome-wide significant at 11 CpGs after adjusting for smoking and were 62% smaller in never smokers. We examined sources and biological implications of education-related methylation differences, demonstrating correlations with maternal prenatal folate, smoking and air pollution signatures, and associations with gene expression in cis, dynamic methylation in foetal brain, and correlations between blood and brain. Our findings show that the methylome of lower-educated people resembles that of smokers beyond effects of their own smoking behaviour and shows traces of various other exposures.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41539-018-0020-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6220239PMC
March 2018

Smoking is Associated to DNA Methylation in Atherosclerotic Carotid Lesions.

Circ Genom Precis Med 2018 09;11(9):e002030

Laboratory of Clinical Chemistry and Hematology, University Medical Center Utrecht, University of Utrecht, the Netherlands (S.H., G.P.).

Background: Tobacco smoking is a major risk factor for atherosclerotic disease and has been associated with DNA methylation (DNAm) changes in blood cells. However, whether smoking influences DNAm in the diseased vascular wall is unknown but may prove crucial in understanding the pathophysiology of atherosclerosis. In this study, we associated current tobacco smoking to epigenome-wide DNAm in atherosclerotic plaques from patients undergoing carotid endarterectomy.

Methods: DNAm at commonly methylated sites (cytosine-guanine nucleotide pairs separated by a phospho-group [CpGs]) was assessed in atherosclerotic plaque samples and peripheral blood samples from 485 carotid endarterectomy patients. We tested the association of current tobacco smoking with DNAm corrected for age and sex. To control for bias and inflation because of cellular heterogeneity, we applied a Bayesian method to estimate an empirical null distribution as implemented by the R package bacon. Replication of the smoking-associated methylated CpGs in atherosclerotic plaques was executed in the second sample of 190 carotid endarterectomy patients, and results were meta-analyzed using a fixed-effects model.

Results: Tobacco smoking was significantly associated to differential DNAm in atherosclerotic lesions of 4 CpGs (false discovery rate <0.05) mapped to 2 different genes ( AHRR, ITPK1) and 17 CpGs mapped to 8 genes and RNAs in blood. The strongest associations were found for CpGs mapped to the gene AHRR, a repressor of the aryl hydrocarbon receptor transcription factor involved in xenobiotic detoxification. One of these methylated CpGs were found to be regulated by local genetic variation.

Conclusions: The risk factor tobacco smoking associates with DNAm at multiple loci in carotid atherosclerotic lesions. These observations support further investigation of the relationship between risk factors and epigenetic regulation in atherosclerotic disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1161/CIRCGEN.117.002030DOI Listing
September 2018

Genome-wide identification of directed gene networks using large-scale population genomics data.

Nat Commun 2018 08 6;9(1):3097. Epub 2018 Aug 6.

Medical Statistics Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Zuid-Holland, 2333 ZC, The Netherlands.

Identification of causal drivers behind regulatory gene networks is crucial in understanding gene function. Here, we develop a method for the large-scale inference of gene-gene interactions in observational population genomics data that are both directed (using local genetic instruments as causal anchors, akin to Mendelian Randomization) and specific (by controlling for linkage disequilibrium and pleiotropy). Analysis of genotype and whole-blood RNA-sequencing data from 3072 individuals identified 49 genes as drivers of downstream transcriptional changes (Wald P < 7 × 10), among which transcription factors were overrepresented (Fisher's P = 3.3 × 10). Our analysis suggests new gene functions and targets, including for SENP7 (zinc-finger genes involved in retroviral repression) and BCL2A1 (target genes possibly involved in auditory dysfunction). Our work highlights the utility of population genomics data in deriving directed gene expression networks. A resource of trans-effects for all 6600 genes with a genetic instrument can be explored individually using a web-based browser.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-05452-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6079029PMC
August 2018

omicsPrint: detection of data linkage errors in multiple omics studies.

Bioinformatics 2018 06;34(12):2142-2143

Molecular Epidemiology, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, ZC Leiden, The Netherlands.

Summary: OmicsPrint is a versatile method for the detection of data linkage errors in multiple omics studies encompassing genetic, transcriptome and/or methylome data. OmicsPrint evaluates data linkage within and between omics data types using genotype calls from SNP arrays, DNA- or RNA-sequencing data and includes an algorithm to infer genotypes from Illumina DNA methylation array data. The method uses classification to verify assumed relationships and detect any data linkage errors, e.g. arising from sample mix-ups and mislabeling. Graphical and text output is provided to inspect and resolve putative data linkage errors. If sufficient genotype calls are available, first degree family relations also are revealed which can be used to check parent-offspring relations or zygosity in twin studies.

Availability And Implementation: omicsPrint is available from BioConductor; http://bioconductor.org/packages/omicsPrint.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty062DOI Listing
June 2018

A SNP panel for identification of DNA and RNA specimens.

BMC Genomics 2018 01 25;19(1):90. Epub 2018 Jan 25.

Department of Human Genetics, Leiden University Medical Center, Postzone S4-P, PO Box 9600, 2300 RC, Leiden, The Netherlands.

Background: SNP panels that uniquely identify an individual are useful for genetic and forensic research. Previously recommended SNP panels are based on DNA profiles and mostly contain intragenic SNPs. With the increasing interest in RNA expression profiles, we aimed for establishing a SNP panel for both DNA and RNA-based genotyping.

Results: To determine a small set of SNPs with maximally discriminative power, genotype calls were obtained from DNA and blood-derived RNA sequencing data belonging to healthy, geographically dispersed, Dutch individuals. SNPs were selected based on different criteria like genotype call rate, minor allele frequency, Hardy-Weinberg equilibrium and linkage disequilibrium. A panel of 50 SNPs was sufficient to identify an individual uniquely: the probability of identity was 6.9 × 10 when assuming no family relations and 1.2 × 10 when accounting for the presence of full sibs. The ability of the SNP panel to uniquely identify individuals on DNA and RNA level was validated in an independent population dataset. The panel is applicable to individuals from European descent, with slightly lower power in non-Europeans. Whereas most of the genes containing the 50 SNPs are expressed in various tissues, our SNP panel needs optimization for other tissues than blood.

Conclusions: This first DNA/RNA SNP panel will be useful to identify sample mix-ups in biomedical research and for assigning DNA and RNA stains in crime scenes to unique individuals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-4482-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5785835PMC
January 2018

Genetically defined elevated homocysteine levels do not result in widespread changes of DNA methylation in leukocytes.

PLoS One 2017 30;12(10):e0182472. Epub 2017 Oct 30.

Framingham Heart Study, Framingham, MA, United States of America.

Background: DNA methylation is affected by the activities of the key enzymes and intermediate metabolites of the one-carbon pathway, one of which involves homocysteine. We investigated the effect of the well-known genetic variant associated with mildly elevated homocysteine: MTHFR 677C>T independently and in combination with other homocysteine-associated variants, on genome-wide leukocyte DNA-methylation.

Methods: Methylation levels were assessed using Illumina 450k arrays on 9,894 individuals of European ancestry from 12 cohort studies. Linear-mixed-models were used to study the association of additive MTHFR 677C>T and genetic-risk score (GRS) based on 18 homocysteine-associated SNPs, with genome-wide methylation.

Results: Meta-analysis revealed that the MTHFR 677C>T variant was associated with 35 CpG sites in cis, and the GRS showed association with 113 CpG sites near the homocysteine-associated variants. Genome-wide analysis revealed that the MTHFR 677C>T variant was associated with 1 trans-CpG (nearest gene ZNF184), while the GRS model showed association with 5 significant trans-CpGs annotated to nearest genes PTF1A, MRPL55, CTDSP2, CRYM and FKBP5.

Conclusions: Our results do not show widespread changes in DNA-methylation across the genome, and therefore do not support the hypothesis that mildly elevated homocysteine is associated with widespread methylation changes in leukocytes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0182472PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5662081PMC
November 2017

IgG glycosylation and DNA methylation are interconnected with smoking.

Biochim Biophys Acta Gen Subj 2018 Mar 18;1862(3):637-648. Epub 2017 Oct 18.

Department of Twin Research and Genetic Epidemiology, King's College London, London, UK.

Background: Glycosylation is one of the most common post-translation modifications with large influences on protein structure and function. The effector function of immunoglobulin G (IgG) alters between pro- and anti-inflammatory, based on its glycosylation. IgG glycan synthesis is highly complex and dynamic.

Methods: With the use of two different analytical methods for assessing IgG glycosylation, we aim to elucidate the link between DNA methylation and glycosylation of IgG by means of epigenome-wide association studies. In total, 3000 individuals from 4 cohorts were analyzed.

Results: The overlap of the results from the two glycan measurement panels yielded DNA methylation of 7 CpG-sites on 5 genomic locations to be associated with IgG glycosylation: cg25189904 (chr.1, GNG12); cg05951221, cg21566642 and cg01940273 (chr.2, ALPPL2); cg05575921 (chr.5, AHRR); cg06126421 (6p21.33); and cg03636183 (chr.19, F2RL3). Mediation analyses with respect to smoking revealed that the effect of smoking on IgG glycosylation may be at least partially mediated via DNA methylation levels at these 7 CpG-sites.

Conclusion: Our results suggest the presence of an indirect link between DNA methylation and IgG glycosylation that may in part capture environmental exposures.

General Significance: An epigenome-wide analysis conducted in four population-based cohorts revealed an association between DNA methylation and IgG glycosylation patterns. Presumably, DNA methylation mediates the effect of smoking on IgG glycosylation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.bbagen.2017.10.012DOI Listing
March 2018

Differentiation-Defective Human Induced Pluripotent Stem Cells Reveal Strengths and Limitations of the Teratoma Assay and In Vitro Pluripotency Assays.

Stem Cell Reports 2017 05;8(5):1340-1353

Department of Anatomy & Embryology, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, the Netherlands. Electronic address:

The ability to form teratomas in vivo containing multiple somatic cell types is regarded as functional evidence of pluripotency for human pluripotent stem cells (hPSCs). Since the Teratoma assay is animal dependent, laborious, and only qualitative, the PluriTest and the hPSC ScoreCard assay have been developed as in vitro alternatives. Here we compared normal hPSCs, induced hPSCs (hiPSCs) with reactivated reprogramming transgenes, and human embryonal carcinoma cells (hECs) in these assays. While normal hPSCs gave rise to typical teratomas, the xenografts of the hECs and the hiPSCs with reactivated reprogramming transgenes were largely undifferentiated and malignant. The hPSC ScoreCard assay confirmed the line-specific differentiation propensities in vitro. However, when undifferentiated cells were analyzed by the PluriTest, only hECs were identified as abnormal whereas all other cell lines were indistinguishable and resembled normal hPSCs. Our results indicate that pluripotency assays are best selected on the basis of intended downstream applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2017.03.009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5425621PMC
May 2017

Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution.

Genome Biol 2017 01 27;18(1):19. Epub 2017 Jan 27.

Molecular Epidemiology section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands.

We show that epigenome- and transcriptome-wide association studies (EWAS and TWAS) are prone to significant inflation and bias of test statistics, an unrecognized phenomenon introducing spurious findings if left unaddressed. Neither GWAS-based methodology nor state-of-the-art confounder adjustment methods completely remove bias and inflation. We propose a Bayesian method to control bias and inflation in EWAS and TWAS based on estimation of the empirical null distribution. Using simulations and real data, we demonstrate that our method maximizes power while properly controlling the false positive rate. We illustrate the utility of our method in large-scale EWAS and TWAS meta-analyses of age and smoking.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-016-1131-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5273857PMC
January 2017

Molecular dissection of germline chromothripsis in a developmental context using patient-derived iPS cells.

Genome Med 2017 01 26;9(1). Epub 2017 Jan 26.

Center for Molecular Medicine and Cancer Genomics Netherlands, Division Biomedical Genetics, University Medical Center Utrecht, Universiteitsweg 100, Utrecht, 3584CG, The Netherlands.

Background: Germline chromothripsis causes complex genomic rearrangements that are likely to affect multiple genes and their regulatory contexts. The contribution of individual rearrangements and affected genes to the phenotypes of patients with complex germline genomic rearrangements is generally unknown.

Methods: To dissect the impact of germline chromothripsis in a relevant developmental context, we performed trio-based RNA expression analysis on blood cells, induced pluripotent stem cells (iPSCs), and iPSC-derived neuronal cells from a patient with de novo germline chromothripsis and both healthy parents. In addition, Hi-C and 4C-seq experiments were performed to determine the effects of the genomic rearrangements on transcription regulation of genes in the proximity of the breakpoint junctions.

Results: Sixty-seven genes are located within 1 Mb of the complex chromothripsis rearrangements involving 17 breakpoints on four chromosomes. We find that three of these genes (FOXP1, DPYD, and TWIST1) are both associated with developmental disorders and differentially expressed in the patient. Interestingly, the effect on TWIST1 expression was exclusively detectable in the patient's iPSC-derived neuronal cells, stressing the need for studying developmental disorders in the biologically relevant context. Chromosome conformation capture analyses show that TWIST1 lost genomic interactions with several enhancers due to the chromothripsis event, which likely led to deregulation of TWIST1 expression and contributed to the patient's craniosynostosis phenotype.

Conclusions: We demonstrate that a combination of patient-derived iPSC differentiation and trio-based molecular profiling is a powerful approach to improve the interpretation of pathogenic complex genomic rearrangements. Here we have applied this approach to identify misexpression of TWIST1, FOXP1, and DPYD as key contributors to the complex congenital phenotype resulting from germline chromothripsis rearrangements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-017-0399-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5270341PMC
January 2017

Disease variants alter transcription factor levels and methylation of their binding sites.

Nat Genet 2017 01 5;49(1):131-138. Epub 2016 Dec 5.

Department of Epidemiology, ErasmusMC, Rotterdam, the Netherlands.

Most disease-associated genetic variants are noncoding, making it challenging to design experiments to understand their functional consequences. Identification of expression quantitative trait loci (eQTLs) has been a powerful approach to infer the downstream effects of disease-associated variants, but most of these variants remain unexplained. The analysis of DNA methylation, a key component of the epigenome, offers highly complementary data on the regulatory potential of genomic regions. Here we show that disease-associated variants have widespread effects on DNA methylation in trans that likely reflect differential occupancy of trans binding sites by cis-regulated transcription factors. Using multiple omics data sets from 3,841 Dutch individuals, we identified 1,907 established trait-associated SNPs that affect the methylation levels of 10,141 different CpG sites in trans (false discovery rate (FDR) < 0.05). These included SNPs that affect both the expression of a nearby transcription factor (such as NFKB1, CTCF and NKX2-3) and methylation of its respective binding site across the genome. Trans methylation QTLs effectively expose the downstream effects of disease-associated variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3721DOI Listing
January 2017

Identification of context-dependent expression quantitative trait loci in whole blood.

Nat Genet 2017 01 5;49(1):139-145. Epub 2016 Dec 5.

Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands.

Genetic risk factors often localize to noncoding regions of the genome with unknown effects on disease etiology. Expression quantitative trait loci (eQTLs) help to explain the regulatory mechanisms underlying these genetic associations. Knowledge of the context that determines the nature and strength of eQTLs may help identify cell types relevant to pathophysiology and the regulatory networks underlying disease. Here we generated peripheral blood RNA-seq data from 2,116 unrelated individuals and systematically identified context-dependent eQTLs using a hypothesis-free strategy that does not require previous knowledge of the identity of the modifiers. Of the 23,060 significant cis-regulated genes (false discovery rate (FDR) ≤ 0.05), 2,743 (12%) showed context-dependent eQTL effects. The majority of these effects were influenced by cell type composition. A set of 145 cis-eQTLs depended on type I interferon signaling. Others were modulated by specific transcription factors binding to the eQTL SNPs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3737DOI Listing
January 2017

Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms.

Genome Biol 2016 Sep 22;17(1):191. Epub 2016 Sep 22.

Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Universiteitssingel 50, 6229 ER, Maastricht, The Netherlands.

Background: Epigenetic change is a hallmark of ageing but its link to ageing mechanisms in humans remains poorly understood. While DNA methylation at many CpG sites closely tracks chronological age, DNA methylation changes relevant to biological age are expected to gradually dissociate from chronological age, mirroring the increased heterogeneity in health status at older ages.

Results: Here, we report on the large-scale identification of 6366 age-related variably methylated positions (aVMPs) identified in 3295 whole blood DNA methylation profiles, 2044 of which have a matching RNA-seq gene expression profile. aVMPs are enriched at polycomb repressed regions and, accordingly, methylation at those positions is associated with the expression of genes encoding components of polycomb repressive complex 2 (PRC2) in trans. Further analysis revealed trans-associations for 1816 aVMPs with an additional 854 genes. These trans-associated aVMPs are characterized by either an age-related gain of methylation at CpG islands marked by PRC2 or a loss of methylation at enhancers. This distinct pattern extends to other tissues and multiple cancer types. Finally, genes associated with aVMPs in trans whose expression is variably upregulated with age (733 genes) play a key role in DNA repair and apoptosis, whereas downregulated aVMP-associated genes (121 genes) are mapped to defined pathways in cellular metabolism.

Conclusions: Our results link age-related changes in DNA methylation to fundamental mechanisms that are thought to drive human ageing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-016-1053-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5032245PMC
September 2016

Blood lipids influence DNA methylation in circulating cells.

Genome Biol 2016 06 27;17(1):138. Epub 2016 Jun 27.

Department of Genetics, University of Groningen, University Medical Centre Groningen, Broerstraat 5, Groningen, The Netherlands.

Background: Cells can be primed by external stimuli to obtain a long-term epigenetic memory. We hypothesize that long-term exposure to elevated blood lipids can prime circulating immune cells through changes in DNA methylation, a process that may contribute to the development of atherosclerosis. To interrogate the causal relationship between triglyceride, low-density lipoprotein (LDL) cholesterol, and high-density lipoprotein (HDL) cholesterol levels and genome-wide DNA methylation while excluding confounding and pleiotropy, we perform a stepwise Mendelian randomization analysis in whole blood of 3296 individuals.

Results: This analysis shows that differential methylation is the consequence of inter-individual variation in blood lipid levels and not vice versa. Specifically, we observe an effect of triglycerides on DNA methylation at three CpGs, of LDL cholesterol at one CpG, and of HDL cholesterol at two CpGs using multivariable Mendelian randomization. Using RNA-seq data available for a large subset of individuals (N = 2044), DNA methylation of these six CpGs is associated with the expression of CPT1A and SREBF1 (for triglycerides), DHCR24 (for LDL cholesterol) and ABCG1 (for HDL cholesterol), which are all key regulators of lipid metabolism.

Conclusions: Our analysis suggests a role for epigenetic priming in end-product feedback control of lipid metabolism and highlights Mendelian randomization as an effective tool to infer causal relationships in integrative genomics data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-016-1000-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4922056PMC
June 2016

Genetic and environmental influences interact with age and sex in shaping the human methylome.

Nat Commun 2016 Apr 7;7:11115. Epub 2016 Apr 7.

Department of Biological Psychology, VU Amsterdam, Van der Boechorststraat 1, 1081BT Amsterdam, The Netherlands.

The methylome is subject to genetic and environmental effects. Their impact may depend on sex and age, resulting in sex- and age-related physiological variation and disease susceptibility. Here we estimate the total heritability of DNA methylation levels in whole blood and estimate the variance explained by common single nucleotide polymorphisms at 411,169 sites in 2,603 individuals from twin families, to establish a catalogue of between-individual variation in DNA methylation. Heritability estimates vary across the genome (mean=19%) and interaction analyses reveal thousands of sites with sex-specific heritability as well as sites where the environmental variance increases with age. Integration with previously published data illustrates the impact of genome and environment across the lifespan at methylation sites associated with metabolic traits, smoking and ageing. These findings demonstrate that our catalogue holds valuable information on locations in the genome where methylation variation between people may reflect disease-relevant environmental exposures or genetic variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms11115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4820961PMC
April 2016

MethylAid: visual and interactive quality control of large Illumina 450k datasets.

Bioinformatics 2014 Dec 21;30(23):3435-7. Epub 2014 Aug 21.

Department of Molecular Epidemiology, Leiden University Medical Center, 2333 ZC Leiden, the Netherlands.

Unlabelled: The Illumina 450k array is a frequently used platform for large-scale genome-wide DNA methylation studies, i.e. epigenome-wide association studies. Currently, quality control of 450k data can be performed with Illumina's GenomeStudio and is part of a limited number 450k analysis pipelines. However, GenomeStudio cannot handle large-scale studies, and existing pipelines provide limited options for quality control and neither support interactive exploration by the user. To aid the detection of bad-quality samples in large-scale genome-wide DNA methylation studies as flexible and transparent as possible, we have developed MethylAid; a visual and interactive Web application using RStudio's shiny package. Bad-quality samples are detected using sample-dependent and sample-independent quality control probes present on the array and user-adjustable thresholds. In-depth exploration of bad-quality samples can be performed using several interactive diagnostic plots. Furthermore, plots can be annotated with user-provided metadata, for example, to identify outlying batches. Our new tool makes quality assessment of 450k array data interactive, flexible and efficient and is, therefore, expected to be useful for both data analysts and core facilities.

Availability And Implementation: MethylAid is implemented as an R/Bioconductor package (www.bioconductor.org/packages/3.0/bioc/html/MethylAid.html). A demo application is available from shiny.bioexp.nl/MethylAid.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu566DOI Listing
December 2014

Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes.

Genome Biol 2014 Jan 7;15(1):R6. Epub 2014 Jan 7.

Background: Long noncoding RNAs (lncRNAs) form an abundant class of transcripts, but the function of the majority of them remains elusive. While it has been shown that some lncRNAs are bound by ribosomes, it has also been convincingly demonstrated that these transcripts do not code for proteins. To obtain a comprehensive understanding of the extent to which lncRNAs bind ribosomes, we performed systematic RNA sequencing on ribosome-associated RNA pools obtained through ribosomal fractionation and compared the RNA content with nuclear and (non-ribosome bound) cytosolic RNA pools.

Results: The RNA composition of the subcellular fractions differs significantly from each other, but lncRNAs are found in all locations. A subset of specific lncRNAs is enriched in the nucleus but surprisingly the majority is enriched in the cytosol and in ribosomal fractions. The ribosomal enriched lncRNAs include H19 and TUG1.

Conclusions: Most studies on lncRNAs have focused on the regulatory function of these transcripts in the nucleus. We demonstrate that only a minority of all lncRNAs are nuclear enriched. Our findings suggest that many lncRNAs may have a function in cytoplasmic processes, and in particular in ribosome complexes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2014-15-1-r6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053777PMC
January 2014

Transcriptome and genome sequencing uncovers functional variation in humans.

Nature 2013 Sep 15;501(7468):506-11. Epub 2013 Sep 15.

Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland.

Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature12531DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3918453PMC
September 2013

General power and sample size calculations for high-dimensional genomic data.

Stat Appl Genet Mol Biol 2013 Aug;12(4):449-67

Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands.

In the design of microarray or next-generation sequencing experiments it is crucial to choose the appropriate number of biological replicates. As often the number of differentially expressed genes and their effect sizes are small and too few replicates will lead to insufficient power to detect these. On the other hand, too many replicates unnecessary leads to high experimental costs. Power and sample size analysis can guide experimentalist in choosing the appropriate number of biological replicates. Several methods for power and sample size analysis have recently been proposed for microarray data. However, most of these are restricted to two group comparisons and require user-defined effect sizes. Here we propose a pilot-data based method for power and sample size analysis which can handle more general experimental designs and uses pilot-data to obtain estimates of the effect sizes. The method can also handle χ2 distributed test statistics which enables power and sample size calculations for a much wider class of models, including high-dimensional generalized linear models which are used, e.g., for RNA-seq data analysis. The performance of the method is evaluated using simulated and experimental data from several microarray and next-generation sequencing experiments. Furthermore, we compare our proposed method for estimation of the density of effect sizes from pilot data with a recent proposed method specific for two group comparisons.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1515/sagmb-2012-0046DOI Listing
August 2013

Integrated analysis of microRNA and mRNA expression: adding biological significance to microRNA target predictions.

Nucleic Acids Res 2013 Aug 14;41(15):e146. Epub 2013 Jun 14.

Center for Human and Clinical Genetics and Leiden University Medical Center, Leiden Genome Technology Center, Leiden University Medical Center, Einthovenweg 20, 2300 ZC Leiden, The Netherlands, Netherlands Bioinformatics Centre, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands, Department of Epidemiology and Biostatistics, VU University Medical Center, De Boelelaan 1118, 1081 HZ Amsterdam, The Netherlands and Department of Pediatric Oncology, Erasmus Medical Center - Sophia Children's Hospital, Dr. Molewaterplein 60, 3015 GJ Rotterdam, The Netherlands.

Current microRNA target predictions are based on sequence information and empirically derived rules but do not make use of the expression of microRNAs and their targets. This study aimed to improve microRNA target predictions in a given biological context, using in silico predictions, microRNA and mRNA expression. We used target prediction tools to produce lists of predicted targets and used a gene set test designed to detect consistent effects of microRNAs on the joint expression of multiple targets. In a single test, association between microRNA expression and target gene set expression as well as the contribution of the individual target genes on the association are determined. The strongest negatively associated mRNAs as measured by the test were prioritized. We applied our integration method to a well-defined muscle differentiation model. Validation of our predictions in C2C12 cells confirmed predicted targets of known as well as novel muscle-related microRNAs. We further studied associations between microRNA-mRNA pairs in human prostate cancer, finding some pairs that have been recently experimentally validated by others. Using the same study, we showed the advantages of the global test over Pearson correlation and lasso. We conclude that our integrated approach successfully identifies regulated microRNAs and their targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt525DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3753644PMC
August 2013

A novel and fast normalization method for high-density arrays.

Stat Appl Genet Mol Biol 2012 Jul 12;11(4). Epub 2012 Jul 12.

Center for Human and Clinical Genetics, Leiden University Medical Center.

Background: Among the most commonly applied microarray normalization methods are intensity-dependent normalization methods such as lowess or loess algorithms. Their computational complexity makes them slow and thus less suitable for normalization of large datasets. Current implementations try to circumvent this problem by using a random subset of the data for normalization, but the impact of this modification has not been previously assessed. We developed a novel intensity-dependent normalization method for microarrays that is fast, simple and can include weighing of observations.

Results: Our normalization method is based on the P-spline scatterplot smoother using all data points for normalization. We show that using a random subset of the data for normalization should be avoided as unstable results can be produced. However, in certain cases normalization based on an invariant subset is desirable, for example, when groups of samples before and after intervention are compared. We show in the context of DNA methylation arrays that a constant weighted P-spline normalization yields a more reliable normalization curve than the one obtained by normalization on the invariant subset only.

Conclusions: Our novel intensity-dependent normalization method is simpler and faster than current loess algorithms, and can be applied to one- and two-colour array data, similar to normalization based on loess.

Availability: An implementation of the method is currently available as an R package called TurboNorm from www.bioconductor.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1515/1544-6115.1753DOI Listing
July 2012

The effects of low levels of dystrophin on mouse muscle function and pathology.

PLoS One 2012 16;7(2):e31937. Epub 2012 Feb 16.

Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.

Duchenne muscular dystrophy (DMD) is a severe progressive muscular disorder caused by reading frame disrupting mutations in the DMD gene, preventing the synthesis of functional dystrophin. As dystrophin provides muscle fiber stability during contractions, dystrophin negative fibers are prone to exercise-induced damage. Upon exhaustion of the regenerative capacity, fibers will be replaced by fibrotic and fat tissue resulting in a progressive loss of function eventually leading to death in the early thirties. With several promising approaches for the treatment of DMD aiming at dystrophin restoration in clinical trials, there is an increasing need to determine more precisely which dystrophin levels are sufficient to restore muscle fiber integrity, protect against muscle damage and improve muscle function.To address this we generated a new mouse model (mdx-Xist(Δhs)) with varying, low dystrophin levels (3-47%, mean 22.7%, stdev 12.1, n = 24) due to skewed X-inactivation. Longitudinal sections revealed that within individual fibers, some nuclei did and some did not express dystrophin, resulting in a random, mosaic pattern of dystrophin expression within fibers.Mdx-Xist(Δhs), mdx and wild type females underwent a 12 week functional test regime consisting of different tests to assess muscle function at base line, or after chronic treadmill running exercise. Overall, mdx-Xist(Δhs) mice with 3-14% dystrophin outperformed mdx mice in the functional tests. Improved histopathology was observed in mice with 15-29% dystrophin and these levels also resulted in normalized expression of pro-inflammatory biomarker genes, while for other parameters >30% of dystrophin was needed. Chronic exercise clearly worsened pathology, which needed dystrophin levels >20% for protection. Based on these findings, we conclude that while even dystrophin levels below 15% can improve pathology and performance, levels of >20% are needed to fully protect muscle fibers from exercise-induced damage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031937PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3281102PMC
August 2012

Comparison of skeletal muscle pathology and motor function of dystrophin and utrophin deficient mouse strains.

Neuromuscul Disord 2012 May 27;22(5):406-17. Epub 2012 Jan 27.

Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.

The genetic defect of mdx mice resembles that of Duchenne muscular dystrophy, although their functional performance and life expectancy is nearly normal. By contrast, mice lacking utrophin and dystrophin (mdx/utrn -/-) are severely affected and die prematurely. Mice with one utrophin allele (mdx/utrn +/-) are more severely affected than mdx mice, but outlive mdx/utrn -/- mice. We subjected mdx/utrn +/+, +/-, -/- and wild type males to a 12week functional test regime of four different functional tests. Mdx/utrn +/+ and +/- mice completed the regime, while mdx/utrn -/- mice died prematurely. Mdx/utrn +/- mice performed significantly worse compared to mdx/utrn +/+ mice in functional tests. Creatine kinase levels, percentage of fibrotic/necrotic tissue, morphology of neuromuscular synapses and expression of biomarker genes were comparable, whereas mdx/utrn +/- and -/- mice had increased levels of regenerating fibers. This makes mdx/utrn +/- mice valuable for testing the benefit of potential therapies on muscle function parameters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.nmd.2011.10.011DOI Listing
May 2012

Single molecule sequencing of free DNA from maternal plasma for noninvasive trisomy 21 detection.

Clin Chem 2012 Apr 25;58(4):699-706. Epub 2012 Jan 25.

Center for Human and Clinical Genetics, Laboratory for Diagnostic Genome Analysis, Leiden University Medical Center, Leiden, the Netherlands.

Background: Noninvasive fetal aneuploidy detection by use of free DNA from maternal plasma has recently been shown to be achievable by whole genome shotgun sequencing. The high-throughput next-generation sequencing platforms previously tested use a PCR step during sample preparation, which results in amplification bias in GC-rich areas of the human genome. To eliminate this bias, and thereby experimental noise, we have used single molecule sequencing as an alternative method.

Methods: For noninvasive trisomy 21 detection, we performed single molecule sequencing on the Helicos platform using free DNA isolated from maternal plasma from 9 weeks of gestation onwards. Relative sequence tag density ratios were calculated and results were directly compared to the previously described Illumina GAII platform.

Results: Sequence data generated without an amplification step show no GC bias. Therefore, with the use of single molecule sequencing all trisomy 21 fetuses could be distinguished more clearly from euploid fetuses.

Conclusions: This study shows for the first time that single molecule sequencing is an attractive and easy to use alternative for reliable noninvasive fetal aneuploidy detection in diagnostics. With this approach, previously described experimental noise associated with PCR amplification, such as GC bias, can be overcome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1373/clinchem.2011.174698DOI Listing
April 2012

Resolving confusion of tongues in statistics and machine learning: a primer for biologists and bioinformaticians.

Proteomics 2012 Feb 23;12(4-5):543-9. Epub 2012 Jan 23.

Center for Human and Clinical Genetics, Leiden University Medical Center, The Netherlands.

Bioinformatics is the field where computational methods from various domains have come together for analysis of biological data. Each domain has introduced its own specific jargon. However, in closely related domains, e.g. machine learning and statistics, concordant and discordant terminology occurs, the later can lead to confusion. This article aims to help solve the confusion of tongues arising from these two closely related domains, which are frequently used in bioinformatics. We provide a short summary of the most commonly applied machine learning and statistical approaches to data analysis in bioinformatics, i.e. classification and statistical hypothesis testing. We explain differences and similarities in common terminology used in various domains, such as precision, recall, sensitivity and true positive rate. This primer can serve as a guide to the terminology used in these fields.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pmic.201100395DOI Listing
February 2012

Filtering, FDR and power.

BMC Bioinformatics 2010 Sep 7;11:450. Epub 2010 Sep 7.

Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands.

Background: In high-dimensional data analysis such as differential gene expression analysis, people often use filtering methods like fold-change or variance filters in an attempt to reduce the multiple testing penalty and improve power. However, filtering may introduce a bias on the multiple testing correction. The precise amount of bias depends on many quantities, such as fraction of probes filtered out, filter statistic and test statistic used.

Results: We show that a biased multiple testing correction results if non-differentially expressed probes are not filtered out with equal probability from the entire range of p-values. We illustrate our results using both a simulation study and an experimental dataset, where the FDR is shown to be biased mostly by filters that are associated with the hypothesis being tested, such as the fold change. Filters that induce little bias on the FDR yield less additional power of detecting differentially expressed genes. Finally, we propose a statistical test that can be used in practice to determine whether any chosen filter introduces bias on the FDR estimate used, given a general experimental setup.

Conclusions: Filtering out of probes must be used with care as it may bias the multiple testing correction. Researchers can use our test for FDR bias to guide their choice of filter and amount of filtering in practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-11-450DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949886PMC
September 2010

Finding transcriptomics biomarkers for in vivo identification of (non-)genotoxic carcinogens using wild-type and Xpa/p53 mutant mouse models.

Carcinogenesis 2009 Oct 20;30(10):1805-12. Epub 2009 Aug 20.

MicroArray Department and Integrative Bioinformatics Unit, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands.

The carcinogenic potential of chemicals and pharmaceuticals is traditionally tested in the chronic, 2 year rodent bioassay. This assay is not only time consuming, expensive and often with a limited sensitivity and specificity but it also causes major distress to the experimental animals. A major improvement in carcinogenicity testing, especially regarding reduction and refinement of animal experimentation, could be the application of toxicogenomics. The ultimate aim of this study is to demonstrate a proof-of-principle for transcriptomics biomarkers in various tissues for identification of (subclasses of) carcinogenic compounds after short-term in vivo exposure studies. Both wild-type and DNA repair-deficient Xpa(-/-)/p53(+/-) (Xpa/p53) mice were exposed up to 14 days to compounds of three distinct classes: genotoxic carcinogens (GTXC), non-genotoxic carcinogens (NGTXC) and non-carcinogens. Subsequently, extensive transcriptomics analyses were performed on several tissues, and transcriptomics data were screened for potential biomarkers using advanced statistical learning techniques. For all tissues analyzed, we identified multigene gene-expression signatures that are, with a high confidence, predictive for GTXC and NGTXC exposures in both mouse genotypes. Xpa/p53 mice did not perform better in the short-term bioassay. We were able to achieve a proof-of-principle for the identification and use of transcriptomics biomarkers for GTXC or NGTXC. This supports the view that toxicogenomics with short-term in vivo exposure provides a viable tool for classifying (geno)toxic compounds.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/carcin/bgp190DOI Listing
October 2009
-->