Publications by authors named "Olivier Delaneau"

46 Publications

Differentially expressed genes reflect disease-induced rather than disease-causing changes in the transcriptome.

Nat Commun 2021 09 24;12(1):5647. Epub 2021 Sep 24.

Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Comparing transcript levels between healthy and diseased individuals allows the identification of differentially expressed genes, which may be causes, consequences or mere correlates of the disease under scrutiny. We propose a method to decompose the observational correlation between gene expression and phenotypes driven by confounders, forward- and reverse causal effects. The bi-directional causal effects between gene expression and complex traits are obtained by Mendelian Randomization integrating summary-level data from GWAS and whole-blood eQTLs. Applying this approach to complex traits reveals that forward effects have negligible contribution. For example, BMI- and triglycerides-gene expression correlation coefficients robustly correlate with trait-to-expression causal effects (r= 0.11, P= 2.0 × 10 and r= 0.13, P= 1.1 × 10), but not detectably with expression-to-trait effects. Our results demonstrate that studies comparing the transcriptome of diseased and healthy subjects are more prone to reveal disease-induced gene expression changes rather than disease causing ones.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-25805-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8463674PMC
September 2021

The molecular basis, genetic control and pleiotropic effects of local gene co-expression.

Nat Commun 2021 08 10;12(1):4842. Epub 2021 Aug 10.

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

Nearby genes are often expressed as a group. Yet, the prevalence, molecular mechanisms and genetic control of local gene co-expression are far from being understood. Here, by leveraging gene expression measurements across 49 human tissues and hundreds of individuals, we find that local gene co-expression occurs in 13% to 53% of genes per tissue. By integrating various molecular assays (e.g. ChIP-seq and Hi-C), we estimate the ability of several mechanisms, such as enhancer-gene interactions, in distinguishing gene pairs that are co-expressed from those that are not. Notably, we identify 32,636 expression quantitative trait loci (eQTLs) which associate with co-expressed gene pairs and often overlap enhancer regions. Due to affecting several genes, these eQTLs are more often associated with multiple human traits than other eQTLs. Our study paves the way to comprehend trait pleiotropy and functional interpretation of QTL and GWAS findings. All local gene co-expression identified here is available through a public database ( https://glcoex.unil.ch/ ).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-25129-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8355184PMC
August 2021

The genomic history of the Aegean palatial civilizations.

Cell 2021 05 29;184(10):2565-2586.e21. Epub 2021 Apr 29.

Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.

The Cycladic, the Minoan, and the Helladic (Mycenaean) cultures define the Bronze Age (BA) of Greece. Urbanism, complex social structures, craft and agricultural specialization, and the earliest forms of writing characterize this iconic period. We sequenced six Early to Middle BA whole genomes, along with 11 mitochondrial genomes, sampled from the three BA cultures of the Aegean Sea. The Early BA (EBA) genomes are homogeneous and derive most of their ancestry from Neolithic Aegeans, contrary to earlier hypotheses that the Neolithic-EBA cultural transition was due to massive population turnover. EBA Aegeans were shaped by relatively small-scale migration from East of the Aegean, as evidenced by the Caucasus-related ancestry also detected in Anatolians. In contrast, Middle BA (MBA) individuals of northern Greece differ from EBA populations in showing ∼50% Pontic-Caspian Steppe-related ancestry, dated at ca. 2,600-2,000 BCE. Such gene flow events during the MBA contributed toward shaping present-day Greek genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2021.03.039DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8127963PMC
May 2021

Gene regulation contributes to explain the impact of early life socioeconomic disadvantage on adult inflammatory levels in two cohort studies.

Sci Rep 2021 02 4;11(1):3100. Epub 2021 Feb 4.

Center for Primary Care and Public Health (Unisanté), University of Lausanne, Lausanne, Switzerland.

Individuals experiencing socioeconomic disadvantage in childhood have a higher rate of inflammation-related diseases decades later. Little is known about the mechanisms linking early life experiences to the functioning of the immune system in adulthood. To address this, we explore the relationship across social-to-biological layers of early life social exposures on levels of adulthood inflammation and the mediating role of gene regulatory mechanisms, epigenetic and transcriptomic profiling from blood, in 2,329 individuals from two European cohort studies. Consistently across both studies, we find transcriptional activity explains a substantive proportion (78% and 26%) of the estimated effect of early life disadvantaged social exposures on levels of adulthood inflammation. Furthermore, we show that mechanisms other than cis DNA methylation may regulate those transcriptional fingerprints. These results further our understanding of social-to-biological transitions by pinpointing the role of gene regulation that cannot fully be explained by differential cis DNA methylation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-021-82714-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7862626PMC
February 2021

Publisher Correction: Efficient phasing and imputation of low-coverage sequencing data using large reference panels.

Nat Genet 2021 Mar;53(3):412

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-021-00788-0DOI Listing
March 2021

Efficient phasing and imputation of low-coverage sequencing data using large reference panels.

Nat Genet 2021 01 7;53(1):120-126. Epub 2021 Jan 7.

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

Low-coverage whole-genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined because current imputation methods are computationally expensive and unable to leverage large reference panels. Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. GLIMPSE achieves imputation of a genome for less than US$1 in computational cost, considerably outperforming other methods and improving imputation accuracy over the full allele frequency range. As a proof of concept, we show that 1× coverage enables effective gene expression association studies and outperforms dense SNP arrays in rare variant burden tests. Overall, this study illustrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-00756-0DOI Listing
January 2021

Genotype imputation using the Positional Burrows Wheeler Transform.

PLoS Genet 2020 11 16;16(11):e1009049. Epub 2020 Nov 16.

Regeneron Genetics Center, Tarrytown, New York, USA.

Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1009049DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7704051PMC
November 2020

High-throughput SARS-CoV-2 and host genome sequencing from single nasopharyngeal swabs.

medRxiv 2020 Sep 1. Epub 2020 Sep 1.

During COVID19 and other viral pandemics, rapid generation of host and pathogen genomic data is critical to tracking infection and informing therapies. There is an urgent need for efficient approaches to this data generation at scale. We have developed a scalable, high throughput approach to generate high fidelity low pass whole genome and HLA sequencing, viral genomes, and representation of human transcriptome from single nasopharyngeal swabs of COVID19 patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.07.27.20163147DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402057PMC
September 2020

Accurate, scalable and integrative haplotype estimation.

Nat Commun 2019 11 28;10(1):5436. Epub 2019 Nov 28.

Department of Genetic Medicine and Development, University of Geneva Medical School, 1 rue Michel-Servet, 1211, Geneva, Switzerland.

The number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here we present a method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear running times with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPEIT4 in an open source format and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-13225-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6882857PMC
November 2019

Expression estimation and eQTL mapping for HLA genes with a personalized pipeline.

PLoS Genet 2019 04 22;15(4):e1008091. Epub 2019 Apr 22.

Department of Genetics and Evolutionary Biology, Institute of Biosciences, University of São Paulo, São Paulo, Brazil.

The HLA (Human Leukocyte Antigens) genes are well-documented targets of balancing selection, and variation at these loci is associated with many disease phenotypes. Variation in expression levels also influences disease susceptibility and resistance, but little information exists about the regulation and population-level patterns of expression. This results from the difficulty in mapping short reads originated from these highly polymorphic loci, and in accounting for the existence of several paralogues. We developed a computational pipeline to accurately estimate expression for HLA genes based on RNA-seq, improving both locus-level and allele-level estimates. First, reads are aligned to all known HLA sequences in order to infer HLA genotypes, then quantification of expression is carried out using a personalized index. We use simulations to show that expression estimates obtained in this way are not biased due to divergence from the reference genome. We applied our pipeline to the GEUVADIS dataset, and compared the quantifications to those obtained with reference transcriptome. Although the personalized pipeline recovers more reads, we found that using the reference transcriptome produces estimates similar to the personalized pipeline (r ≥ 0.87) with the exception of HLA-DQA1. We describe the impact of the HLA-personalized approach on downstream analyses for nine classical HLA loci (HLA-A, HLA-C, HLA-B, HLA-DRA, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1). Although the influence of the HLA-personalized approach is modest for eQTL mapping, the p-values and the causality of the eQTLs obtained are better than when the reference transcriptome is used. We investigate how the eQTLs we identified explain variation in expression among lineages of HLA alleles. Finally, we discuss possible causes underlying differences between expression estimates obtained using RNA-seq, antibody-based approaches and qPCR.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008091DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6497317PMC
April 2019

A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment.

Nat Neurosci 2019 03 28;22(3):353-361. Epub 2019 Jan 28.

Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark.

There is mounting evidence that seemingly diverse psychiatric disorders share genetic etiology, but the biological substrates mediating this overlap are not well characterized. Here we leverage the unique Integrative Psychiatric Research Consortium (iPSYCH) study, a nationally representative cohort ascertained through clinical psychiatric diagnoses indicated in Danish national health registers. We confirm previous reports of individual and cross-disorder single-nucleotide polymorphism heritability for major psychiatric disorders and perform a cross-disorder genome-wide association study. We identify four novel genome-wide significant loci encompassing variants predicted to regulate genes expressed in radial glia and interneurons in the developing neocortex during mid-gestation. This epoch is supported by partitioning cross-disorder single-nucleotide polymorphism heritability, which is enriched at regulatory chromatin active during fetal neurodevelopment. These findings suggest that dysregulation of genes that direct neurodevelopment by common genetic variants may result in general liability for many later psychiatric outcomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41593-018-0320-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6497521PMC
March 2019

The UK Biobank resource with deep phenotyping and genomic data.

Nature 2018 10 10;562(7726):203-209. Epub 2018 Oct 10.

Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.

The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-018-0579-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6786975PMC
October 2018

The effect of genetic variation on promoter usage and enhancer activity.

Nat Commun 2017 11 7;8(1):1358. Epub 2017 Nov 7.

Department of Genetic Medicine and Development, University of Geneva, 1 Michel Servet, Geneva, CH1211, Switzerland.

The identification of genetic variants affecting gene expression, namely expression quantitative trait loci (eQTLs), has contributed to the understanding of mechanisms underlying human traits and diseases. The majority of these variants map in non-coding regulatory regions of the genome and their identification remains challenging. Here, we use natural genetic variation and CAGE transcriptomes from 154 EBV-transformed lymphoblastoid cell lines, derived from unrelated individuals, to map 5376 and 110 regulatory variants associated with promoter usage (puQTLs) and enhancer activity (eaQTLs), respectively. We characterize five categories of genes associated with puQTLs, distinguishing single from multi-promoter genes. Among multi-promoter genes, we find puQTL effects either specific to a single promoter or to multiple promoters with variable effect orientations. Regulatory variants associated with opposite effects on different mRNA isoforms suggest compensatory mechanisms occurring between alternative promoters. Our analyses identify differential promoter usage and modulation of enhancer activity as molecular mechanisms underlying eQTLs related to regulatory elements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-017-01467-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5677018PMC
November 2017

Estimating the causal tissues for complex traits and diseases.

Nat Genet 2017 Dec 23;49(12):1676-1683. Epub 2017 Oct 23.

Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.

How to interpret the biological causes underlying the predisposing markers identified through genome-wide association studies (GWAS) remains an open question. One direct and powerful way to assess the genetic causality behind GWAS is through analysis of expression quantitative trait loci (eQTLs). Here we describe a new approach to estimate the tissues behind the genetic causality of a variety of GWAS traits, using the cis-eQTLs in 44 tissues from the Genotype-Tissue Expression (GTEx) Consortium. We have adapted the regulatory trait concordance (RTC) score to measure the probability of eQTLs being active in multiple tissues and to calculate the probability that a GWAS-associated variant and an eQTL tag the same functional effect. By normalizing the GWAS-eQTL probabilities by the tissue-sharing estimates for eQTLs, we generate relative tissue-causality profiles for GWAS traits. Our approach not only implicates the gene likely mediating individual GWAS signals, but also highlights tissues where the genetic causality for an individual trait is likely manifested.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3981DOI Listing
December 2017

Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues.

Nat Genet 2017 Dec 23;49(12):1747-1751. Epub 2017 Oct 23.

Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying causal variants remains difficult. Whole-genome sequencing (WGS) can help by providing complete knowledge of all genetic variants, but it is financially prohibitive for well-powered GWAS studies. We performed mapping of expression quantitative trait loci (eQTLs) with WGS and RNA-seq, and found that lead eQTL variants called with WGS were more likely to be causal. Through simulations, we derived properties of causal variants and used them to develop a method for identifying likely causal SNPs. We estimated that 25-70% of causal variants were located in open-chromatin regions, depending on the tissue and experiment. Finally, we identified a set of high-confidence causal variants and showed that these were more enriched in GWAS associations than other eQTLs. Of those, we found 65 associations with GWAS traits and provide examples in which genes implicated by expression are functionally validated as being relevant for complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3979DOI Listing
December 2017

A complete tool set for molecular QTL discovery and analysis.

Nat Commun 2017 05 18;8:15452. Epub 2017 May 18.

Department of Genetic Medicine and Development, University of Geneva, 1 Michel Servet, Geneva CH1211, Switzerland.

Population scale studies combining genetic information with molecular phenotypes (for example, gene expression) have become a standard to dissect the effects of genetic variants onto organismal phenotypes. These kinds of data sets require powerful, fast and versatile methods able to discover molecular Quantitative Trait Loci (molQTL). Here we propose such a solution, QTLtools, a modular framework that contains multiple new and well-established methods to prepare the data, to discover proximal and distal molQTLs and, finally, to integrate them with GWAS variants and functional annotations of the genome. We demonstrate its utility by performing a complete expression QTL study in a few easy-to-perform steps. QTLtools is open source and available at https://qtltools.github.io/qtltools/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms15452DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5454369PMC
May 2017

MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets.

Bioinformatics 2017 Jun;33(12):1895-1897

Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.

Motivation: Large genomic datasets combining genotype and sequence data, such as for expression quantitative trait loci (eQTL) detection, require perfect matching between both data types.

Results: We described here MBV (Match BAM to VCF); a method to quickly solve sample mislabeling and detect cross-sample contamination and PCR amplification bias.

Availability And Implementation: MBV is implemented in C ++ as an independent component of the QTLtools software package, the binary and source codes are freely available at https://qtltools.github.io/qtltools/ .

Contact: [email protected] or [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx074DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044394PMC
June 2017

The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease.

Cell 2016 11;167(5):1415-1429.e19

Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK; Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK.

Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2016.10.042DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5300907PMC
November 2016

A reference panel of 64,976 haplotypes for genotype imputation.

Nat Genet 2016 10 22;48(10):1279-83. Epub 2016 Aug 22.

IRGB, CNR, Sardinia, Italy.

We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3643DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5388176PMC
October 2016

Haplotype estimation for biobank-scale data sets.

Nat Genet 2016 07 6;48(7):817-20. Epub 2016 Jun 6.

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

The UK Biobank (UKB) has recently released genotypes on 152,328 individuals together with extensive phenotypic and lifestyle information. We present a new phasing method, SHAPEIT3, that can handle such biobank-scale data sets and results in switch error rates as low as ∼0.3%. The method exhibits O(NlogN) scaling with sample size N, enabling fast and accurate phasing of even larger cohorts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3583DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4926957PMC
July 2016

Phasing for medical sequencing using rare variants and large haplotype reference panels.

Bioinformatics 2016 07 27;32(13):1974-80. Epub 2016 Feb 27.

Department of Statistics, University of Oxford, Oxford, UK, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK and.

Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence.

Results: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset.

Availability And Implementation: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw065DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920110PMC
July 2016

Fast and efficient QTL mapper for thousands of molecular phenotypes.

Bioinformatics 2016 05 26;32(10):1479-85. Epub 2015 Dec 26.

Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, 1211, Switzerland Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland and.

Motivation: In order to discover quantitative trait loci, multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing.

Results: We have developed FastQTL, a method that implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing. The outcome of permutations is modeled using beta distributions trained from a few permutations and from which adjusted P-values can be estimated at any level of significance with little computational cost. The Geuvadis & GTEx pilot datasets can be now easily analyzed an order of magnitude faster than previous approaches.

Availability And Implementation: Source code, binaries and comprehensive documentation of FastQTL are freely available to download at http://fastqtl.sourceforge.net/

Contact: [email protected] or [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv722DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4866519PMC
May 2016

Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank.

Lancet Respir Med 2015 Oct 27;3(10):769-81. Epub 2015 Sep 27.

Division of Respiratory Medicine, Queen's Medical Centre, University of Nottingham, Nottingham, UK.

Background: Understanding the genetic basis of airflow obstruction and smoking behaviour is key to determining the pathophysiology of chronic obstructive pulmonary disease (COPD). We used UK Biobank data to study the genetic causes of smoking behaviour and lung health.

Methods: We sampled individuals of European ancestry from UK Biobank, from the middle and extremes of the forced expiratory volume in 1 s (FEV1) distribution among heavy smokers (mean 35 pack-years) and never smokers. We developed a custom array for UK Biobank to provide optimum genome-wide coverage of common and low-frequency variants, dense coverage of genomic regions already implicated in lung health and disease, and to assay rare coding variants relevant to the UK population. We investigated whether there were shared genetic causes between different phenotypes defined by extremes of FEV1. We also looked for novel variants associated with extremes of FEV1 and smoking behaviour and assessed regions of the genome that had already shown evidence for a role in lung health and disease. We set genome-wide significance at p<5 × 10(-8).

Findings: UK Biobank participants were recruited from March 15, 2006, to July 7, 2010. Sample selection for the UK BiLEVE study started on Nov 22, 2012, and was completed on Dec 20, 2012. We selected 50,008 unique samples: 10,002 individuals with low FEV1, 10,000 with average FEV1, and 5002 with high FEV1 from each of the heavy smoker and never smoker groups. We noted a substantial sharing of genetic causes of low FEV1 between heavy smokers and never smokers (p=2.29 × 10(-16)) and between individuals with and without doctor-diagnosed asthma (p=6.06 × 10(-11)). We discovered six novel genome-wide significant signals of association with extremes of FEV1, including signals at four novel loci (KANSL1, TSEN54, TET2, and RBM19/TBX5) and independent signals at two previously reported loci (NPNT and HLA-DQB1/HLA-DQA2). These variants also showed association with COPD, including in individuals with no history of smoking. The number of copies of a 150 kb region containing the 5' end of KANSL1, a gene that is important for epigenetic gene regulation, was associated with extremes of FEV1. We also discovered five new genome-wide significant signals for smoking behaviour, including a variant in NCAM1 (chromosome 11) and a variant on chromosome 2 (between TEX41 and PABPC1P2) that has a trans effect on expression of NCAM1 in brain tissue.

Interpretation: By sampling from the extremes of the lung function distribution in UK Biobank, we identified novel genetic causes of lung function and smoking behaviour. These results provide new insight into the specific mechanisms underlying airflow obstruction, COPD, and tobacco addiction, and show substantial shared genetic architecture underlying airflow obstruction across individuals, irrespective of smoking behaviour and other airway disease.

Funding: Medical Research Council.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S2213-2600(15)00283-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4593935PMC
October 2015

Identification of Genes Whose Expression Profile Is Associated with Non-Progression towards AIDS Using eQTLs.

PLoS One 2015 14;10(9):e0136989. Epub 2015 Sep 14.

Chaire de Bioinformatique; Laboratoire Génomique, Bioinformatique, et Applications (EA 4627), Conservatoire National des Arts et Métiers, Paris, France.

Background: Many genome-wide association studies have been performed on progression towards the acquired immune deficiency syndrome (AIDS) and they mainly identified associations within the HLA loci. In this study, we demonstrate that the integration of biological information, namely gene expression data, can enhance the sensitivity of genetic studies to unravel new genetic associations relevant to AIDS.

Methods: We collated the biological information compiled from three databases of expression quantitative trait loci (eQTLs) involved in cells of the immune system. We derived a list of single nucleotide polymorphisms (SNPs) that are functional in that they correlate with differential expression of genes in at least two of the databases. We tested the association of those SNPs with AIDS progression in two cohorts, GRIV and ACS. Tests on permuted phenotypes of the GRIV and ACS cohorts or on randomised sets of equivalent SNPs allowed us to assess the statistical robustness of this method and to estimate the true positive rate.

Results: Eight genes were identified with high confidence (p = 0.001, rate of true positives 75%). Some of those genes had previously been linked with HIV infection. Notably, ENTPD4 belongs to the same family as CD39, whose expression has already been associated with AIDS progression; while DNAJB12 is part of the HSP90 pathway, which is involved in the control of HIV latency. Our study also drew our attention to lesser-known functions such as mitochondrial ribosomal proteins and a zinc finger protein, ZFP57, which could be central to the effectiveness of HIV infection. Interestingly, for six out of those eight genes, down-regulation is associated with non-progression, which makes them appealing targets to develop drugs against HIV.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0136989PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4569262PMC
May 2016

Population Variation and Genetic Control of Modular Chromatin Architecture in Humans.

Cell 2015 Aug 20;162(5):1039-50. Epub 2015 Aug 20.

Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva 1211, Switzerland; Institute of Genetics and Genomics in Geneva, University of Geneva, Geneva 1211, Switzerland. Electronic address:

Chromatin state variation at gene regulatory elements is abundant across individuals, yet we understand little about the genetic basis of this variability. Here, we profiled several histone modifications, the transcription factor (TF) PU.1, RNA polymerase II, and gene expression in lymphoblastoid cell lines from 47 whole-genome sequenced individuals. We observed that distinct cis-regulatory elements exhibit coordinated chromatin variation across individuals in the form of variable chromatin modules (VCMs) at sub-Mb scale. VCMs were associated with thousands of genes and preferentially cluster within chromosomal contact domains. We mapped strong proximal and weak, yet more ubiquitous, distal-acting chromatin quantitative trait loci (cQTL) that frequently explain this variation. cQTLs were associated with molecular activity at clusters of cis-regulatory elements and mapped preferentially within TF-bound regions. We propose that local, sequence-independent chromatin variation emerges as a result of genetic perturbations in cooperative interactions between cis-regulatory elements that are located within the same genomic domain.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2015.08.001DOI Listing
August 2015

Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.

Nat Commun 2014 Jun 13;5:3934. Epub 2014 Jun 13.

1] Department of Statistics, University of Oxford, Oxford OX1 3TG, UK [2] Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.

A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms4934DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4338501PMC
June 2014

Biased allelic expression in human primary fibroblast single cells.

Am J Hum Genet 2015 Jan 31;96(1):70-80. Epub 2014 Dec 31.

Department of Genetic Medicine and Development, University of Geneva, 1211 Geneva, Switzerland; Institute of Genetics and Genomics of Geneva, 1211 Geneva, Switzerland; Service of Genetic Medicine, University Hospitals of Geneva, 1211 Geneva, Switzerland. Electronic address:

The study of gene expression in mammalian single cells via genomic technologies now provides the possibility to investigate the patterns of allelic gene expression. We used single-cell RNA sequencing to detect the allele-specific mRNA level in 203 single human primary fibroblasts over 133,633 unique heterozygous single-nucleotide variants (hetSNVs). We observed that at the snapshot of analyses, each cell contained mostly transcripts from one allele from the majority of genes; indeed, 76.4% of the hetSNVs displayed stochastic monoallelic expression in single cells. Remarkably, adjacent hetSNVs exhibited a haplotype-consistent allelic ratio; in contrast, distant sites located in two different genes were independent of the haplotype structure. Moreover, the allele-specific expression in single cells correlated with the abundance of the cellular transcript. We observed that genes expressing both alleles in the majority of the single cells at a given time point were rare and enriched with highly expressed genes. The relative abundance of each allele in a cell was controlled by some regulatory mechanisms given that we observed related single-cell allelic profiles according to genes. Overall, these results have direct implications in cellular phenotypic variability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2014.12.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289680PMC
January 2015

Evidence after imputation for a role of MICA variants in nonprogression and elite control of HIV type 1 infection.

J Infect Dis 2014 Dec 16;210(12):1946-50. Epub 2014 Jun 16.

Chaire de Bioinformatique, EA4627, Conservatoire National des Arts et Métiers.

Past genome-wide association studies (GWAS) involving individuals with AIDS have mainly identified associations in the HLA region. Using the latest software, we imputed 7 million single-nucleotide polymorphisms (SNPs)/indels of the 1000 Genomes Project from the GWAS-determined genotypes of individuals in the Genomics of Resistance to Immunodeficiency Virus AIDS nonprogression cohort and compared them with those of control cohorts. The strongest signals were in MICA, the gene encoding major histocompatibility class I polypeptide-related sequence A (P = 3.31 × 10(-12)), with a particular exonic deletion (P = 1.59 × 10(-8)) in full linkage disequilibrium with the reference HCP5 rs2395029 SNP. Haplotype analysis also revealed an additive effect between HLA-C, HLA-B, and MICA variants. These data suggest a role for MICA in progression and elite control of human immunodeficiency virus type 1 infection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/infdis/jiu342DOI Listing
December 2014

A general approach for haplotype phasing across the full spectrum of relatedness.

PLoS Genet 2014 Apr 17;10(4):e1004234. Epub 2014 Apr 17.

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; Department of Statistics, University of Oxford, Oxford, United Kingdom.

Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1004234DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3990520PMC
April 2014

Haplotype estimation using sequencing reads.

Am J Hum Genet 2013 Oct;93(4):687-96

Department of Statistics, University of Oxford, Oxford OX1 3TG, UK.

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2013.09.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3791270PMC
October 2013
-->