Publications by authors named "Thorfinn Sand Korneliussen"

19 Publications

  • Page 1 of 1

Extensive genome-wide phylogenetic discordance is due to incomplete lineage sorting and not ongoing introgression in a rapidly radiated bryophyte genus.

Mol Biol Evol 2021 Mar 3. Epub 2021 Mar 3.

Department of Natural History, NTNU University Museum, Norwegian University of Science and Technology, Trondheim, Norway.

The relative importance of introgression for diversification has long been a highly disputed topic in speciation research and remains an open question despite the great attention it has received over the past decade. Gene flow leaves traces in the genome similar to those created by incomplete lineage sorting (ILS), and identification and quantification of gene flow in the presence of ILS is challenging and requires knowledge about the true phylogenetic relationship among the species. We use whole nuclear, plastid and organellar genomes from 12 species in the rapidly radiated, ecologically diverse, actively hybridizing genus of peatmoss (Sphagnum) to reconstruct the species phylogeny and quantify introgression using a suite of phylogenomic methods. We found extensive phylogenetic discordance among nuclear and organellar phylogenies, as well as across the nuclear genome and the nodes in the species tree, best explained by extensive ILS following the rapid radiation of the genus rather than by post-speciation introgression. Our analyses support the idea of ancient introgression among the ancestral lineages followed by ILS, whereas recent gene flow among the species is highly restricted despite widespread interspecific hybridization known in the group. Our results contribute to phylogenomic understanding of how speciation proceeds in rapidly radiated, actively hybridizing species groups, and demonstrate that employing a combination of diverse phylogenomic methods can facilitate untangling complex phylogenetic patterns created by ILS and introgression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msab063DOI Listing
March 2021

A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits.

Mol Ecol Resour 2021 May 8;21(4):1085-1097. Epub 2021 Feb 8.

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.

Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2-4x). We compared the RADseq SFS with medium-depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13324DOI Listing
May 2021

Targeted conservation genetics of the endangered chimpanzee.

Heredity (Edinb) 2020 Aug 28;125(1-2):15-27. Epub 2020 Apr 28.

Research and Conservation, Copenhagen Zoo, Roskildevej 38, 2000, Frederiksberg, Denmark.

Populations of the common chimpanzee (Pan troglodytes) are in an impending risk of going extinct in the wild as a consequence of damaging anthropogenic impact on their natural habitat and illegal pet and bushmeat trade. Conservation management programmes for the chimpanzee have been established outside their natural range (ex situ), and chimpanzees from these programmes could potentially be used to supplement future conservation initiatives in the wild (in situ). However, these programmes have often suffered from inadequate information about the geographical origin and subspecies ancestry of the founders. Here, we present a newly designed capture array with ~60,000 ancestry informative markers used to infer ancestry of individual chimpanzees in ex situ populations and determine geographical origin of confiscated sanctuary individuals. From a test panel of 167 chimpanzees with unknown origins or subspecies labels, we identify 90 suitable non-admixed individuals in the European Association of Zoos and Aquaria (EAZA) Ex situ Programme (EEP). Equally important, another 46 individuals have been identified with admixed subspecies ancestries, which therefore over time, should be naturally phased out of the breeding populations. With potential for future re-introduction to the wild, we determine the geographical origin of 31 individuals that were confiscated from the illegal trade and demonstrate the promises of using non-invasive sampling in future conservation action plans. Collectively, our genomic approach provides an exemplar for ex situ management of endangered species and offers an efficient tool in future in situ efforts to combat the illegal wildlife trade.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41437-020-0313-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7413263PMC
August 2020

A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data.

Bioinformatics 2020 02;36(3):828-841

Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.

Motivation: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets.

Results: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e. when the contaminant and the target come from closely related populations or with increased error rates. With a running time below 5 min, our method is applicable to large scale aDNA genomic studies.

Availability And Implementation: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz660DOI Listing
February 2020

Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples.

Genetics 2019 07 14;212(3):587-614. Epub 2019 May 14.

Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark.

Both the total amount and the distribution of heterozygous sites within individual genomes are informative about the genetic diversity of the population they belong to. Detecting true heterozygous sites in ancient genomes is complicated by the generally limited coverage achieved and the presence of post-mortem damage inflating sequencing errors. Additionally, large runs of homozygosity found in the genomes of particularly inbred individuals and of domestic animals can skew estimates of genome-wide heterozygosity rates. Current computational tools aimed at estimating runs of homozygosity and genome-wide heterozygosity levels are generally sensitive to such limitations. Here, we introduce ROHan, a probabilistic method which substantially improves the estimate of heterozygosity rates both genome-wide and for genomic local windows. It combines a local Bayesian model and a Hidden Markov Model at the genome-wide level and can work both on modern and ancient samples. We show that our algorithm outperforms currently available methods for predicting heterozygosity rates for ancient samples. Specifically, ROHan can delineate large runs of homozygosity (at megabase scales) and produce a reliable confidence interval for the genome-wide rate of heterozygosity outside of such regions from modern genomes with a depth of coverage as low as 5-6× and down to 7-8× for ancient samples showing moderate DNA damage. We apply ROHan to a series of modern and ancient genomes previously published and revise available estimates of heterozygosity for humans, chimpanzees and horses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.119.302057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6614887PMC
July 2019

Fast and accurate relatedness estimation from high-throughput sequencing data in the presence of inbreeding.

Gigascience 2019 05;8(5)

Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen K, Denmark.

Background: The estimation of relatedness between pairs of possibly inbred individuals from high-throughput sequencing (HTS) data has previously not been possible for samples where we cannot obtain reliable genotype calls, as in the case of low-coverage data.

Results: We introduce ngsRelateV2, a major revision of ngsRelateV1, a program that originally allowed for estimation of relatedness from HTS data among non-inbred individuals only. The new revised version takes into account the possibility of individuals being inbred by estimating the 9 condensed Jacquard coefficients along with various other relatedness statistics. The program is threaded and scales linearly with the number of cores allocated to the process.

Conclusion: The program is available as an open source C/C++ program under the GPL license and hosted at https://github.com/ANGSD/ngsRelate. To facilitate easy analysis, the program is able to work directly on the most commonly used container formats for raw sequence (BAM/CRAM) and summary data (VCF/BCF).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giz034DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6488770PMC
May 2019

NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data.

Bioinformatics 2015 Dec 30;31(24):4009-11. Epub 2015 Aug 30.

Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark.

Motivation: Pairwise relatedness estimation is important in many contexts such as disease mapping and population genetics. However, all existing estimation methods are based on called genotypes, which is not ideal for next-generation sequencing (NGS) data of low depth from which genotypes cannot be called with high certainty.

Results: We present a software tool, NgsRelate, for estimating pairwise relatedness from NGS data. It provides maximum likelihood estimates that are based on genotype likelihoods instead of genotypes and thereby takes the inherent uncertainty of the genotypes into account. Using both simulated and real data, we show that NgsRelate provides markedly better estimates for low-depth NGS data than two state-of-the-art genotype-based methods.

Availability: NgsRelate is implemented in C++ and is available under the GNU license at www.popgen.dk/software.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv509DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4673978PMC
December 2015

The ancestry and affiliations of Kennewick Man.

Nature 2015 Jul;523(7561):455-458

Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, DK-1350 Copenhagen K, Denmark.

Kennewick Man, referred to as the Ancient One by Native Americans, is a male human skeleton discovered in Washington state (USA) in 1996 and initially radiocarbon dated to 8,340-9,200 calibrated years before present (BP). His population affinities have been the subject of scientific debate and legal controversy. Based on an initial study of cranial morphology it was asserted that Kennewick Man was neither Native American nor closely related to the claimant Plateau tribes of the Pacific Northwest, who claimed ancestral relationship and requested repatriation under the Native American Graves Protection and Repatriation Act (NAGPRA). The morphological analysis was important to judicial decisions that Kennewick Man was not Native American and that therefore NAGPRA did not apply. Instead of repatriation, additional studies of the remains were permitted. Subsequent craniometric analysis affirmed Kennewick Man to be more closely related to circumpacific groups such as the Ainu and Polynesians than he is to modern Native Americans. In order to resolve Kennewick Man's ancestry and affiliations, we have sequenced his genome to ∼1× coverage and compared it to worldwide genomic data including for the Ainu and Polynesians. We find that Kennewick Man is closer to modern Native Americans than to any other population worldwide. Among the Native American groups for whom genome-wide data are available for comparison, several seem to be descended from a population closely related to that of Kennewick Man, including the Confederated Tribes of the Colville Reservation (Colville), one of the five tribes claiming Kennewick Man. We revisit the cranial analyses and find that, as opposed to genome-wide comparisons, it is not possible on that basis to affiliate Kennewick Man to specific contemporary groups. We therefore conclude based on genetic comparisons that Kennewick Man shows continuity with Native North Americans over at least the last eight millennia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature14625DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878456PMC
July 2015

A recent bottleneck of Y chromosome diversity coincides with a global change in culture.

Genome Res 2015 Apr 13;25(4):459-66. Epub 2015 Mar 13.

Center of Molecular Diagnosis and Genetic Research, University Hospital of Obstetrics and Gynecology, Tirana, ALB1005, Albania;

It is commonly thought that human genetic diversity in non-African populations was shaped primarily by an out-of-Africa dispersal 50-100 thousand yr ago (kya). Here, we present a study of 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Applying ancient DNA calibration, we date the Y-chromosomal most recent common ancestor (MRCA) in Africa at 254 (95% CI 192-307) kya and detect a cluster of major non-African founder haplogroups in a narrow time interval at 47-52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck. In contrast to demographic reconstructions based on mtDNA, we infer a second strong bottleneck in Y-chromosome lineages dating to the last 10 ky. We hypothesize that this bottleneck is caused by cultural changes affecting variance of reproductive success among males.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.186684.114DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381518PMC
April 2015

The origin and evolution of maize in the Southwestern United States.

Nat Plants 2015 Jan 8;1:14003. Epub 2015 Jan 8.

Centre for GeoGenetics, University of Copenhagen, 1350 Copenhagen, Denmark.

The origin of maize (Zea mays mays) in the US Southwest remains contentious, with conflicting archaeological data supporting either coastal(1-4) or highland(5,6) routes of diffusion of maize into the United States. Furthermore, the genetics of adaptation to the new environmental and cultural context of the Southwest is largely uncharacterized(7). To address these issues, we compared nuclear DNA from 32 archaeological maize samples spanning 6,000 years of evolution to modern landraces. We found that the initial diffusion of maize into the Southwest about 4,000 years ago is likely to have occurred along a highland route, followed by gene flow from a lowland coastal maize beginning at least 2,000 years ago. Our population genetic analysis also enabled us to differentiate selection during domestication for adaptation to the climatic and cultural environment of the Southwest, identifying adaptation loci relevant to drought tolerance and sugar content.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nplants.2014.3DOI Listing
January 2015

ANGSD: Analysis of Next Generation Sequencing Data.

BMC Bioinformatics 2014 Nov 25;15:356. Epub 2014 Nov 25.

Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen, Denmark.

Background: High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.

Results: We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

Conclusions: The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd . The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-014-0356-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248462PMC
November 2014

Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears.

Cell 2014 May;157(4):785-94

BGI-Shenzhen, Shenzhen 518083, China; Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen Ø, Denmark; Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China; Department of Medicine, University of Hong Kong, Sassoon Road, Pokfulam, Hong Kong. Electronic address:

Polar bears are uniquely adapted to life in the High Arctic and have undergone drastic physiological changes in response to Arctic climates and a hyper-lipid diet of primarily marine mammal prey. We analyzed 89 complete genomes of polar bear and brown bear using population genomic modeling and show that the species diverged only 479-343 thousand years BP. We find that genes on the polar bear lineage have been under stronger positive selection than in brown bears; nine of the top 16 genes under strong positive selection are associated with cardiomyopathy and vascular disease, implying important reorganization of the cardiovascular system. One of the genes showing the strongest evidence of selection, APOB, encodes the primary lipoprotein component of low-density lipoprotein (LDL); functional mutations in APOB may explain how polar bears are able to cope with life-long elevated LDL levels that are associated with high risk of heart disease in humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2014.03.054DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4089990PMC
May 2014

The genome of a Late Pleistocene human from a Clovis burial site in western Montana.

Nature 2014 Feb;506(7487):225-9

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet 208, Kgs. Lyngby DK-2800, Denmark.

Clovis, with its distinctive biface, blade and osseous technologies, is the oldest widespread archaeological complex defined in North America, dating from 11,100 to 10,700 (14)C years before present (bp) (13,000 to 12,600 calendar years bp). Nearly 50 years of archaeological research point to the Clovis complex as having developed south of the North American ice sheets from an ancestral technology. However, both the origins and the genetic legacy of the people who manufactured Clovis tools remain under debate. It is generally believed that these people ultimately derived from Asia and were directly related to contemporary Native Americans. An alternative, Solutrean, hypothesis posits that the Clovis predecessors emigrated from southwestern Europe during the Last Glacial Maximum. Here we report the genome sequence of a male infant (Anzick-1) recovered from the Anzick burial site in western Montana. The human bones date to 10,705 ± 35 (14)C years bp (approximately 12,707-12,556 calendar years bp) and were directly associated with Clovis tools. We sequenced the genome to an average depth of 14.4× and show that the gene flow from the Siberian Upper Palaeolithic Mal'ta population into Native American ancestors is also shared by the Anzick-1 individual and thus happened before 12,600 years bp. We also show that the Anzick-1 individual is more closely related to all indigenous American populations than to any other group. Our data are compatible with the hypothesis that Anzick-1 belonged to a population directly ancestral to many contemporary Native Americans. Finally, we find evidence of a deep divergence in Native American populations that predates the Anzick-1 individual.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878442PMC
February 2014

Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data.

BMC Bioinformatics 2013 Oct 2;14:289. Epub 2013 Oct 2.

Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Oestervoldgade 5-7, DK-1350, Copenhagen, Denmark.

Background: A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima's D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. However, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions.

Results: We have developed an approach that accommodates the uncertainty of the data when calculating site frequency based neutrality test statistics. A salient feature of this approach is that it implicitly solves the problems of varying sequencing depth, missing data and avoids the need to infer variable sites for the analysis and thereby avoids ascertainment problems introduced by a SNP discovery process.

Conclusion: Using an empirical Bayes approach for fast computations, we show that this method produces results for low-coverage NGS data comparable to those achieved when the genotypes are known without uncertainty. We also validate the method in an analysis of data from the 1000 genomes project. The method is implemented in a fast framework which enables researchers to perform these neutrality tests on a genome-wide scale.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-14-289DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015034PMC
October 2013

Estimating individual admixture proportions from next generation sequencing data.

Genetics 2013 Nov 11;195(3):693-702. Epub 2013 Sep 11.

The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N.

Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.113.154138DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813857PMC
November 2013

Quantifying population genetic differentiation from next-generation sequencing data.

Genetics 2013 Nov 26;195(3):979-92. Epub 2013 Aug 26.

Department of Integrative Biology, University of California, Berkeley, California 94720.

Over the past few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data. In particular, the use of naïve methods to identify polymorphic sites and infer genotypes can inflate downstream analyses. Recently, explicit modeling of genotype probability distributions has been proposed as a method for taking genotype call uncertainty into account. Based on this idea, we propose a novel method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy for investigating population structure via principal components analysis. Through extensive simulations, we compare the new method herein proposed to approaches based on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled individuals, suggesting that employing this new method is useful for investigating the genetic relationships of populations sampled at low coverage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.113.154740DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813878PMC
November 2013

Association testing for next-generation sequencing data using score statistics.

Genet Epidemiol 2012 Jul 8;36(5):430-7. Epub 2012 May 8.

Department of Biology, University of Copenhagen, Copenhagen, Denmark.

The advances in sequencing technology have made large-scale sequencing studies for large cohorts feasible. Often, the primary goal for large-scale studies is to identify genetic variants associated with a disease or other phenotypes. Even when deep sequencing is performed, there will be many sites where there is not enough data to call genotypes accurately. Ignoring the genotype classification uncertainty by basing subsequent analyses on called genotypes leads to a loss in power. Additionally, using called genotypes can lead to spurious association signals. Some methods taking the uncertainty of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies as well as mapping of quantitative traits. The model allows additional covariates that enable correction for confounding factors such as population stratification or cohort effects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.21636DOI Listing
July 2012

Sequencing of 50 human exomes reveals adaptation to high altitude.

Science 2010 Jul;329(5987):75-8

BGI-Shenzhen, Shenzhen 518083, China.

Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1), a transcription factor involved in response to hypoxia. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP's association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1190371DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3711608PMC
July 2010

Relatedness mapping and tracts of relatedness for genome-wide data in the presence of linkage disequilibrium.

Genet Epidemiol 2009 Apr;33(3):266-74

Department of Biostatistics, Copenhagen University, Copenhagen, Denmark.

Estimates of relatedness have several applications such as the identification of relatives or in identifying disease related genes through identity by descent (IBD) mapping. Here we present a new method for identifying IBD tracts among individuals from genome-wide single nucleotide polymorphisms data. We use a continuous time Markov model where the hidden states are the number of alleles shared IBD between pairs of individuals at a given position. In contrast to previous methods, our method accurately accounts for linkage disequilibrium using pairwise haplotype probabilities. The method provides a map of the local relatedness along the genome. We illustrate the potential of the method for mapping disease genes on a real data set, and show that the method has the potential to map causative disease mutations using only a handful of affected individuals. The new IBD mapping method provides considerable improvement in mapping power in natural populations compared to standard association mapping methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.20378DOI Listing
April 2009