Publications by authors named "Simon Gravel"

44 Publications

Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits.

Elife 2020 12 29;9. Epub 2020 Dec 29.

Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States.

People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of ~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.56029DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771964PMC
December 2020

A review of UMAP in population genetics.

J Hum Genet 2021 Jan 14;66(1):85-91. Epub 2020 Oct 14.

Department of Human Genetics, McGill University, Montreal, QC, Canada.

Uniform manifold approximation and projection (UMAP) has been rapidly adopted by the population genetics community to study population structure. It has become common in visualizing the ancestral composition of human genetic datasets, as well as searching for unique clusters of data, and for identifying geographic patterns. Here we give an overview of applications of UMAP in population genetics, provide recommendations for best practices, and offer insights on optimal uses for the technique.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s10038-020-00851-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728596PMC
January 2021

Lessons Learned from Bugs in Models of Human History.

Am J Hum Genet 2020 10;107(4):583-588

Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK. Electronic address:

Simulation plays a central role in population genomics studies. Recent years have seen rapid improvements in software efficiency that make it possible to simulate large genomic regions for many individuals sampled from large numbers of populations. As the complexity of the demographic models we study grows, however, there is an ever-increasing opportunity to introduce bugs in their implementation. Here, we describe two errors made in defining population genetic models using the msprime coalescent simulator that have found their way into the published record. We discuss how these errors have affected downstream analyses and give recommendations for software developers and users to reduce the risk of such errors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.08.017DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536610PMC
October 2020

A community-maintained standard library of population genetic models.

Elife 2020 06 23;9. Epub 2020 Jun 23.

Department of Biology and Institute of Ecology and Evolution, University of Oregon, Eugene, United States.

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.54967DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7438115PMC
June 2020

Accounting for long-range correlations in genome-wide simulations of large cohorts.

PLoS Genet 2020 05 5;16(5):e1008619. Epub 2020 May 5.

McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada.

Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008619DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266353PMC
May 2020

Unbiased Estimation of Linkage Disequilibrium from Unphased Data.

Mol Biol Evol 2020 03;37(3):923-932

Department of Human Genetics, McGill University, Montreal, QC, Canada.

Linkage disequilibrium (LD) is used to infer evolutionary history, to identify genomic regions under selection, and to dissect the relationship between genotype and phenotype. In each case, we require accurate estimates of LD statistics from sequencing data. Unphased data present a challenge because multilocus haplotypes cannot be inferred exactly. Widely used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These unbiased statistics are particularly well suited to estimate effective population sizes from unlinked loci in small populations. We develop a simple inference pipeline and use it to refine estimates of recent effective population sizes of the threatened Channel Island Fox populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msz265DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038669PMC
March 2020

UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts.

PLoS Genet 2019 11 1;15(11):e1008432. Epub 2019 Nov 1.

McGill University and Genome Quebec Innovation Centre, Montreal, Québec, Canada.

Human populations feature both discrete and continuous patterns of variation. Current analysis approaches struggle to jointly identify these patterns because of modelling assumptions, mathematical constraints, or numerical challenges. Here we apply uniform manifold approximation and projection (UMAP), a non-linear dimension reduction tool, to three well-studied genotype datasets and discover overlooked subpopulations within the American Hispanic population, fine-scale relationships between geography, genotypes, and phenotypes in the UK population, and cryptic structure in the Thousand Genomes Project data. This approach is well-suited to the influx of large and diverse data and opens new lines of inquiry in population-scale datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008432DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853336PMC
November 2019

Legacy Data Confound Genomics Studies.

Mol Biol Evol 2020 Jan;37(1):2-10

Department of Human Genetics, McGill University, Montreal, QC, Canada.

Recent reports have identified differences in the mutational spectra across human populations. Although some of these reports have been replicated in other cohorts, most have been reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative population stratification within the Japanese population, we identified a previously unreported batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population stratification. Because the 1kGP data are used extensively, we find that the batch effects also lead to incorrect imputation by leading imputation servers and a small number of suspicious GWAS associations. Lower quality data from the early phases of the 1kGP thus continue to contaminate modern studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msz201DOI Listing
January 2020

Models of archaic admixture and recent history from two-locus statistics.

PLoS Genet 2019 06 10;15(6):e1008204. Epub 2019 Jun 10.

Department of Human Genetics, McGill University, Montreal, QC, Canada.

We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 - 8% genetic ancestry to individuals in world-wide populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008204DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6586359PMC
June 2019

Mutations in ACTL6B Cause Neurodevelopmental Deficits and Epilepsy and Lead to Loss of Dendrites in Human Neurons.

Am J Hum Genet 2019 05 25;104(5):815-834. Epub 2019 Apr 25.

Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, WC1N 3BG London, UK.

We identified individuals with variations in ACTL6B, a component of the chromatin remodeling machinery including the BAF complex. Ten individuals harbored bi-allelic mutations and presented with global developmental delay, epileptic encephalopathy, and spasticity, and ten individuals with de novo heterozygous mutations displayed intellectual disability, ambulation deficits, severe language impairment, hypotonia, Rett-like stereotypies, and minor facial dysmorphisms (wide mouth, diastema, bulbous nose). Nine of these ten unrelated individuals had the identical de novo c.1027G>A (p.Gly343Arg) mutation. Human-derived neurons were generated that recaptured ACTL6B expression patterns in development from progenitor cell to post-mitotic neuron, validating the use of this model. Engineered knock-out of ACTL6B in wild-type human neurons resulted in profound deficits in dendrite development, a result recapitulated in two individuals with different bi-allelic mutations, and reversed on clonal genetic repair or exogenous expression of ACTL6B. Whole-transcriptome analyses and whole-genomic profiling of the BAF complex in wild-type and bi-allelic mutant ACTL6B neural progenitor cells and neurons revealed increased genomic binding of the BAF complex in ACTL6B mutants, with corresponding transcriptional changes in several genes including TPPP and FSCN1, suggesting that altered regulation of some cytoskeletal genes contribute to altered dendrite development. Assessment of bi-alleic and heterozygous ACTL6B mutations on an ACTL6B knock-out human background demonstrated that bi-allelic mutations mimic engineered deletion deficits while heterozygous mutations do not, suggesting that the former are loss of function and the latter are gain of function. These results reveal a role for ACTL6B in neurodevelopment and implicate another component of chromatin remodeling machinery in brain disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2019.03.022DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6507050PMC
May 2019

A study in scarlet: MC1R as the main predictor of red hair and exemplar of the flip-flop effect.

Hum Mol Genet 2019 06;28(12):2093-2106

Anesthesia and the Alan Edwards Centre for Research on Pain, McGill University, Montreal, Canada.

Genetic variation in melanocortin-1 receptor (MC1R) is a known contributor to disease-free red hair in humans. Three loss-of-function single-nucleotide variants (rs1805007, rs1805008 and rs1805009) have been established as strongly correlated with red hair. The contribution of other loss-of-function MC1R variants (in particular rs1805005, rs2228479 and rs885479) and the extent to which other genetic loci are involved in red hair colour is less well understood. Here, we used the UK Biobank cohort to capture a comprehensive list of MC1R variants contributing to red hair colour. We report a correlation with red hair for both strong-effect variants (rs1805007, rs1805008 and rs1805009) and weak-effect variants (rs1805005, rs2228479 and rs885479) and show that their coefficients differ by two orders of magnitude. On the haplotype level, both strong- and weak-effect variants contribute to the red hair phenotype, but when considered individually, weak-effect variants show a reverse, negative association with red hair. The reversal of association direction in the single-variant analysis is facilitated by a distinguishing structure of MC1R, in which loss-of-function variants are never found to co-occur on the same haplotype. The other previously reported hair colour genes' variants do not substantially improve the MC1R red hair colour predictive model. Our best model for predicting red versus other hair colours yields an unparalleled area under the receiver operating characteristic of 0.96 using only MC1R variants. In summary, we present a comprehensive statistically derived characterization of the role of MC1R variants in red hair colour and offer a powerful, economical and parsimonious model that achieves unsurpassed performance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddz018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6548228PMC
June 2019

Inferring Transmission Histories of Rare Alleles in Population-Scale Genealogies.

Am J Hum Genet 2018 12;103(6):893-906

McGill University and Genome Quebec Innovation Centre, Montréal, QC H3A 0G1, Canada. Electronic address:

Learning the transmission history of alleles through a family or population plays an important role in evolutionary, demographic, and medical genetic studies. Most classical models of population genetics have attempted to do so under the assumption that the genealogy of a population is unavailable and that its idiosyncrasies can be described by a small number of parameters describing population size and mate choice dynamics. Large genetic samples have increased sensitivity to such modeling assumptions, and large-scale genealogical datasets become a useful tool to investigate realistic genealogies. However, analyses in such large datasets are often intractable using conventional methods. We present an efficient method to infer transmission paths of rare alleles through population-scale genealogies. Based on backward-time Monte Carlo simulations of genetic inheritance, we use an importance sampling scheme to dramatically speed up convergence. The approach can take advantage of available genotypes of subsets of individuals in the genealogy including haplotype structure as well as information about the mode of inheritance and general prevalence of a mutation or disease in the population. Using a high-quality genealogical dataset of more than three million married individuals in the Quebec founder population, we apply the method to reconstruct the transmission history of chronic atrial and intestinal dysrhythmia (CAID), a rare recessive disease. We identify the most likely early carriers of the mutation and geographically map the expected carrier rate in the present-day French-Canadian population of Quebec.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2018.10.017DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288464PMC
December 2018

Author Correction: Germline HAVCR2 mutations altering TIM-3 characterize subcutaneous panniculitis-like T cell lymphomas with hemophagocytic lymphohistiocytic syndrome.

Nat Genet 2019 01;51(1):196

Department of Pathology, Montreal Children's Hospital, McGill University Health Centre, Montreal, Quebec, Canada.

In the version of this article originally published, the main-text sentence "In three patients of European ancestry, we identified the germline variant encoding p.Ile97Met in TIM-3, which was homozygous in two (P12 and P13) and heterozygous in one (P15) in the germline but with no TIM-3 plasma membrane expression in the tumor" misstated the identifiers of the two homozygous individuals, which should have been P13 and P14. The error has been corrected in the HTML, PDF and print versions of the paper.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0304-8DOI Listing
January 2019

Germline HAVCR2 mutations altering TIM-3 characterize subcutaneous panniculitis-like T cell lymphomas with hemophagocytic lymphohistiocytic syndrome.

Nat Genet 2018 12 29;50(12):1650-1657. Epub 2018 Oct 29.

Department of Pathology, Montreal Children's Hospital, McGill University Health Centre, Montreal, Quebec, Canada.

Subcutaneous panniculitis-like T cell lymphoma (SPTCL), a non-Hodgkin lymphoma, can be associated with hemophagocytic lymphohistiocytosis (HLH), a life-threatening immune activation that adversely affects survival. T cell immunoglobulin mucin 3 (TIM-3) is a modulator of immune responses expressed on subgroups of T and innate immune cells. We identify in ~60% of SPTCL cases germline, loss-of-function, missense variants altering highly conserved residues of TIM-3, c.245A>G (p.Tyr82Cys) and c.291A>G (p.Ile97Met), each with specific geographic distribution. The variant encoding p.Tyr82Cys TIM-3 occurs on a potential founder chromosome in patients with East Asian and Polynesian ancestry, while p.Ile97Met TIM-3 occurs in patients with European ancestry. Both variants induce protein misfolding and abrogate TIM-3's plasma membrane expression, leading to persistent immune activation and increased production of inflammatory cytokines, including tumor necrosis factor-α and interleukin-1β, promoting HLH and SPTCL. Our findings highlight HLH-SPTCL as a new genetic entity and identify mutations causing TIM-3 alterations as a causative genetic defect in SPTCL. While HLH-SPTCL patients with mutant TIM-3 benefit from immunomodulation, therapeutic repression of the TIM-3 checkpoint may have adverse consequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0251-4DOI Listing
December 2018

Genomic inference using diffusion models and the allele frequency spectrum.

Curr Opin Genet Dev 2018 12 23;53:140-147. Epub 2018 Oct 23.

Department of Human Genetics, McGill University, Montreal, QC, Canada. Electronic address:

Evolutionary, biological, and demographic processes together shape observed variation in populations. Understanding how these processes influence variation allows us to infer past demography and the nature of selection in populations. Forward in time models such as the diffusion approximation provide a powerful tool for performing inference based on the distribution of allele frequencies. Here, we discuss recent computational developments and their application to reconstructing human demographic history. Using whole-genome sequence data for 797 French Canadian individuals, we assess the neutrality of synonymous variants and show that selection can bias inferred demography, mutation rates, and distributions of fitness effects. We argue that the simple evolutionary models investigated by Kimura and Ohta still provide important insight into modern genetic research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.gde.2018.10.001DOI Listing
December 2018

Intratumor Heterogeneity and Circulating Tumor Cell Clusters.

Mol Biol Evol 2018 09 12;35(9):2135-2144. Epub 2017 Jun 12.

Department of Human Genetics, McGill University, Montréeal, QC, Canada.

Genetic diversity plays a central role in tumor progression, metastasis, and resistance to treatment. Experiments are shedding light on this diversity at ever finer scales, but interpretation is challenging. Using recent progress in numerical models, we simulate macroscopic tumors to investigate the interplay between growth dynamics, microscopic composition, and circulating tumor cell cluster diversity. We find that modest differences in growth parameters can profoundly change microscopic diversity. Simple outwards expansion leads to spatially segregated clones and low diversity, as expected. However, a modest cell turnover can result in an increased number of divisions and mixing among clones resulting in increased microscopic diversity in the tumor core. Using simulations to estimate power to detect such spatial trends, we find that multiregion sequencing data from contemporary studies is marginally powered to detect the predicted effects. Slightly larger samples, improved detection of rare variants, or sequencing of smaller biopsies or circulating tumor cell clusters would allow one to distinguish between leading models of tumor evolution. The genetic composition of circulating tumor cell clusters, which can be obtained from non-invasive blood draws, is therefore informative about tumor evolution and its metastatic potential.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msy115DOI Listing
September 2018

On the decidability of population size histories from finite allele frequency spectra.

Theor Popul Biol 2018 03 3;120:42-51. Epub 2018 Jan 3.

Department of Human Genetics, McGill University, Montreal, QC, Canada; McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada. Electronic address:

Understanding the historical events that shaped current genomic diversity has applications in historical, biological, and medical research. However, the amount of historical information that can be inferred from genetic data is finite, which leads to an identifiability problem. For example, different historical processes can lead to identical distribution of allele frequencies. This identifiability issue casts a shadow of uncertainty over the results of any study which uses the frequency spectrum to infer past demography. It has been argued that imposing mild 'reasonableness' constraints on demographic histories can enable unique reconstruction, at least in an idealized setting where the length of the genome is nearly infinite. Here, we discuss this problem for finite sample size and genome length. Using the diffusion approximation, we obtain bounds on likelihood differences between similar demographic histories, and use them to construct pairs of very different reasonable histories that produce almost-identical frequency distributions. The finite-genome problem therefore remains poorly determined even among reasonable histories. Where fits to few-parameter models produce narrow parameter confidence intervals, large uncertainties lurk hidden by model assumption.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tpb.2017.12.008DOI Listing
March 2018

Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation.

Genetics 2017 07 11;206(3):1549-1567. Epub 2017 May 11.

Department of Human Genetics and Genome Quebec Innovation Centre, McGill University, Montreal, QC H3A 0G1, Canada

Understanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the robustness of the out-of-Africa model inference to the choice of modern population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.117.200493DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5500150PMC
July 2017

Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations.

Am J Hum Genet 2017 Apr 30;100(4):635-649. Epub 2017 Mar 30.

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center of Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. Electronic address:

The vast majority of genome-wide association studies (GWASs) are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g., linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWASs, we used published summary statistics to calculate polygenic risk scores for eight well-studied phenotypes. We identify directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk are typically highest in the population from which summary statistics were derived. We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable. This work cautions that summarizing findings from large-scale GWASs may have limited portability to other populations using standard approaches and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2017.03.004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5384097PMC
April 2017

The Great Migration and African-American Genomic Diversity.

PLoS Genet 2016 05 27;12(5):e1006059. Epub 2016 May 27.

Department of Human Genetics, McGill University, Montreal, Quebec, Canada.

We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status. An estimated 82.1% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West. Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of ∼15-16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1006059DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4883799PMC
May 2016

When Is Selection Effective?

Authors:
Simon Gravel

Genetics 2016 05 23;203(1):451-62. Epub 2016 Mar 23.

Department of Human Genetics, McGill University, and Genome Quebec Innovation Centre, Montreal, Québec, Canada, H3A 1A4

Deleterious alleles can reach high frequency in small populations because of random fluctuations in allele frequency. This may lead, over time, to reduced average fitness. In this sense, selection is more "effective" in larger populations. Recent studies have considered whether the different demographic histories across human populations have resulted in differences in the number, distribution, and severity of deleterious variants, leading to an animated debate. This article first seeks to clarify some terms of the debate by identifying differences in definitions and assumptions used in recent studies. We argue that variants of Morton, Crow, and Muller's "total mutational damage" provide the soundest and most practical basis for such comparisons. Using simulations, analytical calculations, and 1000 Genomes Project data, we provide an intuitive and quantitative explanation for the observed similarity in genetic load across populations. We show that recent demography has likely modulated the effect of selection and still affects it, but the net result of the accumulated differences is small. Direct observation of differential efficacy of selection for specific allele classes is nevertheless possible with contemporary data sets. By contrast, identifying average genome-wide differences in the efficacy of selection across populations will require many modeling assumptions and is unlikely to provide much biological insight about human populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.115.184630DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858791PMC
May 2016

Genomic Insights into the Ancestry and Demographic History of South America.

PLoS Genet 2015 Dec 4;11(12):e1005602. Epub 2015 Dec 4.

Department of Genetics, Stanford University, Stanford, California, United States of America.

South America has a complex demographic history shaped by multiple migration and admixture events in pre- and post-colonial times. Settled over 14,000 years ago by Native Americans, South America has experienced migrations of European and African individuals, similar to other regions in the Americas. However, the timing and magnitude of these events resulted in markedly different patterns of admixture throughout Latin America. We use genome-wide SNP data for 437 admixed individuals from 5 countries (Colombia, Ecuador, Peru, Chile, and Argentina) to explore the population structure and demographic history of South American Latinos. We combined these data with population reference panels from Africa, Asia, Europe and the Americas to perform global ancestry analysis and infer the subcontinental origin of the European and Native American ancestry components of the admixed individuals. By applying ancestry-specific PCA analyses we find that most of the European ancestry in South American Latinos is from the Iberian Peninsula; however, many individuals trace their ancestry back to Italy, especially within Argentina. We find a strong gradient in the Native American ancestry component of South American Latinos associated with country of origin and the geography of local indigenous populations. For example, Native American genomic segments in Peruvians show greater affinities with Andean indigenous peoples like Quechua and Aymara, whereas Native American haplotypes from Colombians tend to cluster with Amazonian and coastal tribes from northern South America. Using ancestry tract length analysis we modeled post-colonial South American migration history as the youngest in Latin America during European colonization (9-14 generations ago), with an additional strong pulse of European migration occurring between 3 and 9 generations ago. These genetic footprints can impact our understanding of population-level differences in biomedical traits and, thus, inform future medical genetic studies in the region.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1005602DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4670080PMC
December 2015

Computationally Efficient Composite Likelihood Statistics for Demographic Inference.

Mol Biol Evol 2016 Feb 5;33(2):591-3. Epub 2015 Nov 5.

Department of Molecular and Cellular Biology, University of Arizona

Many population genetics tools employ composite likelihoods, because fully modeling genomic linkage is challenging. But traditional approaches to estimating parameter uncertainties and performing model selection require full likelihoods, so these tools have relied on computationally expensive maximum-likelihood estimation (MLE) on bootstrapped data. Here, we demonstrate that statistical theory can be applied to adjust composite likelihoods and perform robust computationally efficient statistical inference in two demographic inference tools: ∂a∂i and TRACTS. On both simulated and real data, the adjustments perform comparably to MLE bootstrapping while using orders of magnitude less computational time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msv255DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5854098PMC
February 2016

Estimating the mutation load in human genomes.

Nat Rev Genet 2015 Jun 12;16(6):333-43. Epub 2015 May 12.

McGill University, Department of Human Genetics and Genome Quebec Innovation Centre, 740 Dr Penfield Avenue, Montreal, Quebec H3A 0G1, Canada.

Next-generation sequencing technology has facilitated the discovery of millions of genetic variants in human genomes. A sizeable fraction of these variants are predicted to be deleterious. Here, we review the pattern of deleterious alleles as ascertained in genome sequencing data sets and ask whether human populations differ in their predicted burden of deleterious alleles - a phenomenon known as mutation load. We discuss three demographic models that are predicted to affect mutation load and relate these models to the evidence (or the lack thereof) for variation in the efficacy of purifying selection in diverse human genomes. We also emphasize why accurate estimation of mutation load depends on assumptions regarding the distribution of dominance and selection coefficients - quantities that remain poorly characterized for current genomic data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nrg3931DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959039PMC
June 2015

The existence and abundance of ghost ancestors in biparental populations.

Theor Popul Biol 2015 May 19;101:47-53. Epub 2015 Feb 19.

Biomathematics Research Centre, University of Canterbury, New Zealand.

In a randomly-mating biparental population of size N there are, with high probability, individuals who are genealogical ancestors of every extant individual within approximately log2(N) generations into the past. We use this result of J. Chang to prove a curious corollary under standard models of recombination: there exist, with high probability, individuals within a constant multiple of log2(N) generations into the past who are simultaneously (i) genealogical ancestors of each of the individuals at the present, and (ii) genetic ancestors to none of the individuals at the present. Such ancestral individuals-ancestors of everyone today that left no genetic trace-represent 'ghost' ancestors in a strong sense. In this short note, we use simple analytical argument and simulations to estimate how many such individuals exist in finite Wright-Fisher populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tpb.2015.02.002DOI Listing
May 2015

Adaptive, convergent origins of the pygmy phenotype in African rainforest hunter-gatherers.

Proc Natl Acad Sci U S A 2014 09 18;111(35):E3596-603. Epub 2014 Aug 18.

Sainte-Justine Hospital Research Centre, Montreal, QC, Canada H3T 1C5; Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland;

The evolutionary history of the human pygmy phenotype (small body size), a characteristic of African and Southeast Asian rainforest hunter-gatherers, is largely unknown. Here we use a genome-wide admixture mapping analysis to identify 16 genomic regions that are significantly associated with the pygmy phenotype in the Batwa, a rainforest hunter-gatherer population from Uganda (east central Africa). The identified genomic regions have multiple attributes that provide supporting evidence of genuine association with the pygmy phenotype, including enrichments for SNPs previously associated with stature variation in Europeans and for genes with growth hormone receptor and regulation functions. To test adaptive evolutionary hypotheses, we computed the haplotype-based integrated haplotype score (iHS) statistic and the level of population differentiation (FST) between the Batwa and their agricultural neighbors, the Bakiga, for each genomic SNP. Both |iHS| and FST values were significantly higher for SNPs within the Batwa pygmy phenotype-associated regions than the remainder of the genome, a signature of polygenic adaptation. In contrast, when we expanded our analysis to include Baka rainforest hunter-gatherers from Cameroon and Gabon (west central Africa) and Nzebi and Nzime neighboring agriculturalists, we did not observe elevated |iHS| or FST values in these genomic regions. Together, these results suggest adaptive and at least partially convergent origins of the pygmy phenotype even within Africa, supporting the hypothesis that small body size confers a selective advantage for tropical rainforest hunter-gatherers but raising questions about the antiquity of this behavior.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1402875111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4156716PMC
September 2014

Predicting discovery rates of genomic features.

Authors:
Simon Gravel

Genetics 2014 Jun 17;197(2):601-10. Epub 2014 Mar 17.

Department of Human Genetics and Génome Québec Innovation Centre, McGill University, Montréal, Quebec H3A 0G1, Canada and

Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.114.162149DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4063918PMC
June 2014

Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries.

Am J Hum Genet 2013 Nov 25;93(5):852-64. Epub 2013 Oct 25.

Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA. Electronic address:

Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062-147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217-73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2013.10.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3824117PMC
November 2013