Publications by authors named "Aaron P Ragsdale"

14 Publications

  • Page 1 of 1

Assumptions about frequency-dependent architectures of complex traits bias measures of functional enrichment.

Genet Epidemiol 2021 Jun 22. Epub 2021 Jun 22.

Department of Human Genetics, McGill University, Montreal, Quebec, Canada.

Linkage-Disequilibrium Score Regression (LDSC) is a popular framework for analyzing Genome-wide Association Studies (GWAS) summary statistics that allows for estimating single nucleotide polymorphism heritability, confounding, and functional enrichment of genetic variants with different annotations. Recent work has highlighted the influence of implicit and explicit assumptions of the model on the biological interpretation of the results. In this study, we explored a formulation of LDSC that replaces the measure of LD with a recently proposed unbiased estimator of the statistic. In addition to modest statistical difference across estimators, this derivation highlighted implicit and unrealistic assumptions about the relationship between allele frequency, effect size, and annotation status. We carry out a systematic comparison of alternative LDSC formulations by applying them to summary statistics from 47 GWAS traits. Our results show that commonly used models likely underestimate functional enrichment. These results highlight the importance of calibrating the LDSC model to achieve a more robust understanding of polygenic traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22388DOI Listing
June 2021

Inferring genome-wide correlations of mutation fitness effects between populations.

Mol Biol Evol 2021 May 27. Epub 2021 May 27.

University of Arizona, USA.

The effect of a mutation on fitness may differ between populations depending on environmental and genetic context, but little is known about the factors that underlie such differences. To quantify genome-wide correlations in mutation fitness effects, we developed a novel concept called a joint distribution of fitness effects (DFE) between populations. We then proposed a new statistic w to measure the DFE correlation between populations. Using simulation, we showed that inferring the DFE correlation from the joint allele frequency spectrum is statistically precise and robust. Using population genomic data, we inferred DFE correlations of populations in humans, Drosophila melanogaster, and wild tomatoes. In these species, we found that the overall correlation of the joint DFE was inversely related to genetic differentiation. In humans and D. melanogaster, deleterious mutations had a lower DFE correlation than tolerated mutations, indicating a complex joint DFE. Altogether, the DFE correlation can be reliably inferred, and it offers extensive insight into the genetics of population divergence.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msab162DOI Listing
May 2021

Nonparametric coalescent inference of mutation spectrum history and demography.

Proc Natl Acad Sci U S A 2021 May;118(21)

Department of Genome Sciences, University of Washington, Seattle, WA 98195;

As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.2013798118DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8166128PMC
May 2021

Brassica Rapa domestication: untangling wild and feral forms and convergence of crop morphotypes.

Mol Biol Evol 2021 Apr 30. Epub 2021 Apr 30.

University of Wisconsin-Madison, Department of Botany.

The study of domestication contributes to our knowledge of evolution and crop genetic resources. Human selection has shaped wild Brassica rapa into diverse turnip, leafy, and oilseed crops. Despite its worldwide economic importance and potential as a model for understanding diversification under domestication, insights into the number of domestication events and initial crop(s) domesticated in B. rapa have been limited due to a lack of clarity about the wild or feral status of conspecific non-crop relatives. To address this gap and reconstruct the domestication history of B. rapa, we analyzed 68,468 genotyping-by-sequencing-derived SNPs for 416 samples in the largest diversity panel of domesticated and weedy B. rapa to date. To further understand the center of origin, we modeled the potential range of wild B. rapa during the mid-Holocene. Our analyses of genetic diversity across B. rapa morphotypes suggest that non-crop samples from the Caucasus, Siberia, and Italy may be truly wild, while those occurring in the Americas and much of Europe are feral. Clustering, tree-based analyses, and parameterized demographic inference further indicate that turnips were likely the first crop type domesticated, from which leafy types in East Asia and Europe were selected from distinct lineages. These findings clarify the domestication history and nature of wild crop genetic resources for B. rapa, which provides the first step toward investigating cases of possible parallel selection, the domestication and feralization syndrome, and novel germplasm for Brassica crop improvement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msab108DOI Listing
April 2021

Lessons Learned from Bugs in Models of Human History.

Am J Hum Genet 2020 10;107(4):583-588

Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK. Electronic address:

Simulation plays a central role in population genomics studies. Recent years have seen rapid improvements in software efficiency that make it possible to simulate large genomic regions for many individuals sampled from large numbers of populations. As the complexity of the demographic models we study grows, however, there is an ever-increasing opportunity to introduce bugs in their implementation. Here, we describe two errors made in defining population genetic models using the msprime coalescent simulator that have found their way into the published record. We discuss how these errors have affected downstream analyses and give recommendations for software developers and users to reduce the risk of such errors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.08.017DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536610PMC
October 2020

A community-maintained standard library of population genetic models.

Elife 2020 06 23;9. Epub 2020 Jun 23.

Department of Biology and Institute of Ecology and Evolution, University of Oregon, Eugene, United States.

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.54967DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7438115PMC
June 2020

Accounting for long-range correlations in genome-wide simulations of large cohorts.

PLoS Genet 2020 05 5;16(5):e1008619. Epub 2020 May 5.

McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada.

Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008619DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266353PMC
May 2020

Unbiased Estimation of Linkage Disequilibrium from Unphased Data.

Mol Biol Evol 2020 03;37(3):923-932

Department of Human Genetics, McGill University, Montreal, QC, Canada.

Linkage disequilibrium (LD) is used to infer evolutionary history, to identify genomic regions under selection, and to dissect the relationship between genotype and phenotype. In each case, we require accurate estimates of LD statistics from sequencing data. Unphased data present a challenge because multilocus haplotypes cannot be inferred exactly. Widely used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These unbiased statistics are particularly well suited to estimate effective population sizes from unlinked loci in small populations. We develop a simple inference pipeline and use it to refine estimates of recent effective population sizes of the threatened Channel Island Fox populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msz265DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038669PMC
March 2020

Models of archaic admixture and recent history from two-locus statistics.

PLoS Genet 2019 06 10;15(6):e1008204. Epub 2019 Jun 10.

Department of Human Genetics, McGill University, Montreal, QC, Canada.

We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 - 8% genetic ancestry to individuals in world-wide populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008204DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6586359PMC
June 2019

Genomic inference using diffusion models and the allele frequency spectrum.

Curr Opin Genet Dev 2018 12 23;53:140-147. Epub 2018 Oct 23.

Department of Human Genetics, McGill University, Montreal, QC, Canada. Electronic address:

Evolutionary, biological, and demographic processes together shape observed variation in populations. Understanding how these processes influence variation allows us to infer past demography and the nature of selection in populations. Forward in time models such as the diffusion approximation provide a powerful tool for performing inference based on the distribution of allele frequencies. Here, we discuss recent computational developments and their application to reconstructing human demographic history. Using whole-genome sequence data for 797 French Canadian individuals, we assess the neutrality of synonymous variants and show that selection can bias inferred demography, mutation rates, and distributions of fitness effects. We argue that the simple evolutionary models investigated by Kimura and Ohta still provide important insight into modern genetic research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.gde.2018.10.001DOI Listing
December 2018

Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation.

Genetics 2017 07 11;206(3):1549-1567. Epub 2017 May 11.

Department of Human Genetics and Genome Quebec Innovation Centre, McGill University, Montreal, QC H3A 0G1, Canada

Understanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the robustness of the out-of-Africa model inference to the choice of modern population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.117.200493DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5500150PMC
July 2017

Inferring Demographic History Using Two-Locus Statistics.

Genetics 2017 06 16;206(2):1037-1048. Epub 2017 Apr 16.

Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona 85721

Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.117.201251DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499162PMC
June 2017

Genomic inferences of domestication events are corroborated by written records in Brassica rapa.

Mol Ecol 2017 Jul 5;26(13):3373-3388. Epub 2017 May 5.

Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA.

Demographic modelling is often used with population genomic data to infer the relationships and ages among populations. However, relatively few analyses are able to validate these inferences with independent data. Here, we leverage written records that describe distinct Brassica rapa crops to corroborate demographic models of domestication. Brassica rapa crops are renowned for their outstanding morphological diversity, but the relationships and order of domestication remain unclear. We generated genomewide SNPs from 126 accessions collected globally using high-throughput transcriptome data. Analyses of more than 31,000 SNPs across the B. rapa genome revealed evidence for five distinct genetic groups and supported a European-Central Asian origin of B. rapa crops. Our results supported the traditionally recognized South Asian and East Asian B. rapa groups with evidence that pak choi, Chinese cabbage and yellow sarson are likely monophyletic groups. In contrast, the oil-type B. rapa subsp. oleifera and brown sarson were polyphyletic. We also found no evidence to support the contention that rapini is the wild type or the earliest domesticated subspecies of B. rapa. Demographic analyses suggested that B. rapa was introduced to Asia 2,400-4,100 years ago, and that Chinese cabbage originated 1,200-2,100 years ago via admixture of pak choi and European-Central Asian B. rapa. We also inferred significantly different levels of founder effect among the B. rapa subspecies. Written records from antiquity that document these crops are consistent with these inferences. The concordance between our age estimates of domestication events with historical records provides unique support for our demographic inferences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/mec.14131DOI Listing
July 2017

Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations.

Genetics 2016 05 30;203(1):513-23. Epub 2016 Mar 30.

Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona 85721

The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.115.184812DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858796PMC
May 2016