Publications by authors named "Brian L Browning"

61 Publications

Fast two-stage phasing of large-scale sequence data.

Am J Hum Genet 2021 10 2;108(10):1880-1890. Epub 2021 Sep 2.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2021.08.005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551421PMC
October 2021

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Haplotype analysis of the internationally distributed BRCA1 c.3331_3334delCAAG founder mutation reveals a common ancestral origin in Iberia.

Breast Cancer Res 2020 10 21;22(1):108. Epub 2020 Oct 21.

Pontificia Universidad Católica de Chile, Santiago, Chile.

Background: The BRCA1 c.3331_3334delCAAG founder mutation has been reported in hereditary breast and ovarian cancer families from multiple Hispanic groups. We aimed to evaluate BRCA1 c.3331_3334delCAAG haplotype diversity in cases of European, African, and Latin American ancestry.

Methods: BC mutation carrier cases from Colombia (n = 32), Spain (n = 13), Portugal (n = 2), Chile (n = 10), Africa (n = 1), and Brazil (n = 2) were genotyped with the genome-wide single nucleotide polymorphism (SNP) arrays to evaluate haplotype diversity around BRCA1 c.3331_3334delCAAG. Additional Portuguese (n = 13) and Brazilian (n = 18) BC mutation carriers were genotyped for 15 informative SNPs surrounding BRCA1. Data were phased using SHAPEIT2, and identical by descent regions were determined using BEAGLE and GERMLINE. DMLE+ was used to date the mutation in Colombia and Iberia.

Results: The haplotype reconstruction revealed a shared 264.4-kb region among carriers from all six countries. The estimated mutation age was ~ 100 generations in Iberia and that it was introduced to South America early during the European colonization period.

Conclusions: Our results suggest that this mutation originated in Iberia and later introduced to Colombia and South America at the time of Spanish colonization during the early 1500s. We also found that the Colombian mutation carriers had higher European ancestry, at the BRCA1 gene harboring chromosome 17, than controls, which further supported the European origin of the mutation. Understanding founder mutations in diverse populations has implications in implementing cost-effective, ancestry-informed screening.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13058-020-01341-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7579869PMC
October 2020

Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection.

Am J Hum Genet 2020 11 13;107(5):895-910. Epub 2020 Oct 13.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.

Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.09.010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7553009PMC
November 2020

IBDkin: fast estimation of kinship coefficients from identity by descent segments.

Bioinformatics 2020 08;36(16):4519-4520

Department of Biostatistics.

Motivation: Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size.

Results: We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM.

Availability And Implementation: https://github.com/YingZhou001/IBDkin.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa569DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7750976PMC
August 2020

Population-Specific Recombination Maps from Segments of Identity by Descent.

Am J Hum Genet 2020 07 12;107(1):137-148. Epub 2020 Jun 12.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA. Electronic address:

Recombination rates vary significantly across the genome, and estimates of recombination rates are needed for downstream analyses such as haplotype phasing and genotype imputation. Existing methods for recombination rate estimation are limited by insufficient amounts of informative genetic data or by high computational cost. We present a method and software, called IBDrecomb, for using segments of identity by descent to infer recombination rates. IBDrecomb can be applied to sequenced population cohorts to obtain high-resolution, population-specific recombination maps. In simulated admixed data, IBDrecomb obtains higher accuracy than admixture-based estimation of recombination rates. When applied to 2,500 simulated individuals, IBDrecomb obtains similar accuracy to a linkage-disequilibrium (LD)-based method applied to 96 individuals (the largest number for which computation is tractable). Compared to LD-based maps, our IBD-based maps have the advantage of estimating recombination rates in the recent past rather than the distant past. We used IBDrecomb to generate new recombination maps for European Americans and for African Americans from TOPMed sequence data from the Framingham Heart Study (1,626 unrelated individuals) and the Jackson Heart Study (2,046 unrelated individuals), and we compare them to LD-based, admixture-based, and family-based maps.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.05.016DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7332656PMC
July 2020

A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data.

Am J Hum Genet 2020 04 12;106(4):426-437. Epub 2020 Mar 12.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA. Electronic address:

Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.02.010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118582PMC
April 2020

Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent.

Am J Hum Genet 2019 11 3;105(5):883-893. Epub 2019 Oct 3.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA. Electronic address:

The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10, 1.56 × 10].
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2019.09.012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848988PMC
November 2019

Genetic history of the population of Crete.

Ann Hum Genet 2019 11 13;83(6):373-388. Epub 2019 Jun 13.

Departments of Medicine and Genome Sciences, University of Washington, Seattle, Washington.

The medieval history of several populations often suffers from scarcity of contemporary records resulting in contradictory and sometimes biased interpretations by historians. This is the situation with the population of the island of Crete, which remained relatively undisturbed until the Middle Ages when multiple wars, invasions, and occupations by foreigners took place. Historians have considered the effects of the occupation of Crete by the Arabs (in the 9th and 10th centuries C.E.) and the Venetians (in the 13th to the 17th centuries C.E.) to the local population. To obtain insights on such effects from a genetic perspective, we studied representative samples from 17 Cretan districts using the Illumina 1 million or 2.5 million arrays and compared the Cretans to the populations of origin of the medieval conquerors and settlers. Highlights of our findings include (1) small genetic contributions from the Arab occupation to the extant Cretan population, (2) low genetic contribution of the Venetians to the extant Cretan population, and (3) evidence of a genetic relationship among the Cretans and Central, Northern, and Eastern Europeans, which could be explained by the settlement in the island of northern origin tribes during the medieval period. Our results show how the interaction between genetics and the historical record can help shed light on the historical record.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/ahg.12328DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6851683PMC
November 2019

A One-Penny Imputed Genome from Next-Generation Reference Panels.

Am J Hum Genet 2018 09 9;103(3):338-348. Epub 2018 Aug 9.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

Genotype imputation is commonly performed in genome-wide association studies because it greatly increases the number of markers that can be tested for association with a trait. In general, one should perform genotype imputation using the largest reference panel that is available because the number of accurately imputed variants increases with reference panel size. However, one impediment to using larger reference panels is the increased computational cost of imputation. We present a new genotype imputation method, Beagle 5.0, which greatly reduces the computational cost of imputation from large reference panels. We compare Beagle 5.0 with Beagle 4.1, Impute4, Minimac3, and Minimac4 using 1000 Genomes Project data, Haplotype Reference Consortium data, and simulated data for 10k, 100k, 1M, and 10M reference samples. All methods produce nearly identical accuracy, but Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3× (10k), 12× (100k), 43× (1M), and 533× (10M) faster than the fastest alternative method. Cost data from the Amazon Elastic Compute Cloud show that Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2018.07.015DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6128308PMC
September 2018

Genotype Imputation from Large Reference Panels.

Annu Rev Genomics Hum Genet 2018 08 23;19:73-96. Epub 2018 May 23.

Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington 98195-7720, USA; email:

Genotype imputation has become a standard tool in genome-wide association studies because it enables researchers to inexpensively approximate whole-genome sequence data from genome-wide single-nucleotide polymorphism array data. Genotype imputation increases statistical power, facilitates fine mapping of causal variants, and plays a key role in meta-analyses of genome-wide association studies. Only variants that were previously observed in a reference panel of sequenced individuals can be imputed. However, the rapid increase in the number of deeply sequenced individuals will soon make it possible to assemble enormous reference panels that greatly increase the number of imputable variants. In this review, we present an overview of genotype imputation and describe the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1146/annurev-genom-083117-021602DOI Listing
August 2018

Ancestry-specific recent effective population size in the Americas.

PLoS Genet 2018 05 24;14(5):e1007385. Epub 2018 May 24.

Department of Biostatistics, University of Washington, Seattle, WA, United States of America.

Populations change in size over time due to factors such as population growth, migration, bottleneck events, natural disasters, and disease. The historical effective size of a population affects the power and resolution of genetic association studies. For admixed populations, it is not only the overall effective population size that is of interest, but also the effective sizes of the component ancestral populations. We use identity by descent and local ancestry inferred from genome-wide genetic data to estimate overall and ancestry-specific effective population size during the past hundred generations for nine admixed American populations from the Hispanic Community Health Study/Study of Latinos, and for African-American and European-American populations from two US cities. In these populations, the estimated pre-admixture effective sizes of the ancestral populations vary by sampled population, suggesting that the ancestors of different sampled populations were drawn from different sub-populations. In addition, we estimate that overall effective population sizes dropped substantially in the generations immediately after the commencement of European and African immigration, reaching a minimum around 12 generations ago, but rebounded within a small number of generations afterwards. Of the populations that we considered, the population of individuals originating from Puerto Rico has the smallest bottleneck size of one thousand, while the Pittsburgh African-American population has the largest bottleneck size of two hundred thousand.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1007385DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5967706PMC
May 2018

POPdemog: visualizing population demographic history from simulation scripts.

Bioinformatics 2018 08;34(16):2854-2855

Department of Biostatistics, University of Washington, WA, USA.

Summary: We present POPdemog, an R package which converts coalescent simulation program input parameters into a visual representation of the demographic model. This package is useful for preparing figures, for checking that demographic simulation parameters have been correctly specified, and for understanding demographic models that other researchers have used to simulate genetic data. The POPdemog package supports the ms, msa, msHot, MaCS, msprime, scrm and Cosi2 programs, and includes options for customizing the output figures.

Availability And Implementation: The POPdemog package and its tutorial can be freely downloaded from https://github.com/YingZhou001/POPdemog.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty184DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084562PMC
August 2018

Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture.

Cell 2018 03 15;173(1):53-61.e9. Epub 2018 Mar 15.

Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA; The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.

Anatomically modern humans interbred with Neanderthals and with a related archaic population known as Denisovans. Genomes of several Neanderthals and one Denisovan have been sequenced, and these reference genomes have been used to detect introgressed genetic material in present-day human genomes. Segments of introgression also can be detected without use of reference genomes, and doing so can be advantageous for finding introgressed segments that are less closely related to the sequenced archaic genomes. We apply a new reference-free method for detecting archaic introgression to 5,639 whole-genome sequences from Eurasia and Oceania. We find Denisovan ancestry in populations from East and South Asia and Papuans. Denisovan ancestry comprises two components with differing similarity to the sequenced Altai Denisovan individual. This indicates that at least two distinct instances of Denisovan admixture into modern humans occurred, involving Denisovan populations that had different levels of relatedness to the sequenced Altai Denisovan. VIDEO ABSTRACT.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2018.02.031DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5866234PMC
March 2018

Genome-wide association study of heart rate and its variability in Hispanic/Latino cohorts.

Heart Rhythm 2017 11 10;14(11):1675-1684. Epub 2017 Jun 10.

Department of Epidemiology, University of Washington, Seattle, Washington; Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington.

Background: Although time-domain measures of heart rate variability (HRV) are used to estimate cardiac autonomic tone and disease risk in multiethnic populations, the genetic epidemiology of HRV in Hispanics/Latinos has not been characterized.

Objective: The purpose of this study was to conduct a genome-wide association study of heart rate (HR) and its variability in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Women's Health Initiative Hispanic SNP-Health Association Resource project (n = 13,767).

Methods: We estimated HR (bpm), standard deviation of normal-to-normal interbeat intervals (SDNN, ms), and root mean squared difference in successive, normal-to-normal interbeat intervals (RMSSD, ms) from resting, standard 12-lead ECGs. We estimated associations between each phenotype and 17 million genotyped or imputed single nucleotide polymorphisms (SNPs), accounting for relatedness and adjusting for age, sex, study site, and ancestry. Cohort-specific estimates were combined using fixed-effects, inverse-variance meta-analysis. We investigated replication for select SNPs exceeding genome-wide (P <5 × 10) or suggestive (P <10) significance thresholds.

Results: Two genome-wide significant SNPs replicated in a European ancestry cohort, 1 one for RMSSD (rs4963772; chromosome 12) and another for SDNN (rs12982903; chromosome 19). A suggestive SNP for HR (rs236352; chromosome 6) replicated in an African-American cohort. Functional annotation of replicated SNPs in cardiac and neuronal tissues identified potentially causal variants and mechanisms.

Conclusion: This first genome-wide association study of HRV and HR in Hispanics/Latinos underscores the potential for even modestly sized samples of non-European ancestry to inform the genetic epidemiology of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.hrthm.2017.06.018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5671896PMC
November 2017

Genome-wide association study of red blood cell traits in Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos.

PLoS Genet 2017 04 28;13(4):e1006760. Epub 2017 Apr 28.

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America.

Prior GWAS have identified loci associated with red blood cell (RBC) traits in populations of European, African, and Asian ancestry. These studies have not included individuals with an Amerindian ancestral background, such as Hispanics/Latinos, nor evaluated the full spectrum of genomic variation beyond single nucleotide variants. Using a custom genotyping array enriched for Amerindian ancestral content and 1000 Genomes imputation, we performed GWAS in 12,502 participants of Hispanic Community Health Study and Study of Latinos (HCHS/SOL) for hematocrit, hemoglobin, RBC count, RBC distribution width (RDW), and RBC indices. Approximately 60% of previously reported RBC trait loci generalized to HCHS/SOL Hispanics/Latinos, including African ancestral alpha- and beta-globin gene variants. In addition to the known 3.8kb alpha-globin copy number variant, we identified an Amerindian ancestral association in an alpha-globin regulatory region on chromosome 16p13.3 for mean corpuscular volume and mean corpuscular hemoglobin. We also discovered and replicated three genome-wide significant variants in previously unreported loci for RDW (SLC12A2 rs17764730, PSMB5 rs941718), and hematocrit (PROX1 rs3754140). Among the proxy variants at the SLC12A2 locus we identified rs3812049, located in a bi-directional promoter between SLC12A2 (which encodes a red cell membrane ion-transport protein) and an upstream anti-sense long-noncoding RNA, LINC01184, as the likely causal variant. We further demonstrate that disruption of the regulatory element harboring rs3812049 affects transcription of SLC12A2 and LINC01184 in human erythroid progenitor cells. Together, these results reinforce the importance of genetic study of diverse ancestral populations, in particular Hispanics/Latinos.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1006760DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428979PMC
April 2017

Genetics of the peloponnesean populations and the theory of extinction of the medieval peloponnesean Greeks.

Eur J Hum Genet 2017 05 8;25(5):637-645. Epub 2017 Mar 8.

Department of Computer Sciences, Purdue University, West Lafayette, Indiana.

Peloponnese has been one of the cradles of the Classical European civilization and an important contributor to the ancient European history. It has also been the subject of a controversy about the ancestry of its population. In a theory hotly debated by scholars for over 170 years, the German historian Jacob Philipp Fallmerayer proposed that the medieval Peloponneseans were totally extinguished by Slavic and Avar invaders and replaced by Slavic settlers during the 6th century CE. Here we use 2.5 million single-nucleotide polymorphisms to investigate the genetic structure of Peloponnesean populations in a sample of 241 individuals originating from all districts of the peninsula and to examine predictions of the theory of replacement of the medieval Peloponneseans by Slavs. We find considerable heterogeneity of Peloponnesean populations exemplified by genetically distinct subpopulations and by gene flow gradients within Peloponnese. By principal component analysis (PCA) and ADMIXTURE analysis the Peloponneseans are clearly distinguishable from the populations of the Slavic homeland and are very similar to Sicilians and Italians. Using a novel method of quantitative analysis of ADMIXTURE output we find that the Slavic ancestry of Peloponnesean subpopulations ranges from 0.2 to 14.4%. Subpopulations considered by Fallmerayer to be Slavic tribes or to have Near Eastern origin, have no significant ancestry of either. This study rejects the theory of extinction of medieval Peloponneseans and illustrates how genetics can clarify important aspects of the history of a human population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ejhg.2017.18DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437898PMC
May 2017

Genome-wide association of white blood cell counts in Hispanic/Latino Americans: the Hispanic Community Health Study/Study of Latinos.

Hum Mol Genet 2017 03;26(6):1193-1204

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98195, USA.

Circulating white blood cell (WBC) counts (neutrophils, monocytes, lymphocytes, eosinophils, basophils) differ by ethnicity. The genetic factors underlying basal WBC traits in Hispanics/Latinos are unknown. We performed a genome-wide association study of total WBC and differential counts in a large, ethnically diverse US population sample of Hispanics/Latinos ascertained by the Hispanic Community Health Study and Study of Latinos (HCHS/SOL). We demonstrate that several previously known WBC-associated genetic loci (e.g. the African Duffy antigen receptor for chemokines null variant for neutrophil count) are generalizable to WBC traits in Hispanics/Latinos. We identified and replicated common and rare germ-line variants at FLT3 (a gene often somatically mutated in leukemia) associated with monocyte count. The common FLT3 variant rs76428106 has a large allele frequency differential between African and non-African populations. We also identified several novel genetic loci involving or regulating hematopoietic transcription factors (CEBPE-SLC7A7, CEBPA and CRBN-TRNT1) associated with basophil count. The minor allele of the CEBPE variant associated with lower basophil count has been previously associated with Amerindian ancestry and higher risk of acute lymphoblastic leukemia in Hispanics. Together, these data suggest that germline genetic variation affecting transcriptional and signaling pathways that underlie WBC development and lineage specification can contribute to inter-individual as well as ethnic differences in peripheral blood cell counts (normal hematopoiesis) in addition to susceptibility to leukemia (malignant hematopoiesis).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddx024DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5968624PMC
March 2017

Robust Inference of Identity by Descent from Exome-Sequencing Data.

Am J Hum Genet 2016 Nov 13;99(5):1106-1116. Epub 2016 Oct 13.

Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA. Electronic address:

Identifying and characterizing genomic regions that are shared identical by descent (IBD) among individuals can yield insight into population history, facilitate the identification of adaptively evolving loci, and be an important tool in disease gene mapping. Although increasingly large collections of exome sequences have been generated, it is challenging to detect IBD segments in exomes, precluding many potentially informative downstream analyses. Here, we describe an approach, ExIBD, to robustly detect IBD segments in exome-sequencing data, rigorously evaluate its performance, and apply this method to high-coverage exomes from 6,515 European and African Americans. Furthermore, we show how IBD networks, constructed from patterns of pairwise IBD between individuals, and principles from graph theory provide insight into recent population history and reveal cryptic population structure in European Americans. Our results enable IBD analyses to be performed on exome data, which will expand the scope of inferences that can be made from existing massively large exome-sequencing datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2016.09.011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097937PMC
November 2016

Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants.

Am J Hum Genet 2016 06 26;98(6):1077-1081. Epub 2016 May 26.

Division of Medical Genetics, Department of Medicine and Department of Genome Sciences, The University of Washington, Seattle, WA 98195, USA.

The American College of Medical Genetics and Genomics (ACMG) and Association of Molecular Pathology (AMP) recently published important new guidelines aiming to improve and standardize the pathogenicity classification of genomic variants. The Clinical Sequencing Exploratory Research (CSER) consortium evaluated the use of these guidelines across nine laboratories. One identified obstacle to consistent usage of the ACMG-AMP guidelines is the lack of a definition of cosegregation as criteria for pathogenicity classification. Cosegregation data differ from many other types of pathogenicity data in being quantitative. However, the ACMG-AMP guidelines do not define quantitative criteria for use of these data. Here, such quantitative criteria, in an easily implementable form, are proposed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2016.04.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908147PMC
June 2016

Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

G3 (Bethesda) 2016 06 1;6(6):1525-34. Epub 2016 Jun 1.

Department of Biostatistics, University of Washington, Seattle, Washington 98195.

We estimated local ancestry on the autosomes and X chromosome in a large US-based study of 12,793 Hispanic/Latino individuals using the RFMix method, and we compared different reference panels and approaches to local ancestry estimation on the X chromosome by means of Mendelian inconsistency rates as a proxy for accuracy. We developed a novel and straightforward approach to performing ancestry-specific PCA after finding artifactual behavior in the results from an existing approach. Using the ancestry-specific PCA, we found significant population structure within African, European, and Amerindian ancestries in the Hispanic/Latino individuals in our study. In the African ancestral component of the admixed individuals, individuals whose grandparents were from Central America clustered separately from individuals whose grandparents were from the Caribbean, and also from reference Yoruba and Mandenka West African individuals. In the European component, individuals whose grandparents were from Puerto Rico diverged partially from other background groups. In the Amerindian ancestral component, individuals clustered into multiple different groups depending on the grandparental country of origin. Therefore, local ancestry estimation provides further insight into the complex genetic structure of US Hispanic/Latino populations, which must be properly accounted for in genotype-phenotype association studies. It also provides a basis for admixture mapping and ancestry-specific allele frequency estimation, which are useful in the identification of risk factors for disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.116.028779DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889649PMC
June 2016

ASAFE: ancestry-specific allele frequency estimation.

Bioinformatics 2016 07 3;32(14):2227-9. Epub 2016 May 3.

Department of Biostatistics and.

Unlabelled: In a genome-wide association study (GWAS) of an admixed population, such as Hispanic Americans, ancestry-specific allele frequencies can inform the design of a replication GWAS. We derive an EM algorithm to estimate ancestry-specific allele frequencies for a bi-allelic marker given genotypes and local ancestries on a 3-way admixed population, when the phase of each admixed individual's genotype relative to the pair of local ancestries is unknown. We call our algorithm Ancestry Specific Allele Frequency Estimation (ASAFE). We demonstrate that ASAFE has low error on simulated data.

Availability And Implementation: The R source code for ASAFE is available for download at https://github.com/BiostatQian/ASAFE CONTACT: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw220DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937201PMC
July 2016

Genome-wide Association Study of Platelet Count Identifies Ancestry-Specific Loci in Hispanic/Latino Americans.

Am J Hum Genet 2016 Feb 21;98(2):229-42. Epub 2016 Jan 21.

Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10(-28)) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.12.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4746331PMC
February 2016

Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos.

Am J Hum Genet 2016 Jan;98(1):165-84

Division of Cardiovascular Sciences, NHLBI, NIH, Bethesda, MD 20892, USA.

US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a "genetic-analysis group" variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.12.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4716704PMC
January 2016

Genotype Imputation with Millions of Reference Samples.

Am J Hum Genet 2016 Jan;98(1):116-26

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle's throughput was more than 100× greater than Impute2's throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26× more memory per computational thread and 15× more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.11.020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4716681PMC
January 2016

Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent.

Am J Hum Genet 2015 Sep 20;97(3):404-18. Epub 2015 Aug 20.

Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA.

Existing methods for estimating historical effective population size from genetic data have been unable to accurately estimate effective population size during the most recent past. We present a non-parametric method for accurately estimating recent effective population size by using inferred long segments of identity by descent (IBD). We found that inferred segments of IBD contain information about effective population size from around 4 generations to around 50 generations ago for SNP array data and to over 200 generations ago for sequence data. In human populations that we examined, the estimates of effective size were approximately one-third of the census size. We estimate the effective population size of European-ancestry individuals in the UK four generations ago to be eight million and the effective population size of Finland four generations ago to be 0.7 million. Our method is implemented in the open-source IBDNe software package.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.07.012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4564943PMC
September 2015

Genome-wide haplotypic testing in a Finnish cohort identifies a novel association with low-density lipoprotein cholesterol.

Eur J Hum Genet 2015 May 4;23(5):672-7. Epub 2014 Jun 4.

Department of Biostatistics, University of Washington, Seattle, WA, USA.

We performed genome-wide tests for association between haplotype clusters and each of 9 metabolic traits in a cohort of 5402 Northern Finnish individuals genotyped for 330 000 single-nucleotide polymorphisms. The metabolic traits were body mass index, C-reactive protein, diastolic blood pressure, glucose, high-density lipoprotein (HDL), insulin, low-density lipoprotein (LDL), systolic blood pressure, and triglycerides. Haplotype clusters were determined using Beagle. There were LDL-associated clusters in the chromosome 4q13.3-q21.1 region containing the albumin (ALB) and platelet factor 4 (PF4) genes. This region has not been associated with LDL in previous genome-wide association studies. The most significant haplotype cluster in this region was associated with 0.488 mmol/l higher LDL (95% CI: 0.361-0.615 mmol/l, P-value: 6.4 × 10(-14)). We also observed three previously reported associations: Chromosome 16q13 with HDL, chromosome 1p32.3-p32.2 with LDL and chromosome 19q13.31-q13.32 with LDL. The chromosome 1 and chromosome 4 LDL associations do not reach genome-wide significance in single-marker analyses of these data, illustrating the power of haplotypic association testing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ejhg.2014.105DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402615PMC
May 2015

Efficient clustering of identity-by-descent between multiple individuals.

Bioinformatics 2014 Apr 19;30(7):915-22. Epub 2013 Dec 19.

Bioinformatics Research Center, Aarhus Universitet, 8000C Aarhus, Denmark, Department of Biostatistics and Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.

Motivation: Most existing identity-by-descent (IBD) detection methods only consider haplotype pairs; less attention has been paid to considering multiple haplotypes simultaneously, even though IBD is an equivalence relation on haplotypes that partitions a set of haplotypes into IBD clusters. Multiple-haplotype IBD clusters may have advantages over pairwise IBD in some applications, such as IBD mapping. Existing methods for detecting multiple-haplotype IBD clusters are often computationally expensive and unable to handle large samples with thousands of haplotypes.

Results: We present a clustering method, efficient multiple-IBD, which uses pairwise IBD segments to infer multiple-haplotype IBD clusters. It expands clusters from seed haplotypes by adding qualified neighbors and extends clusters across sliding windows in the genome. Our method is an order of magnitude faster than existing methods and has comparable performance with respect to the quality of clusters it uncovers. We further investigate the potential application of multiple-haplotype IBD clusters in association studies by testing for association between multiple-haplotype IBD clusters and low-density lipoprotein cholesterol in the Northern Finland Birth Cohort. Using our multiple-haplotype IBD cluster approach, we found an association with a genomic interval covering the PCSK9 gene in these data that is missed by standard single-marker association tests. Previously published studies confirm association of PCSK9 with low-density lipoprotein.

Availability And Implementation: Source code is available under the GNU Public License http://cs.au.dk/~qianyuxx/EMI/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt734DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3967111PMC
April 2014

Large numbers of individuals are required to classify and define risk for rare variants in known cancer risk genes.

Genet Med 2014 Jul 19;16(7):529-34. Epub 2013 Dec 19.

Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, USA.

Purpose: Up to half of unique genetic variants in genomic evaluations of familial cancer risk will be rare variants of uncertain significance. Classification of rare variants will be an ongoing issue as genomic testing becomes more common.

Methods: We modified standard power calculations to explore sample sizes necessary to classify and estimate relative disease risk for rare variant frequencies (0.001-0.00001) and varying relative risk (20-1.5), using population-based and family-based designs focusing on breast and colon cancer. We required 80% power and tolerated a 10% false-positive rate because variants tested will be in known genes with high pretest probability.

Results: Using population-based strategies, hundreds to millions of cases are necessary to classify rare cancer variants. Larger samples are necessary for less frequent and less penetrant variants. Family-based strategies are robust to changes in variant frequency and require between 8 and 1,175 individuals, depending on risk.

Conclusion: It is unlikely that most rare missense variants will be classifiable in the near future, and accurate relative risk estimates may never be available for very rare variants. This knowledge may alter strategies for communicating information about variants of uncertain significance to patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/gim.2013.187DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4063879PMC
July 2014

Detecting identity by descent and estimating genotype error rates in sequence data.

Am J Hum Genet 2013 Nov 24;93(5):840-51. Epub 2013 Oct 24.

Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA. Electronic address:

Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2013.09.014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3824133PMC
November 2013
-->