Publications by authors named "Carlos D Bustamante"

222 Publications

Bayesian model comparison for rare-variant association studies.

Am J Hum Genet 2021 Nov 18. Epub 2021 Nov 18.

Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA. Electronic address:

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2021.11.005DOI Listing
November 2021

Paths and timings of the peopling of Polynesia inferred from genomic networks.

Nature 2021 09 22;597(7877):522-526. Epub 2021 Sep 22.

National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico.

Polynesia was settled in a series of extraordinary voyages across an ocean spanning one third of the Earth, but the sequences of islands settled remain unknown and their timings disputed. Currently, several centuries separate the dates suggested by different archaeological surveys. Here, using genome-wide data from merely 430 modern individuals from 21 key Pacific island populations and novel ancestry-specific computational analyses, we unravel the detailed genetic history of this vast, dispersed island network. Our reconstruction of the branching Polynesian migration sequence reveals a serial founder expansion, characterized by directional loss of variants, that originated in Samoa and spread first through the Cook Islands (Rarotonga), then to the Society (Tōtaiete mā) Islands (11th century), the western Austral (Tuha'a Pae) Islands and Tuāmotu Archipelago (12th century), and finally to the widely separated, but genetically connected, megalithic statue-building cultures of the Marquesas (Te Henua 'Enana) Islands in the north, Raivavae in the south, and Easter Island (Rapa Nui), the easternmost of the Polynesian islands, settled in approximately AD 1200 via Mangareva.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03902-8DOI Listing
September 2021

Dynamic RNA Regulation in the Brain Underlies Physiological Plasticity in a Hibernating Mammal.

Front Physiol 2020 18;11:624677. Epub 2021 Jan 18.

RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO, United States.

Hibernation is a physiological and behavioral phenotype that minimizes energy expenditure. Hibernators cycle between profound depression and rapid hyperactivation of multiple physiological processes, challenging our concept of mammalian homeostasis. How the hibernator orchestrates and survives these extremes while maintaining cell to organismal viability is unknown. Here, we enhance the genome integrity and annotation of a model hibernator, the 13-lined ground squirrel. Our new assembly brings this genome to near chromosome-level contiguity and adds thousands of previously unannotated genes. These new genomic resources were used to identify 6,505 hibernation-related, differentially-expressed and processed transcripts using RNA-seq data from three brain regions in animals whose physiological status was precisely defined using body temperature telemetry. A software tool, squirrelBox, was developed to foster further data analyses and visualization. SquirrelBox includes a comprehensive toolset for rapid visualization of gene level and cluster group dynamics, sequence scanning of -mer and domains, and interactive exploration of gene lists. Using these new tools and data, we deconvolute seasonal from temperature-dependent effects on the brain transcriptome during hibernation for the first time, highlighting the importance of carefully timed samples for studies of differential gene expression in hibernation. The identified genes include a regulatory network of RNA binding proteins that are dynamic in hibernation along with the composition of the RNA pool. In addition to passive effects of temperature, we provide evidence for regulated transcription and RNA turnover during hibernation. Significant alternative splicing, largely temperature dependent, also occurs during hibernation. These findings form a crucial first step and provide a roadmap for future work toward defining novel mechanisms of tissue protection and metabolic depression that may 1 day be applied toward improving human health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fphys.2020.624677DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7848201PMC
January 2021

Discovering prescription patterns in pediatric acute-onset neuropsychiatric syndrome patients.

J Biomed Inform 2021 01 28;113:103664. Epub 2020 Dec 28.

Department of Biomedical Data Science, Stanford University, CA, USA; Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA. Electronic address:

Objective: Pediatric acute-onset neuropsychiatric syndrome (PANS) is a complex neuropsychiatric syndrome characterized by an abrupt onset of obsessive-compulsive symptoms and/or severe eating restrictions, along with at least two concomitant debilitating cognitive, behavioral, or neurological symptoms. A wide range of pharmacological interventions along with behavioral and environmental modifications, and psychotherapies have been adopted to treat symptoms and underlying etiologies. Our goal was to develop a data-driven approach to identify treatment patterns in this cohort.

Materials And Methods: In this cohort study, we extracted medical prescription histories from electronic health records. We developed a modified dynamic programming approach to perform global alignment of those medication histories. Our approach is unique since it considers time gaps in prescription patterns as part of the similarity strategy.

Results: This study included 43 consecutive new-onset pre-pubertal patients who had at least 3 clinic visits. Our algorithm identified six clusters with distinct medication usage history which may represent clinician's practice of treating PANS of different severities and etiologies i.e., two most severe groups requiring high dose intravenous steroids; two arthritic or inflammatory groups requiring prolonged nonsteroidal anti-inflammatory drug (NSAID); and two mild relapsing/remitting group treated with a short course of NSAID. The psychometric scores as outcomes in each cluster generally improved within the first two years.

Discussion And Conclusion: Our algorithm shows potential to improve our knowledge of treatment patterns in the PANS cohort, while helping clinicians understand how patients respond to a combination of drugs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2020.103664DOI Listing
January 2021

High-throughput SARS-CoV-2 and host genome sequencing from single nasopharyngeal swabs.

medRxiv 2020 Sep 1. Epub 2020 Sep 1.

During COVID19 and other viral pandemics, rapid generation of host and pathogen genomic data is critical to tracking infection and informing therapies. There is an urgent need for efficient approaches to this data generation at scale. We have developed a scalable, high throughput approach to generate high fidelity low pass whole genome and HLA sequencing, viral genomes, and representation of human transcriptome from single nasopharyngeal swabs of COVID19 patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.07.27.20163147DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402057PMC
September 2020

Native American gene flow into Polynesia predating Easter Island settlement.

Nature 2020 07 8;583(7817):572-577. Epub 2020 Jul 8.

Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA.

The possibility of voyaging contact between prehistoric Polynesian and Native American populations has long intrigued researchers. Proponents have pointed to the existence of New World crops, such as the sweet potato and bottle gourd, in the Polynesian archaeological record, but nowhere else outside the pre-Columbian Americas, while critics have argued that these botanical dispersals need not have been human mediated. The Norwegian explorer Thor Heyerdahl controversially suggested that prehistoric South American populations had an important role in the settlement of east Polynesia and particularly of Easter Island (Rapa Nui). Several limited molecular genetic studies have reached opposing conclusions, and the possibility continues to be as hotly contested today as it was when first suggested. Here we analyse genome-wide variation in individuals from islands across Polynesia for signs of Native American admixture, analysing 807 individuals from 17 island populations and 15 Pacific coast Native American groups. We find conclusive evidence for prehistoric contact of Polynesian individuals with Native American individuals (around AD 1200) contemporaneous with the settlement of remote Oceania. Our analyses suggest strongly that a single contact event occurred in eastern Polynesia, before the settlement of Rapa Nui, between Polynesian individuals and a Native American group most closely related to the indigenous inhabitants of present-day Colombia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2487-2DOI Listing
July 2020

FasTag: Automatic text classification of unstructured medical narratives.

PLoS One 2020 22;15(6):e0234647. Epub 2020 Jun 22.

Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, United States of America.

Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0234647PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763PMC
August 2020

Clinical Genetics Lacks Standard Definitions and Protocols for the Collection and Use of Diversity Measures.

Am J Hum Genet 2020 07 6;107(1):72-82. Epub 2020 Jun 6.

Stanford Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA.

Genetics researchers and clinical professionals rely on diversity measures such as race, ethnicity, and ancestry (REA) to stratify study participants and patients for a variety of applications in research and precision medicine. However, there are no comprehensive, widely accepted standards or guidelines for collecting and using such data in clinical genetics practice. Two NIH-funded research consortia, the Clinical Genome Resource (ClinGen) and Clinical Sequencing Evidence-generating Research (CSER), have partnered to address this issue and report how REA are currently collected, conceptualized, and used. Surveying clinical genetics professionals and researchers (n = 448), we found heterogeneity in the way REA are perceived, defined, and measured, with variation in the perceived importance of REA in both clinical and research settings. The majority of respondents (>55%) felt that REA are at least somewhat important for clinical variant interpretation, ordering genetic tests, and communicating results to patients. However, there was no consensus on the relevance of REA, including how each of these measures should be used in different scenarios and what information they can convey in the context of human genetics. A lack of common definitions and applications of REA across the precision medicine pipeline may contribute to inconsistencies in data collection, missing or inaccurate classifications, and misleading or inconclusive results. Thus, our findings support the need for standardization and harmonization of REA data collection and use in clinical genetics and precision health research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.05.005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7332657PMC
July 2020

Development of a small panel of SNPs to infer ancestry in Chileans that distinguishes Aymara and Mapuche components.

Biol Res 2020 Apr 16;53(1):15. Epub 2020 Apr 16.

Mathomics, Centro de Modelamiento Matemático y Centro para la Regulación del Genoma, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile.

Background: Current South American populations trace their origins mainly to three continental ancestries, i.e. European, Amerindian and African. Individual variation in relative proportions of each of these ancestries may be confounded with socio-economic factors due to population stratification. Therefore, ancestry is a potential confounder variable that should be considered in epidemiologic studies and in public health plans. However, there are few studies that have assessed the ancestry of the current admixed Chilean population. This is partly due to the high cost of genome-scale technologies commonly used to estimate ancestry. In this study we have designed a small panel of SNPs to accurately assess ancestry in the largest sampling to date of the Chilean mestizo population (n = 3349) from eight cities. Our panel is also able to distinguish between the two main Amerindian components of Chileans: Aymara from the north and Mapuche from the south.

Results: A panel of 150 ancestry-informative markers (AIMs) of SNP type was selected to maximize ancestry informativeness and genome coverage. Of these, 147 were successfully genotyped by KASPar assays in 2843 samples, with an average missing rate of 0.012, and a 0.95 concordance with microarray data. The ancestries estimated with the panel of AIMs had relative high correlations (0.88 for European, 0.91 for Amerindian, 0.70 for Aymara, and 0.68 for Mapuche components) with those obtained with AXIOM LAT1 array. The country's average ancestry was 0.53 ± 0.14 European, 0.04 ± 0.04 African, and 0.42 ± 0.14 Amerindian, disaggregated into 0.18 ± 0.15 Aymara and 0.25 ± 0.13 Mapuche. However, Mapuche ancestry was highest in the south (40.03%) and Aymara in the north (35.61%) as expected from the historical location of these ethnic groups. We make our results available through an online app and demonstrate how it can be used to adjust for ancestry when testing association between incidence of a disease and nongenetic risk factors.

Conclusions: We have conducted the most extensive sampling, across many different cities, of current Chilean population. Ancestry varied significantly by latitude and human development. The panel of AIMs is available to the community for estimating ancestry at low cost in Chileans and other populations with similar ancestry.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s40659-020-00284-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7161194PMC
April 2020

Genetic variation drives seasonal onset of hibernation in the 13-lined ground squirrel.

Commun Biol 2019 20;2:478. Epub 2019 Dec 20.

1Department of Genetics and Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA USA.

Hibernation in sciurid rodents is a dynamic phenotype timed by a circannual clock. When housed in an animal facility, 13-lined ground squirrels exhibit variation in seasonal onset of hibernation, which is not explained by environmental or biological factors. We hypothesized that genetic factors instead drive variation in timing. After increasing genome contiguity, here, we employ a genotype-by-sequencing approach to characterize genetic variation in 153 ground squirrels. Combined with datalogger records ( = 72), we estimate high heritability (61-100%) for hibernation onset. Applying a genome-wide scan with 46,996 variants, we identify 2 loci significantly ( < 7.14 × 10), and 12 loci suggestively ( < 2.13 × 10), associated with onset. At the most significant locus, whole-genome resequencing reveals a putative causal variant in the promoter of . Expression quantitative trait loci (eQTL) analyses further reveal gene associations for 8/14 loci. Our results highlight the power of applying genetic mapping to hibernation and present new insight into genetics driving its onset.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-019-0719-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6925185PMC
July 2020

Population History and Gene Divergence in Native Mexicans Inferred from 76 Human Exomes.

Mol Biol Evol 2020 04;37(4):994-1006

National Laboratory of Genomics for Biodiversity (LANGEBIO), UGA, CINVESTAV, Irapuato, Guanajuato 36821, Mexico.

Native American genetic variation remains underrepresented in most catalogs of human genome sequencing data. Previous genotyping efforts have revealed that Mexico's Indigenous population is highly differentiated and substructured, thus potentially harboring higher proportions of private genetic variants of functional and biomedical relevance. Here we have targeted the coding fraction of the genome and characterized its full site frequency spectrum by sequencing 76 exomes from five Indigenous populations across Mexico. Using diffusion approximations, we modeled the demographic history of Indigenous populations from Mexico with northern and southern ethnic groups splitting 7.2 KYA and subsequently diverging locally 6.5 and 5.7 KYA, respectively. Selection scans for positive selection revealed BCL2L13 and KBTBD8 genes as potential candidates for adaptive evolution in Rarámuris and Triquis, respectively. BCL2L13 is highly expressed in skeletal muscle and could be related to physical endurance, a well-known phenotype of the northern Mexico Rarámuri. The KBTBD8 gene has been associated with idiopathic short stature and we found it to be highly differentiated in Triqui, a southern Indigenous group from Oaxaca whose height is extremely low compared to other Native populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msz282DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7086176PMC
April 2020

LitGen: Genetic Literature Recommendation Guided by Human Explanations.

Pac Symp Biocomput 2020 ;25:67-78

Department of Biomedical Data Science, Stanford University School of Medicine, USA.

As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences-e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)-the flagship NIH program for clinical curation-we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evi+dence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7478937PMC
March 2021

Ancient DNA Reconstructs the Genetic Legacies of Precontact Puerto Rico Communities.

Mol Biol Evol 2020 03;37(3):611-626

School of Human Evolution and Social Change, Arizona State University, Tempe, AZ.

Indigenous peoples have occupied the island of Puerto Rico since at least 3000 BC. Due to the demographic shifts that occurred after European contact, the origin(s) of these ancient populations, and their genetic relationship to present-day islanders, are unclear. We use ancient DNA to characterize the population history and genetic legacies of precontact Indigenous communities from Puerto Rico. Bone, tooth, and dental calculus samples were collected from 124 individuals from three precontact archaeological sites: Tibes, Punta Candelero, and Paso del Indio. Despite poor DNA preservation, we used target enrichment and high-throughput sequencing to obtain complete mitochondrial genomes (mtDNA) from 45 individuals and autosomal genotypes from two individuals. We found a high proportion of Native American mtDNA haplogroups A2 and C1 in the precontact Puerto Rico sample (40% and 44%, respectively). This distribution, as well as the haplotypes represented, supports a primarily Amazonian South American origin for these populations and mirrors the Native American mtDNA diversity patterns found in present-day islanders. Three mtDNA haplotypes from precontact Puerto Rico persist among Puerto Ricans and other Caribbean islanders, indicating that present-day populations are reservoirs of precontact mtDNA diversity. Lastly, we find similarity in autosomal ancestry patterns between precontact individuals from Puerto Rico and the Bahamas, suggesting a shared component of Indigenous Caribbean ancestry with close affinity to South American populations. Our findings contribute to a more complete reconstruction of precontact Caribbean population history and explore the role of Indigenous peoples in shaping the biocultural diversity of present-day Puerto Ricans and other Caribbean islanders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msz267DOI Listing
March 2020

The inference of sex-biased human demography from whole-genome data.

PLoS Genet 2019 09 20;15(9):e1008293. Epub 2019 Sep 20.

Center for Computational Molecular Biology, Brown University, Providence, RI, USA.

Sex-biased demographic events ("sex-bias") involve unequal numbers of females and males. These events are typically inferred from the relative amount of X-chromosomal to autosomal genetic variation and have led to conflicting conclusions about human demographic history. Though population size changes alter the relative amount of X-chromosomal to autosomal genetic diversity even in the absence of sex-bias, this has generally not been accounted for in sex-bias estimators to date. Here, we present a novel method to identify sex-bias from genetic sequence data that models population size changes and estimates the female fraction of the effective population size during each time epoch. Compared to recent sex-bias inference methods, our approach can detect sex-bias that changes on a single population branch without requiring data from an outgroup or knowledge of divergence events. When applied to simulated data, conventional sex-bias estimators are biased by population size changes, especially recent growth or bottlenecks, while our estimator is unbiased. We next apply our method to high-coverage exome data from the 1000 Genomes Project and estimate a male bias in Yorubans (47% female) and Europeans (44%), possibly due to stronger background selection on the X chromosome than on the autosomes. Finally, we apply our method to the 1000 Genomes Project Phase 3 high-coverage Complete Genomics whole-genome data and estimate a female bias in Yorubans (63% female), Europeans (84%), Punjabis (82%), as well as Peruvians (56%), and a male bias in the Southern Han Chinese (45%). Our method additionally identifies a male-biased migration out of Africa based on data from Europeans (20% female). Our results demonstrate that modeling population size change is necessary to estimate sex-bias parameters accurately. Our approach gives insight into signatures of sex-bias in sexual species, and the demographic models it produces can serve as more accurate null models for tests of selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008293DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6774570PMC
September 2019

Genomic Evidence for Local Adaptation of Hunter-Gatherers to the African Rainforest.

Curr Biol 2019 09 8;29(17):2926-2935.e4. Epub 2019 Aug 8.

Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France. Electronic address:

African rainforests support exceptionally high biodiversity and host the world's largest number of active hunter-gatherers [1-3]. The genetic history of African rainforest hunter-gatherers and neighboring farmers is characterized by an ancient divergence more than 100,000 years ago, together with recent population collapses and expansions, respectively [4-12]. While the demographic past of rainforest hunter-gatherers has been deeply characterized, important aspects of their history of genetic adaptation remain unclear. Here, we investigated how these groups have adapted-through classic selective sweeps, polygenic adaptation, and selection since admixture-to the challenging rainforest environments. To do so, we analyzed a combined dataset of 566 high-coverage exomes, including 266 newly generated exomes, from 14 populations of rainforest hunter-gatherers and farmers, together with 40 newly generated, low-coverage genomes. We find evidence for a strong, shared selective sweep among all hunter-gatherer groups in the regulatory region of TRPS1-primarily involved in morphological traits. We detect strong signals of polygenic adaptation for height and life history traits such as reproductive age; however, the latter appear to result from pervasive pleiotropy of height-associated genes. Furthermore, polygenic adaptation signals for functions related to responses of mast cells to allergens and microbes, the IL-2 signaling pathway, and host interactions with viruses support a history of pathogen-driven selection in the rainforest. Finally, we find that genes involved in heart and bone development and immune responses are enriched in both selection signals and local hunter-gatherer ancestry in admixed populations, suggesting that selection has maintained adaptive variation in the face of recent gene flow from farmers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cub.2019.07.013DOI Listing
September 2019

DeepTag: inferring diagnoses from veterinary clinical notes.

NPJ Digit Med 2018 24;1:60. Epub 2018 Oct 24.

1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305 USA.

Large scale veterinary clinical records can become a powerful resource for patient care and research. However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free-text notes. The lack of standard coding makes it challenging to use the clinical data to improve patient care. It is also a major impediment to cross-species translational research, which relies on the ability to accurately identify patient cohorts with specific diagnostic criteria in humans and animals. In order to reduce the coding burden for veterinary clinical practice and aid translational research, we have developed a deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free-text notes. DeepTag is trained on a newly curated dataset of 112,558 veterinary notes manually annotated by experts. DeepTag extends multitask LSTM with an improved hierarchical objective that captures the semantic structures between diseases. To foster human-machine collaboration, DeepTag also learns to abstain in examples when it is uncertain and defers them to human experts, resulting in improved performance. DeepTag accurately infers disease codes from free-text even in challenging cross-hospital settings where the text comes from different clinical settings than the ones used for training. It enables automated disease annotation across a broad range of clinical diagnoses with minimal preprocessing. The technical framework in this work can be applied in other medical domains that currently lack medical coding resources.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-018-0067-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550285PMC
October 2018

Genetic analyses of diverse populations improves discovery for complex traits.

Nature 2019 06 19;570(7762):514-518. Epub 2019 Jun 19.

Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-019-1310-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6785182PMC
June 2019

A genetic counseling needs assessment of Mexico.

Mol Genet Genomic Med 2019 05 1;7(5):e668. Epub 2019 Apr 1.

Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California.

Background: While genetic counseling has expanded globally, Mexico has not adopted it as a separate profession. Given the rapid expansion of genetic and genomic services, understanding the current genetic counseling landscape in Mexico is crucial to improving healthcare outcomes.

Methods: Our needs assessment strategy has two components. First, we gathered quantitative data about genetics education and medical geneticists' geographic distribution through an exhaustive compilation of available information across several medical schools and public databases. Second, we conducted semi-structured interviews of 19 key-informants from 10 Mexican states remotely with digital recording and transcription.

Results: Across 32 states, ~54% of enrolled medical students receive no medical genetics training, and only Mexico City averages at least one medical geneticist per 100,000 people. Barriers to genetic counseling services include: geographic distribution of medical geneticists, lack of access to diagnostic tools, patient health literacy and cultural beliefs, and education in medical genetics/genetic counseling. Participants reported generally positive attitudes towards a genetic counseling profession; concerns regarding a current shortage of available jobs for medical geneticists persisted.

Conclusion: To create a foundation that can support a genetic counseling profession in Mexico, the clinical significance of medical genetics must be promoted nationwide. Potential approaches include: requiring medical genetics coursework, developing community genetics services, and increasing jobs for medical geneticists.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/mgg3.668DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6503023PMC
May 2019

Mitogenomes illuminate the origin and migration patterns of the indigenous people of the Canary Islands.

PLoS One 2019 20;14(3):e0209125. Epub 2019 Mar 20.

Department of Genetics, Stanford University, Stanford, California, United States of America.

The Canary Islands' indigenous people have been the subject of substantial archaeological, anthropological, linguistic and genetic research pointing to a most probable North African Berber source. However, neither agreement about the exact point of origin nor a model for the indigenous colonization of the islands has been established. To shed light on these questions, we analyzed 48 ancient mitogenomes from 25 archaeological sites from the seven main islands. Most lineages observed in the ancient samples have a Mediterranean distribution, and belong to lineages associated with the Neolithic expansion in the Near East and Europe (T2c, J2a, X3a…). This phylogeographic analysis of Canarian ancient mitogenomes, the first of its kind, shows that some lineages are restricted to Central North Africa (H1cf, J2a2d and T2c1d3), while others have a wider distribution, including both West and Central North Africa, and, in some cases, Europe and the Near East (U6a1a1, U6a7a1, U6b, X3a, U6c1). In addition, we identify four new Canarian-specific lineages (H1e1a9, H4a1e, J2a2d1a and L3b1a12) whose coalescence dates correlate with the estimated time for the colonization of the islands (1st millennia CE). Additionally, we observe an asymmetrical distribution of mtDNA haplogroups in the ancient population, with certain haplogroups appearing more frequently in the islands closer to the continent. This reinforces results based on modern mtDNA and Y-chromosome data, and archaeological evidence suggesting the existence of two distinct migrations. Comparisons between insular populations show that some populations had high genetic diversity, while others were probably affected by genetic drift and/or bottlenecks. In spite of observing interinsular differences in the survival of indigenous lineages, modern populations, with the sole exception of La Gomera, are homogenous across the islands, supporting the theory of extensive human mobility after the European conquest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0209125PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6426200PMC
November 2019

Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis.

Genet Med 2019 09 24;21(9):2126-2134. Epub 2019 Jan 24.

Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.

Purpose: Despite the successful progress next-generation sequencing technologies has achieved in diagnosing the genetic cause of rare Mendelian diseases, the current diagnostic rate is still far from satisfactory because of heterogeneity, imprecision, and noise in disease phenotype descriptions and insufficient utilization of expert knowledge in clinical genetics. To overcome these difficulties, we present a novel method called Xrare for the prioritization of causative gene variants in rare disease diagnosis.

Methods: We propose a new phenotype similarity scoring method called Emission-Reception Information Content (ERIC), which is highly tolerant of noise and imprecision in clinical phenotypes. We utilize medical genetic domain knowledge by designing genetic features implementing American College of Medical Genetics and Genomics (ACMG) guidelines.

Results: ERIC score ranked consistently higher for disease genes than other phenotypic similarity scores in the presence of imprecise and noisy phenotypes. Extensive simulations and real clinical data demonstrated that Xrare outperforms existing alternative methods by 10-40% at various genetic diagnosis scenarios.

Conclusion: The Xrare model is learned from a large database of clinical variants, and derives its strength from the tight integration of medical genetics features and phenotypic features similarity scores. Xrare provides the clinical community with a robust and powerful tool for variant prioritization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41436-019-0439-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6752318PMC
September 2019

Structural Variation Detection by Proximity Ligation from Formalin-Fixed, Paraffin-Embedded Tumor Tissue.

J Mol Diagn 2019 05 31;21(3):375-383. Epub 2018 Dec 31.

Department of Pathology, Stanford University School of Medicine, Stanford, California; Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California. Electronic address:

The clinical management and therapy of many solid tumor malignancies depends on detection of medically actionable or diagnostically relevant genetic variation. However, a principal challenge for genetic assays from tumors is the fragmented and chemically damaged state of DNA in formalin-fixed, paraffin-embedded (FFPE) samples. From highly fragmented DNA and RNA there is no current technology for generating long-range DNA sequence data as is required to detect genomic structural variation or long-range genotype phasing. We have developed a high-throughput chromosome conformation capture approach for FFPE samples that we call Fix-C, which is similar in concept to Hi-C. Fix-C enables structural variation detection from archival FFPE samples. This method was applied to 15 clinical adenocarcinoma- and sarcoma-positive control specimens spanning a broad range of tumor purities. In this panel, Fix-C analysis achieves a 90% concordance rate with fluorescence in situ hybridization assays, the current clinical gold standard. In addition, novel structural variation undetected by other methods could be identified, and long-range chromatin configuration information recovered from these FFPE samples harboring highly degraded DNA. This powerful approach will enable detailed resolution of global genome rearrangement events during cancer progression from FFPE material and will inform the development of targeted molecular diagnostic assays for patient care.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmoldx.2018.11.003DOI Listing
May 2019

Polygenic risk scores: a biased prediction?

Genome Med 2018 12 27;10(1):100. Epub 2018 Dec 27.

Department of Biomedical Data Science, Stanford University School of Medicine, Campus Drive, Stanford, CA, 94305, USA.

A new study highlights the biases and inaccuracies of polygenic risk scores (PRS) when predicting disease risk in individuals from populations other than those used in their derivation. The design bias of workhorse tools used for research, particularly genotyping arrays, contributes to these distortions. To avoid further inequities in health outcomes, the inclusion of diverse populations in research, unbiased genotyping, and methods of bias reduction in PRS are critical.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-018-0610-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309089PMC
December 2018

Rapid evolution of a skin-lightening allele in southern African KhoeSan.

Proc Natl Acad Sci U S A 2018 12 10;115(52):13324-13329. Epub 2018 Dec 10.

Department of Ecology and Evolution, State University of New York at Stony Brook, Stony Brook, NY 11794;

Skin pigmentation is under strong directional selection in northern European and Asian populations. The indigenous KhoeSan populations of far southern Africa have lighter skin than other sub-Saharan African populations, potentially reflecting local adaptation to a region of Africa with reduced UV radiation. Here, we demonstrate that a canonical Eurasian skin pigmentation gene, , was introduced to southern Africa via recent migration and experienced strong adaptive evolution in the KhoeSan. To reconstruct the evolution of skin pigmentation, we collected phenotypes from over 400 ≠Khomani San and Nama individuals and high-throughput sequenced candidate pigmentation genes. The derived causal allele in , p.Ala111Thr, significantly lightens basal skin pigmentation in the KhoeSan and explains 8 to 15% of phenotypic variance in these populations. The frequency of this allele (33 to 53%) is far greater than expected from colonial period European gene flow; however, the most common derived haplotype is identical among European, eastern African, and KhoeSan individuals. Using four-population demographic simulations with selection, we show that the allele was introduced into the KhoeSan only 2,000 y ago via a back-to-Africa migration and then experienced a selective sweep (s = 0.04 to 0.05 in ≠Khomani and Nama). The locus is both a rare example of intense, ongoing adaptation in very recent human history, as well as an adaptive gene flow at a pigmentation locus in humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1801948115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6310813PMC
December 2018

Standardized Biogeographic Grouping System for Annotating Populations in Pharmacogenetic Research.

Clin Pharmacol Ther 2019 05 21;105(5):1256-1262. Epub 2019 Jan 21.

Department of Biomedical Data Science, Stanford University, Stanford, California, USA.

The varying frequencies of pharmacogenetic alleles among populations have important implications for the impact of these alleles in different populations. Current population grouping methods to communicate these patterns are insufficient as they are inconsistent and fail to reflect the global distribution of genetic variability. To facilitate and standardize the reporting of variability in pharmacogenetic allele frequencies, we present seven geographically defined groups: American, Central/South Asian, East Asian, European, Near Eastern, Oceanian, and Sub-Saharan African, and two admixed groups: African American/Afro-Caribbean and Latino. These nine groups are defined by global autosomal genetic structure and based on data from large-scale sequencing initiatives. We recognize that broadly grouping global populations is an oversimplification of human diversity and does not capture complex social and cultural identity. However, these groups meet a key need in pharmacogenetics research by enabling consistent communication of the scale of variability in global allele frequencies and are now used by Pharmacogenomics Knowledgebase (PharmGKB).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cpt.1322DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6465129PMC
May 2019

Gut microbiome transition across a lifestyle gradient in Himalaya.

PLoS Biol 2018 11 15;16(11):e2005396. Epub 2018 Nov 15.

Department of Microbiology and Immunology, Stanford University, Stanford, California, United States of America.

The composition of the gut microbiome in industrialized populations differs from those living traditional lifestyles. However, it has been difficult to separate the contributions of human genetic and geographic factors from lifestyle. Whether shifts away from the foraging lifestyle that characterize much of humanity's past influence the gut microbiome, and to what degree, remains unclear. Here, we characterize the stool bacterial composition of four Himalayan populations to investigate how the gut community changes in response to shifts in traditional human lifestyles. These groups led seminomadic hunting-gathering lifestyles until transitioning to varying levels of agricultural dependence upon farming. The Tharu began farming 250-300 years ago, the Raute and Raji transitioned 30-40 years ago, and the Chepang retain many aspects of a foraging lifestyle. We assess the contributions of dietary and environmental factors on their gut-associated microbes and find that differences in the lifestyles of Himalayan foragers and farmers are strongly correlated with microbial community variation. Furthermore, the gut microbiomes of all four traditional Himalayan populations are distinct from that of the Americans, indicating that industrialization may further exacerbate differences in the gut community. The Chepang foragers harbor an elevated abundance of taxa associated with foragers around the world. Conversely, the gut microbiomes of the populations that have transitioned to farming are more similar to those of Americans, with agricultural dependence and several associated lifestyle and environmental factors correlating with the extent of microbiome divergence from the foraging population. The gut microbiomes of Raute and Raji reveal an intermediate state between the Chepang and Tharu, indicating that divergence from a stereotypical foraging microbiome can occur within a single generation. Our results also show that environmental factors such as drinking water source and solid cooking fuel are significantly associated with the gut microbiome. Despite the pronounced differences in gut bacterial composition across populations, we found little differences in alpha diversity across lifestyles. These findings in genetically similar populations living in the same geographical region establish the key role of lifestyle in determining human gut microbiome composition and point to the next challenging steps of determining how large-scale gut microbiome reconfiguration impacts human biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.2005396DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237292PMC
November 2018

Population genomic analyses of the chocolate tree, L., provide insights into its domestication process.

Commun Biol 2018 16;1:167. Epub 2018 Oct 16.

Mars, Incorporated, 6885 Elm Street, McLean, VA, 22101, USA.

Domestication has had a strong impact on the development of modern societies. We sequenced 200 genomes of the chocolate plant L. to show for the first time to our knowledge that a single population, the Criollo population, underwent strong domestication ~3600 years ago (95% CI: 2481-13,806 years ago). We also show that during the process of domestication, there was strong selection for genes involved in the metabolism of the colored protectants anthocyanins and the stimulant theobromine, as well as disease resistance genes. Our analyses show that domesticated populations of (Criollo) maintain a higher proportion of high-frequency deleterious mutations. We also show for the first time the negative consequences of the increased accumulation of deleterious mutations during domestication on the fitness of individuals (significant reduction in kilograms of beans per hectare per year as Criollo ancestry increases, as estimated from a GLM,  = 0.000425).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-018-0168-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191438PMC
October 2018

Gene expression imputation identifies candidate genes and susceptibility loci associated with cutaneous squamous cell carcinoma.

Nat Commun 2018 10 15;9(1):4264. Epub 2018 Oct 15.

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, 94305, USA.

Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute gene expression levels in 6891 cSCC cases and 54,566 controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558 self-reported cSCC cases and 673,788 controls from 23andMe. In a discovery-validation study, we identify 19 loci containing 33 genes whose imputed expression levels are associated with cSCC at false discovery rate < 10% in the GERA cohort and validate 15 of these candidate genes at Bonferroni significance in the 23andMe dataset, including eight genes in five novel susceptibility loci and seven genes in four previously associated loci. These results suggest genetic mechanisms contributing to cSCC risk and illustrate advantages and disadvantages of TWAS as a supplement to traditional GWAS analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-06149-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189170PMC
October 2018

The clinical imperative for inclusivity: Race, ethnicity, and ancestry (REA) in genomics.

Hum Mutat 2018 11;39(11):1713-1720

Department of Biomedical Data Science, Stanford University, Standford, California.

The Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group highlights the need to develop guidance on race, ethnicity, and ancestry (REA) data collection and use in clinical genomics. We present quantitative and qualitative evidence to characterize: (1) acquisition of REA data via clinical laboratory requisition forms, and (2) information disparity across populations in the Genome Aggregation Database (gnomAD) at clinically relevant sites ascertained from annotations in ClinVar. Our requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories. There was also striking disparity across REA populations in the amount of information available about clinically relevant variants in gnomAD. European ancestral populations constituted the majority of observations (55.8%), allele counts (59.7%), and private alleles (56.1%) in gnomAD at 550 loci with "pathogenic" and "likely pathogenic" expert-reviewed variants in ClinVar. Our findings highlight the importance of implementing and supporting programs to increase diversity in genome sequencing and clinical genomics, as well as measuring uncertainty around population-level datasets that are used in variant interpretation. Finally, we suggest the need for a standardized REA data collection framework to be developed through partnerships and collaborations and adopted across clinical genomics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23644DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188707PMC
November 2018

Data mining of digitized health records in a resource-constrained setting reveals that timely immunophenotyping is associated with improved breast cancer outcomes.

BMC Cancer 2018 Sep 27;18(1):933. Epub 2018 Sep 27.

Department of Biomedical Data Science, School of Medicine, Stanford University, 1265 Welch Road, Stanford, California, 94305, USA.

Background: Organizations that issue guidance on breast cancer recommend the use of immunohistochemistry (IHC) for providing appropriate and precise care. However, little focus has been directed to the identification of maximum allowable turnaround times for IHC, which is necessary given the diversity of hospital settings in the world. Much less effort has been committed to the development of digital tools that allow hospital administrators to monitor service utilization histories of their patients.

Methods: In this retrospective cohort study, we reviewed electronic and paper medical records of all suspected breast cancer patients treated at one secondary-care hospital of the Mexican Institute of Social Security (IMSS), located in western Mexico. We then followed three years of medical history of those patients with IHC testing.

Results: In 2014, there were 402 breast cancer patients, of which 30 (7.4% of total) were tested for some IHC biomarker (ER, PR, HER2). The subtyping allowed doctors to adjust (56.7%) or confirm (43.3%) the initial therapeutic regimen. The average turnaround time was 56 days. Opportune IHC testing was found to be beneficial when it was available before or during the first rounds of chemotherapy.

Conclusions: The use of data mining tools applied to health record data revealed that there is an association between timely immunohistochemistry and improved outcomes in breast cancer patients. Based on this finding, inclusion of turnaround time in clinical guidelines is recommended. As much of the health data in the country becomes digitized, our visualization tools allow a digital dashboard of the hospital service utilization histories.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12885-018-4833-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6161369PMC
September 2018
-->