Publications by authors named "Anders Albrechtsen"

108 Publications

Loss of sucrase-isomaltase function increases acetate levels and improves metabolic health in Greenlandic cohorts.

Gastroenterology 2021 Dec 13. Epub 2021 Dec 13.

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark. Electronic address:

Background & Aims: The sucrase-isomaltase (SI) c.273_274delAG loss-of-function variant is common in Arctic populations and causes congenital sucrase-isomaltase deficiency, an inability to breakdown and absorb sucrose and isomaltose. Children with this condition experience gastrointestinal symptoms, when dietary sucrose is introduced. Here we aimed to describe the health of adults with sucrase-isomaltase deficiency.

Methods: Association between c.273_274delAG and phenotypes related to metabolic health was assessed in two cohorts of Greenlandic adults (N=4,922 and N=1,629). A sucrase-isomaltase knock-out (Sis-KO) mouse model was used to further elucidate the findings. Results homozygous carriers of the variant had a markedly healthier metabolic profile, than the remaining population, including lower BMI (β (SE), -2.0 kg/m (0.5), P=3.1x10), body weight (-4.8 kg (1.4), P=5.1x10), fat percentage (-3.3% (1.0), P=3.7x10), fasting triglyceride (-0.27 mmol/L (0.07), P=2.3x10), and remnant cholesterol (-0.11 mmol/L (0.03), P=4.2x10). Further analyses suggested that this was likely mediated partly by higher circulating levels of acetate observed in homozygous carriers (0.056 mmol/L (0.002), P=2.1x10), and partly by reduced sucrose uptake, but not lower caloric intake. These findings were verified in Sis-KO mice, which compared to wild-type mice were leaner on a sucrose-containing diet, despite similar caloric intake, had significantly higher plasma acetate levels in response to a sucrose gavage, and had lower plasma glucose level in response to a sucrose-tolerance test.

Conclusions: These results suggest that sucrase-isomaltase constitutes a promising drug target for improvement of metabolic health, and that the health benefits are mediated by reduced dietary sucrose uptake and possibly also by higher levels of circulating acetate.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1053/j.gastro.2021.12.236DOI Listing
December 2021

Efficient approaches for large-scale GWAS with genotype uncertainty.

G3 (Bethesda) 2021 Dec 4. Epub 2021 Dec 4.

The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.

Introduction: Association studies using genetic data from SNP-chip based imputation or low depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results.

Methods: Our proposed method, ANGSD-asso's latent model, models the unobserved genotype as a latent variable in a generalised linear model framework. The software is implemented in C/C ++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare to using genetic dosages.

Results And Discussion: Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso's latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso's latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/g3journal/jkab385DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8727990PMC
December 2021

Detecting selection in low-coverage high-throughput sequencing data using principal component analysis.

BMC Bioinformatics 2021 Sep 29;22(1):470. Epub 2021 Sep 29.

Department of Biology, The Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark.

Background: Identification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data.

Materials And Methods: We have extended two principal component analysis based selection statistics to genotype likelihood data and applied them to low-coverage sequencing data from the 1000 Genomes Project for populations with European and East Asian ancestry to detect signals of selection in samples with continuous population structure.

Results: Here, we present two selections statistics which we have implemented in the PCAngsd framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes.

Conclusion: We show that selection scans of low-coverage sequencing data of populations with similar ancestry perform on par with that obtained from high quality genotype data. Moreover, we demonstrate that PCAngsd outperform selection statistics obtained from called genotypes from low-coverage sequencing data without the need for ad-hoc filtering.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-021-04375-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8480091PMC
September 2021

Joint identification of sex and sex-linked scaffolds in non-model organisms using low depth sequencing data.

Mol Ecol Resour 2022 Feb 9;22(2):458-467. Epub 2021 Sep 9.

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark.

Being able to assign sex to individuals and identify autosomal and sex-linked scaffolds are essential in most population genomic analyses. Non-model organisms often have genome assemblies at scaffold-level and lack characterization of sex-linked scaffolds. Previous methods to identify sex and sex-linked scaffolds have relied on synteny between the non-model organism and a closely related species or prior knowledge about the sex of the samples to identify sex-linked scaffolds. In the latter case, the difference in depth of coverage between the autosomes and the sex chromosomes are used. Here, we present "sex assignment through coverage" (SATC), a method to assign sex to samples and identify sex-linked scaffolds from next generation sequencing (NGS) data. The method works for species with a homogametic/heterogametic sex determination system and only requires a scaffold-level reference assembly and sampling of both sexes with whole genome sequencing (WGS) data. We use the sequencing depth distribution across scaffolds to jointly identify: (i) male and female individuals, and (ii) sex-linked scaffolds. This is achieved through projecting the scaffold depths into a low-dimensional space using principal component analysis (PCA) and subsequent Gaussian mixture clustering. We demonstrate the applicability of our method using data from five mammal species and a bird species complex. The method is freely available at https://github.com/popgenDK/SATC as R code and a graphical user interface (GUI).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13491DOI Listing
February 2022

NGSremix: a software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data.

G3 (Bethesda) 2021 08;11(8)

Department of Biology, The Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen N, Denmark.

Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here, we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/g3journal/jkab174DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496226PMC
August 2021

Predictors and trajectories of treatment response to SSRIs in patients suffering from PTSD.

Psychiatry Res 2021 07 26;301:113964. Epub 2021 Apr 26.

H. Lundbeck A/S, Valby, Copenhagen, Denmark.

Paroxetine and sertraline are the only FDA approved drugs for treatment of posttraumatic stress disorder (PTSD). Although both drugs show better outcomes than placebo, not all patients benefit from treatment. We examined predictors and latent classes of SSRI treatment response in patients with PTSD. Symptom severity was measured over a 12-week period in 390 patients suffering from PTSD treated with open-label sertraline or paroxetine and a double-blinded placebo. First, growth curve modeling (GCM) was used to examine population-level predictors of treatment response. Second, growth mixture modeling (GMM) was used to group patients into latent classes based on their treatment response trajectories over time and to investigate predictors of latent class membership. Gender, childhood sexual trauma, and sexual assault as index trauma moderated the population-level treatment response using GCM. GMM identified three classes: fast responders, responders with low pretreatment symptom severity and responders with high pretreatment symptom severity. Class membership was predicted based on time since index trauma, severity of depression, and severity of anxiety. The study shows that higher severity of comorbid disorders does not result in an inferior response to treatment and suggests that patients with longer time since index trauma might particularly benefit from treatment with sertraline or paroxetine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.psychres.2021.113964DOI Listing
July 2021

Physical activity attenuates postprandial hyperglycaemia in homozygous TBC1D4 loss-of-function mutation carriers.

Diabetologia 2021 Aug 29;64(8):1795-1804. Epub 2021 Apr 29.

Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Aims/hypothesis: The common muscle-specific TBC1D4 p.Arg684Ter loss-of-function variant defines a subtype of non-autoimmune diabetes in Arctic populations. Homozygous carriers are characterised by elevated postprandial glucose and insulin levels. Because 3.8% of the Greenlandic population are homozygous carriers, it is important to explore possibilities for precision medicine. We aimed to investigate whether physical activity attenuates the effect of this variant on 2 h plasma glucose levels after an oral glucose load.

Methods: In a Greenlandic population cohort (n = 2655), 2 h plasma glucose levels were obtained after an OGTT, physical activity was estimated as physical activity energy expenditure and TBC1D4 genotype was determined. We performed TBC1D4-physical activity interaction analysis, applying a linear mixed model to correct for genetic admixture and relatedness.

Results: Physical activity was inversely associated with 2 h plasma glucose levels (β[main effect of physical activity] -0.0033 [mmol/l] / [kJ kg day], p = 6.5 × 10), and significantly more so among homozygous carriers of the TBC1D4 risk variant compared with heterozygous carriers and non-carriers (β[interaction] -0.015 [mmol/l] / [kJ kg day], p = 0.0085). The estimated effect size suggests that 1 h of vigorous physical activity per day (compared with resting) reduces 2 h plasma glucose levels by an additional ~0.7 mmol/l in homozygous carriers of the risk variant.

Conclusions/interpretation: Physical activity improves glucose homeostasis particularly in homozygous TBC1D4 risk variant carriers via a skeletal muscle TBC1 domain family member 4-independent pathway. This provides a rationale to implement physical activity as lifestyle precision medicine in Arctic populations.

Data Repository: The Greenlandic Cardio-Metabochip data for the Inuit Health in Transition study has been deposited at the European Genome-phenome Archive ( https://www.ebi.ac.uk/ega/dacs/EGAC00001000736 ) under accession EGAD00010001428.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00125-021-05461-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8245392PMC
August 2021

A large-scale genome-wide gene expression analysis in peripheral blood identifies very few differentially expressed genes related to antidepressant treatment and response in patients with major depressive disorder.

Neuropsychopharmacology 2021 06 8;46(7):1324-1332. Epub 2021 Apr 8.

The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.

A better understanding of the biological factors underlying antidepressant treatment in patients with major depressive disorder (MDD) is needed. We perform gene expression analyses and explore sources of variability in peripheral blood related to antidepressant treatment and treatment response in patients suffering from recurrent MDD at baseline and after 8 weeks of treatment. The study includes 281 patients, which were randomized to 8 weeks of treatment with vortioxetine (N = 184) or placebo (N = 97). To our knowledge, this is the largest dataset including both gene expression in blood and placebo-controlled treatment response measured by a clinical scale in a randomized clinical trial. We identified three novel genes whose RNA expression levels at baseline and week 8 are significantly (FDR < 0.05) associated with treatment response after 8 weeks of treatment. Among these genes were SOCS3 (FDR = 0.0039) and PROK2 (FDR = 0.0028), which have previously both been linked to depression. Downregulation of these genes was associated with poorer treatment response. We did not identify any genes that were differentially expressed between placebo and vortioxetine groups at week 8 or between baseline and week 8 of treatment. Nor did we replicate any genes identified in previous peripheral blood gene expression studies examining treatment response. Analysis of genome-wide expression variability showed that type of treatment and treatment response explains very little of the variance, a median of <0.0001% and 0.05% in gene expression across all genes, respectively. Given the relatively large size of the study, the limited findings suggest that peripheral blood gene expression might not be the best approach to explore the biological factors underlying antidepressant treatment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41386-021-01002-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8134553PMC
June 2021

The genetic history of Greenlandic-European contact.

Curr Biol 2021 05 11;31(10):2214-2219.e4. Epub 2021 Mar 11.

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark. Electronic address:

The Inuit ancestors of the Greenlandic people arrived in Greenland close to 1,000 years ago. Since then, Europeans from many different countries have been present in Greenland. Consequently, the present-day Greenlandic population has ∼25% of its genetic ancestry from Europe. In this study, we investigated to what extent different European countries have contributed to this genetic ancestry. We combined dense SNP chip data from 3,972 Greenlanders and 8,275 Europeans from 14 countries and inferred the ancestry contribution from each of these 14 countries using haplotype-based methods. Due to the rapid increase in population size in Greenland over the past ∼100 years, we hypothesized that earlier European interactions, such as pre-colonial Dutch whalers and early German and Danish-Norwegian missionaries, as well as the later Danish colonists and post-colonial immigrants, all contributed European genetic ancestry. However, we found that the European ancestry is almost entirely Danish and that a substantial fraction is from admixture that took place within the last few generations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cub.2021.02.041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284823PMC
May 2021

High genetic diversity and low differentiation reflect the ecological versatility of the African leopard.

Curr Biol 2021 05 25;31(9):1862-1871.e5. Epub 2021 Feb 25.

Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen N, Denmark. Electronic address:

Large carnivores are generally sensitive to ecosystem changes because their specialized diet and position at the top of the trophic pyramid is associated with small population sizes. Accordingly, low genetic diversity at the whole-genome level has been reported for all big cat species, including the widely distributed leopard. However, all previous whole-genome analyses of leopards are based on the Far Eastern Amur leopards that live at the extremity of the species' distribution and therefore are not necessarily representative of the whole species. We sequenced 53 whole genomes of African leopards. Strikingly, we found that the genomic diversity in the African leopard is 2- to 5-fold higher than in other big cats, including the Amur leopard, likely because of an exceptionally high effective population size maintained by the African leopard throughout the Pleistocene. Furthermore, we detected ongoing gene flow and very low population differentiation within African leopards compared with those of other big cats. We corroborated this by showing a complete absence of an otherwise ubiquitous equatorial forest barrier to gene flow. This sets the leopard apart from most other widely distributed large African mammals, including lions. These results revise our understanding of trophic sensitivity and highlight the remarkable resilience of the African leopard, likely because of its extraordinary habitat versatility and broad dietary niche.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cub.2021.01.064DOI Listing
May 2021

Large-scale Inference of Population Structure in Presence of Missingness using PCA.

Bioinformatics 2021 Jan 18. Epub 2021 Jan 18.

Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark.

Motivation: Principal component analysis (PCA) is a commonly used tool in genetics to capture and visualize population structure. Due to technological advances in sequencing, such as the widely used non-invasive prenatal test, massive datasets of ultra-low coverage sequencing are being generated. These datasets are characterized by having a large amount of missing genotype information.

Results: We present EMU, a method for inferring population structure in the presence of rampant non-random missingness. We show through simulations that several commonly used PCA methods can not handle missing data arisen from various sources, which leads to biased results as individuals are projected into the PC space based on their amount of missingness. In terms of accuracy, EMU outperforms an existing method that also accommodates missingness while being competitively fast. We further tested EMU on around 100K individuals of the Phase 1 dataset of the Chinese Millionome Project, that were shallowly sequenced to around 0.08x. From this data we are able to capture the population structure of the Han Chinese and to reproduce previous analysis in a matter of CPU hours instead of CPU years. EMU's capability to accurately infer population structure in the presence of missingness will be of increasing importance with the rising number of large-scale genetic datasets.

Availability: EMU is written in Python and is freely available at https://github.com/rosemeis/emu.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab027DOI Listing
January 2021

A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits.

Mol Ecol Resour 2021 May 8;21(4):1085-1097. Epub 2021 Feb 8.

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.

Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2-4x). We compared the RADseq SFS with medium-depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13324DOI Listing
May 2021

Vicariance followed by secondary gene flow in a young gazelle species complex.

Mol Ecol 2021 01 22;30(2):528-544. Epub 2020 Dec 22.

Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen N, Denmark.

Grant's gazelles have recently been proposed to be a species complex comprising three highly divergent mtDNA lineages (Nanger granti, N. notata and N. petersii). The three lineages have nonoverlapping distributions in East Africa, but without any obvious geographical divisions, making them an interesting model for studying the early-stage evolutionary dynamics of allopatric speciation in detail. Here, we use genomic data obtained by restriction site-associated (RAD) sequencing of 106 gazelle individuals to shed light on the evolutionary processes underlying Grant's gazelle divergence, to characterize their genetic structure and to assess the presence of gene flow between the main lineages in the species complex. We date the species divergence to 134,000 years ago, which is recent in evolutionary terms. We find population subdivision within N. granti, which coincides with the previously suggested two subspecies, N. g. granti and N. g. robertsii. Moreover, these two lineages seem to have hybridized in Masai Mara. Perhaps more surprisingly given their extreme genetic differentiation, N. granti and N. petersii also show signs of prolonged admixture in Mkomazi, which we identified as a hybrid population most likely founded by allopatric lineages coming into secondary contact. Despite the admixed composition of this population, elevated X chromosomal differentiation suggests that selection may be shaping the outcome of hybridization in this population. Our results therefore provide detailed insights into the processes of allopatric speciation and secondary contact in a recently radiated species complex.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/mec.15738DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7898927PMC
January 2021

Omega-3 fatty acids and risk of cardiovascular disease in Inuit: First prospective cohort study.

Atherosclerosis 2020 11 8;312:28-34. Epub 2020 Sep 8.

Steno Diabetes Center Copenhagen, Gentofte, Denmark; National Institute of Public Health, Southern Denmark University, Denmark.

Background And Aims: No prospective study have ever assessed if marine n-3 polyunsaturated fatty acids protect Inuit against cardiovascular disease as claimed. It is highly relevant as cardiovascular disease (CVD) incidence rates are rising concurrent with a westernization of diet. We aimed to assess the association between blood cell membrane phospholipid content of eicosapentaenoic acid and docosahexaenoic acid (EPA + DHA) on CVD risk in Inuit.

Methods: We used data from a cohort of adult Greenlanders with follow-up in national registers. The main outcome was fatal and non-fatal CVD incidence among participants without previous CVD. The continuous effect of EPA + DHA was calculated as incidence rate ratios (IRRs) using Poisson regression with age as time scale, adjusting for age, sex, genetic admixture, lifestyle and dietary risk factors.

Results: Out of 3095 eligible participants, 2924 were included. During a median follow-up of 9.7 years, 216 had their first CVD event (8.3 events/1000 person years). No association between EPA + DHA and CVD risk was seen, with IRR = 0.99 per percentage point EPA + DHA increase (95% CI: 0.95-1.03, p = 0.59). No association was seen with risk of ischemic heart disease (IHD) (IRR = 1.03, 95% CI: 0.97-1.09) and stroke (IRR = 0.98, 95% CI: 0.93-1.03) as separate outcomes or for intake of EPA and DHA.

Conclusions: We can exclude that the CVD risk reduction is larger than 21% for individuals at the 75% EPA + DHA percentile compared to the 25% percentile. We need a larger sample size and/or longer follow-up to detect smaller effects and associations with IHD and/or stroke.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.atherosclerosis.2020.08.032DOI Listing
November 2020

Population genomics of the Viking world.

Nature 2020 09 16;585(7825):390-396. Epub 2020 Sep 16.

NTNU University Museum, Department of Archaeology and Cultural History, Trondheim, Norway.

The maritime expansion of Scandinavian populations during the Viking Age (about AD 750-1050) was a far-flung transformation in world history. Here we sequenced the genomes of 442 humans from archaeological sites across Europe and Greenland (to a median depth of about 1×) to understand the global influence of this expansion. We find the Viking period involved gene flow into Scandinavia from the south and east. We observe genetic structure within Scandinavia, with diversity hotspots in the south and restricted gene flow within Scandinavia. We find evidence for a major influx of Danish ancestry into England; a Swedish influx into the Baltic; and Norwegian influx into Ireland, Iceland and Greenland. Additionally, we see substantial ancestry from elsewhere in Europe entering Scandinavia during the Viking Age. Our ancient DNA analysis also revealed that a Viking expedition included close family members. By comparing with modern populations, we find that pigmentation-associated loci have undergone strong population differentiation during the past millennium, and trace positively selected loci-including the lactase-persistence allele of LCT and alleles of ANKA that are associated with the immune response-in detail. We conclude that the Viking diaspora was characterized by substantial transregional engagement: distinct populations influenced the genomic makeup of different regions of Europe, and Scandinavia experienced increased contact with the rest of the continent.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2688-8DOI Listing
September 2020

Genetic study of the Arctic CPT1A variant suggests that its effect on fatty acid levels is modulated by traditional Inuit diet.

Eur J Hum Genet 2020 11 19;28(11):1592-1601. Epub 2020 Jun 19.

Department of Biology, Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark.

Several recent studies have found signs of recent selection on the carnitine palmitoyl-transferase 1A (CPT1A) gene in the ancestors of Arctic populations likely as a result of their traditional diet. CPT1A is involved in fatty acid transportation and is known to affect circulating fatty acid profiles in Inuit as does the unique traditional diet rich in marine animals. We aimed to assess which fatty acids may have driven the selection of rs80356779, a c.1436C>T (p.(Pro479Leu)) variant in CPT1A, by analyzing a potential interaction between the variant and traditional Inuit diet. We included 3005 genome-wide genotyped individuals living in Greenland, who had blood cell membrane fatty acid levels measured. Consumption of 25 traditional food items was expressed as percentage of total energy intake. We tested for CPT1A × traditional diet interaction while taking relatedness and admixture into account. Increasing intakes of traditional diet was estimated to attenuate the effect of 479L on 20:3 omega-6 levels (p = 0.000399), but increase the effect of the variant on 22:5 omega-3 levels (p = 0.000963). The 479L effect on 22:5 omega-3 more than doubled in individuals with a high intake of traditional diet (90% percentile) compared with individuals with a low intake (10% percentile). Similar results were found when assessing interactions with marine foods. Our results suggest that the association between traditional diet and blood cell fatty acid composition is affected by the CPT1A genotype, or other variants in linkage disequilibrium, and support the hypothesis that omega-3 fatty acids may have been important for adaptation to the Arctic diet.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41431-020-0674-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7576585PMC
November 2020

Evaluation of model fit of inferred admixture proportions.

Mol Ecol Resour 2020 Jul 25;20(4):936-949. Epub 2020 May 25.

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.

Model based methods for genetic clustering of individuals, such as those implemented in structure or ADMIXTURE, allow the user to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In the case of a bad fitting admixture model, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and nondiscrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and low depth sequencing data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13171DOI Listing
July 2020

Estimating narrow-sense heritability using family data from admixed populations.

Heredity (Edinb) 2020 06 9;124(6):751-762. Epub 2020 Apr 9.

Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, 2200, Copenhagen, Denmark.

Estimating total narrow-sense heritability in admixed populations remains an open question. In this work, we used extensive simulations to evaluate existing linear mixed-model frameworks for estimating total narrow-sense heritability in two population-based cohorts from Greenland, and compared the results with data from unadmixed individuals from Denmark. When our analysis focused on Greenlandic sib pairs, and under the assumption that shared environment among siblings has a negligible effect, the model with two relationship matrices, one capturing identity by descent and one capturing identity by state, returned heritability estimates close to the true simulated value, while using each of the two matrices alone led to downward biases. When phenotypes correlated with ancestry, heritability estimates were inflated. Based on these observations, we propose a PCA-based adjustment that recovers the true simulated heritability. We use this knowledge to estimate the heritability of ten quantitative traits from the two Greenlandic cohorts, and report differences such as lower heritability for height in Greenlanders compared with Europeans. In conclusion, narrow-sense heritability in admixed populations is best estimated when using a mixture of genetic relationship matrices on individuals with at least one first-degree relative included in the sample.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41437-020-0311-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7239878PMC
June 2020

The derived allele of a novel intergenic variant at chromosome 11 associates with lower body mass index and a favorable metabolic phenotype in Greenlanders.

PLoS Genet 2020 01 24;16(1):e1008544. Epub 2020 Jan 24.

Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

The genetic architecture of the small and isolated Greenlandic population is advantageous for identification of novel genetic variants associated with cardio-metabolic traits. We aimed to identify genetic loci associated with body mass index (BMI), to expand the knowledge of the genetic and biological mechanisms underlying obesity. Stage 1 BMI-association analyses were performed in 4,626 Greenlanders. Stage 2 replication and meta-analysis were performed in additional cohorts comprising 1,058 Yup'ik Alaska Native people, and 1,529 Greenlanders. Obesity-related traits were assessed in the stage 1 study population. We identified a common variant on chromosome 11, rs4936356, where the derived G-allele had a frequency of 24% in the stage 1 study population. The derived allele was genome-wide significantly associated with lower BMI (beta (SE), -0.14 SD (0.03), p = 3.2x10-8), corresponding to 0.64 kg/m2 lower BMI per G allele in the stage 1 study population. We observed a similar effect in the Yup'ik cohort (-0.09 SD, p = 0.038), and a non-significant effect in the same direction in the independent Greenlandic stage 2 cohort (-0.03 SD, p = 0.514). The association remained genome-wide significant in meta-analysis of the Arctic cohorts (-0.10 SD (0.02), p = 4.7x10-8). Moreover, the variant was associated with a leaner body type (weight, -1.68 (0.37) kg; waist circumference, -1.52 (0.33) cm; hip circumference, -0.85 (0.24) cm; lean mass, -0.84 (0.19) kg; fat mass and percent, -1.66 (0.33) kg and -1.39 (0.27) %; visceral adipose tissue, -0.30 (0.07) cm; subcutaneous adipose tissue, -0.16 (0.05) cm, all p<0.0002), lower insulin resistance (HOMA-IR, -0.12 (0.04), p = 0.00021), and favorable lipid levels (triglyceride, -0.05 (0.02) mmol/l, p = 0.025; HDL-cholesterol, 0.04 (0.01) mmol/l, p = 0.0015). In conclusion, we identified a novel variant, where the derived G-allele possibly associated with lower BMI in Arctic populations, and as a consequence also leaner body type, lower insulin resistance, and a favorable lipid profile.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008544DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7001991PMC
January 2020

Detection of internal N7-methylguanosine (m7G) RNA modifications by mutational profiling sequencing.

Nucleic Acids Res 2019 11;47(20):e126

Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.

Methylation of guanosine on position N7 (m7G) on internal RNA positions has been found in all domains of life and have been implicated in human disease. Here, we present m7G Mutational Profiling sequencing (m7G-MaP-seq), which allows high throughput detection of m7G modifications at nucleotide resolution. In our method, m7G modified positions are converted to abasic sites by reduction with sodium borohydride, directly recorded as cDNA mutations through reverse transcription and sequenced. We detect positions with increased mutation rates in the reduced and control samples taking the possibility of sequencing/alignment error into account and use replicates to calculate statistical significance based on log likelihood ratio tests. We show that m7G-MaP-seq efficiently detects known m7G modifications in rRNA with mutational rates up to 25% and we map a previously uncharacterised evolutionarily conserved rRNA modification at position 1581 in Arabidopsis thaliana SSU rRNA. Furthermore, we identify m7G modifications in budding yeast, human and arabidopsis tRNAs and demonstrate that m7G modification occurs before tRNA splicing. We do not find any evidence for internal m7G modifications being present in other small RNA, such as miRNA, snoRNA and sRNA, including human Let-7e. Likewise, high sequencing depth m7G-MaP-seq analysis of mRNA from E. coli or yeast cells did not identify any internal m7G modifications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz736DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6847341PMC
November 2019

A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data.

Bioinformatics 2020 02;36(3):828-841

Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.

Motivation: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets.

Results: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e. when the contaminant and the target come from closely related populations or with increased error rates. With a running time below 5 min, our method is applicable to large scale aDNA genomic studies.

Availability And Implementation: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz660DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8215924PMC
February 2020

Genomic diversity and novel genome-wide association with fruit morphology in Capsicum, from 746k polymorphic sites.

Sci Rep 2019 07 11;9(1):10067. Epub 2019 Jul 11.

CREA Research Centre for Vegetable and Ornamental Crops, Pontecagnano Faiano, Italy.

Capsicum is one of the major vegetable crops grown worldwide. Current subdivision in clades and species is based on morphological traits and coarse sets of genetic markers. Broad variability of fruits has been driven by breeding programs and has been mainly studied by linkage analysis. We discovered 746k variable sites by sequencing 1.8% of the genome in a collection of 373 accessions belonging to 11 Capsicum species from 51 countries. We describe genomic variation at population-level, confirm major subdivision in clades and species, and show that the known major subdivision of C. annuum separates large and bulky fruits from small ones. In C. annuum, we identify four novel loci associated with phenotypes determining the fruit shape, including a non-synonymous mutation in the gene Longifolia 1-like (CA03g16080). Our collection covers all the economically important species of Capsicum widely used in breeding programs and represent the widest and largest study so far in terms of the number of species and number of genetic variants analyzed. We identified a large set of markers that can be used for population genetic studies and genetic association analyses. Our results provide a comprehensive and precise perspective on genomic variability in Capsicum at population-level and suggest that future fine genetic association studies will yield useful results for breeding.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-46136-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6624249PMC
July 2019

DamMet: ancient methylome mapping accounting for errors, true variants, and post-mortem DNA damage.

Gigascience 2019 04;8(4)

Lundbeck Foundation GeoGenetics Center, University of Copenhagen, Øster Voldgade 5-7, 1350K Copenhagen, Denmark.

Background: Recent computational advances in ancient DNA research have opened access to the detection of ancient DNA methylation footprints at the genome-wide scale. The most commonly used approach infers the methylation state of a given genomic region on the basis of the amount of nucleotide mis-incorporations observed at CpG dinucleotide sites. However, this approach overlooks a number of confounding factors, including the presence of sequencing errors and true variants. The scale and distribution of the inferred methylation measurements are also variable across samples, precluding direct comparisons.

Findings: Here, we present DamMet, an open-source software program retrieving maximum likelihood estimates of regional CpG methylation levels from ancient DNA sequencing data. It builds on a novel statistical model of post-mortem DNA damage for dinucleotides, accounting for sequencing errors, genotypes, and differential post-mortem cytosine deamination rates at both methylated and unmethylated sites. To validate DamMet, we extended gargammel, a sequence simulator for ancient DNA data, by introducing methylation-dependent features of post-mortem DNA decay. This new simulator provides direct validation of DamMet predictions. Additionally, the methylation levels inferred by DamMet were found to be correlated to those inferred by epiPALEOMIX and both on par and directly comparable to those measured from whole-genome bisulphite sequencing experiments of fresh tissues.

Conclusions: DamMet provides genuine estimates for local DNA methylation levels in ancient individual genomes. The returned estimates are directly cross-sample comparable, and the software is available as an open-source C++ program hosted at https://gitlab.com/KHanghoj/DamMet along with a manual and tutorial.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giz025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6474913PMC
April 2019

Testing for Hardy-Weinberg equilibrium in structured populations using genotype or low-depth next generation sequencing data.

Mol Ecol Resour 2019 Sep 12;19(5):1144-1152. Epub 2019 Jun 12.

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark.

Testing for deviations from Hardy-Weinberg equilibrium (HWE) is a common practice for quality control in genetic studies. Variable sites violating HWE may be identified as technical errors in the sequencing or genotyping process, or they may be of particular evolutionary interest. Large-scale genetic studies based on next-generation sequencing (NGS) methods have become more prevalent as cost is decreasing but these methods are still associated with statistical uncertainty. The large-scale studies usually consist of samples from diverse ancestries that make the existence of some degree of population structure almost inevitable. Precautions are therefore needed when analysing these data set, as population structure causes deviations from HWE. Here we propose a method that takes population structure into account in the testing for HWE, such that other factors causing deviations from HWE can be detected. We show the effectiveness of PCAngsd in low-depth NGS data, as well as in genotype data, for both simulated and real data set, where the use of genotype likelihoods enables us to model the uncertainty.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13019DOI Listing
September 2019

Ancestry-specific association mapping in admixed populations.

Genet Epidemiol 2019 07 18;43(5):506-521. Epub 2019 Mar 18.

Department of Biology, The Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark.

During the last decade genome-wide association studies have proven to be a powerful approach to identifying disease-causing variants. However, for admixed populations, most current methods for association testing are based on the assumption that the effect of a genetic variant is the same regardless of its ancestry. This is a reasonable assumption for a causal variant but may not hold for the genetic variants that are tested in genome-wide association studies, which are usually not causal. The effects of noncausal genetic variants depend on how strongly their presence correlate with the presence of the causal variant, which may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect size is allowed to depend on the ancestry of a given allele. Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a substantial increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to help determine if a given genetic variant is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22200DOI Listing
July 2019

Allele frequency-free inference of close familial relationships from genotypes or low-depth sequencing data.

Mol Ecol 2019 01;28(1):35-48

Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.

Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non-model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low-depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low-depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/mec.14954DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6850436PMC
January 2019

Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History.

Cell 2018 10;175(2):347-359.e14

BGI-Shenzhen, Shenzhen 518083, Guangdong, China.

We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2018.08.016DOI Listing
October 2018

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.

Genetics 2018 10 21;210(2):719-731. Epub 2018 Aug 21.

The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200, Denmark.

We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.118.301336DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6216594PMC
October 2018

Identification of novel high-impact recessively inherited type 2 diabetes risk variants in the Greenlandic population.

Diabetologia 2018 09 20;61(9):2005-2015. Epub 2018 Jun 20.

Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200, Copenhagen, Denmark.

Aims/hypothesis: In a recent study using a standard additive genetic model, we identified a TBC1D4 loss-of-function variant with a large recessive impact on risk of type 2 diabetes in Greenlanders. The aim of the current study was to identify additional genetic variation underlying type 2 diabetes using a recessive genetic model, thereby increasing the power to detect variants with recessive effects.

Methods: We investigated three cohorts of Greenlanders (B99, n = 1401; IHIT, n = 3115; and BBH, n = 547), which were genotyped using Illumina MetaboChip. Of the 4674 genotyped individuals passing quality control, 4648 had phenotype data available, and type 2 diabetes association analyses were performed for 317 individuals with type 2 diabetes and 2631 participants with normal glucose tolerance. Statistical association analyses were performed using a linear mixed model.

Results: Using a recessive genetic model, we identified two novel loci associated with type 2 diabetes in Greenlanders, namely rs870992 in ITGA1 on chromosome 5 (OR 2.79, p = 1.8 × 10), and rs16993330 upstream of LARGE1 on chromosome 22 (OR 3.52, p = 1.3 × 10). The LARGE1 variant did not reach the conventional threshold for genome-wide significance (p < 5 × 10) but did withstand a study-wide Bonferroni-corrected significance threshold. Both variants were common in Greenlanders, with minor allele frequencies of 23% and 16%, respectively, and were estimated to have large recessive effects on risk of type 2 diabetes in Greenlanders, compared with additively inherited variants previously observed in European populations.

Conclusions/interpretation: We demonstrate the value of using a recessive genetic model in a historically small and isolated population to identify genetic risk variants. Our findings give new insights into the genetic architecture of type 2 diabetes, and further support the existence of high-effect genetic risk factors of potential clinical relevance, particularly in isolated populations.

Data Availability: The Greenlandic MetaboChip-genotype data are available at European Genome-Phenome Archive (EGA; https://ega-archive.org/ ) under the accession EGAS00001002641.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00125-018-4659-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6096637PMC
September 2018
-->