Publications by authors named "Hafid Laayouni"

43 Publications

A fully integrated machine learning scan of selection in the chimpanzee genome.

NAR Genom Bioinform 2020 Sep 3;2(3):lqaa061. Epub 2020 Sep 3.

Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain.

After diverging, each chimpanzee subspecies has been the target of unique selective pressures. Here, we employ a machine learning approach to classify regions as under positive selection or neutrality genome-wide. The regions determined to be under selection reflect the unique demographic and adaptive history of each subspecies. The results indicate that effective population size is important for determining the proportion of the genome under positive selection. The chimpanzee subspecies share signals of selection in genes associated with immunity and gene regulation. With these results, we have created a selection map for each population that can be displayed in a genome browser (www.hsb.upf.edu/chimp_browser). This study is the first to use a detailed demographic history and machine learning to map selection genome-wide in chimpanzee. The chimpanzee selection map will improve our understanding of the impact of selection on closely related subspecies and will empower future studies of chimpanzee.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nargab/lqaa061DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671310PMC
September 2020

Positive selection in admixed populations from Ethiopia.

BMC Genet 2020 10 22;21(Suppl 1):108. Epub 2020 Oct 22.

Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88 08003, Barcelona, Catalonia, Spain.

Background: In the process of adaptation of humans to their environment, positive or adaptive selection has played a main role. Positive selection has, however, been under-studied in African populations, despite their diversity and importance for understanding human history.

Results: Here, we have used 119 available whole-genome sequences from five Ethiopian populations (Amhara, Oromo, Somali, Wolayta and Gumuz) to investigate the modes and targets of positive selection in this part of the world. The site frequency spectrum-based test SFselect was applied to idfentify a wide range of events of selection (old and recent), and the haplotype-based statistic integrated haplotype score to detect more recent events, in each case with evaluation of the significance of candidate signals by extensive simulations. Additional insights were provided by considering admixture proportions and functional categories of genes. We identified both individual loci that are likely targets of classic sweeps and groups of genes that may have experienced polygenic adaptation. We found population-specific as well as shared signals of selection, with folate metabolism and the related ultraviolet response and skin pigmentation standing out as a shared pathway, perhaps as a response to the high levels of ultraviolet irradiation, and in addition strong signals in genes such as IFNA, MRC1, immunoglobulins and T-cell receptors which contribute to defend against pathogens.

Conclusions: Signals of positive selection were detected in Ethiopian populations revealing novel adaptations in East Africa, and abundant targets for functional follow-up.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12863-020-00908-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7580818PMC
October 2020

The shaping of immunological responses through natural selection after the Roma Diaspora.

Sci Rep 2020 09 30;10(1):16134. Epub 2020 Sep 30.

Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, 6525 GA, Nijmegen, The Netherlands.

The Roma people are the largest transnational ethnic minority in Europe and can be considered the last human migration of South Asian origin into the continent. They left Northwest India approximately 1,000 years ago, reaching the Balkan Peninsula around the twelfth century and Romania in the fourteenth century. Here, we analyze whole-genome sequencing data of 40 Roma and 40 non-Roma individuals from Romania. We performed a genome-wide scan of selection comparing Roma, their local host population, and a Northwestern Indian population, to identify the selective pressures faced by the Roma mainly after they settled in Europe. We identify under recent selection several pathways implicated in immune responses, among them cellular metabolism pathways known to be rewired after immune stimulation. We validated the interaction between PIK3-mTOR-HIF-1α and cytokine response influenced by bacterial and fungal infections. Our results point to a significant role of these pathways for host defense against the most prevalent pathogens in Europe during the last millennium.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-73182-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7528012PMC
September 2020

Large multiple sequence alignments with a root-to-leaf regressive method.

Nat Biotechnol 2019 12 2;37(12):1466-1470. Epub 2019 Dec 2.

Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain.

Multiple sequence alignments (MSAs) are used for structural and evolutionary predictions, but the complexity of aligning large datasets requires the use of approximate solutions, including the progressive algorithm. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf to root, based on a guide tree. Their accuracy declines substantially as the number of sequences is scaled up. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around from the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41587-019-0333-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943PMC
December 2019

Gene connectivity and enzyme evolution in the human metabolic network.

Biol Direct 2019 09 3;14(1):17. Epub 2019 Sep 3.

Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain.

Background: Determining the factors involved in the likelihood of a gene being under adaptive selection is still a challenging goal in Evolutionary Biology. Here, we perform an evolutionary analysis of the human metabolic genes to explore the associations between network structure and the presence and strength of natural selection in the genes whose products are involved in metabolism. Purifying and positive selection are estimated at interspecific (among mammals) and intraspecific (among human populations) levels, and the connections between enzymatic reactions are differentiated between incoming (in-degree) and outgoing (out-degree) links.

Results: We confirm that purifying selection has been stronger in highly connected genes. Long-term positive selection has targeted poorly connected enzymes, whereas short-term positive selection has targeted different enzymes depending on whether the selective sweep has reached fixation in the population: genes under a complete selective sweep are poorly connected, whereas those under an incomplete selective sweep have high out-degree connectivity. The last steps of pathways are more conserved due to stronger purifying selection, with long-term positive selection targeting preferentially enzymes that catalyze the first steps. However, short-term positive selection has targeted enzymes that catalyze the last steps in the metabolic network. Strong signals of positive selection have been found for metabolic processes involved in lipid transport and membrane fluidity and permeability.

Conclusions: Our analysis highlights the importance of analyzing the same biological system at different evolutionary timescales to understand the evolution of metabolic genes and of distinguishing between incoming and outgoing links in a metabolic network. Short-term positive selection has targeted enzymes with a different connectivity profile depending on the completeness of the selective sweep, while long-term positive selection has targeted genes with fewer connections that code for enzymes that catalyze the first steps in the network.

Reviewers: This article was reviewed by Diamantis Sellis and Brandon Invergo.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13062-019-0248-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6724310PMC
September 2019

Influence of pathway topology and functional class on the molecular evolution of human metabolic genes.

PLoS One 2018 14;13(12):e0208782. Epub 2018 Dec 14.

Institute of Systems Biology, Department of Biochemistry and Molecular Biology, University of Valencia, Valencia, Spain.

Metabolic networks comprise thousands of enzymatic reactions functioning in a controlled manner and have been shaped by natural selection. Thanks to the genome data, the footprints of adaptive (positive) selection are detectable, and the strength of purifying selection can be measured. This has made possible to know where, in the metabolic network, adaptive selection has acted and where purifying selection is more or less strong and efficient. We have carried out a comprehensive molecular evolutionary study of all the genes involved in the human metabolism. We investigated the type and strength of the selective pressures that acted on the enzyme-coding genes belonging to metabolic pathways during the divergence of primates and rodents. Then, we related those selective pressures to the functional and topological characteristics of the pathways. We have used DNA sequences of all enzymes (956) of the metabolic pathways comprised in the HumanCyc database, using genome data for humans and five other mammalian species. We have found that the evolution of metabolic genes is primarily constrained by the layer of the metabolism in which the genes participate: while genes encoding enzymes of the inner core of metabolism are much conserved, those encoding enzymes participating in the outer layer, mediating the interaction with the environment, are evolutionarily less constrained and more plastic, having experienced faster functional evolution. Genes that have been targeted by adaptive selection are endowed by higher out-degree centralities than non-adaptive genes, while genes with high in-degree centralities are under stronger purifying selection. When the position along the pathway is considered, a funnel-like distribution of the strength of the purifying selection is found. Genes at bottom positions are highly preserved by purifying selection, whereas genes at top positions, catalyzing the first steps, are open to evolutionary changes. These results show how functional and topological characteristics of metabolic pathways contribute to shape the patterns of evolutionary pressures driven by natural selection and how pathway network structure matters in the evolutionary process that shapes the evolution of the system.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208782PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6294346PMC
May 2019

Evaluating the Genetics of Common Variable Immunodeficiency: Monogenetic Model and Beyond.

Front Immunol 2018 14;9:636. Epub 2018 May 14.

Servei de Genòmica, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain.

Common variable immunodeficiency (CVID) is the most frequent symptomatic primary immunodeficiency characterized by recurrent infections, hypogammaglobulinemia and poor response to vaccines. Its diagnosis is made based on clinical and immunological criteria, after exclusion of other diseases that can cause similar phenotypes. Currently, less than 20% of cases of CVID have a known underlying genetic cause. We have analyzed whole-exome sequencing and copy number variants data of 36 children and adolescents diagnosed with CVID and healthy relatives to estimate the proportion of monogenic cases. We have replicated an association of CVID to p.C104R in TNFRSF13B and reported the second case of homozygous patient to date. Our results also identify five causative genetic variants in , and , as well as other very likely causative variants in , or among others. We experimentally validate the effect of the stop-gain mutation which abolishes protein production and downregulates the expression of CTLA4, and of the frameshift indel in producing expression downregulation of the protein. Our results indicate a monogenic origin of at least 15-24% of the CVID cases included in the study. The proportion of monogenic patients seems to be lower in CVID than in other PID that have also been analyzed by whole exome or targeted gene panels sequencing. Regardless of the exact proportion of CVID monogenic cases, other genetic models have to be considered for CVID. We propose that because of its prevalence and other features as intermediate penetrancies and phenotypic variation within families, CVID could fit with other more complex genetic scenarios. In particular, in this work, we explore the possibility of CVID being originated by an oligogenic model with the presence of heterozygous mutations in interacting proteins or by the accumulation of detrimental variants in particular immunological pathways, as well as perform association tests to detect association with rare genetic functional variation in the CVID cohort compared to healthy controls.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fimmu.2018.00636DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5960686PMC
June 2019

Selection in the Introgressed Regions of the Chimpanzee Genome.

Genome Biol Evol 2018 04;10(4):1132-1138

Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.

During the demographic history of the Pan clade, there has been gene-flow between species, likely >200,000 years ago. Bonobo haplotypes in three subspecies of chimpanzee have been identified to be segregating in modern-day chimpanzee populations, suggesting that these haplotypes, with increased differentiation, may be a target of natural selection. Here, we investigate signatures of adaptive introgression within the bonobo-like haplotypes in chimpanzees using site frequency spectrum-based tests. We find evidence for subspecies-specific adaptations in introgressed regions involved with male reproduction in central chimpanzees, the immune system in eastern chimpanzees, female reproduction and the nervous system in Nigeria-Cameroon chimpanzees. Furthermore, our results indicate signatures of balancing selection in some of the putatively introgressed regions. This might be the product of long-term balancing selection resulting in a similar genomic signature as introgression, or possibly balancing selection acting on alleles reintroduced through gene flow.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evy077DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5905441PMC
April 2018

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

Gigascience 2017 11;6(11):1-6

Institut de Biologia Evolutiva, (CSIC-Universitat Pompeu Fabra), PRBB, Doctor Aiguader 88, Barcelona, Catalonia 08003, Spain.

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan_tro_2.1.4) by several metrics, such as increased contiguity by >750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/gix098DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5714192PMC
November 2017

PopHuman: the human population genomics browser.

Nucleic Acids Res 2018 01;46(D1):D1003-D1010

Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain.

The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx943DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753332PMC
January 2018

Y-chromosomal sequences of diverse Indian populations and the ancestry of the Andamanese.

Hum Genet 2017 05 25;136(5):499-510. Epub 2017 Apr 25.

Institute of Evolutionary Biology (CSIC-UPF), Universitat Pompeu Fabra, Doctor Aiguader 88 (PRBB), 08003, Barcelona, Catalonia, Spain.

We present 42 new Y-chromosomal sequences from diverse Indian tribal and non-tribal populations, including the Jarawa and Onge from the Andaman Islands, which are analysed within a calibrated Y-chromosomal phylogeny incorporating South Asian (in total 305 individuals) and worldwide (in total 1286 individuals) data from the 1000 Genomes Project. In contrast to the more ancient ancestry in the South than in the North that has been claimed, we detected very similar coalescence times within Northern and Southern non-tribal Indian populations. A closest neighbour analysis in the phylogeny showed that Indian populations have an affinity towards Southern European populations and that the time of divergence from these populations substantially predated the Indo-European migration into India, probably reflecting ancient shared ancestry rather than the Indo-European migration, which had little effect on Indian male lineages. Among the tribal populations, the Birhor (Austro-Asiatic-speaking) and Irula (Dravidian-speaking) are the nearest neighbours of South Asian non-tribal populations, with a common origin in the last few millennia. In contrast, the Riang (Tibeto-Burman-speaking) and Andamanese have their nearest neighbour lineages in East Asia. The Jarawa and Onge shared haplogroup D lineages with each other within the last ~7000 years, but had diverged from Japanese haplogroup D Y-chromosomes ~53000 years ago, most likely by a split from a shared ancestral population. This analysis suggests that Indian populations have complex ancestry which cannot be explained by a single expansion model.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00439-017-1800-0DOI Listing
May 2017

Natural Selection in the Great Apes.

Mol Biol Evol 2016 12 30;33(12):3268-3283. Epub 2016 Oct 30.

Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.

Natural selection is crucial for the adaptation of populations to their environments. Here, we present the first global study of natural selection in the Hominidae (humans and great apes) based on genome-wide information from population samples representing all extant species (including most subspecies). Combining several neutrality tests we create a multi-species map of signatures of natural selection covering all major types of natural selection. We find that the estimated efficiency of both purifying and positive selection varies between species and is significantly correlated with their long-term effective population size. Thus, even the modest differences in population size among the closely related Hominidae lineages have resulted in differences in their ability to remove deleterious alleles and to adapt to changing environments. Most signatures of balancing and positive selection are species-specific, with signatures of balancing selection more often being shared among species. We also identify loci with evidence of positive selection across several lineages. Notably, we detect signatures of positive selection in several genes related to brain function, anatomy, diet and immune processes. Our results contribute to a better understanding of human evolution by putting the evidence of natural selection in humans within its larger evolutionary context. The global map of natural selection in our closest living relatives is available as an interactive browser at http://tinyurl.com/nf8qmzh.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msw215DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100057PMC
December 2016

Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation.

Nat Genet 2016 09 25;48(9):1066-70. Epub 2016 Jul 25.

Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Spain.

To shed light on the peopling of South Asia and the origins of the morphological adaptations found there, we analyzed whole-genome sequences from 10 Andamanese individuals and compared them with sequences for 60 individuals from mainland Indian populations with different ethnic histories and with publicly available data from other populations. We show that all Asian and Pacific populations share a single origin and expansion out of Africa, contradicting an earlier proposal of two independent waves of migration. We also show that populations from South and Southeast Asia harbor a small proportion of ancestry from an unknown extinct hominin, and this ancestry is absent from Europeans and East Asians. The footprints of adaptive selection in the genomes of the Andamanese show that the characteristic distinctive phenotypes of this population (including very short stature) do not reflect an ancient African origin but instead result from strong natural selection on genes related to human body size.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3621DOI Listing
September 2016

MicroRNA Genetic Variation: From Population Analysis to Functional Implications of Three Allele Variants Associated with Cancer.

Hum Mutat 2016 10 29;37(10):1060-73. Epub 2016 Aug 29.

Department of Experimental and Health Sciences, IBE, Institute of Evolutionary Biology, (Universitat Pompeu Fabra-CSIC), Barcelona, Catalonia, Spain.

Nucleotide variants in microRNA regions have been associated with disease; nevertheless, few studies still have addressed the allele-dependent effect of these changes. We studied microRNA genetic variation in human populations and found that while low-frequency variants accumulate indistinctly in microRNA regions, the mature and seed regions tend to be depleted of high-frequency variants, probably as a result of purifying selection. Comparison of pairwise population fixation indexes among regions showed that the seed had higher population fixation indexes than the other regions, suggesting the existence of local adaptation in the seed region. We further performed functional studies of three microRNA variants associated with cancer (rs2910164:C > G in MIR146A, rs11614913:C > T in MIR196A2, and rs3746444:A > G in both MIR499A and MIR499B). We found differences in the expression between alleles and in the regulation of several genes involved in cancer, such as TP53, KIT, CDH1, CLH, and TERT, which may result in changes in regulatory networks related to tumorigenesis. Furthermore, luciferase-based assays showed that MIR499A could be regulating the cadherin CDH1 and the cell adhesion molecule CLH1 in an allele-dependent fashion. A better understanding of the effect of microRNA variants associated with disease could be key in our way to a more personalized medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23045DOI Listing
October 2016

Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations.

Bioinformatics 2015 Dec 26;31(24):3946-52. Epub 2015 Aug 26.

Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain.

Motivation: Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness).

Results: We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography.

Availability And Implementation: The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the 'Hierarchical Boosting' framework are available at http://hsb.upf.edu/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv493DOI Listing
December 2015

The genetics of East African populations: a Nilo-Saharan component in the African genetic landscape.

Sci Rep 2015 May 28;5:9996. Epub 2015 May 28.

Institut de Biologia Evolutiva (UPF-CSIC), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.

East Africa is a strategic region to study human genetic diversity due to the presence of ethnically, linguistically, and geographically diverse populations. Here, we provide new insight into the genetic history of populations living in the Sudanese region of East Africa by analysing nine ethnic groups belonging to three African linguistic families: Niger-Kordofanian, Nilo-Saharan and Afro-Asiatic. A total of 500 individuals were genotyped for 200,000 single-nucleotide polymorphisms. Principal component analysis, clustering analysis using ADMIXTURE, FST statistics, and the three-population test were used to investigate the underlying genetic structure and ancestry of the different ethno-linguistic groups. Our analyses revealed a genetic component for Sudanese Nilo-Saharan speaking groups (Darfurians and part of Nuba populations) related to Nilotes of South Sudan, but not to other Sudanese populations or other sub-Saharan populations. Populations inhabiting the North of the region showed close genetic affinities with North Africa, with a component that could be remnant of North Africans before the migrations of Arabs from Arabia. In addition, we found very low genetic distances between populations in genes important for anti-malarial and anti-bacterial host defence, suggesting similar selective pressures on these genes and stressing the importance of considering functional pathways to understand the evolutionary history of populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep09996DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446898PMC
May 2015

Recent positive selection has acted on genes encoding proteins with more interactions within the whole human interactome.

Genome Biol Evol 2015 Apr 2;7(4):1141-54. Epub 2015 Apr 2.

Institute of Evolutionary Biology, Universitat Pompeu Fabra-CSIC, CEXS-UPF-PRBB, Barcelona, Catalonia, Spain Departament de Genètica i de Microbiologia, Grup de Biologia Evolutiva (GBE), Universitat Autonòma de Barcelona, Bellaterra, Spain

Genes vary in their likelihood to undergo adaptive evolution. The genomic factors that determine adaptability, however, remain poorly understood. Genes function in the context of molecular networks, with some occupying more important positions than others and thus being likely to be under stronger selective pressures. However, how positive selection distributes across the different parts of molecular networks is still not fully understood. Here, we inferred positive selection using comparative genomics and population genetics approaches through the comparison of 10 mammalian and 270 human genomes, respectively. In agreement with previous results, we found that genes with lower network centralities are more likely to evolve under positive selection (as inferred from divergence data). Surprisingly, polymorphism data yield results in the opposite direction than divergence data: Genes with higher centralities are more likely to have been targeted by recent positive selection during recent human evolution. Our results indicate that the relationship between centrality and the impact of adaptive evolution highly depends on the mode of positive selection and/or the evolutionary time-scale.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evv055DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4419801PMC
April 2015

VCF2Networks: applying genotype networks to single-nucleotide variants data.

Bioinformatics 2015 Feb 4;31(3):438-9. Epub 2014 Oct 4.

Department of Experimental and Health Sciences, Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Catalonia, Spain, Institute of Evolutionary Biology and Environmental Studies/Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland, SIB, CIG Quartier Sorge, bâtiment Génopode 1015 Lausanne, Switzerland, The Santa Fe Institute, 1399 Hyde Parke Road, 87501 Santa Fe, New Mexico, USA and Departament de Genetica i de Microbiologia, Grup de Biologia Evolutiva (GBE), Universitat Autonoma de Barcelona, 08913 Bellaterra, Barcelona Department of Experimental and Health Sciences, Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Catalonia, Spain, Institute of Evolutionary Biology and Environmental Studies/Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland, SIB, CIG Quartier Sorge, bâtiment Génopode 1015 Lausanne, Switzerland, The Santa Fe Institute, 1399 Hyde Parke Road, 87501 Santa Fe, New Mexico, USA and Departament de Genetica i de Microbiologia, Grup de Biologia Evolutiva (GBE), Universitat Autonoma de Barcelona, 08913 Bellaterra, Barcelona.

Summary: A wealth of large-scale genome sequencing projects opens the doors to new approaches to study the relationship between genotype and phenotype. One such opportunity is the possibility to apply genotype networks analysis to population genetics data. Genotype networks are a representation of the set of genotypes associated with a single phenotype, and they allow one to estimate properties such as the robustness of the phenotype to mutations, and the ability of its associated genotypes to evolve new adaptations. So far, though, genotype networks analysis has rarely been applied to population genetics data. To help fill this gap, here we present VCF2Networks, a tool to determine and study genotype network structure from single-nucleotide variant data.

Availability And Implementation: VCF2Networks is available at https://bitbucket.org/dalloliogm/vcf2networks.

Contact: giovanni.dallolio@kcl.ac.uk

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu650DOI Listing
February 2015

Population and genomic lessons from genetic analysis of two Indian populations.

Hum Genet 2014 Oct 1;133(10):1273-87. Epub 2014 Jul 1.

Department of Genetics, University of Delhi South Campus, New Delhi, 110 021, India.

Indian demographic history includes special features such as founder effects, interpopulation segregation, complex social structure with a caste system and elevated frequency of consanguineous marriages. It also presents a higher frequency for some rare mendelian disorders and in the last two decades increased prevalence of some complex disorders. Despite the fact that India represents about one-sixth of the human population, deep genetic studies from this terrain have been scarce. In this study, we analyzed high-density genotyping and whole-exome sequencing data of a North and a South Indian population. Indian populations show higher differentiation levels than those reported between populations of other continents. In this work, we have analyzed its consequences, by specifically assessing the transferability of genetic markers from or to Indian populations. We show that there is limited genetic marker portability from available genetic resources such as HapMap or the 1,000 Genomes Project to Indian populations, which also present an excess of private rare variants. Conversely, tagSNPs show a high level of portability between the two Indian populations, in contrast to the common belief that North and South Indian populations are genetically very different. By estimating kinship from mates and consanguinity in our data from trios, we also describe different patterns of assortative mating and inbreeding in the two populations, in agreement with distinct mating preferences and social structures. In addition, this analysis has allowed us to describe genomic regions under recent adaptive selection, indicating differential adaptive histories for North and South Indian populations. Our findings highlight the importance of considering demography for design and analysis of genetic studies, as well as the need for extending human genetic variation catalogs to new populations and particularly to those with particular demographic histories.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00439-014-1462-0DOI Listing
October 2014

Human genome variation and the concept of genotype networks.

PLoS One 2014 9;9(6):e99424. Epub 2014 Jun 9.

Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, Barcelona, Catalonia, Spain; Universitat Autonòma de Barcelona, Barcelona, Spain.

Genotype networks are a concept used in systems biology to study sets of genotypes having the same phenotype, and the ability of these to bring forth novel phenotypes. In the past they have been applied to determine the genetic heterogeneity, and stability to mutations, of systems such as metabolic networks and RNA folds. Recently, they have been the base for reconciling the neutralist and selectionist views on evolution. Here, we adapted this concept to the study of population genetics data. Specifically, we applied genotype networks to the human 1000 genomes dataset, and analyzed networks composed of short haplotypes of Single Nucleotide Variants (SNV). The result is a scan of how properties related to genetic heterogeneity and stability to mutations are distributed along the human genome. We found that genes involved in acquired immunity, such as some HLA and MHC genes, tend to have the most heterogeneous and connected networks, and that coding regions tend to be more heterogeneous and stable to mutations than non-coding regions. We also found, using coalescent simulations, that regions under selection have more extended and connected networks. The application of the concept of genotype networks can provide a new opportunity to understand the evolutionary processes that shaped our genome. Learning how the genotype space of each region of our genome has been explored during the evolutionary history of the human species can lead to a better understanding on how selective pressures and neutral factors have shaped genetic diversity within populations and among individuals. Combined with the availability of larger datasets of sequencing data, genotype networks represent a new approach to the study of human genetic diversity that looks to the whole genome, and goes beyond the classical division between selection and neutrality methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099424PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4049842PMC
October 2015

Convergent evolution in European and Rroma populations reveals pressure exerted by plague on Toll-like receptors.

Proc Natl Acad Sci U S A 2014 Feb 3;111(7):2668-73. Epub 2014 Feb 3.

Institut de Biologia Evolutiva (Consejo Superior de Investigaciones Cientificas-Universitat Pompeu Fabra), Universitat Pompeu Fabra, 08003 Barcelona, Spain.

Recent historical periods in Europe have been characterized by severe epidemic events such as plague, smallpox, or influenza that shaped the immune system of modern populations. This study aims to identify signals of convergent evolution of the immune system, based on the peculiar demographic history in which two populations with different genetic ancestry, Europeans and Rroma (Gypsies), have lived in the same geographic area and have been exposed to similar environments, including infections, during the last millennium. We identified several genes under evolutionary pressure in European/Romanian and Rroma/Gipsy populations, but not in a Northwest Indian population, the geographic origin of the Rroma. Genes in the immune system were highly represented among those under strong evolutionary pressures in Europeans, and infections are likely to have played an important role. For example, Toll-like receptor 1 (TLR1)/TLR6/TLR10 gene cluster showed a strong signal of adaptive selection. Their gene products are functional receptors for Yersinia pestis, the agent of plague, as shown by overexpression studies showing induction of proinflammatory cytokines such as TNF, IL-1β, and IL-6 as one possible infection that may have exerted evolutionary pressures. Immunogenetic analysis showed that TLR1, TLR6, and TLR10 single-nucleotide polymorphisms modulate Y. pestis-induced cytokine responses. Other infections may also have played an important role. Thus, reconstruction of evolutionary history of European populations has identified several immune pathways, among them TLR1/TLR6/TLR10, as being shaped by convergent evolution in two human populations with different origins under the same infectious environment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1317723111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3932890PMC
February 2014

1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans.

Nucleic Acids Res 2014 Jan 25;42(Database issue):D903-9. Epub 2013 Nov 25.

Program for Population Genetics, Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain, Population Genomics Node, National Institute for Bioinformatics (INB), Universitat Pompeu Fabra, 08003 Barcelona, Spain, Institute of Molecular Biology and Biotechnology-FORTH, Heraklion, Crete GR 700 13, Greece and Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.

Searching for Darwinian selection in natural populations has been the focus of a multitude of studies over the last decades. Here we present the 1000 Genomes Selection Browser 1.0 (http://hsb.upf.edu) as a resource for signatures of recent natural selection in modern humans. We have implemented and applied a large number of neutrality tests as well as summary statistics informative for the action of selection such as Tajima's D, CLR, Fay and Wu's H, Fu and Li's F* and D*, XPEHH, ΔiHH, iHS, F(ST), ΔDAF and XPCLR among others to low coverage sequencing data from the 1000 genomes project (Phase 1; release April 2012). We have implemented a publicly available genome-wide browser to communicate the results from three different populations of West African, Northern European and East Asian ancestry (YRI, CEU, CHB). Information is provided in UCSC-style format to facilitate the integration with the rich UCSC browser tracks and an access page is provided with instructions and for convenient visualization. We believe that this expandable resource will facilitate the interpretation of signals of selection on different temporal, geographical and genomic scales.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt1188DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965045PMC
January 2014

Metabolic flux is a determinant of the evolutionary rates of enzyme-encoding genes.

Evolution 2014 Feb 19;68(2):605-13. Epub 2013 Sep 19.

Institute of Evolutionary Biology (CSIC- Pompeu Fabra University), CEXS-UPF-PRBB, Dr. Aiguader 88, 08003 Barcelona, Catalonia, Spain.

Relationships between evolutionary rates and gene properties on a genomic, functional, pathway, or system level are being explored to unravel the principles of the evolutionary process. In particular, functional network properties have been analyzed to recognize the constraints they may impose on the evolutionary fate of genes. Here we took as a case study the core metabolic network in human erythrocytes and we analyzed the relationship between the evolutionary rates of its genes and the metabolic flux distribution throughout it. We found that metabolic flux correlates with the ratio of nonsynonymous to synonymous substitution rates. Genes encoding enzymes that carry high fluxes have been more constrained in their evolution, while purifying selection is more relaxed in genes encoding enzymes carrying low metabolic fluxes. These results demonstrate the importance of considering the dynamical functioning of gene networks when assessing the action of selection on system-level properties.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/evo.12262DOI Listing
February 2014

Great ape genetic diversity and population history.

Nature 2013 Jul 3;499(7459):471-5. Epub 2013 Jul 3.

Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, PRBB, Doctor Aiguader 88, Barcelona, Catalonia 08003, Spain.

Most great ape genetic variation remains uncharacterized; however, its study is critical for understanding population history, recombination, selection and susceptibility to disease. Here we sequence to high coverage a total of 79 wild- and captive-born individuals representing all six great ape species and seven subspecies and report 88.8 million single nucleotide polymorphisms. Our analysis provides support for genetically distinct populations within each species, signals of gene flow, and the split of common chimpanzees into two distinct groups: Nigeria-Cameroon/western and central/eastern populations. We find extensive inbreeding in almost all wild populations, with eastern gorillas being the most extreme. Inferred effective population sizes have varied radically over time in different lineages and this appears to have a profound effect on the genetic diversity at, or close to, genes in almost all species. We discover and assign 1,982 loss-of-function variants throughout the human and great ape lineages, determining that the rate of gene loss has not been different in the human branch compared to other internal branches in the great ape phylogeny. This comprehensive catalogue of great ape genome diversity provides a framework for understanding evolution and a resource for more effective management of wild and captive great ape populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature12228DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3822165PMC
July 2013

A system-level, molecular evolutionary analysis of mammalian phototransduction.

BMC Evol Biol 2013 Feb 23;13:52. Epub 2013 Feb 23.

Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), CEXS-UPF-PRBB, Barcelona, Catalonia, Spain.

Background: Visual perception is initiated in the photoreceptor cells of the retina via the phototransduction system. This system has shown marked evolution during mammalian divergence in such complex attributes as activation time and recovery time. We have performed a molecular evolutionary analysis of proteins involved in mammalian phototransduction in order to unravel how the action of natural selection has been distributed throughout the system to evolve such traits.

Results: We found selective pressures to be non-randomly distributed according to both a simple protein classification scheme and a protein-interaction network representation of the signaling pathway. Proteins which are topologically central in the signaling pathway, such as the G proteins, as well as retinoid cycle chaperones and proteins involved in photoreceptor cell-type determination, were found to be more constrained in their evolution. Proteins peripheral to the pathway, such as ion channels and exchangers, as well as the retinoid cycle enzymes, have experienced a relaxation of selective pressures. Furthermore, signals of positive selection were detected in two genes: the short-wave (blue) opsin (OPN1SW) in hominids and the rod-specific Na+/ Ca2+, K+ ion exchanger (SLC24A1) in rodents.

Conclusions: The functions of the proteins involved in phototransduction and the topology of the interactions between them have imposed non-random constraints on their evolution. Thus, in shaping or conserving system-level phototransduction traits, natural selection has targeted the underlying proteins in a concerted manner.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2148-13-52DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3616935PMC
February 2013

Distribution of events of positive selection and population differentiation in a metabolic pathway: the case of asparagine N-glycosylation.

BMC Evol Biol 2012 Jun 25;12:98. Epub 2012 Jun 25.

IBE, Institut de Biologia Evolutiva (UPF-CSIC), Parc de Recerca Biomèdica de Barcelona (PRBB), Dr, Aiguader, 88, 08003, Barcelona, Catalonia, Spain.

Background: Asparagine N-Glycosylation is one of the most important forms of protein post-translational modification in eukaryotes. This metabolic pathway can be subdivided into two parts: an upstream sub-pathway required for achieving proper folding for most of the proteins synthesized in the secretory pathway, and a downstream sub-pathway required to give variability to trans-membrane proteins, and involved in adaptation to the environment and innate immunity. Here we analyze the nucleotide variability of the genes of this pathway in human populations, identifying which genes show greater population differentiation and which genes show signatures of recent positive selection. We also compare how these signals are distributed between the upstream and the downstream parts of the pathway, with the aim of exploring how forces of population differentiation and positive selection vary among genes involved in the same metabolic pathway but subject to different functional constraints.

Results: Our results show that genes in the downstream part of the pathway are more likely to show a signature of population differentiation, while events of positive selection are equally distributed among the two parts of the pathway. Moreover, events of positive selection are frequent on genes that are known to be at bifurcation points, and that are identified as being in key position by a network-level analysis such as MGAT3 and GCS1.

Conclusions: These findings indicate that the upstream part of the Asparagine N-Glycosylation pathway has lower diversity among populations, while the downstream part is freer to tolerate diversity among populations. Moreover, the distribution of signatures of population differentiation and positive selection can change between parts of a pathway, especially between parts that are exposed to different functional constraints. Our results support the hypothesis that genes involved in constitutive processes can be expected to show lower population differentiation, while genes involved in traits related to the environment should show higher variability. Taken together, this work broadens our knowledge on how events of population differentiation and of positive selection are distributed among different parts of a metabolic pathway.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2148-12-98DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426484PMC
June 2012

Network-level and population genetics analysis of the insulin/TOR signal transduction pathway across human populations.

Mol Biol Evol 2012 May 1;29(5):1379-92. Epub 2011 Dec 1.

Institute of Evolutionary Biology CEXS-UPF-PRBB, Barcelona, Catalonia, Spain.

Genes and proteins rarely act in isolation, but they rather operate as components of complex networks of interacting molecules. Therefore, for understanding their evolution, it may be helpful to take into account the interaction networks in which they participate. It has been shown that selective constraints acting on genes depend on the position that they occupy in the network. Less understood is how the impact of local adaptation at the intraspecific level is affected by the network structure. Here, we analyzed the patterns of molecular evolution of 67 genes involved in the insulin/target of rapamycin (TOR) signal transduction pathway. This well-characterized pathway plays a key role in fundamental processes such as energetic metabolism, growth, reproduction, and aging and is involved in metabolic disorders such as obesity, insulin resistance, and diabetes. For that purpose, we combined genotype data from worldwide human populations with current knowledge of the structure and function of the pathway. We identified the footprint of recent positive selection in nine of the studied genomic regions. Most of the adaptation signals were observed among Middle East and North African, European, and Central South Asian populations. We found that positive selection preferentially targets the most central elements in the pathway, in contrast to previous observations in the whole human interactome. This observation indicates that the impact of positive selection on genes involved in the insulin/TOR pathway is affected by the pathway structure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msr298DOI Listing
May 2012

A targeted association study of immunity genes and networks suggests novel associations with placental malaria infection.

PLoS One 2011 19;6(9):e24996. Epub 2011 Sep 19.

Institute of Evolutionary Biology (UPF-CSIC), CEXS-UPF-PRBB, Barcelona, Catalonia, Spain.

A large proportion of the death toll associated with malaria is a consequence of malaria infection during pregnancy, causing up to 200,000 infant deaths annually. We previously published the first extensive genetic association study of placental malaria infection, and here we extend this analysis considerably, investigating genetic variation in over 9,000 SNPs in more than 1,000 genes involved in immunity and inflammation for their involvement in susceptibility to placental malaria infection. We applied a new approach incorporating results from both single gene analysis as well as gene-gene interactions on a protein-protein interaction network. We found suggestive associations of variants in the gene KLRK1 in the single gene analysis, as well as evidence for associations of multiple members of the IL-7/IL-7R signalling cascade in the combined analysis. To our knowledge, this is the first large-scale genetic study on placental malaria infection to date, opening the door for follow-up studies trying to elucidate the genetic basis of this neglected form of malaria.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0024996PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176307PMC
February 2012