Publications by authors named "Sergey V Nuzhdin"

110 Publications

Interspecific Sample Prioritization Can Improve QTL Detection With Tree-Based Predictive Models.

Front Genet 2021 6;12:684882. Epub 2021 Sep 6.

Department of Biological Sciences, University of Southern California, Los Angeles, CA, United States.

Due to increasing demand for new advanced crops, considerable efforts have been made to explore the improvement of stress and disease resistance cultivar traits through the study of wild crops. When both wild and interspecific hybrid materials are available, a common approach has been to study two types of materials separately and simply compare the quantitative trait locus (QTL) regions. However, combining the two types of materials can potentially create a more efficient method of finding predictive QTLs. In this simulation study, we focused on scenarios involving causal marker expression suppressed by regulatory mechanisms, where the otherwise easily lost associated signals benefit the most from combining the two types of data. A probabilistic sampling approach was used to prioritize consistent genotypic phenotypic patterns across both types of data sets. We chose random forest and gradient boosting to apply the prioritization scheme and found that both facilitated the investigation of predictive causal markers in most of the biological scenarios simulated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2021.684882DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8450460PMC
September 2021

Genotyping and lipid profiling of 601 cultivated sunflower lines reveals novel genetic determinants of oil fatty acid content.

BMC Genomics 2021 Jul 5;22(1):505. Epub 2021 Jul 5.

Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA.

Background: Sunflower is an important oilseed crop domesticated in North America approximately 4000 years ago. During the last century, oil content in sunflower was under strong selection. Further improvement of oil properties achieved by modulating its fatty acid composition is one of the main directions in modern oilseed crop breeding.

Results: We searched for the genetic basis of fatty acid content variation by genotyping 601 inbred sunflower lines and assessing their lipid and fatty acid composition. Our genome-wide association analysis based on the genotypes for 15,483 SNPs and the concentrations of 23 fatty acids, including minor fatty acids, revealed significant genetic associations for eleven of them. Identified genomic regions included the loci involved in rare fatty acids variation on chromosomes 3 and 14, explaining up to 34.5% of the total variation of docosanoic acid (22:0) in sunflower oil.

Conclusions: This is the first large scale implementation of high-throughput lipidomic profiling to sunflower germplasm characterization. This study contributes to the genetic characterization of Russian sunflower collections, which made a substantial contribution to the development of sunflower as the oilseed crop worldwide, and provides new insights into the genetic control of oil composition that can be implemented in future studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-021-07768-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8256595PMC
July 2021

Genomic diversity and genome-wide association analysis related to yield and fatty acid composition of wild American oil palm.

Plant Sci 2021 Mar 24;304:110731. Epub 2020 Oct 24.

Malaysian Palm Oil Board, 6, Persiaran Institusi, Bandar Baru Bangi, Kajang, Selangor, 43000, Malaysia. Electronic address:

Existing Elaeis guineensis cultivars lack sufficient genetic diversity due to extensive breeding. Harnessing variation in wild crop relatives is necessary to expand the breadth of agronomically valuable traits. Using RAD sequencing, we examine the natural diversity of wild American oil palm populations (Elaeis oleifera), a sister species of the cultivated Elaeis guineensis oil palm. We genotyped 192 wild E. oleifera palms collected from seven Latin American countries along with four cultivated E. guineensis palms. Honduras, Costa Rica, Panama and Colombia palms are panmictic and genetically similar. Genomic patterns of diversity suggest that these populations likely originated from the Amazon Basin. Despite evidence of a genetic bottleneck and high inbreeding observed in these populations, there is considerable genetic and phenotypic variation for agronomically valuable traits. Genome-wide association revealed several candidate genes associated with fatty acid composition along with vegetative and yield-related traits. These observations provide valuable insight into the geographic distribution of diversity, phenotypic variation and its genetic architecture that will guide choices of wild genotypes for crop improvement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.plantsci.2020.110731DOI Listing
March 2021

Key Considerations for the Use of Seaweed to Reduce Enteric Methane Emissions From Cattle.

Front Vet Sci 2020 23;7:597430. Epub 2020 Dec 23.

Foundation for Food and Agriculture Research, Washington, DC, United States.

Enteric methane emissions are the single largest source of direct greenhouse gas emissions (GHG) in beef and dairy value chains and a substantial contributor to anthropogenic methane emissions globally. In late 2019, the World Wildlife Fund (WWF), the Advanced Research Projects Agency-Energy (ARPA-E) and the Foundation for Food and Agriculture Research (FFAR) convened approximately 50 stakeholders representing research and production of seaweeds, animal feeds, dairy cattle, and beef and dairy foods to discuss challenges and opportunities associated with the use of seaweed-based ingredients to reduce enteric methane emissions. This article describes the considerations identified by the workshop participants and suggests next steps for the further development and evaluation of seaweed-based feed ingredients as enteric methane mitigants. Although numerous compounds derived from sources other than seaweed have been identified as having enteric methane mitigation potential, these mitigants are outside the scope of this article.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fvets.2020.597430DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7785520PMC
December 2020

Expression of fatty acid and triacylglycerol synthesis genes in interspecific hybrids of oil palm.

Sci Rep 2020 10 1;10(1):16296. Epub 2020 Oct 1.

Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board (MPOB), P.O. Box 10620, 50720, Kuala Lumpur, Malaysia.

Evaluation of transcriptome data in combination with QTL information has been applied in many crops to study the expression of genes responsible for specific phenotypes. In oil palm, the mesocarp oil extracted from E. oleifera × E. guineensis interspecific hybrids is known to have lower palmitic acid (C16:0) content compared to pure African palms. The present study demonstrates the effectiveness of transcriptome data in revealing the expression profiles of genes in the fatty acid (FA) and triacylglycerol (TAG) biosynthesis processes in interspecific hybrids. The transcriptome assembly yielded 43,920 putative genes of which a large proportion were homologous to known genes in the public databases. Most of the genes encoding key enzymes involved in the FA and TAG synthesis pathways were identified. Of these, 27, including two candidate genes located within the QTL associated with C16:0 content, showed differential expression between developmental stages, populations and/or palms with contrasting C16:0 content. Further evaluation using quantitative real-time PCR revealed that differentially expressed patterns are generally consistent with those observed in the transcriptome data. Our results also suggest that different isoforms are likely to be responsible for some of the variation observed in FA composition of interspecific hybrids.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-73170-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529811PMC
October 2020

Multi-trait multi-locus SEM model discriminates SNPs of different effects.

BMC Genomics 2020 Jul 28;21(Suppl 8):490. Epub 2020 Jul 28.

Peter the Great Saint-Petersburg Polytechnic University, Russian Federation, Polytechnicheskaya, 29, St. Petersburg, 195251, Russia.

Background: There is a plethora of methods for genome-wide association studies. However, only a few of them may be classified as multi-trait and multi-locus, i.e. consider the influence of multiple genetic variants to several correlated phenotypes.

Results: We propose a multi-trait multi-locus model which employs structural equation modeling (SEM) to describe complex associations between SNPs and traits - multi-trait multi-locus SEM (mtmlSEM). The structure of our model makes it possible to discriminate pleiotropic and single-trait SNPs of direct and indirect effect. We also propose an automatic procedure to construct the model using factor analysis and the maximum likelihood method. For estimating a large number of parameters in the model, we performed Bayesian inference and implemented Gibbs sampling. An important feature of the model is that it correctly copes with non-normally distributed variables, such as some traits and variants.

Conclusions: We applied the model to Vavilov's collection of 404 chickpea (Cicer arietinum L.) accessions with 20-fold cross-validation. We analyzed 16 phenotypic traits which we organized into five groups and found around 230 SNPs associated with traits, 60 of which were of pleiotropic effect. The model demonstrated high accuracy in predicting trait values.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-020-06833-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7385891PMC
July 2020

Genomic Analysis of Vavilov's Historic Chickpea Landraces Reveals Footprints of Environmental and Human Selection.

Int J Mol Sci 2020 May 31;21(11). Epub 2020 May 31.

Department of Applied Mathematics, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia.

A defining challenge of the 21st century is meeting the nutritional demands of the growing human population, under a scenario of limited land and water resources and under the specter of climate change. The Vavilov seed bank contains numerous landraces collected nearly a hundred years ago, and thus may contain 'genetic gems' with the potential to enhance modern breeding efforts. Here, we analyze 407 landraces, sampled from major historic centers of chickpea cultivation and secondary diversification. Genome-Wide Association Studies (GWAS) conducted on both phenotypic traits and bioclimatic variables at landraces sampling sites as extended phenotypes resulted in 84 GWAS hits associated to various regions. The novel haploblock-based test identified haploblocks enriched for single nucleotide polymorphisms (SNPs) associated with phenotypes and bioclimatic variables. Subsequent bi-clustering of traits sharing enriched haploblocks underscored both non-random distribution of SNPs among several haploblocks and their association with multiple traits. We hypothesize that these clusters of pleiotropic SNPs represent co-adapted genetic complexes to a range of environmental conditions that chickpea experienced during domestication and subsequent geographic radiation. Linking genetic variation to phenotypic data and a wealth of historic information preserved in historic seed banks are the keys for genome-based and environment-informed breeding intensification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms21113952DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7313079PMC
May 2020

The Post-mating Response: Gene Expression and Behavioral Changes Reveal Perdurance and Variation in Cross-Tissue Interactions.

G3 (Bethesda) 2020 03 5;10(3):967-983. Epub 2020 Mar 5.

Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, FL and

Examining cross-tissue interactions is important for understanding physiology and homeostasis. In animals, the female gonad produces signaling molecules that act distally. We examine gene expression in female head tissues in 1) virgins without a germline compared to virgins with a germline, 2) post-mated females with and without a germline compared to virgins, and 3) post-mated females mated to males with and without a germline compared to virgins. In virgins, the absence of a female germline results in expression changes in genes with known roles in nutrient homeostasis. At one- and three-day(s) post-mating, genes that change expression are enriched with those that function in metabolic pathways, in all conditions. We systematically examine female post-mating impacts on sleep, food preference and re-mating, in the strains and time points used for gene expression analyses and compare to published studies. We show that post-mating, gene expression changes vary by strain, prompting us to examine variation in female re-mating. We perform a genome-wide association study that identifies several DNA polymorphisms, including four in/near Wnt signaling pathway genes. Together, these data reveal how gene expression and behavior in females are influenced by cross-tissue interactions, by examining the impact of mating, fertility, and genotype.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.119.400963DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7056969PMC
March 2020

Compensatory Evolution of Gene Expression.

Trends Genet 2019 12 20;35(12):890-891. Epub 2019 Oct 20.

Molecular and Computational Biology, Department of Biology, University of Southern California, Los Angeles, CA 90089, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tig.2019.09.008DOI Listing
December 2019

Multi-trait analysis of domestication genes in Cicer arietinum - Cicer reticulatum hybrids with a multidimensional approach: Modeling wide crosses for crop improvement.

Plant Sci 2019 Aug 25;285:122-131. Epub 2019 Apr 25.

University of Southern California, Program Molecular & Computational Biology, Dornsife College of Letters Arts & Science, Los Angeles, CA 90089, USA; Peter the Great St Petersburg Polytechnich University, Department of Applied Mathematics, St Petersburg, Russia. Electronic address:

Domestication and subsequent breeding have eroded genetic diversity in the modern chickpea crop by ˜100-fold. Corresponding reductions to trait variation create the need, and an opportunity, to identify and harness the genetic capacity of wild species for crop improvement. Here we analyze trait segregation in a series of wild x cultivated hybrid populations to delineate the genetic underpinnings of domestication traits. Two species of wild chickpea, C. reticulatum and C. echinospermum, were crossed with the elite, early flowering C. arietinum cultivar ICCV96029. KASP genotyping of F2 parents with an FT-linked molecular marker enabled selection of 284 F3 families with reduced phenological variation: 255 F3 families of C. arietinum x reticulatum (AR) derived from 17 diverse wild parents and 29 F3 families of C. arietinum x echinospermum (AE) from 3 wild parents. The combined 284 lineages were genotyped using a genotyping-by-sequencing strategy and phenotyped for agronomic traits. 50 QTLs in 11 traits were detected from AR and 35 QTLs in 10 traits from the combined data. Using hierarchical clustering to assign traits to six correlated groups and mixed model based multi-trait mapping, four pleiotropic loci were identified. Bayesian analysis further identified four inter-trait relationships controlling the duration of vegetative growth and seed maturation, for which the underlying pleiotropic genes were mapped. A random forest approach was used to explore the most extreme trait differences between AR and AE progenies, identifying traits most characteristic of wild species origin. Knowledge of the genomic basis of traits that segregate in wild-cultivated hybrid populations will facilitate chickpea improvement by linking genetic and phenotypic variation in a quantitative genetic framework.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.plantsci.2019.04.018DOI Listing
August 2019

Bayesian model selection for the Drosophila gap gene network.

BMC Bioinformatics 2019 Jun 13;20(1):327. Epub 2019 Jun 13.

Molecular and Computational Biology, USC, 1050 Childs Way, Los Angeles, CA 90089-2532, US.

Background: The gap gene system controls the early cascade of the segmentation pathway in Drosophila melanogaster as well as other insects. Owing to its tractability and key role in embryo patterning, this system has been the focus for both computational modelers and experimentalists. The gap gene expression dynamics can be considered strictly as a one-dimensional process and modeled as a system of reaction-diffusion equations. While substantial progress has been made in modeling this phenomenon, there still remains a deficit of approaches to evaluate competing hypotheses. Most of the model development has happened in isolation and there has been little attempt to compare candidate models.

Results: The Bayesian framework offers a means of doing formal model evaluation. Here, we demonstrate how this framework can be used to compare different models of gene expression. We focus on the Papatsenko-Levine formalism, which exploits a fractional occupancy based approach to incorporate activation of the gap genes by the maternal genes and cross-regulation by the gap genes themselves. The Bayesian approach provides insight about relationship between system parameters. In the regulatory pathway of segmentation, the parameters for number of binding sites and binding affinity have a negative correlation. The model selection analysis supports a stronger binding affinity for Bicoid compared to other regulatory edges, as shown by a larger posterior mean. The procedure doesn't show support for activation of Kruppel by Bicoid.

Conclusions: We provide an efficient solver for the general representation of the Papatsenko-Levine model. We also demonstrate the utility of Bayes factor for evaluating candidate models for spatial pattering models. In addition, by using the parallel tempering sampler, the convergence of Markov chains can be remarkably improved and robust estimates of Bayes factors obtained.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-2888-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6567646PMC
June 2019

WhoGEM: an admixture-based prediction machine accurately predicts quantitative functional traits in plants.

Genome Biol 2019 05 28;20(1):106. Epub 2019 May 28.

University of La Verne, 1950 3rd Street, La Verne, CA, 91750, USA.

The explosive growth of genomic data provides an opportunity to make increased use of sequence variations for phenotype prediction. We have developed a prediction machine for quantitative phenotypes (WhoGEM) that overcomes some of the bottlenecks limiting the current methods. We demonstrated its performance by predicting quantitative disease resistance and quantitative functional traits in the wild model plant species, Medicago truncatula, using geographical locations as covariates for admixture analysis. The method's prediction reliability equals or outperforms all existing algorithms for quantitative phenotype prediction. WhoGEM analysis produces evidence that variation in genome admixture proportions explains most of the phenotypic variation for quantitative phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1697-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6537182PMC
May 2019

Novel approach to quantitative spatial gene expression uncovers genetic stochasticity in the developing Drosophila eye.

Evol Dev 2019 05 12;21(3):157-171. Epub 2019 Feb 12.

Molecular and Computational Biology, University of Southern California, Los Angeles, California.

Robustness in development allows for the accumulation of genetically based variation in expression. However, this variation is usually examined in response to large perturbations, and examination of this variation has been limited to being spatial, or quantitative, but because of technical restrictions not both. Here we bridge these gaps by investigating replicated quantitative spatial gene expression using rigorous statistical models, in different genotypes, sexes, and species (Drosophila melanogaster and D. simulans). Using this type of quantitative approach with molecular developmental data allows for comparison among conditions, such as different genetic backgrounds. We apply this approach to the morphogenetic furrow, a wave of differentiation that patterns the developing eye disc. Within the morphogenetic furrow, we focus on four genes, hairy, atonal, hedgehog, and Delta. Hybridization chain reaction quantitatively measures spatial gene expression, co-staining for all four genes simultaneously. We find considerable variation in the spatial expression pattern of these genes in the eye between species, genotypes, and sexes. We also find that there has been evolution of the regulatory relationship between these genes, and that their spatial interrelationships have evolved between species. This variation has no phenotypic effect, and could be buffered by network thresholds or compensation from other genes. Both of these mechanisms could potentially be contributing to long term developmental systems drift.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/ede.12283DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7461728PMC
May 2019

Inference of Transcription Factor Regulation Patterns Using Gene Expression Covariation in Natural Populations of .

Biophysics (Oxf) 2018 Jan 23;63(1):43-51. Epub 2018 Apr 23.

University of Southern California, Los Angeles, CA.

Gene regulatory networks control the complex programs that drive development. Deciphering the connections between transcription factors (TFs) and target genes is challenging, in part because TFs bind to thousands of places in the genome but control expression through a subset of these binding events. We hypothesize that we can combine natural variation of expression levels and predictions of TF binding sites to identify TF targets. We gather RNA-seq data from 71 genetically distinct F1 embryos and calculate the correlations between TF and potential target genes' expression levels, which we call "regulatory strength." To separate direct and indirect TF targets, we hypothesize that direct TF targets will have a preponderance of binding sites in their upstream regions. Using 14 TFs active during embryogenesis, we find that 12 TFs showed a significant correlation between their binding strength and regulatory strength on downstream targets, and 10 TFs showed a significant correlation between the number of binding sites and the regulatory effect on target genes. The general roles, e.g. 's role as an activator, and the particular interactions we observed between our TFs, e.g. role as a repressor of and , generally coincide with the literature.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1134/S0006350918010128DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368187PMC
January 2018

Threshold response to stochasticity in morphogenesis.

PLoS One 2019 30;14(1):e0210088. Epub 2019 Jan 30.

Department of Physics and Astronomy, University of Southern California, Los Angeles, California, United States of America.

During development of biological organisms, multiple complex structures are formed. In many instances, these structures need to exhibit a high degree of order to be functional, although many of their constituents are intrinsically stochastic. Hence, it has been suggested that biological robustness ultimately must rely on complex gene regulatory networks and clean-up mechanisms. Here we explore developmental processes that have evolved inherent robustness against stochasticity. In the context of the Drosophila eye disc, multiple optical units, ommatidia, develop into crystal-like patterns. During the larva-to-pupa stage of metamorphosis, the centers of the ommatidia are specified initially through the diffusion of morphogens, followed by the specification of R8 cells. Establishing the R8 cell is crucial in setting up the geometric, and functional, relationships of cells within an ommatidium and among neighboring ommatidia. Here we study an PDE mathematical model of these spatio-temporal processes in the presence of parametric stochasticity, defining and applying measures that quantify order within the resulting spatial patterns. We observe a universal sigmoidal response to increasing transcriptional noise. Ordered patterns persist up to a threshold noise level in the model parameters. In accordance with prior qualitative observations, as the noise is further increased past a threshold point of no return, these ordered patterns rapidly become disordered. Such robustness in development allows for the accumulation of genetic variation without any observable changes in phenotype. We argue that the observed sigmoidal dependence introduces robustness allowing for sizable amounts of genetic variation and transcriptional noise to be tolerated in natural populations without resulting in phenotype variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0210088PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6353092PMC
September 2019

Quantitative analysis reveals genotype- and domain- specific differences between mRNA and protein expression of segmentation genes in Drosophila.

Dev Biol 2019 04 7;448(1):48-58. Epub 2019 Jan 7.

Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, St. Petersburg 195251, Russia. Electronic address:

In many biological systems gene expression at mRNA and protein levels is not identical. Rigorous comparison of such differences on a spatio-temporal scale is still not feasible by high-throughput transcriptomic and proteomic analyses of early embryo development. Here, we characterize differences between mRNA and protein expression of Drosophila segmentation genes at the level of individual gene expression domains. We obtained quantitative imaging data on expression of gap genes gt and hb and pair-rule gene eve for Drosophila wild type embryos, Kr null mutants and Kr+/Kr- heterozygotes. To compare mRNA and protein expression we use several criteria including difference in amplitude and positions of expression domains, pattern shape and positional variability. For a number of gene expression domains we show examples where protein expression does not repeat mRNA expression even after a temporal delay. We calculated time delays between eve pattern formation at the level of mRNA and protein for wild type embryos, Kr mutants and Kr+/Kr- heterozygotes. We detect that in wild type embryos, the amplitudes of eve stripes 3 and 7 do not differ significantly at the level of mRNA, however, stripe 3 is higher than stripe 7 at the protein level. We further show that hb mRNA and protein expression in both anterior and posterior domains significantly differs at specific time points. The formation of hb PS4 stripe at the mRNA level proceeds five times faster than at the level of protein. With regard to spatial expression, we show that the offset between posterior gt mRNA and protein domains is much larger in Kr mutants than in wild type embryos and heterozygotes. Finally, we analyze differences in positional variability of eve stripe 7 expression in Kr mutants and Kr+/Kr- heterozygotes at the level of mRNA and protein. These results enable further perspectives to uncover mechanisms underlying discrepancies between mRNA and protein expression in early embryo.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ydbio.2019.01.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477536PMC
April 2019

Prediction of deleterious mutations in coding regions of mammals with transfer learning.

Evol Appl 2019 Jan 9;12(1):18-28. Epub 2018 May 9.

Peter the Great St. Petersburg Polytechnic University St. Petersburg Russia.

The genomes of mammals contain thousands of deleterious mutations. It is important to be able to recognize them with high precision. In conservation biology, the small size of fragmented populations results in accumulation of damaging variants. Preserving animals with less damaged genomes could optimize conservation efforts. In breeding of farm animals, trade-offs between farm performance versus general fitness might be better avoided if deleterious mutations are well classified. In humans, the problem of such a precise classification has been successfully solved, in large part due to large databases of disease-causing mutations. However, this kind of information is very limited for other mammals. Here, we propose to better use information available on human mutations to enable classification of damaging mutations in other mammalian species. Specifically, we apply transfer learning-machine learning methods-improving small dataset for solving a focal problem (recognizing damaging mutations in our companion and farm animals) due to the use of much large datasets available for solving a related problem (recognizing damaging mutations in humans). We validate our tools using mouse and dog annotated datasets and obtain significantly better results in companion to the SIFT classifier. Then, we apply them to predict deleterious mutations in cattle genomewide dataset.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/eva.12607DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6304693PMC
January 2019

A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants.

Front Plant Sci 2018 28;9:1734. Epub 2018 Nov 28.

Department of Applied Mathematics, Peter the Great St.Petersburg Polytechnic University, St. Petersburg, Russia.

The impact of deleterious variation on both plant fitness and crop productivity is not completely understood and is a hot topic of debates. The deleterious mutations in plants have been solely predicted using sequence conservation methods rather than function-based classifiers due to lack of well-annotated mutational datasets in these organisms. Here, we developed a machine learning classifier based on a dataset of deleterious and neutral mutations in by extracting 18 informative features that discriminate deleterious mutations from neutral, including 9 novel features not used in previous studies. We examined linear SVM, Gaussian SVM, and Random Forest classifiers, with the latter performing best. Random Forest classifiers exhibited a markedly higher accuracy than the popular PolyPhen-2 tool in the dataset. Additionally, we tested whether the Random Forest, trained on the dataset, accurately predicts deleterious mutations in and and observed satisfactory levels of performance accuracy (87% and 93%, respectively) higher than obtained by the PolyPhen-2. Application of Transfer learning in classifiers did not improve their performance. To additionally test the performance of the Random Forest classifier across different angiosperm species, we applied it to annotate deleterious mutations in and validated them using population frequency data. Overall, we devised a classifier with the potential to improve the annotation of putative functional mutations in QTL and GWAS hit regions, as well as for the evolutionary analysis of proliferation of deleterious mutations during plant domestication; thus optimizing breeding improvement and development of new cultivars.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpls.2018.01734DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6279870PMC
November 2018

Dynamical Modeling of the Core Gene Network Controlling Flowering Suggests Cumulative Activation From the Gene Homologs in Chickpea.

Front Genet 2018 20;9:547. Epub 2018 Nov 20.

Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia.

Initiation of flowering moves plants from vegetative to reproductive development. The time when this transition happens (flowering time), an important indicator of productivity, depends on both endogenous and environmental factors. The core genetic regulatory network canalizing the flowering signals to the decision to flower has been studied extensively in the model plant and has been shown to preserve its main regulatory blocks in other species. It integrates activation from the () gene or its homologs to the flowering decision expressed as high expression of the meristem identity genes, including . We elaborated a dynamical model of this flowering gene regulatory network and applied it to the previously published expression data from two cultivars of domesticated chickpea (), obtained for two photoperiod durations. Due to a large number of free parameters in the model, we used an ensemble approach analyzing the model solutions at many parameter sets that provide equally good fit to data. Testing several alternative hypotheses about regulatory roles of the five homologs present in chickpea revealed no preference in segregating individual copies as singled-out activators with their own regulatory parameters, thus favoring the hypothesis that the five genes possess similar regulatory properties and provide cumulative activation in the network. The analysis reveals that different levels of activation from can explain a small difference observed in the expression of the two homologs of the repressor gene . Finally, the model predicts highly reduced activation between and , thus suggesting that this regulatory block is not conserved in chickpea and needs other mechanisms. Overall, this study provides the first attempt to quantitatively test the flowering time gene network in chickpea based on data-driven modeling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2018.00547DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6262361PMC
November 2018

Analysis of Gene Expression Variance in Schizophrenia Using Structural Equation Modeling.

Front Mol Neurosci 2018 11;11:192. Epub 2018 Jun 11.

Institute of Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia.

Schizophrenia (SCZ) is a psychiatric disorder of unknown etiology. There is evidence suggesting that aberrations in neurodevelopment are a significant attribute of schizophrenia pathogenesis and progression. To identify biologically relevant molecular abnormalities affecting neurodevelopment in SCZ we used cultured neural progenitor cells derived from olfactory neuroepithelium (CNON cells). Here, we tested the hypothesis that variance in gene expression differs between individuals from SCZ and control groups. In CNON cells, variance in gene expression was significantly higher in SCZ samples in comparison with control samples. Variance in gene expression was enriched in five molecular pathways: serine biosynthesis, PI3K-Akt, MAPK, neurotrophin and focal adhesion. More than 14% of variance in disease status was explained within the logistic regression model (C-value = 0.70) by predictors accounting for gene expression in 69 genes from these five pathways. Structural equation modeling (SEM) was applied to explore how the structure of these five pathways was altered between SCZ patients and controls. Four out of five pathways showed differences in the estimated relationships among genes: between KRAS and NF1, and KRAS and SOS1 in the MAPK pathway; between PSPH and SHMT2 in serine biosynthesis; between AKT3 and TSC2 in the PI3K-Akt signaling pathway; and between CRK and RAPGEF1 in the focal adhesion pathway. Our analysis provides evidence that variance in gene expression is an important characteristic of SCZ, and SEM is a promising method for uncovering altered relationships between specific genes thus suggesting affected gene regulation associated with the disease. We identified altered gene-gene interactions in pathways enriched for genes with increased variance in expression in SCZ. These pathways and loci were previously implicated in SCZ, providing further support for the hypothesis that gene expression variance plays important role in the etiology of SCZ.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fnmol.2018.00192DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6004421PMC
June 2018

Comparative Transcriptomics in Two Bivalve Species Offers Different Perspectives on the Evolution of Sex-Biased Genes.

Genome Biol Evol 2018 06;10(6):1389-1402

Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy.

Comparative genomics has become a central tool for evolutionary biology, and a better knowledge of understudied taxa represents the foundation for future work. In this study, we characterized the transcriptome of male and female mature gonads in the European clam Ruditapes decussatus, compared with that in the Manila clam Ruditapes philippinarum providing, for the first time in bivalves, information about transcription dynamics and sequence evolution of sex-biased genes. In both the species, we found a relatively low number of sex-biased genes (1,284, corresponding to 41.3% of the orthologous genes between the two species), probably due to the absence of sexual dimorphism, and the transcriptional bias is maintained in only 33% of the orthologs. The dN/dS is generally low, indicating purifying selection, with genes where the female-biased transcription is maintained between the two species showing a significantly higher dN/dS. Genes involved in embryo development, cell proliferation, and maintenance of genome stability show a faster sequence evolution. Finally, we report a lack of clear correlation between transcription level and evolutionary rate in these species, in contrast with studies that reported a negative correlation. We discuss such discrepancy and call into question some methodological approaches and rationales generally used in this type of comparative studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evy082DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007409PMC
June 2018

Analysis of Genetic Variation Indicates DNA Shape Involvement in Purifying Selection.

Mol Biol Evol 2018 08;35(8):1958-1967

Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA.

Noncoding DNA sequences, which play various roles in gene expression and regulation, are under evolutionary pressure. Gene regulation requires specific protein-DNA binding events, and our previous studies showed that both DNA sequence and shape readout are employed by transcription factors (TFs) to achieve DNA binding specificity. By investigating the shape-disrupting properties of single nucleotide polymorphisms (SNPs) in human regulatory regions, we established a link between disruptive local DNA shape changes and loss of specific TF binding. Furthermore, we described cases where disease-associated SNPs may alter TF binding through DNA shape changes. This link led us to hypothesize that local DNA shape within and around TF binding sites is under selection pressure. To verify this hypothesis, we analyzed SNP data derived from 216 natural strains of Drosophila melanogaster. Comparing SNPs located in functional and nonfunctional regions within experimentally validated cis-regulatory modules (CRMs) from D. melanogaster that are active in the blastoderm stage of development, we found that SNPs within functional regions tended to cause smaller DNA shape variations. Furthermore, SNPs with higher minor allele frequency were more likely to result in smaller DNA shape variations. The same analysis based on a large number of SNPs in putative CRMs of the D. melanogaster genome derived from DNase I accessibility data confirmed these observations. Taken together, our results indicate that common SNPs in functional regions tend to maintain DNA shape, whereas shape-disrupting SNPs are more likely to be eliminated through purifying selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msy099DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063282PMC
August 2018

The Evolution of Gene Expression in cis and trans.

Trends Genet 2018 07 18;34(7):532-544. Epub 2018 Apr 18.

Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.

There is abundant variation in gene expression between individuals, populations, and species. The evolution of gene regulation and expression within and between species is thought to frequently contribute to adaptation. Yet considerable evidence suggests that the primary evolutionary force acting on variation in gene expression is stabilizing selection. We review here the results of recent studies characterizing the evolution of gene expression occurring in cis (via linked polymorphisms) or in trans (through diffusible products of other genes) and their contribution to adaptation and response to the environment. We review the evidence for buffering of variation in gene expression at the level of both transcription and translation, and the possible mechanisms for this buffering. Lastly, we summarize unresolved questions about the evolution of gene regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tig.2018.03.007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6094946PMC
July 2018

Genetic Diversity, Population Structure, and Genetic Correlation with Climatic Variation in Chickpea () Landraces from Pakistan.

Plant Genome 2018 03;11(1)

Chickpea ( L.) production in arid regions, such as those predominant in Pakistan, faces immense challenges of drought and heat stress. Addressing these challenges is made more difficult by the lack of genetic and phenotypic characterization of available cultivated varieties and breeding materials. Genotyping-by-sequencing offers a rapid and cost-effective means to identify genome-wide nucleotide variation in crop germplasm. When combined with extended crop phenotypes deduced from climatic variation at sites of collection, the data can predict which portions of genetic variation might have roles in climate resilience. Here we use 8113 single nucleotide polymorphism markers to determine genetic variation and compare population structure within a previously uncharacterized collection of 77 landraces and 5 elite cultivars, currently grown in situ on farms throughout the chickpea growing regions of Pakistan. The compiled landraces span a striking aridity gradient into the Thal Desert of the Punjab. Despite low levels of variation across the collection and limited genetic structure, we found some differentiation between accessions from arid, semiarid, irrigated, and coastal areas. In a subset of 232 markers, we found evidence of differentiation along gradients of elevation and isothermality. Our results highlight the utility of exploring large germplasm collections for nucleotide variation associated with environmental extremes, and the use of such data to nominate germplasm accessions with the potential to improve crop drought tolerance and other environmental traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3835/plantgenome2017.08.0067DOI Listing
March 2018

Ecology and genomics of an important crop wild relative as a prelude to agricultural innovation.

Nat Commun 2018 02 13;9(1):649. Epub 2018 Feb 13.

Department of Plant Biology, Michigan State University, East Lansing, MI, 48823, USA.

Domesticated species are impacted in unintended ways during domestication and breeding. Changes in the nature and intensity of selection impart genetic drift, reduce diversity, and increase the frequency of deleterious alleles. Such outcomes constrain our ability to expand the cultivation of crops into environments that differ from those under which domestication occurred. We address this need in chickpea, an important pulse legume, by harnessing the diversity of wild crop relatives. We document an extreme domestication-related genetic bottleneck and decipher the genetic history of wild populations. We provide evidence of ancestral adaptations for seed coat color crypsis, estimate the impact of environment on genetic structure and trait values, and demonstrate variation between wild and cultivated accessions for agronomic properties. A resource of genotyped, association mapping progeny functionally links the wild and cultivated gene pools and is an essential resource chickpea for improvement, while our methods inform collection of other wild crop progenitor species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-02867-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5811434PMC
February 2018

Conservation of social effects (Ψ between two species of despite reversal of sexual dimorphism.

Ecol Evol 2017 12 22;7(23):10031-10041. Epub 2017 Oct 22.

Program in Molecular and Computational Biology Dornsife College of Letters, Arts and Sciences University of Southern California Los Angeles CA USA.

Indirect genetic effects (IGEs) describe the effect of the genes of social partners on the phenotype of a focal individual. Here, we measure indirect genetic effects using the "coefficient of interaction" (Ψ) to test whether Ψ evolved between and . We compare Ψ for locomotion between ethanol and nonethanol environments in both species, but only utilizes ethanol ecologically. We find that while sexual dimorphism for locomotion has been reversed in , there has been no evolution of social effects between these two species. What did evolve was the interaction between genotype-specific Ψ and the environment, as  varies unpredictably between environments and  does not. In this system, this suggests evolutionary lability of sexual dimorphism but a conservation of social effects, which brings forth interesting questions about the role of the social environment in sexual selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/ece3.3523DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5723616PMC
December 2017

Development of F1 hybrid population and the high-density linkage map for European aspen (Populus tremula L.) using RADseq technology.

BMC Plant Biol 2017 Nov 14;17(Suppl 1):180. Epub 2017 Nov 14.

Saint Petersburg State Forest Technical University, Institutskiy per, 5, 194021, St. Petersburg, Russia.

Background: Restriction-site associated DNA sequencing (RADseq) technology was recently employed to identify a large number of single nucleotide polymorphisms (SNP) for linkage mapping of a North American and Eastern Asian Populus species. However, there is also the need for high-density genetic linkage maps for the European aspen (P. tremula) as a tool for further mapping of quantitative trait loci (QTLs) and marker-assisted selection of the Populus species native to Europe.

Results: We established a hybrid F1 population from the cross of two aspen parental genotypes diverged in their phenological and morphological traits. We performed RADseq of 122 F1 progenies and two parents yielding 15,732 high-quality SNPs that were successfully identified using the reference genome of P. trichocarpa. 2055 SNPs were employed for the construction of maternal and paternal linkage maps. The maternal linkage map was assembled with 1000 SNPs, containing 19 linkage groups and spanning 3054.9 cM of the genome, with an average distance of 3.05 cM between adjacent markers. The paternal map consisted of 1055 SNPs and the same number of linkage groups with a total length of 3090.56 cM and average interval distance of 2.93 cM. The linkage maps were employed for QTL mapping of one-year-old seedlings height variation. The most significant QTL (LOD = 5.73) was localized to LG5 (96.94 cM) of the male linkage map, explaining 18% of the phenotypic variation.

Conclusions: The set of 15,732 SNPs polymorphic in aspen and high-density genetic linkage maps constructed for the P. tremula intra-specific cross will provide a valuable source for QTL mapping and identification of candidate genes facilitating marker-assisted selection in European aspen.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12870-017-1127-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5688504PMC
November 2017

Mapping Quantitative Trait Loci Underlying Circadian Light Sensitivity in Drosophila.

J Biol Rhythms 2017 Oct 8;32(5):394-405. Epub 2017 Oct 8.

Department of Genetics, University of Leicester, Leicester, UK.

Despite the significant advance in our understanding of the molecular basis of light entrainment of the circadian clock in Drosophila, the underlying genetic architecture is still largely unknown. The aim of this study was to identify loci associated with variation in circadian photosensitivity, which are important for the evolution of this trait. We have used complementary approaches that combined quantitative trait loci (QTL) mapping, complementation testing, and transcriptome profiling to dissect this variation. We identified a major QTL on chromosome 2, which was subsequently fine mapped using deficiency complementation mapping into 2 smaller regions spanning 139 genes, some of which are known to be involved in functions that have been previously implicated in light entrainment. Two genes implicated with the clock and located within that interval, timeless and cycle, failed to complement the QTL, indicating that alleles of these genes contribute to the variation in light response. Specifically, we find that the timeless s/ ls polymorphism that has been previously shown to constitute a latitudinal cline in Europe is also segregating in our recombinant inbred lines and is contributing to the phenotypic variation in light sensitivity. We also profiled gene expression in 2 recombinant inbred strains that differ significantly in their photosensitivity and identified a total of 368 transcripts that showed differential expression (false discovery rate < 0.1). Of 131 transcripts that showed a significant recombinant inbred line by treatment interaction (i.e., putative expression QTL), 4 are located within QTL2.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/0748730417731863DOI Listing
October 2017

Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network.

PLoS One 2017 12;12(9):e0184657. Epub 2017 Sep 12.

Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia.

Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0184657PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5595321PMC
October 2017
-->