Publications by authors named "Aalt D J van Dijk"

68 Publications

Chasing breeding footprints through structural variations in Cucumis melo and wild relatives.

G3 (Bethesda) 2021 Jan;11(1)

Department of Bioscience, Wageningen Plant Research, Wageningen University & Research, 6708 PB, Wageningen, the Netherlands.

Cucumis melo (melon or muskmelon) is an important crop in the family of the Cucurbitaceae. Melon is cross pollinated and domesticated at several locations throughout the breeding history, resulting in highly diverse genetic structure in the germplasm. Yet, the relations among the groups and cultivars are still incomplete. We shed light on the melonbreeding history, analyzing structural variations ranging from 50 bp up to 100 kb, identified from whole genome sequences of 100 selected melon accessions and wild relatives. Phylogenetic trees based on SV types completely resolve cultivars and wild accessions into two monophyletic groups and clustering of cultivars largely correlates with their geographic origin. Taking into account morphology, we found six mis-categorized cultivars. Unique inversions are more often shared between cultivars, carrying advantageous genes and do not directly originate from wild species. Approximately 60% of the inversion breaks carry a long poly A/T motif, and following observations in other plant species, suggest that inversions in melon likely resulted from meiotic recombination events. We show that resistance genes in the linkage V region are expanded in the cultivar genomes compared to wild relatives. Furthermore, particular agronomic traits such as fruit ripening, fragrance, and stress response are specifically selected for in the melon subspecies. These results represent distinctive footprints of selective breeding that shaped today's melon. The sequences and genomic relations between land races, wild relatives, and cultivars will serve the community to identify genetic diversity, optimize experimental designs, and enhance crop development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/g3journal/jkaa038DOI Listing
January 2021

Prior Biological Knowledge Improves Genomic Prediction of Growth-Related Traits in .

Front Genet 2020 20;11:609117. Epub 2021 Jan 20.

Bioinformatics Group, Wageningen University, Wageningen, Netherlands.

Prediction of growth-related complex traits is highly important for crop breeding. Photosynthesis efficiency and biomass are direct indicators of overall plant performance and therefore even minor improvements in these traits can result in significant breeding gains. Crop breeding for complex traits has been revolutionized by technological developments in genomics and phenomics. Capitalizing on the growing availability of genomics data, genome-wide marker-based prediction models allow for efficient selection of the best parents for the next generation without the need for phenotypic information. Until now such models mostly predict the phenotype directly from the genotype and fail to make use of relevant biological knowledge. It is an open question to what extent the use of such biological knowledge is beneficial for improving genomic prediction accuracy and reliability. In this study, we explored the use of publicly available biological information for genomic prediction of photosynthetic light use efficiency (Φ ) and projected leaf area (PLA) in . To explore the use of various types of knowledge, we mapped genomic polymorphisms to Gene Ontology (GO) terms and transcriptomics-based gene clusters, and applied these in a Genomic Feature Best Linear Unbiased Predictor (GFBLUP) model, which is an extension to the traditional Genomic BLUP (GBLUP) benchmark. Our results suggest that incorporation of prior biological knowledge can improve genomic prediction accuracy for both Φ and PLA. The improvement achieved depends on the trait, type of knowledge and trait heritability. Moreover, transcriptomics offers complementary evidence to the Gene Ontology for improvement when used to define functional groups of genes. In conclusion, prior knowledge about trait-specific groups of genes can be directly translated into improved genomic prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2020.609117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7855462PMC
January 2021

Geometricus represents protein structures as shape-mers derived from moment invariants.

Bioinformatics 2020 Dec;36(Supplement_2):i718-i725

Bioinformatics Group, Department of Plant Sciences.

Motivation: As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well.

Results: We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family.

Availability And Implementation: Python code available at https://git.wur.nl/durai001/geometricus.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa839DOI Listing
December 2020

The santalene synthase from Cinnamomum camphora: Reconstruction of a sesquiterpene synthase from a monoterpene synthase.

Arch Biochem Biophys 2020 11 26;695:108647. Epub 2020 Oct 26.

Bioscience, Wageningen Plant Research, Netherlands. Electronic address:

Plant terpene synthases (TPSs) can mediate formation of a large variety of terpenes, and their diversification contributes to the specific chemical profiles of different plant species and chemotypes. Plant genomes often encode a number of related terpene synthases, which can produce very different terpenes. The relationship between TPS sequence and resulting terpene product is not completely understood. In this work we describe two TPSs from the Camphor tree Cinnamomum camphora (L.) Presl. One of these, CiCaMS, acts as a monoterpene synthase (monoTPS), and mediates the production of myrcene, while the other, CiCaSSy, acts as a sesquiterpene synthase (sesquiTPS), and catalyses the production of α-santalene, β-santalene and trans-α-bergamotene. Interestingly, these enzymes share 97% DNA sequence identity and differ only in 22 amino acid residues out of 553. To understand which residues are essential for the catalysis of monoterpenes resp. sesquiterpenes, a number of hybrid synthases were prepared, and supplemented by a set of single-residue variants. These were tested for their ability to produce monoterpenes and sesquiterpenes by in vivo production of sesquiterpenes in E. coli, and by in vitro enzyme assays. This analysis pinpointed three residues in the sequence which could mediate the change in product specificity from a monoterpene synthase to a sesquiterpene synthase. Another set of three residues defined the sesquiterpene product profile, including the ratios between sesquiterpene products.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.abb.2020.108647DOI Listing
November 2020

CAPICE: a computational method for Consequence-Agnostic Pathogenicity Interpretation of Clinical Exome variations.

Genome Med 2020 08 24;12(1):75. Epub 2020 Aug 24.

Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands.

Exome sequencing is now mainstream in clinical practice. However, identification of pathogenic Mendelian variants remains time-consuming, in part, because the limited accuracy of current computational prediction methods requires manual classification by experts. Here we introduce CAPICE, a new machine-learning-based method for prioritizing pathogenic variants, including SNVs and short InDels. CAPICE outperforms the best general (CADD, GAVIN) and consequence-type-specific (REVEL, ClinPred) computational prediction methods, for both rare and ultra-rare variants. CAPICE is easily added to diagnostic pipelines as pre-computed score file or command-line software, or using online MOLGENIS web service with API. Download CAPICE for free and open-source (LGPLv3) at https://github.com/molgenis/capice .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-020-00775-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7446154PMC
August 2020

Coevolution-based prediction of protein-protein interactions in polyketide biosynthetic assembly lines.

Bioinformatics 2020 12;36(19):4846-4853

Bioinformatics Group.

Motivation: Polyketide synthases (PKSs) are enzymes that generate diverse molecules of great pharmaceutical importance, including a range of clinically used antimicrobials and antitumor agents. Many polyketides are synthesized by cis-AT modular PKSs, which are organized in assembly lines, in which multiple enzymes line up in a specific order. This order is defined by specific protein-protein interactions (PPIs). The unique modular structure and catalyzing mechanism of these assembly lines makes their products predictable and also spurred combinatorial biosynthesis studies to produce novel polyketides using synthetic biology. However, predicting the interactions of PKSs, and thereby inferring the order of their assembly line, is still challenging, especially for cases in which this order is not reflected by the ordering of the PKS-encoding genes in the genome.

Results: Here, we introduce PKSpop, which uses a coevolution-based PPI algorithm to infer protein order in PKS assembly lines. Our method accurately predicts protein orders (93% accuracy). Additionally, we identify new residue pairs that are key in determining interaction specificity, and show that coevolution of N- and C-terminal docking domains of PKSs is significantly more predictive for PPIs than coevolution between ketosynthase and acyl carrier protein domains.

Availability And Implementation: The code is available on http://www.bif.wur.nl/ (under 'Software').

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa595DOI Listing
December 2020

Caretta - A multiple protein structure alignment and feature extraction suite.

Comput Struct Biotechnol J 2020 6;18:981-992. Epub 2020 Apr 6.

Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands.

The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta's performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.csbj.2020.03.011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7186369PMC
April 2020

Novel routes towards bioplastics from plants: elucidation of the methylperillate biosynthesis pathway from Salvia dorisiana trichomes.

J Exp Bot 2020 05;71(10):3052-3065

Wageningen Plant Research, 6700 AA, Wageningen, The Netherlands.

Plants produce a large variety of highly functionalized terpenoids. Functional groups such as partially unsaturated rings and carboxyl groups provide handles to use these compounds as feedstock for biobased commodity chemicals. For instance, methylperillate, a monoterpenoid found in Salvia dorisiana, may be used for this purpose, as it carries both an unsaturated ring and a methylated carboxyl group. The biosynthetic pathway of methylperillate in plants is still unclear. In this work, we identified glandular trichomes from S. dorisiana as the location of biosynthesis and storage of methylperillate. mRNA from purified trichomes was used to identify four genes that can encode the pathway from geranyl diphosphate towards methylperillate. This pathway includes a (-)-limonene synthase (SdLS), a limonene 7-hydroxylase (SdL7H, CYP71A76), and a perillyl alcohol dehydrogenase (SdPOHDH). We also identified a terpene acid methyltransferase, perillic acid O-methyltransferase (SdPAOMT), with homology to salicylic acid OMTs. Transient expression in Nicotiana benthamiana of these four genes, in combination with a geranyl diphosphate synthase to boost precursor formation, resulted in production of methylperillate. This demonstrates the potential of these enzymes for metabolic engineering of a feedstock for biobased commodity chemicals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jxb/eraa086DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7260718PMC
May 2020

Meiotic recombination profiling of interspecific hybrid F1 tomato pollen by linked read sequencing.

Plant J 2020 05 22;102(3):480-492. Epub 2020 Jan 22.

Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.

Genome wide screening of pooled pollen samples from a single interspecific F1 hybrid obtained from a cross between tomato, Solanum lycopersicum and its wild relative, Solanum pimpinellifolium using linked read sequencing of the haploid nuclei, allowed profiling of the crossover (CO) and gene conversion (GC) landscape. We observed a striking overlap between cold regions of CO in the male gametes and our previously established F6 recombinant inbred lines (RILs) population. COs were overrepresented in non-coding regions in the gene promoter and 5'UTR regions of genes. Poly-A/T and AT rich motifs were found enriched in 1 kb promoter regions flanking the CO sites. Non-crossover associated allelic and ectopic GCs were detected in most chromosomes, confirming that besides CO, GC represents also a source for genetic diversity and genome plasticity in tomato. Furthermore, we identified processed break junctions pointing at the involvement of both homology directed and non-homology directed repair pathways, suggesting a recombination machinery in tomato that is more complex than currently anticipated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/tpj.14640DOI Listing
May 2020

Designing Eukaryotic Gene Expression Regulation Using Machine Learning.

Trends Biotechnol 2020 02 17;38(2):191-201. Epub 2019 Aug 17.

Bioinformatics Group, Wageningen University and Research, Wageningen, The Netherlands. Electronic address:

Controlling the expression of genes is one of the key challenges of synthetic biology. Until recently fine-tuned control has been out of reach, particularly in eukaryotes owing to their complexity of gene regulation. With advances in machine learning (ML) and in particular with increasing dataset sizes, models predicting gene expression levels from regulatory sequences can now be successfully constructed. Such models form the cornerstone of algorithms that allow users to design regulatory regions to achieve a specific gene expression level. In this review we discuss strategies for data collection, data encoding, ML practices, design algorithm choices, and finally model interpretation. Ultimately, these developments will provide synthetic biologists with highly specific genetic building blocks to rationally engineer complex pathways and circuits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tibtech.2019.07.007DOI Listing
February 2020

Comprehensive phenotyping reveals interactions and functions of Arabidopsis thaliana TCP genes in yield determination.

Plant J 2019 07 29;99(2):316-328. Epub 2019 Apr 29.

Bioscience, Wageningen Plant Research, Wageningen University and Research, 6708 PB, Wageningen, The Netherlands.

Members of the Arabidopsis thaliana TCP transcription factor (TF) family affect plant growth and development. We systematically quantified the effect of mutagenizing single or multiple TCP TFs and how altered vegetative growth or branching influences final seed yield. We monitored rosette growth over time and branching patterns and seed yield characteristics at the end of the lifecycle. Subsequently, an approach was developed to disentangle vegetative growth and to determine possible effects on seed yield. Analysis of growth parameters showed all investigated tcp mutants to be affected in certain growth aspects compared with wild-type plants, highlighting the importance of TCP TFs in plant development. Furthermore, we found evidence that all class II TCPs are involved in axillary branch outgrowth, either as inhibitors (BRANCHED-like genes) or enhancers (JAW- and TCP5-like genes). Comprehensive phenotyping of plants mutant for single or multiple TCP TFs reveals that the proposed opposite functions of class I and class II TCPs in plant growth needs revision and shows complex interactions between closely related TCP genes instead of full genetic redundancy. In various instances, the alterations in vegetative growth or in branching patterns result into negative trade-off effects on seed yield that were missed in previous studies, showing the importance of comprehensive and quantitative phenotyping.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/tpj.14326DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6767503PMC
July 2019

Improved inference of intermolecular contacts through protein-protein interaction prediction using coevolutionary analysis.

Bioinformatics 2019 06;35(12):2036-2042

Bioinformatics Group, Department of Plant Sciences.

Motivation: Predicting residue-residue contacts between interacting proteins is an important problem in bioinformatics. The growing wealth of sequence data can be used to infer these contacts through correlated mutation analysis on multiple sequence alignments of interacting homologs of the proteins of interest. This requires correct identification of pairs of interacting proteins for many species, in order to avoid introducing noise (i.e. non-interacting sequences) in the analysis that will decrease predictive performance.

Results: We have designed Ouroboros, a novel algorithm to reduce such noise in intermolecular contact prediction. Our method iterates between weighting proteins according to how likely they are to interact based on the correlated mutations signal, and predicting correlated mutations based on the weighted sequence alignment. We show that this approach accurately discriminates between protein interaction versus non-interaction and simultaneously improves the prediction of intermolecular contact residues compared to a naive application of correlated mutation analysis. This requires no training labels concerning interactions or contacts. Furthermore, the method relaxes the assumption of one-to-one interaction of previous approaches, allowing for the study of many-to-many interactions.

Availability And Implementation: Source code and test data are available at www.bif.wur.nl/.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty924DOI Listing
June 2019

Comparative analysis of binding patterns of MADS-domain proteins in Arabidopsis thaliana.

BMC Plant Biol 2018 Jun 25;18(1):131. Epub 2018 Jun 25.

Bioscience, Wageningen UR, Droevendaalsesteeg 1, Wageningen, The Netherlands.

Background: Correct flower formation requires highly specific temporal and spatial regulation of gene expression. In Arabidopsis thaliana the majority of the master regulators that determine flower organ identity belong to the MADS-domain transcription factor family. The canonical DNA binding motif for this transcription factor family is the CArG-box, which has the consensus CC(A/T)GG. However, so far, a comprehensive analysis of MADS-domain binding patterns has not yet been performed.

Results: Eight publicly available ChIP-seq datasets of MADS-domain proteins that regulate the floral transition and flower formation were analyzed. Surprisingly, the preferred DNA binding motif of each protein was a CArG-box with an NAA extension. Furthermore, motifs of other transcription factors were found in the vicinity of binding sites of MADS-domain transcription factors, suggesting that interaction of MADS-domain proteins with other transcription factors is important for target gene regulation. Finally, conservation of CArG-boxes between Arabidopsis ecotypes was assessed to obtain information about their evolutionary importance. CArG-boxes that fully matched the consensus were more conserved than other CArG-boxes, suggesting that the perfect CArG-box is evolutionary more important than other CArG-box variants.

Conclusion: Our analysis provides detailed insight into MADS-domain protein binding patterns. The results underline the importance of an extended version of the CArG-box and provide a first view on evolutionary conservation of MADS-domain protein binding sites in Arabidopsis ecotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12870-018-1348-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019531PMC
June 2018

DNA sequence and shape are predictive for meiotic crossovers throughout the plant kingdom.

Plant J 2018 May 29. Epub 2018 May 29.

Business Unit Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands.

A better understanding of genomic features influencing the location of meiotic crossovers (COs) in plant species is both of fundamental importance and of practical relevance for plant breeding. Using CO positions with sufficiently high resolution from four plant species [Arabidopsis thaliana, Solanum lycopersicum (tomato), Zea mays (maize) and Oryza sativa (rice)] we have trained machine-learning models to predict the susceptibility to CO formation. Our results show that CO occurrence within various plant genomes can be predicted by DNA sequence and shape features. Several features related to genome content and to genomic accessibility were consistently either positively or negatively related to COs in all four species. Other features were found as predictive only in specific species. Gene annotation-related features were especially predictive for maize, whereas in tomato and Arabidopsis propeller twist and helical twist (DNA shape features) and AT/TA dinucleotides were found to be the most important. In rice, high roll (another DNA shape feature) and low CA dinucleotide frequency in particular were found to be associated with CO occurrence. The accuracy of our models was sufficient for Arabidopsis and rice (area under receiver operating characteristic curve, AUROC > 0.5), and was high for tomato and maize (AUROC ≫ 0.5), demonstrating that DNA sequence and shape are predictive for meiotic COs throughout the plant kingdom.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/tpj.13979DOI Listing
May 2018

Corrigendum: Towards recommendations for metadata and data handling in plant phenotyping.

J Exp Bot 2018 03;69(7):1819

Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstrasse, Stadt Seeland, Germany.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jxb/ery006DOI Listing
March 2018

Divergent regulation of Arabidopsis SAUR genes: a focus on the SAUR10-clade.

BMC Plant Biol 2017 Dec 19;17(1):245. Epub 2017 Dec 19.

Laboratory of Molecular Biology, Wageningen University & Research, Droevendaalsesteeg 1, 6708, PB, Wageningen, the Netherlands.

Background: Small Auxin-Upregulated RNA (SAUR) genes encode growth regulators that induce cell elongation. Arabidopsis contains more than 70 SAUR genes, of which the growth-promoting function has been unveiled in seedlings, while their role in other tissues remained largely unknown. Here, we focus on the regulatory regions of Arabidopsis SAUR genes, to predict the processes in which they play a role, and understand the dynamics of plant growth.

Results: In this study, we characterized in detail the entire SAUR10-clade: SAUR8, SAUR9, SAUR10, SAUR12, SAUR16, SAUR50, SAUR51 and SAUR54. Overexpression analysis revealed that the different proteins fulfil similar functions, while the SAUR expression patterns were highly diverse, showing expression throughout plant development in a variety of tissues. In addition, the response to application of different hormones largely varied between the different genes. These tissue-specific and hormone-specific responses could be linked to transcription factor binding sites using in silico analyses. These analyses also supported the existence of two groups of SAURs in Arabidopsis: Class I genes can be induced by combinatorial action of ARF-BZR-PIF transcription factors, while Class II genes are not regulated by auxin.

Conclusions: SAUR10-clade genes generally induce cell-elongation, but exhibit diverse expression patterns and responses to hormones. Our experimental and in silico analyses suggest that transcription factors involved in plant development determine the tissue specific expression of the different SAUR genes, whereas the amplitude of this expression can often be controlled by hormone response transcription factors. This allows the plant to fine tune growth in a variety of tissues in response to internal and external signals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12870-017-1210-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735953PMC
December 2017

Transcription Factor-Mediated Control of Anthocyanin Biosynthesis in Vegetative Tissues.

Plant Physiol 2018 02 30;176(2):1862-1878. Epub 2017 Nov 30.

Wageningen Plant Research, Bioscience, 6700 AA, Wageningen, The Netherlands

Plants accumulate secondary metabolites to adapt to environmental conditions. These compounds, here exemplified by the purple-colored anthocyanins, are accumulated upon high temperatures, UV-light, drought, and nutrient deficiencies, and may contribute to tolerance to these stresses. Producing compounds is often part of a more broad response of the plant to changes in the environment. Here we investigate how a transcription-factor-mediated program for controlling anthocyanin biosynthesis also has effects on formation of specialized cell structures and changes in the plant root architecture. A systems biology approach was developed in tomato () for coordinated induction of biosynthesis of anthocyanins, in a tissue- and development-independent manner. A transcription factor couple from that is known to control anthocyanin biosynthesis was introduced in tomato under control of a dexamethasone-inducible promoter. By application of dexamethasone, anthocyanin formation was induced within 24 h in vegetative tissues and in undifferentiated cells. Profiles of metabolites and gene expression were analyzed in several tomato tissues. Changes in concentration of anthocyanins and other phenolic compounds were observed in all tested tissues, accompanied by induction of the biosynthetic pathways leading from Glc to anthocyanins. A number of pathways that are not known to be involved in anthocyanin biosynthesis were observed to be regulated. Anthocyanin-producing plants displayed profound physiological and architectural changes, depending on the tissue, including root branching, root epithelial cell morphology, seed germination, and leaf conductance. The inducible anthocyanin-production system reveals a range of phenomena that accompanies anthocyanin biosynthesis in tomato, including adaptions of the plants architecture and physiology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1104/pp.17.01662DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813534PMC
February 2018

Tulipa gesneriana and Lilium longiflorum PEBP Genes and Their Putative Roles in Flowering Time Control.

Plant Cell Physiol 2018 Jan;59(1):90-106

Department of Life Sciences, Ben Gurion University of the Negev, Beersheva 84105, Israel.

Floral induction in Tulipa gesneriana and Lilium longiflorum is triggered by contrasting temperature conditions, high and low temperature, respectively. In Arabidopsis, the floral integrator FLOWERING LOCUS T (FT), a member of the PEBP (phosphatidyl ethanolamine-binding protein) gene family, is a key player in flowering time control. In this study, one PEBP gene was identified and characterized in lily (LlFT) and three PEBP genes were isolated from tulip (TgFT1, TgFT2 and TgFT3). Overexpression of these genes in Arabidopsis thaliana resulted in an early flowering phenotype for LlFT and TgFT2, but a late flowering phenotype for TgFT1 and TgFT3. Overexpression of LlFT in L. longiflorum also resulted in an early flowering phenotype, confirming its proposed role as a flowering time-controlling gene. The tulip PEBP genes TgFT2 and TgFT3 have a similar expression pattern in tulip, but show opposite effects on the timing of flowering in Arabidopsis. Therefore, the difference between these two proteins was further investigated by interchanging amino acids thought to be important for the FT function. This resulted in the conversion of phenotypes in Arabidopsis upon overexpressing the substituted TgFT2 and TgFT3 genes, revealing the importance of these interchanged amino acid residues. Based on all obtained results, we hypothesize that LlFT is involved in creating meristem competence to flowering-related cues in lily, and TgFT2 is considered to act as a florigen involved in the floral induction in tulip. The function of TgFT3 remains unclear, but, based on our observations and phylogenetic analysis, we propose a bulb-specific function for this gene.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/pcp/pcx164DOI Listing
January 2018

Similarities between plant traits based on their connection to underlying gene functions.

PLoS One 2017 10;12(8):e0182097. Epub 2017 Aug 10.

Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands.

Understanding of phenotypes and their genetic basis is a major focus in current plant biology. Large amounts of phenotype data are being generated, both for macroscopic phenotypes such as size or yield, and for molecular phenotypes such as expression levels and metabolite levels. More insight in the underlying genetic and molecular mechanisms that influence phenotypes will enable a better understanding of how various phenotypes are related to each other. This will be a major step forward in understanding plant biology, with immediate value for plant breeding and academic plant research. Currently the genetic basis of most phenotypes remains however to be discovered, and the relatedness of different traits is unclear. We here present a novel approach to connect phenotypes to underlying biological processes and molecular functions. These connections define similarities between different types of phenotypes. The approach starts by using Quantitative Trait Locus (QTL) data, which are abundantly available for many phenotypes of interest. Overrepresentation analysis of gene functions based on Gene Ontology term enrichment across multiple QTL regions for a given phenotype, be it macroscopic or molecular, results in a small set of biological processes and molecular functions for each phenotype. Subsequently, similarity between different phenotypes can be defined in terms of these gene functions. Using publicly available rice data as example, a close relationship with defined molecular phenotypes is demonstrated for many macroscopic phenotypes. This includes for example a link between 'leaf senescence' and 'aspartic acid', as well as between 'days to maturity' and 'choline'. Relationships between macroscopic and molecular phenotypes may result in more efficient marker-assisted breeding and are likely to direct future research aimed at a better understanding of plant phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0182097PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5552327PMC
October 2017

Sequence-based analysis of protein degradation rates.

Proteins 2017 Sep 10;85(9):1593-1601. Epub 2017 Jun 10.

Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.

Protein turnover is a key aspect of cellular homeostasis and proteome dynamics. However, there is little consensus on which properties of a protein determine its lifetime in the cell. In this work, we exploit two reliable datasets of experimental protein degradation rates to learn models and uncover determinants of protein degradation, with particular focus on properties that can be derived from the sequence. Our work shows that simple sequence features suffice to obtain predictive models of which the output correlates reasonably well with the experimentally measured values. We also show that intrinsic disorder may have a larger effect than previously reported, and that the effect of PEST regions, long thought to act as specific degradation signals, can be better explained by their disorder. We also find that determinants of protein degradation depend on the cell types or experimental conditions studied. This analysis serves as a first step towards the development of more complex, mature computational models of degradation of proteins and eventually of their full life cycle. Proteins 2017; 85:1593-1601. © 2017 Wiley Periodicals, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.25323DOI Listing
September 2017

Floral pathway integrator gene expression mediates gradual transmission of environmental and endogenous cues to flowering time.

PeerJ 2017 19;5:e3197. Epub 2017 Apr 19.

Biometris, Department for Mathematical and Statistical Methods, Wageningen University, Wageningen, The Netherlands.

The appropriate timing of flowering is crucial for the reproductive success of plants. Hence, intricate genetic networks integrate various environmental and endogenous cues such as temperature or hormonal statues. These signals integrate into a network of floral pathway integrator genes. At a quantitative level, it is currently unclear how the impact of genetic variation in signaling pathways on flowering time is mediated by floral pathway integrator genes. Here, using datasets available from literature, we connect flowering time in genetic backgrounds varying in upstream signalling components with the expression levels of floral pathway integrator genes in these genetic backgrounds. Our modelling results indicate that flowering time depends in a quite linear way on expression levels of floral pathway integrator genes. This gradual, proportional response of flowering time to upstream changes enables a gradual adaptation to changing environmental factors such as temperature and light.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.3197DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399868PMC
April 2017

An interactomics overview of the human and bovine milk proteome over lactation.

Proteome Sci 2016 5;15. Epub 2017 Jan 5.

Dairy Science and Technology, Food Quality and Design Group, Wageningen University, Postbox 8129, 6700EV Wageningen, The Netherlands.

Background: Milk is the most important food for growth and development of the neonate, because of its nutrient composition and presence of many bioactive proteins. Differences between human and bovine milk in low abundant proteins have not been extensively studied. To better understand the differences between human and bovine milk, the qualitative and quantitative differences in the milk proteome as well as their changes over lactation were compared using both label-free and labelled proteomics techniques. These datasets were analysed and compared, to better understand the role of milk proteins in development of the newborn.

Methods: Human and bovine milk samples were prepared by using filter-aided sample preparation (FASP) combined with dimethyl labelling and analysed by nano LC LTQ-Orbitrap XL mass spectrometry.

Results: The human and bovine milk proteome show similarities with regard to the distribution over biological functions, especially the dominant presence of enzymes, transport and immune-related proteins. At a quantitative level, the human and bovine milk proteome differed not only between species but also over lactation within species. Dominant enzymes that differed between species were those assisting in nutrient digestion, with bile salt-activated lipase being abundant in human milk and pancreatic ribonuclease being abundant in bovine milk. As lactation advances, immune-related proteins decreased slower in human milk compared to bovine milk. Notwithstanding these quantitative differences, analysis of human and bovine co-expression networks and protein-protein interaction networks indicated that a subset of milk proteins displayed highly similar interactions in each of the different networks, which may be related to the general importance of milk in nutrition and healthy development of the newborn.

Conclusions: Our findings promote a better understanding of the differences and similarities in dynamics of human and bovine milk proteins, thereby also providing guidance for further improvement of infant formula.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12953-016-0110-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5267443PMC
January 2017

Selected proceedings of Machine Learning in Systems Biology: MLSB 2016.

BMC Bioinformatics 2016 Dec 13;17(Suppl 16):437. Epub 2016 Dec 13.

Department of Computer Science, Aalto University, 00076, Aalto, Finland.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-016-1305-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5249013PMC
December 2016

Cross-Family Transcription Factor Interactions: An Additional Layer of Gene Regulation.

Trends Plant Sci 2017 01 1;22(1):66-80. Epub 2016 Nov 1.

Wageningen University and Research, Bioscience, Plant Developmental Systems, Wageningen, The Netherlands; Wageningen University and Research, Laboratory of Molecular Biology, Wageningen, The Netherlands. Electronic address:

Specific and dynamic gene expression strongly depends on transcription factor (TF) activity and most plant TFs function in a combinatorial fashion. They can bind to DNA and control the expression of the corresponding gene in an additive fashion or cooperate by physical interactions, forming larger protein complexes. The importance of protein-protein interactions between members of a particular plant TF family has long been recognised; however, a significant number of interfamily TF interactions has recently been reported. The biological implications and the molecular mechanisms involved in cross-family interactions have now started to be elucidated and the examples illustrate potential roles in the bridging of biological processes. Hence, cross-family TF interactions expand the molecular toolbox for plants with additional mechanisms to control and fine-tune robust gene expression patterns and to adapt to their continuously changing environment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tplants.2016.10.007DOI Listing
January 2017

Distribution, position and genomic characteristics of crossovers in tomato recombinant inbred lines derived from an interspecific cross between Solanum lycopersicum and Solanum pimpinellifolium.

Plant J 2017 02 3;89(3):554-564. Epub 2017 Feb 3.

Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.

We determined the crossover (CO) distribution, frequency and genomic sequences involved in interspecies meiotic recombination by using parent-assigned variants of 52 F recombinant inbred lines obtained from a cross between tomato, Solanum lycopersicum, and its wild relative, Solanum pimpinellifolium. The interspecific CO frequency was 80% lower than reported for intraspecific tomato crosses. We detected regions showing a relatively high and low CO frequency, so-called hot and cold regions. Cold regions coincide to a large extent with the heterochromatin, although we found a limited number of smaller cold regions in the euchromatin. The CO frequency was higher at the distal ends of chromosomes than in pericentromeric regions and higher in short arm euchromatin. Hot regions of CO were detected in euchromatin, and COs were more often located in non-coding regions near the 5' untranslated region of genes than expected by chance. Besides overrepresented CCN repeats, we detected poly-A/T and AT-rich motifs enriched in 1-kb promoter regions flanking the CO sites. The most abundant sequence motifs at CO sites share weak similarity to transcription factor-binding sites, such as for the C2H2 zinc finger factors class and MADS box factors, while InterPro scans detected enrichment for genes possibly involved in the repair of DNA breaks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/tpj.13406DOI Listing
February 2017

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Authors:
Yuxiang Jiang Tal Ronnen Oron Wyatt T Clark Asma R Bankapur Daniel D'Andrea Rosalba Lepore Christopher S Funk Indika Kahanda Karin M Verspoor Asa Ben-Hur Da Chen Emily Koo Duncan Penfold-Brown Dennis Shasha Noah Youngs Richard Bonneau Alexandra Lin Sayed M E Sahraeian Pier Luigi Martelli Giuseppe Profiti Rita Casadio Renzhi Cao Zhaolong Zhong Jianlin Cheng Adrian Altenhoff Nives Skunca Christophe Dessimoz Tunca Dogan Kai Hakala Suwisa Kaewphan Farrokh Mehryary Tapio Salakoski Filip Ginter Hai Fang Ben Smithers Matt Oates Julian Gough Petri Törönen Patrik Koskinen Liisa Holm Ching-Tai Chen Wen-Lian Hsu Kevin Bryson Domenico Cozzetto Federico Minneci David T Jones Samuel Chapman Dukka Bkc Ishita K Khan Daisuke Kihara Dan Ofer Nadav Rappoport Amos Stern Elena Cibrian-Uhalte Paul Denny Rebecca E Foulger Reija Hieta Duncan Legge Ruth C Lovering Michele Magrane Anna N Melidoni Prudence Mutowo-Meullenet Klemens Pichler Aleksandra Shypitsyna Biao Li Pooya Zakeri Sarah ElShal Léon-Charles Tranchevent Sayoni Das Natalie L Dawson David Lee Jonathan G Lees Ian Sillitoe Prajwal Bhat Tamás Nepusz Alfonso E Romero Rajkumar Sasidharan Haixuan Yang Alberto Paccanaro Jesse Gillis Adriana E Sedeño-Cortés Paul Pavlidis Shou Feng Juan M Cejuela Tatyana Goldberg Tobias Hamp Lothar Richter Asaf Salamov Toni Gabaldon Marina Marcet-Houben Fran Supek Qingtian Gong Wei Ning Yuanpeng Zhou Weidong Tian Marco Falda Paolo Fontana Enrico Lavezzo Stefano Toppo Carlo Ferrari Manuel Giollo Damiano Piovesan Silvio C E Tosatto Angela Del Pozo José M Fernández Paolo Maietta Alfonso Valencia Michael L Tress Alfredo Benso Stefano Di Carlo Gianfranco Politano Alessandro Savino Hafeez Ur Rehman Matteo Re Marco Mesiti Giorgio Valentini Joachim W Bargsten Aalt D J van Dijk Branislava Gemovic Sanja Glisic Vladmir Perovic Veljko Veljkovic Nevena Veljkovic Danillo C Almeida-E-Silva Ricardo Z N Vencio Malvika Sharan Jörg Vogel Lakesh Kansakar Shanshan Zhang Slobodan Vucetic Zheng Wang Michael J E Sternberg Mark N Wass Rachael P Huntley Maria J Martin Claire O'Donovan Peter N Robinson Yves Moreau Anna Tramontano Patricia C Babbitt Steven E Brenner Michal Linial Christine A Orengo Burkhard Rost Casey S Greene Sean D Mooney Iddo Friedberg Predrag Radivojac

Genome Biol 2016 09 7;17(1):184. Epub 2016 Sep 7.

Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA.

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.

Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-016-1037-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5015320PMC
September 2016

Towards recommendations for metadata and data handling in plant phenotyping.

J Exp Bot 2015 Sep 4;66(18):5417-27. Epub 2015 Jun 4.

Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstrasse 3, D-06466 Stadt Seeland, Germany.

Recent methodological developments in plant phenotyping, as well as the growing importance of its applications in plant science and breeding, are resulting in a fast accumulation of multidimensional data. There is great potential for expediting both discovery and application if these data are made publicly available for analysis. However, collection and storage of phenotypic observations is not yet sufficiently governed by standards that would ensure interoperability among data providers and precisely link specific phenotypes and associated genomic sequence information. This lack of standards is mainly a result of a large variability of phenotyping protocols, the multitude of phenotypic traits that are measured, and the dependence of these traits on the environment. This paper discusses the current situation of standardization in the area of phenomics, points out the problems and shortages, and presents the areas that would benefit from improvement in this field. In addition, the foundations of the work that could revise the situation are proposed, and practical solutions developed by the authors are introduced.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jxb/erv271DOI Listing
September 2015

A quantitative and dynamic model of the Arabidopsis flowering time gene regulatory network.

PLoS One 2015 26;10(2):e0116973. Epub 2015 Feb 26.

Bioscience, Plant Research International, Wageningen UR, Wageningen, The Netherlands; Biometris, Wageningen UR, Wageningen, The Netherlands; Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands.

Various environmental signals integrate into a network of floral regulatory genes leading to the final decision on when to flower. Although a wealth of qualitative knowledge is available on how flowering time genes regulate each other, only a few studies incorporated this knowledge into predictive models. Such models are invaluable as they enable to investigate how various types of inputs are combined to give a quantitative readout. To investigate the effect of gene expression disturbances on flowering time, we developed a dynamic model for the regulation of flowering time in Arabidopsis thaliana. Model parameters were estimated based on expression time-courses for relevant genes, and a consistent set of flowering times for plants of various genetic backgrounds. Validation was performed by predicting changes in expression level in mutant backgrounds and comparing these predictions with independent expression data, and by comparison of predicted and experimental flowering times for several double mutants. Remarkably, the model predicts that a disturbance in a particular gene has not necessarily the largest impact on directly connected genes. For example, the model predicts that SUPPRESSOR OF OVEREXPRESSION OF CONSTANS (SOC1) mutation has a larger impact on APETALA1 (AP1), which is not directly regulated by SOC1, compared to its effect on LEAFY (LFY) which is under direct control of SOC1. This was confirmed by expression data. Another model prediction involves the importance of cooperativity in the regulation of APETALA1 (AP1) by LFY, a prediction supported by experimental evidence. Concluding, our model for flowering time gene regulation enables to address how different quantitative inputs are combined into one quantitative output, flowering time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0116973PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4342252PMC
January 2016

Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.

BMC Plant Biol 2014 Dec 10;14:330. Epub 2014 Dec 10.

Background: Elucidation of genotype-to-phenotype relationships is a major challenge in biology. In plants, it is the basis for molecular breeding. Quantitative Trait Locus (QTL) mapping enables to link variation at the trait level to variation at the genomic level. However, QTL regions typically contain tens to hundreds of genes. In order to prioritize such candidate genes, we show that we can identify potentially causal genes for a trait based on overrepresentation of biological processes (gene functions) for the candidate genes in the QTL regions of that trait.

Results: The prioritization method was applied to rice QTL data, using gene functions predicted on the basis of sequence- and expression-information. The average reduction of the number of genes was over ten-fold. Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits. A detailed analysis of flowering time QTLs illustrates that genes with completely unknown function are likely to play a role in this important trait.

Conclusions: Our approach can guide further experimentation and validation of causal genes for quantitative traits. This way it capitalizes on QTL data to uncover how individual genes influence trait variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12870-014-0330-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274756PMC
December 2014

Rice cytochrome P450 MAX1 homologs catalyze distinct steps in strigolactone biosynthesis.

Nat Chem Biol 2014 Dec 26;10(12):1028-33. Epub 2014 Oct 26.

1] Laboratory of Plant Physiology, Wageningen University, Wageningen, the Netherlands. [2] Centre for Biosystems Genomics, Wageningen, the Netherlands.

Strigolactones (SLs) are a class of phytohormones and rhizosphere signaling compounds with high structural diversity. Three enzymes, carotenoid isomerase DWARF27 and carotenoid cleavage dioxygenases CCD7 and CCD8, were previously shown to convert all-trans-β-carotene to carlactone (CL), the SL precursor. However, how CL is metabolized to SLs has remained elusive. Here, by reconstituting the SL biosynthetic pathway in Nicotiana benthamiana, we show that a rice homolog of Arabidopsis More Axillary Growth 1 (MAX1), encodes a cytochrome P450 CYP711 subfamily member that acts as a CL oxidase to stereoselectively convert CL into ent-2'-epi-5-deoxystrigol (B-C lactone ring formation), the presumed precursor of rice SLs. A protein encoded by a second rice MAX1 homolog then catalyzes the conversion of ent-2'-epi-5-deoxystrigol to orobanchol. We therefore report that two members of CYP711 enzymes can catalyze two distinct steps in SL biosynthesis, identifying the first enzymes involved in B-C ring closure and a subsequent structural diversification step of SLs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nchembio.1660DOI Listing
December 2014