Publications by authors named "Neil Swainston"

56 Publications

DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach.

J Cheminform 2020 Sep 4;12(1):53. Epub 2020 Sep 4.

Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK.

We address the problem of generating novel molecules with desired interaction properties as a multi-objective optimization problem. Interaction binding models are learned from binding data using graph convolution networks (GCNs). Since the experimentally obtained property scores are recognised as having potentially gross errors, we adopted a robust loss for the model. Combinations of these terms, including drug likeness and synthetic accessibility, are then optimized using reinforcement learning based on a graph convolution policy approach. Some of the molecules generated, while legitimate chemically, can have excellent drug-likeness scores but appear unusual. We provide an example based on the binding potency of small molecules to dopamine transporters. We extend our method successfully to use a multi-objective reward function, in this case for generating novel molecules that bind with dopamine transporters but not with those for norepinephrine. Our method should be generally applicable to the generation in silico of molecules with desirable properties.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-020-00454-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487898PMC
September 2020

Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently.

Biochem J 2020 12;477(23):4559-4580

Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K.

The number of 'small' molecules that may be of interest to chemical biologists - chemical space - is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved 'forward' problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). 'Deep' (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1042/BCJ20200781DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7733676PMC
December 2020

Engineering towards production of gatekeeper (2)-flavanones: naringenin, pinocembrin, eriodictyol and homoeriodictyol.

Synth Biol (Oxf) 2020 6;5(1):ysaa012. Epub 2020 Aug 6.

Manchester aaSynthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and Department of Chemistry, The University of Manchester, Manchester M1 7DN, UK.

Natural plant-based flavonoids have drawn significant attention as dietary supplements due to their potential health benefits, including anti-cancer, anti-oxidant and anti-asthmatic activities. Naringenin, pinocembrin, eriodictyol and homoeriodictyol are classified as (2)-flavanones, an important sub-group of naturally occurring flavonoids, with wide-reaching applications in human health and nutrition. These four compounds occupy a central position as branch point intermediates towards a broad spectrum of naturally occurring flavonoids. Here, we report the development of production chassis for each of these key gatekeeper flavonoids. Selection of key enzymes, genetic construct design and the optimization of process conditions resulted in the highest reported titers for naringenin (484 mg/l), improved production of pinocembrin (198 mg/l) and eriodictyol (55 mg/l from caffeic acid), and provided the first example of production of homoeriodictyol directly from glycerol (17 mg/l). This work provides a springboard for future production of diverse downstream natural and non-natural flavonoid targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/synbio/ysaa012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7644443PMC
August 2020

Highly multiplexed, fast and accurate nanopore sequencing for verification of synthetic DNA constructs and sequence libraries.

Synth Biol (Oxf) 2019 29;4(1):ysz025. Epub 2019 Oct 29.

Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, UK.

Synthetic biology utilizes the Design-Build-Test-Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilize this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using polymerase chain reaction is introduced, and sequencing data are analyzed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 × 96 well plates) were processed in a single workflow in 72 h (from colonies to analyzed data). Given our procedure's low hardware costs and highly multiplexed capability, this provides cost-effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, single nucleotide polymorphism analysis and gene synthesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/synbio/ysz025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7445882PMC
October 2019

SBML Level 3: an extensible format for the exchange and reuse of biological models.

Mol Syst Biol 2020 08;16(8):e9110

Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.

Systems biology has experienced dramatic growth in the number, size, and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of the Systems Biology Markup Language (SBML), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multiscale models of whole cells and organs, and new data sources such as single-cell measurements and live imaging, has precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level 3 provides the foundation needed to support this evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.15252/msb.20199110DOI Listing
August 2020

VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder.

Molecules 2020 Jul 29;25(15). Epub 2020 Jul 29.

Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK.

Molecular similarity is an elusive but core "unsupervised" cheminformatics concept, yet different "fingerprint" encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are "better" than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a "bowtie"-shaped artificial neural network. In the middle is a "bottleneck layer" or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/molecules25153446DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7435890PMC
July 2020

Rapid prototyping of microbial production strains for the biomanufacture of potential materials monomers.

Metab Eng 2020 07 23;60:168-182. Epub 2020 Apr 23.

Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, M1 7DN, UK; Department of Chemistry, The University of Manchester, Manchester, M13 9PL, UK. Electronic address:

Bio-based production of industrial chemicals using synthetic biology can provide alternative green routes from renewable resources, allowing for cleaner production processes. To efficiently produce chemicals on-demand through microbial strain engineering, biomanufacturing foundries have developed automated pipelines that are largely compound agnostic in their time to delivery. Here we benchmark the capabilities of a biomanufacturing pipeline to enable rapid prototyping of microbial cell factories for the production of chemically diverse industrially relevant material building blocks. Over 85 days the pipeline was able to produce 17 potential material monomers and key intermediates by combining 160 genetic parts into 115 unique biosynthetic pathways. To explore the scale-up potential of our prototype production strains, we optimized the enantioselective production of mandelic acid and hydroxymandelic acid, achieving gram-scale production in fed-batch fermenters. The high success rate in the rapid design and prototyping of microbially-produced material building blocks reveals the potential role of biofoundries in leading the transition to sustainable materials production.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ymben.2020.04.008DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7225752PMC
July 2020

The RESOLUTE consortium: unlocking SLC transporters for drug discovery.

Authors:
Giulio Superti-Furga Daniel Lackner Tabea Wiedmer Alvaro Ingles-Prieto Barbara Barbosa Enrico Girardi Ulrich Goldmann Bettina Gürtl Kristaps Klavins Christoph Klimek Sabrina Lindinger Eva Liñeiro-Retes André C Müller Svenja Onstein Gregor Redinger Daniela Reil Vitaly Sedlyarov Gernot Wolf Matthew Crawford Robert Everley David Hepworth Shenping Liu Stephen Noell Mary Piotrowski Robert Stanton Hui Zhang Salvatore Corallino Andrea Faedo Maria Insidioso Giovanna Maresca Loredana Redaelli Francesca Sassone Lia Scarabottolo Michela Stucchi Paola Tarroni Sara Tremolada Helena Batoulis Andreas Becker Eckhard Bender Yung-Ning Chang Alexander Ehrmann Anke Müller-Fahrnow Vera Pütter Diana Zindel Bradford Hamilton Martin Lenter Diana Santacruz Coralie Viollet Charles Whitehurst Kai Johnsson Philipp Leippe Birgit Baumgarten Lena Chang Yvonne Ibig Martin Pfeifer Jürgen Reinhardt Julian Schönbett Paul Selzer Klaus Seuwen Charles Bettembourg Bruno Biton Jörg Czech Hélène de Foucauld Michel Didier Thomas Licher Vincent Mikol Antje Pommereau Frédéric Puech Veeranagouda Yaligara Aled Edwards Brandon J Bongers Laura H Heitman Ad P IJzerman Huub J Sijben Gerard J P van Westen Justine Grixti Douglas B Kell Farah Mughal Neil Swainston Marina Wright-Muelas Tina Bohstedt Nicola Burgess-Brown Liz Carpenter Katharina Dürr Jesper Hansen Andreea Scacioc Giulia Banci Claire Colas Daniela Digles Gerhard Ecker Barbara Füzi Viktoria Gamsjäger Melanie Grandits Riccardo Martini Florentina Troger Patrick Altermatt Cédric Doucerain Franz Dürrenberger Vania Manolova Anna-Lena Steck Hanna Sundström Maria Wilhelm Claire M Steppan

Nat Rev Drug Discov 2020 07;19(7):429-430

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/d41573-020-00056-6DOI Listing
July 2020

An automated pipeline for the screening of diverse monoterpene synthase libraries.

Sci Rep 2019 08 15;9(1):11936. Epub 2019 Aug 15.

Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester, United Kingdom.

Monoterpenoids are a structurally diverse group of natural products with applications as pharmaceuticals, flavourings, fragrances, pesticides, and biofuels. Recent advances in synthetic biology offer new routes to this chemical diversity through the introduction of heterologous isoprenoid production pathways into engineered microorganisms. Due to the nature of the branched reaction mechanism, monoterpene synthases often produce multiple products when expressed in monoterpenoid production platforms. Rational engineering of terpene synthases is challenging due to a lack of correlation between protein sequence and cyclisation reaction catalysed. Directed evolution offers an attractive alternative protein engineering strategy as limited prior sequence-function knowledge is required. However, directed evolution of terpene synthases is hampered by the lack of a convenient high-throughput screening assay for the detection of multiple volatile terpene products. Here we applied an automated pipeline for the screening of diverse monoterpene synthase libraries, employing robotic liquid handling platforms coupled to GC-MS, and automated data extraction. We used the pipeline to screen pinene synthase variant libraries, with mutations in three areas of plasticity, capable of producing multiple monoterpene products. We successfully identified variants with altered product profiles and demonstrated good agreement between the results of the automated screen and traditional shake-flask cultures. In addition, useful insights into the cyclisation reaction catalysed by pinene synthase were obtained, including the identification of positions with the highest level of plasticity, and the significance of region 2 in carbocation cyclisation. The results obtained will aid the prediction and design of novel terpene synthase activities towards clean monoterpenoid products.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-48452-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6695433PMC
August 2019

GeneORator: An Effective Strategy for Navigating Protein Sequence Space More Efficiently through Boolean OR-Type DNA Libraries.

ACS Synth Biol 2019 06 7;8(6):1371-1378. Epub 2019 Jun 7.

Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology , The University of Manchester , Manchester M1 7DN , United Kingdom.

Directed evolution requires the creation of genetic diversity and subsequent screening or selection for improved variants. For DNA mutagenesis, conventional site-directed methods implicitly utilize the Boolean AND operator (creating all mutations simultaneously), producing a combinatorial explosion in the number of genetic variants as the number of mutations increases. We introduce GeneORator, a novel strategy for creating DNA libraries based on the Boolean logical OR operator. Here, a single library is divided into many subsets, each containing different combinations of the desired mutations. Consequently, the effect of adding more mutations on the number of genetic combinations is additive (Boolean OR logic) and not exponential (AND logic). We demonstrate this strategy with large-scale mutagenesis studies, using monoamine oxidase-N ( Aspergillus niger) as the exemplar target. First, we mutated every residue in the secondary structure-containing regions (276 out of a total 495 amino acids) to screen for improvements in k. Second, combinatorial OR-type libraries permitted screening of diverse mutation combinations in the enzyme active site to detect activity toward novel substrates. In both examples, OR-type libraries effectively reduced the number of variants searched up to 10-fold, dramatically reducing the screening effort required to discover variants with improved and/or novel activity. Importantly, this approach enables the screening of a greater diversity of mutation combinations, accessing a larger area of a protein's sequence space. OR-type libraries can be applied to any biological engineering objective requiring DNA mutagenesis, and the approach has wide ranging applications in, for example, enzyme engineering, antibody engineering, and synthetic biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acssynbio.9b00063DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007284PMC
June 2019

Machine Learning of Designed Translational Control Allows Predictive Pathway Optimization in Escherichia coli.

ACS Synth Biol 2019 01 7;8(1):127-136. Epub 2019 Jan 7.

Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry , University of Manchester , Manchester M1 7DN , United Kingdom.

The field of synthetic biology aims to make the design of biological systems predictable, shrinking the huge design space to practical numbers for testing. When designing microbial cell factories, most optimization efforts have focused on enzyme and strain selection/engineering, pathway regulation, and process development. In silico tools for the predictive design of bacterial ribosome binding sites (RBSs) and RBS libraries now allow translational tuning of biochemical pathways; however, methods for predicting optimal RBS combinations in multigene pathways are desirable. Here we present the implementation of machine learning algorithms to model the RBS sequence-phenotype relationship from representative subsets of large combinatorial RBS libraries allowing the accurate prediction of optimal high-producers. Applied to a recombinant monoterpenoid production pathway in Escherichia coli, our approach was able to boost production titers by over 60% when screening under 3% of a library. To facilitate library screening, a multiwell plate fermentation procedure was developed, allowing increased screening throughput with sufficient resolution to discriminate between high and low producers. High producers from one library did not translate during scale-up, but the reduced screening requirements allowed rapid rescreening at the larger scale. This methodology is potentially compatible with any biochemical pathway and provides a powerful tool toward predictive design of bacterial production chassis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acssynbio.8b00398DOI Listing
January 2019

An automated Design-Build-Test-Learn pipeline for enhanced microbial production of fine chemicals.

Commun Biol 2018 8;1:66. Epub 2018 Jun 8.

Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, M1 7DN, UK.

The microbial production of fine chemicals provides a promising biosustainable manufacturing solution that has led to the successful production of a growing catalog of natural products and high-value chemicals. However, development at industrial levels has been hindered by the large resource investments required. Here we present an integrated Design-Build-Test-Learn (DBTL) pipeline for the discovery and optimization of biosynthetic pathways, which is designed to be compound agnostic and automated throughout. We initially applied the pipeline for the production of the flavonoid (2)-pinocembrin in , to demonstrate rapid iterative DBTL cycling with automation at every stage. In this case, application of two DBTL cycles successfully established a production pathway improved by 500-fold, with competitive titers up to 88 mg L. The further application of the pipeline to optimize an alkaloids pathway demonstrates how it could facilitate the rapid optimization of microbial strains for production of any chemical compound of interest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-018-0076-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6123781PMC
June 2018

Fast and Flexible Synthesis of Combinatorial Libraries for Directed Evolution.

Methods Enzymol 2018 24;608:59-79. Epub 2018 May 24.

School of Chemistry, Faculty of Science and Engineering, Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, United Kingdom. Electronic address:

Directed evolution (DE) is a powerful tool for optimizing an enzyme's properties toward a particular objective, such as broader substrate scope, greater thermostability, or increased k. A successful DE project requires the generation of genetic diversity and subsequent screening or selection to identify variants with improved fitness. In contrast to random methods (error-prone PCR or DNA shuffling), site-directed mutagenesis enables the rational design of variant libraries and provides control over the nature and frequency of the encoded mutations. Knowledge of protein structure, dynamics, enzyme mechanisms, and natural evolution demonstrates that multiple (combinatorial) mutations are required to discover the most improved variants. To this end, we describe an experimentally straightforward and low-cost method for the preparation of combinatorial variant libraries. Our approach employs a two-step PCR protocol, first producing mutagenic megaprimers, which can then be combined in a "mix-and-match" fashion to generate diverse sets of combinatorial variant libraries both quickly and accurately.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/bs.mie.2018.04.006DOI Listing
June 2019

Multifragment DNA Assembly of Biochemical Pathways via Automated Ligase Cycling Reaction.

Methods Enzymol 2018 ;608:369-392

Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom; School of Chemistry, The University of Manchester, Manchester, United Kingdom. Electronic address:

The microbial production of commodity, fine, and specialty chemicals is a driving force in biotechnology. An essential requirement is to introduce biosynthetic pathways to the target compound(s) into chassis organisms. First suitable enzymes must be selected and characterized, and then genetic pathways must be designed and assembled into suitable expression vectors. The design of these pathways is crucial for balancing the pathway for efficient in vivo activity. This can be achieved through optimization of the pathway regulation by altering transcription and translation rates. The possible permutations of a multigene pathway create a vast design space which is intractable to explore using traditional time-consuming and laborious pathway assembly methods. The advent of multifragment DNA assembly technologies has enabled simultaneous, multiplexed pathway construction allowing an increased capability to sample the design space. Furthermore, the implementation of laboratory automation allows error-reduced, high-throughput (HTP) construction of pathways. In this chapter, we present a workflow that combines automated in silico design of DNA parts followed by pathway assembly using the ligase cycling reaction on robotics platforms, to allow multiplexed assembly of plasmid-borne gene pathways with high efficiency. Details and considerations in designing DNA parts for expression bacterial chassis are discussed followed by laboratory protocols for HTP pathway assembly and screening using robotics platforms. This workflow is employed in the SYNBIOCHEM Synthetic Biology Research Center, providing the capability to assemble over 96 plasmids simultaneously, with over 40% of clones from each assembly harboring the correctly assembled plasmids. This workflow is easy to modify for use in other laboratories and will help to accelerate synthetic biology projects with diverse applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/bs.mie.2018.04.011DOI Listing
June 2019

PartsGenie: an integrated tool for optimizing and sharing synthetic biology parts.

Bioinformatics 2018 07;34(13):2327-2329

Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, UK.

Motivation: Synthetic biology is typified by developing novel genetic constructs from the assembly of reusable synthetic DNA parts, which contain one or more features such as promoters, ribosome binding sites, coding sequences and terminators. PartsGenie is introduced to facilitate the computational design of such synthetic biology parts, bridging the gap between optimization tools for the design of novel parts, the representation of such parts in community-developed data standards such as Synthetic Biology Open Language, and their sharing in journal-recommended data repositories. Consisting of a drag-and-drop web interface, a number of DNA optimization algorithms, and an interface to the well-used data repository JBEI ICE, PartsGenie facilitates the design, optimization and dissemination of reusable synthetic biology parts through an integrated application.

Availability And Implementation: PartsGenie is freely available at https://parts.synbiochem.co.uk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty105DOI Listing
July 2018

Rationalizing Context-Dependent Performance of Dynamic RNA Regulatory Devices.

ACS Synth Biol 2018 07 29;7(7):1660-1668. Epub 2018 Jun 29.

Manchester Institute of Biotechnology, School of Chemistry , University of Manchester , Manchester , M13 9PL , United Kingdom.

The ability of RNA to sense, regulate, and store information is an attractive attribute for a variety of functional applications including the development of regulatory control devices for synthetic biology. RNA folding and function is known to be highly context sensitive, which limits the modularity and reuse of RNA regulatory devices to control different heterologous sequences and genes. We explored the cause and effect of sequence context sensitivity for translational ON riboswitches located in the 5' UTR, by constructing and screening a library of N-terminal synonymous codon variants. By altering the N-terminal codon usage we were able to obtain RNA devices with a broad range of functional performance properties (ON, OFF, fold-change). Linear regression and calculated metrics were used to rationalize the major determining features leading to optimal riboswitch performance, and to identify multiple interactions between the explanatory metrics. Finally, partial least squared (PLS) analysis was employed in order to understand the metrics and their respective effect on performance. This PLS model was shown to provide good explanation of our library. This study provides a novel multivariant analysis framework to rationalize the codon context performance of allosteric RNA-devices. The framework will also serve as a platform for future riboswitch context engineering endeavors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acssynbio.8b00041DOI Listing
July 2018

Engineering the "Missing Link" in Biosynthetic (-)-Menthol Production: Bacterial Isopulegone Isomerase.

ACS Catal 2018 Mar 24;8(3):2012-2020. Epub 2018 Jan 24.

Manchester Centre for Fine and Speciality Chemicals (SYNBIOCHEM) and School of Chemistry, Manchester Institute of Biotechnology, University of Manchester, Manchester M1 7DN, United Kingdom.

The realization of a synthetic biology approach to microbial (1,2,5)-()-menthol () production relies on the identification of a gene encoding an isopulegone isomerase (IPGI), the only enzyme in the biosynthetic pathway as yet unidentified. We demonstrate that Δ5-3-ketosteroid isomerase (KSI) from can act as an IPGI, producing ()-(+)-pulegone (()-) from (+)--isopulegone (). Using a robotics-driven semirational design strategy, we identified a key KSI variant encoding four active site mutations, which confer a 4.3-fold increase in activity over the wild-type enzyme. This was assisted by the generation of crystal structures of four KSI variants, combined with molecular modeling of binding to identify key active site residue targets. The KSI variant was demonstrated to function efficiently within cascade biocatalytic reactions with downstream enzymes pulegone reductase and (-)-menthone:(-)-menthol reductase to generate from . This study introduces the use of a recombinant IPGI, engineered to function efficiently within a biosynthetic pathway for the production of in microorganisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acscatal.7b04115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5937688PMC
March 2018

STRENDA DB: enabling the validation and sharing of enzyme kinetics data.

FEBS J 2018 06 23;285(12):2193-2204. Epub 2018 Mar 23.

Beilstein-Institut, Frankfurt am Main, Germany.

Standards for reporting enzymology data (STRENDA) DB is a validation and storage system for enzyme function data that incorporates the STRENDA Guidelines. It provides authors who are preparing a manuscript with a user-friendly, web-based service that checks automatically enzymology data sets entered in the submission form that they are complete and valid before they are submitted as part of a publication to a journal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/febs.14427DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6005732PMC
June 2018

Selenzyme: enzyme selection tool for pathway design.

Bioinformatics 2018 06;34(12):2153-2154

BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology.

Summary: Synthetic biology applies the principles of engineering to biology in order to create biological functionalities not seen before in nature. One of the most exciting applications of synthetic biology is the design of new organisms with the ability to produce valuable chemicals including pharmaceuticals and biomaterials in a greener; sustainable fashion. Selecting the right enzymes to catalyze each reaction step in order to produce a desired target compound is, however, not trivial. Here, we present Selenzyme, a free online enzyme selection tool for metabolic pathway design. The user is guided through several decision steps in order to shortlist the best candidates for a given pathway step. The tool graphically presents key information about enzymes based on existing databases and tools such as: similarity of sequences and of catalyzed reactions; phylogenetic distance between source organism and intended host species; multiple alignment highlighting conserved regions, predicted catalytic site, and active regions and relevant properties such as predicted solubility and transmembrane regions. Selenzyme provides bespoke sequence selection for automated workflows in biofoundries.

Availability And Implementation: The tool is integrated as part of the pathway design stage into the design-build-test-learn SYNBIOCHEM pipeline. The Selenzyme web server is available at http://selenzyme.synbiochem.co.uk.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty065DOI Listing
June 2018

biochem4j: Integrated and extensible biochemical knowledge through graph databases.

PLoS One 2017 14;12(7):e0179130. Epub 2017 Jul 14.

Manchester Centre for Synthetic Biology of Fine and Specialty Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom.

Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and-crucially-the relationships between them. Such a resource should be extensible, such that newly discovered relationships-for example, those between novel, synthetic enzymes and non-natural products-can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0179130PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5510799PMC
September 2017

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.

PLoS Biol 2017 Jun 29;15(6):e2001414. Epub 2017 Jun 29.

European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.2001414DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5490878PMC
June 2017

SpeedyGenes: Exploiting an Improved Gene Synthesis Method for the Efficient Production of Synthetic Protein Libraries for Directed Evolution.

Methods Mol Biol 2017 ;1472:63-78

Manchester Institute of Biotechnology, The University of Manchester, 131, Princess St, Manchester, M1 7DN, UK.

Gene synthesis is a fundamental technology underpinning much research in the life sciences. In particular, synthetic biology and biotechnology utilize gene synthesis to assemble any desired DNA sequence, which can then be incorporated into novel parts and pathways. Here, we describe SpeedyGenes, a gene synthesis method that can assemble DNA sequences with greater fidelity (fewer errors) than existing methods, but that can also be used to encode extensive, statistically designed sequence variation at any position in the sequence to create diverse (but accurate) variant libraries. We summarize the integrated use of GeneGenie to design DNA and oligonucleotide sequences, followed by the procedure for assembling these accurately and efficiently using SpeedyGenes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-6343-0_5DOI Listing
January 2018

Recon 2.2: from reconstruction to model of human metabolism.

Metabolomics 2016;12:109. Epub 2016 Jun 7.

Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, M1 7DN UK ; School of Computer Science, The University of Manchester, Manchester, M13 9PL UK ; Center for Quantitative Medicine, UConn Health, 263 Farmington Avenue, Farmington, CT 06030-6033 USA.

Introduction: The human genome-scale metabolic reconstruction details all known metabolic reactions occurring in humans, and thereby holds substantial promise for studying complex diseases and phenotypes. Capturing the whole human metabolic reconstruction is an on-going task and since the last community effort generated a consensus reconstruction, several updates have been developed.

Objectives: We report a new consensus version, Recon 2.2, which integrates various alternative versions with significant additional updates. In addition to re-establishing a consensus reconstruction, further key objectives included providing more comprehensive annotation of metabolites and genes, ensuring full mass and charge balance in all reactions, and developing a model that correctly predicts ATP production on a range of carbon sources.

Methods: Recon 2.2 has been developed through a combination of manual curation and automated error checking. Specific and significant manual updates include a respecification of fatty acid metabolism, oxidative phosphorylation and a coupling of the electron transport chain to ATP synthase activity. All metabolites have definitive chemical formulae and charges specified, and these are used to ensure full mass and charge reaction balancing through an automated linear programming approach. Additionally, improved integration with transcriptomics and proteomics data has been facilitated with the updated curation of relationships between genes, proteins and reactions.

Results: Recon 2.2 now represents the most predictive model of human metabolism to date as demonstrated here. Extensive manual curation has increased the reconstruction size to 5324 metabolites, 7785 reactions and 1675 associated genes, which now are mapped to a single standard. The focus upon mass and charge balancing of all reactions, along with better representation of energy generation, has produced a flux model that correctly predicts ATP yield on different carbon sources.

Conclusion: Through these updates we have achieved the most complete and best annotated consensus human metabolic reconstruction available, thereby increasing the ability of this resource to provide novel insights into normal and disease states in human. The model is freely available from the Biomodels database (http://identifiers.org/biomodels.db/MODEL1603150001).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s11306-016-1051-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896983PMC
June 2016

SYNBIOCHEM-a SynBio foundry for the biosynthesis and sustainable production of fine and speciality chemicals.

Biochem Soc Trans 2016 06;44(3):675-7

BBSRC/EPSRC Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, U.K. SYNBIOCHEM Platform/Theme Lead, School of Chemistry, The University of Manchester, Manchester, U.K. SYNBIOCHEM, The University of Manchester, Manchester, U.K.

The Manchester Synthetic Biology Research Centre (SYNBIOCHEM) is a foundry for the biosynthesis and sustainable production of fine and speciality chemicals. The Centre's integrated technology platforms provide a unique capability to facilitate predictable engineering of microbial bio-factories for chemicals production. An overview of these capabilities is described.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1042/BST20160009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4900749PMC
June 2016

Bioinformatics for the synthetic biology of natural products: integrating across the Design-Build-Test cycle.

Nat Prod Rep 2016 Aug 17;33(8):925-32. Epub 2016 May 17.

Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester M1 7DN, UK.

Covering: 2000 to 2016Progress in synthetic biology is enabled by powerful bioinformatics tools allowing the integration of the design, build and test stages of the biological engineering cycle. In this review we illustrate how this integration can be achieved, with a particular focus on natural products discovery and production. Bioinformatics tools for the DESIGN and BUILD stages include tools for the selection, synthesis, assembly and optimization of parts (enzymes and regulatory elements), devices (pathways) and systems (chassis). TEST tools include those for screening, identification and quantification of metabolites for rapid prototyping. The main advantages and limitations of these tools as well as their interoperability capabilities are highlighted.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/c6np00018eDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5063057PMC
August 2016

libChEBI: an API for accessing the ChEBI database.

J Cheminform 2016 1;8:11. Epub 2016 Mar 1.

Manchester Centre for Synthetic Biology of Fine and Specialty Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, M1 7DN UK ; School of Computer Science, University of Manchester, Manchester, M13 9PL UK ; Center for Quantitative Medicine, UConn Health, Farmington, CT 06030 USA.

Background: ChEBI is a database and ontology of chemical entities of biological interest. It is widely used as a source of identifiers to facilitate unambiguous reference to chemical entities within biological models, databases, ontologies and literature. ChEBI contains a wealth of chemical data, covering over 46,500 distinct chemical entities, and related data such as chemical formula, charge, molecular mass, structure, synonyms and links to external databases. Furthermore, ChEBI is an ontology, and thus provides meaningful links between chemical entities. Unlike many other resources, ChEBI is fully human-curated, providing a reliable, non-redundant collection of chemical entities and related data. While ChEBI is supported by a web service for programmatic access and a number of download files, it does not have an API library to facilitate the use of ChEBI and its data in cheminformatics software.

Results: To provide this missing functionality, libChEBI, a comprehensive API library for accessing ChEBI data, is introduced. libChEBI is available in Java, Python and MATLAB versions from http://github.com/libChEBI, and provides full programmatic access to all data held within the ChEBI database through a simple and documented API. libChEBI is reliant upon the (automated) download and regular update of flat files that are held locally. As such, libChEBI can be embedded in both on- and off-line software applications.

Conclusions: libChEBI allows better support of ChEBI and its data in the development of new cheminformatics software. Covering three key programming languages, it allows for the entirety of the ChEBI database to be accessed easily and quickly through a simple API. All code is open access and freely available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-016-0123-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4772646PMC
March 2016

SBOL Visual: A Graphical Language for Genetic Designs.

PLoS Biol 2015 Dec 3;13(12):e1002310. Epub 2015 Dec 3.

Bioengineering, University of Washington, Seattle, Washington, United States of America.

Synthetic Biology Open Language (SBOL) Visual is a graphical standard for genetic engineering. It consists of symbols representing DNA subsequences, including regulatory elements and DNA assembly features. These symbols can be used to draw illustrations for communication and instruction, and as image assets for computer-aided design. SBOL Visual is a community standard, freely available for personal, academic, and commercial use (Creative Commons CC0 license). We provide prototypical symbol images that have been used in scientific publications and software tools. We encourage users to use and modify them freely, and to join the SBOL Visual community: http://www.sbolstandard.org/visual.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.1002310DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4669170PMC
December 2015

ChEBI in 2016: Improved services and an expanding collection of metabolites.

Nucleic Acids Res 2016 Jan 13;44(D1):D1214-9. Epub 2015 Oct 13.

Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK

ChEBI is a database and ontology containing information about chemical entities of biological interest. It currently includes over 46,000 entries, each of which is classified within the ontology and assigned multiple annotations including (where relevant) a chemical structure, database cross-references, synonyms and literature citations. All content is freely available and can be accessed online at http://www.ebi.ac.uk/chebi. In this update paper, we describe recent improvements and additions to the ChEBI offering. We have substantially extended our collection of endogenous metabolites for several organisms including human, mouse, Escherichia coli and yeast. Our front-end has also been reworked and updated, improving the user experience, removing our dependency on Java applets in favour of embedded JavaScript components and moving from a monthly release update to a 'live' website. Programmatic access has been improved by the introduction of a library, libChEBI, in Java, Python and Matlab. Furthermore, we have added two new tools, namely an analysis tool, BiNChE, and a query tool for the ontology, OntoQuery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1031DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702775PMC
January 2016

RobOKoD: microbial strain design for (over)production of target compounds.

Front Cell Dev Biol 2015 24;3:17. Epub 2015 Mar 24.

Manchester Institute of Biotechnology, University of Manchester Manchester, UK ; School of Computer Science, University of Manchester Manchester, UK.

Sustainable production of target compounds such as biofuels and high-value chemicals for pharmaceutical, agrochemical, and chemical industries is becoming an increasing priority given their current dependency upon diminishing petrochemical resources. Designing these strains is difficult, with current methods focusing primarily on knocking-out genes, dismissing other vital steps of strain design including the overexpression and dampening of genes. The design predictions from current methods also do not translate well-into successful strains in the laboratory. Here, we introduce RobOKoD (Robust, Overexpression, Knockout and Dampening), a method for predicting strain designs for overproduction of targets. The method uses flux variability analysis to profile each reaction within the system under differing production percentages of target-compound and biomass. Using these profiles, reactions are identified as potential knockout, overexpression, or dampening targets. The identified reactions are ranked according to their suitability, providing flexibility in strain design for users. The software was tested by designing a butanol-producing Escherichia coli strain, and was compared against the popular OptKnock and RobustKnock methods. RobOKoD shows favorable design predictions, when predictions from these methods are compared to a successful butanol-producing experimentally-validated strain. Overall RobOKoD provides users with rankings of predicted beneficial genetic interventions with which to support optimized strain design.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fcell.2015.00017DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4371745PMC
April 2015