Publications by authors named "Koji Tsuda"

73 Publications

Molecular generation by Fast Assembly of (Deep)SMILES fragments.

J Cheminform 2021 Nov 14;13(1):88. Epub 2021 Nov 14.

Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba, 277-8561, Japan.

Background: In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile.

Results: In this article, a simple method is described to generate only valid molecules at high frequency ([Formula: see text] molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ([Formula: see text] molecule/s) because it relies almost exclusively on string operations. The "Fast Assembly of SMILES Fragments" software is released as open-source at https://github.com/UnixJunkie/FASMIFRA . Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-021-00566-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8591910PMC
November 2021

Efficient Search for Energetically Favorable Molecular Conformations against Metastable States via Gray-Box Optimization.

J Chem Theory Comput 2021 Aug 14;17(8):5419-5427. Epub 2021 Jul 14.

Medical Sciences Innovation Hub Program, RIKEN, Yokohama 230-0045, Japan.

In order to accurately understand and estimate molecular properties, finding energetically favorable molecular conformations is the most fundamental task for atomistic computational research on molecules and materials. Geometry optimization based on quantum chemical calculations has enabled the conformation prediction of arbitrary molecules, including ones. However, it is computationally expensive to perform geometry optimizations for enormous conformers. In this study, we introduce the gray-box optimization (GBO) framework, which enables optimal control over the entire geometry optimization process, among multiple conformers. Algorithms designed for GBO roughly estimate energetically preferable conformers during their geometry optimization iterations. They then preferentially compute promising conformers. To evaluate the performance of the GBO framework, we applied it to a test set consisting of seven dipeptides and mycophenolic acid to determine their stable conformations at the density functional theory level. We thus preferentially obtained energetically favorable conformations. Furthermore, the computational costs required to find the most stable conformation were significantly reduced (approximately 1% on average, compared to the naive approach for the dipeptides).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jctc.1c00301DOI Listing
August 2021

Determination of quasi-primary odors by endpoint detection.

Sci Rep 2021 06 8;11(1):12070. Epub 2021 Jun 8.

Graduate School of Frontier Sciences, The University of Tokyo, Chiba, 277-8568, Japan.

It is known that there are no primary odors that can represent any other odors with their combination. Here, we propose an alternative approach: "quasi" primary odors. This approach comprises the following condition and method: (1) within a collected dataset and (2) by the machine learning-based endpoint detection. The quasi-primary odors are selected from the odors included in a collected odor dataset according to the endpoint score. While it is limited within the given dataset, the combination of such quasi-primary odors with certain ratios can reproduce any other odor in the dataset. To visually demonstrate this approach, the three quasi-primary odors having top three high endpoint scores are assigned to the vertices of a chromaticity triangle with red, green, and blue. Then, the other odors in the dataset are projected onto the chromaticity triangle to have their unique colors. The number of quasi-primary odors is not limited to three but can be set to an arbitrary number. With this approach, one can first find "extreme" odors (i.e., quasi-primary odors) in a given odor dataset, and then, reproduce any other odor in the dataset or even synthesize a new arbitrary odor by combining such quasi-primary odors with certain ratios.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-021-91210-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8187439PMC
June 2021

Fe-Al-Si Thermoelectric (FAST) Materials and Modules: Diffusion Couple and Machine-Learning-Assisted Materials Development.

ACS Appl Mater Interfaces 2021 Nov 21;13(45):53346-53354. Epub 2021 May 21.

Aisin Corporation, Kariya, Aichi 448-8650, Japan.

To lower the introduction and maintenance costs of autonomous power supplies for driving Internet-of-things (IoT) devices, we have developed low-cost Fe-Al-Si-based thermoelectric (FAST) materials and power generation modules. Our development approach combines computational science, experiments, mapping measurements, and machine learning (ML). FAST materials have a good balance of mechanical properties and excellent chemical stability, superior to that of conventional Bi-Te-based materials. However, it remains challenging to enhance the power factor (PF) and lower the thermal conductivity of FAST materials to develop reliable power generation devices. This forum paper describes the current status of materials development based on experiments and ML with limited data, together with power generation module fabrication related to FAST materials with a view to commercialization. Combining bulk combinatorial methods with diffusion couple and mapping measurements could accelerate the search to enhance PF for FAST materials. We report that ML prediction is a powerful tool for finding unexpected off-stoichiometric compositions of the Fe-Al-Si system and dopant concentrations of a fourth element to enhance the PF, i.e., Co substitution for Fe atoms in FAST materials.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acsami.1c04583DOI Listing
November 2021

Using molecular dynamics simulations to prioritize and understand AI-generated cell penetrating peptides.

Sci Rep 2021 05 20;11(1):10630. Epub 2021 May 20.

Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba, 277-8561, Japan.

Cell-penetrating peptides have important therapeutic applications in drug delivery, but the variety of known cell-penetrating peptides is still limited. With a promise to accelerate peptide development, artificial intelligence (AI) techniques including deep generative models are currently in spotlight. Scientists, however, are often overwhelmed by an excessive number of unannotated sequences generated by AI and find it difficult to obtain insights to prioritize them for experimental validation. To avoid this pitfall, we leverage molecular dynamics (MD) simulations to obtain mechanistic information to prioritize and understand AI-generated peptides. A mechanistic score of permeability is computed from five steered MD simulations starting from different initial structures predicted by homology modelling. To compensate for variability of predicted structures, the score is computed with sample variance penalization so that a peptide with consistent behaviour is highly evaluated. Our computational pipeline involving deep learning, homology modelling, MD simulations and synthesizability assessment generated 24 novel peptide sequences. The top-scoring peptide showed a consistent pattern of conformational change in all simulations regardless of initial structures. As a result of wet-lab-experiments, our peptide showed better permeability and weaker toxicity in comparison to a clinically used peptide, TAT. Our result demonstrates how MD simulations can support de novo peptide design by providing mechanistic information supplementing statistical inference.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-021-90245-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137933PMC
May 2021

Determinants of bone health in elderly Japanese men: study design and key findings of the Fujiwara-kyo Osteoporosis Risk in Men (FORMEN) cohort study.

Environ Health Prev Med 2021 Apr 23;26(1):51. Epub 2021 Apr 23.

Department of Public Health, Kindai University Faculty of Medicine, 377-2 Oono-higashi, Osaka-Sayama, Osaka, 589-8511, Japan.

Background: The Fujiwara-kyo Osteoporosis Risk in Men (FORMEN) study was launched to investigate risk factors for osteoporotic fractures, interactions of osteoporosis with other non-communicable chronic diseases, and effects of fracture on QOL and mortality.

Methods: FORMEN baseline study participants (in 2007 and 2008) included 2012 community-dwelling men (aged 65-93 years) in Nara prefecture, Japan. Clinical follow-up surveys were conducted 5 and 10 years after the baseline survey, and 1539 and 906 men completed them, respectively. Supplemental mail, telephone, and visit surveys were conducted with non-participants to obtain outcome information. Survival and fracture outcomes were determined for 2006 men, with 566 deaths identified and 1233 men remaining in the cohort at 10-year follow-up.

Comments: The baseline survey covered a wide range of bone health-related indices including bone mineral density, trabecular microarchitecture assessment, vertebral imaging for detecting vertebral fractures, and biochemical markers of bone turnover, as well as comprehensive geriatric assessment items. Follow-up surveys were conducted to obtain outcomes including osteoporotic fracture, cardiovascular diseases, initiation of long-term care, and mortality. A complete list of publications relating to the FORMEN study can be found at https://www.med.kindai.ac.jp/pubheal/FORMEN/Publications.html .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12199-021-00972-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8066970PMC
April 2021

Comparative Analysis of Patient-Matched PDOs Revealed a Reduction in OLFM4-Associated Clusters in Metastatic Lesions in Colorectal Cancer.

Stem Cell Reports 2021 Apr 11;16(4):954-967. Epub 2021 Mar 11.

Department of Cell Biology, Cancer Institute, Japanese Foundation for Cancer Research, Tokyo, Japan. Electronic address:

Metastasis is the major cause of cancer-related death, but whether metastatic lesions exhibit the same cellular composition as primary tumors has yet to be elucidated. To investigate the cellular heterogeneity of metastatic colorectal cancer (CRC), we established 72 patient-derived organoids (PDOs) from 21 patients. Combined bulk transcriptomic and single-cell RNA-sequencing analysis revealed decreased gene expression of markers for differentiated cells in PDOs derived from metastatic lesions. Paradoxically, expression of potential intestinal stem cell markers was also decreased. We identified OLFM4 as the gene most strongly correlating with a stem-like cell cluster, and found OLFM4 cells to be capable of initiating organoid culture growth and differentiation capacity in primary PDOs. These cells were required for the efficient growth of primary PDOs but dispensable for metastatic PDOs. These observations demonstrate that metastatic lesions have a cellular composition distinct from that of primary tumors; patient-matched PDOs are a useful resource for analyzing metastatic CRC.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2021.02.012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8072036PMC
April 2021

Black-Box Optimization for Automated Discovery.

Acc Chem Res 2021 03 26;54(6):1334-1346. Epub 2021 Feb 26.

RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan.

In chemistry and materials science, researchers and engineers discover, design, and optimize chemical compounds or materials with their professional knowledge and techniques. At the highest level of abstraction, this process is formulated as black-box optimization. For instance, the trial-and-error process of synthesizing various molecules for better material properties can be regarded as optimizing a black-box function describing the relation between a chemical formula and its properties. Various black-box optimization algorithms have been developed in the machine learning and statistics communities. Recently, a number of researchers have reported successful applications of such algorithms to chemistry. They include the design of photofunctional molecules and medical drugs, optimization of thermal emission materials and high Li-ion conductive solid electrolytes, and discovery of a new phase in inorganic thin films for solar cells.There are a wide variety of algorithms available for black-box optimization, such as Bayesian optimization, reinforcement learning, and active learning. Practitioners need to select an appropriate algorithm or, in some cases, develop novel algorithms to meet their demands. It is also necessary to determine how to best combine machine learning techniques with quantum mechanics- and molecular mechanics-based simulations, and experiments. In this Account, we give an overview of recent studies regarding automated discovery, design, and optimization based on black-box optimization. The Account covers the following algorithms: Bayesian optimization to optimize the chemical or physical properties, an optimization method using a quantum annealer, best-arm identification, gray-box optimization, and reinforcement learning. In addition, we introduce active learning and boundless objective-free exploration, which may not fall into the category of black-box optimization.Data quality and quantity are key for the success of these automated discovery techniques. As laboratory automation and robotics are put forward, automated discovery algorithms would be able to match human performance at least in some domains in the near future.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.accounts.0c00713DOI Listing
March 2021

First-principles study of electronic structures and elasticity of AlFeSi.

J Phys Condens Matter 2021 Apr 26;33(19). Epub 2021 Apr 26.

Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan.

AlFeSiintermetallic compound shows promising application in low-cost and non-toxic thermoelectric device because of its relatively high power factor of ∼700W m Kat 400 K. Herein we performed the first-principles calculations with the projector augmented-wave (PAW) method to study the formation energies, elastic constants, electronic structures, and electronic transport properties of AlFeSi. We discussed the thermodynamical stability of AlFeSiagainst other ternary crystalline compounds in Al-Fe-Si phase. The band gap of AlFeSiwas particularly examined using the semilocal and hybrid functionals and the on-site Hubbard correction, which were also applied to β-FeSito calibrate the prediction reliability of our employed computational methods. Our calculations show that AlFeSiis a narrow-gap semiconductor. The semilocal functional within generalized gradient approximation (GGA) shows an exceptional agreement between the predicted band gap of AlFeSiand the available experiment data, which is in contrast to the typical trend and rationally understood through a comprehensive comparison. We found that both HSE06 and PBE0 hybrid functionals with a standard setup overestimated the band gaps of AlFeSiand β-FeSitoo much. The underlying reasons may be ascribed to a large electronic screening, which arises from the unique characteristics of Fe 3states appearing in both sides of band gaps of AlFeSiand β-FeSi, and to a reduced delocalization error thanks to the covalent Fe-Si and Si-Si bonding nature. The chemical bonding and elasticity of AlFeSiwere compared with those of β-FeSiand FeAl. In AlFeSithe Fe-Al bonding is more ionic and the Fe-Si bonding is more covalent. The elastic moduli of AlFeSiare comparable to those of β-FeSiand larger than those of FeAl. Our calculation results indicate that the mechanical strength of AlFeSicould be strong enough for the practical application in thermoelectric device.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1088/1361-648X/abe474DOI Listing
April 2021

Vision-based egg quality prediction in Pacific bluefin tuna (Thunnus orientalis) by deep neural network.

Sci Rep 2021 01 12;11(1). Epub 2021 Jan 12.

RIKEN Center for Advanced Intelligence Project (AIP), Nihonbashi, Tokyo, 103-0027, Japan.

Closed-cycle aquaculture using hatchery produced seed stocks is vital to the sustainability of endangered species such as Pacific bluefin tuna (Thunnus orientalis) because this aquaculture system does not depend on aquaculture seeds collected from the wild. High egg quality promotes efficient aquaculture production by improving hatch rates and subsequent growth and survival of hatched larvae. In this study, we investigate the possibility of a simple, low-cost, and accurate egg quality prediction system based only on photographic images using deep neural networks. We photographed individual eggs immediately after spawning and assessed their qualities, i.e., whether they hatched normally and how many days larvae survived without feeding. The proposed system predicted normally hatching eggs with higher accuracy than human experts. It was also successful in predicting which eggs would produce longer-surviving larvae. We also analyzed the image aspects that contributed to the prediction to discover important egg features. Our results suggest the applicability of deep learning techniques to efficient egg quality prediction, and analysis of early developmental stages of development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-80001-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7804258PMC
January 2021

CompRet: a comprehensive recommendation framework for chemical synthesis planning with algorithmic enumeration.

J Cheminform 2020 Sep 1;12(1):52. Epub 2020 Sep 1.

Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.

In computer-assisted synthesis planning (CASP) programs, providing as many chemical synthetic routes as possible is essential for considering optimal and alternative routes in a chemical reaction network. As the majority of CASP programs have been designed to provide one or a few optimal routes, it is likely that the desired one will not be included. To avoid this, an exact algorithm that lists possible synthetic routes within the chemical reaction network is required, alongside a recommendation of synthetic routes that meet specified criteria based on the chemist's objectives. Herein, we propose a chemical-reaction-network-based synthetic route recommendation framework called "CompRet" with a mathematically guaranteed enumeration algorithm. In a preliminary experiment, CompRet was shown to successfully provide alternative routes for a known antihistaminic drug, cetirizine. CompRet is expected to promote desirable enumeration-based chemical synthesis searches and aid the development of an interactive CASP framework for chemists.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-020-00452-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7465358PMC
September 2020

Machine learning to reveal hidden risk combinations for the trajectory of posttraumatic stress disorder symptoms.

Sci Rep 2020 12 10;10(1):21726. Epub 2020 Dec 10.

Graduate School of Medicine, Tohoku University, Sendai, 980-0872, Japan.

The nature of the recovery process of posttraumatic stress disorder (PTSD) symptoms is multifactorial. The Massive Parallel Limitless-Arity Multiple-testing Procedure (MP-LAMP), which was developed to detect significant combinational risk factors comprehensively, was utilized to reveal hidden combinational risk factors to explain the long-term trajectory of the PTSD symptoms. In 624 population-based subjects severely affected by the Great East Japan Earthquake, 61 potential risk factors encompassing sociodemographics, lifestyle, and traumatic experiences were analyzed by MP-LAMP regarding combinational associations with the trajectory of PTSD symptoms, as evaluated by the Impact of Event Scale-Revised score after eight years adjusted by the baseline score. The comprehensive combinational analysis detected 56 significant combinational risk factors, including 15 independent variables, although the conventional bivariate analysis between single risk factors and the trajectory detected no significant risk factors. The strongest association was observed with the combination of short resting time, short walking time, unemployment, and evacuation without preparation (adjusted P value = 2.2 × 10, and raw P value = 3.1 × 10). Although short resting time had no association with the poor trajectory, it had a significant interaction with short walking time (P value = 1.2 × 10), which was further strengthened by the other two components (P value = 9.7 × 10). Likewise, components that were not associated with a poor trajectory in bivariate analysis were included in every observed significant risk combination due to their interactions with other components. Comprehensive combination detection by MP-LAMP is essential for explaining multifactorial psychiatric symptoms by revealing the hidden combinations of risk factors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-78966-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730124PMC
December 2020

Generating Ampicillin-Level Antimicrobial Peptides with Activity-Aware Generative Adversarial Networks.

ACS Omega 2020 Sep 28;5(36):22847-22851. Epub 2020 Aug 28.

Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba 277-8561, Japan.

Antimicrobial peptides are a potential solution to the threat of multidrug-resistant bacterial pathogens. Recently, deep generative models including generative adversarial networks (GANs) have been shown to be capable of designing new antimicrobial peptides. Intuitively, a GAN controls the probability distribution of generated sequences to cover active peptides as much as possible. This paper presents a peptide-specialized model called PepGAN that takes the balance between covering active peptides and dodging nonactive peptides. As a result, PepGAN has superior statistical fidelity with respect to physicochemical descriptors including charge, hydrophobicity, and weight. Top six peptides were synthesized, and one of them was confirmed to be highly antimicrobial. The minimum inhibitory concentration was 3.1 μg/mL, indicating that the peptide is twice as strong as ampicillin.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acsomega.0c02088DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495458PMC
September 2020

NMR-TS: de novo molecule identification from NMR spectra.

Sci Technol Adv Mater 2020 Jul 30;21(1):552-561. Epub 2020 Jul 30.

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan.

Nuclear magnetic resonance (NMR) spectroscopy is an effective tool for identifying molecules in a sample. Although many previously observed NMR spectra are accumulated in public databases, they cover only a tiny fraction of the chemical space, and molecule identification is typically accomplished manually based on expert knowledge. Herein, we propose NMR-TS, a machine-learning-based python library, to automatically identify a molecule from its NMR spectrum. NMR-TS discovers candidate molecules whose NMR spectra match the target spectrum by using deep learning and density functional theory (DFT)-computed spectra. As a proof-of-concept, we identify prototypical metabolites from their computed spectra. After an average 5451 DFT runs for each spectrum, six of the nine molecules are identified correctly, and proximal molecules are obtained in the other cases. This encouraging result implies that de novo molecule generation can contribute to the fully automated identification of chemical structures. NMR-TS is available at https://github.com/tsudalab/NMR-TS.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/14686996.2020.1793382DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7476483PMC
July 2020

Pushing property limits in materials discovery boundless objective-free exploration.

Chem Sci 2020 Jun 28;11(23):5959-5968. Epub 2020 May 28.

RIKEN Center for Advanced Intelligence Project , 1-4-1 Nihonbashi, Chuo-ku , Tokyo 103-0027 , Japan . Email: ; Email:

Materials chemists develop chemical compounds to meet often conflicting demands of industrial applications. This process may not be properly modeled by black-box optimization because the target property is not well defined in some cases. Herein, we propose a new algorithm for automated materials discovery called BoundLess Objective-free eXploration (BLOX) that uses a novel criterion based on kernel-based Stein discrepancy in the property space. Unlike other objective-free exploration methods, a boundary for the materials properties is not needed; hence, BLOX is suitable for open-ended scientific endeavors. We demonstrate the effectiveness of BLOX by finding light-absorbing molecules from a drug database. Our goal is to minimize the number of density functional theory calculations required to discover out-of-trend compounds in the intensity-wavelength property space. Using absorption spectroscopy, we experimentally verified that eight compounds identified as outstanding exhibit the expected optical properties. Our results show that BLOX is useful for chemical repurposing, and we expect this search method to have numerous applications in various scientific disciplines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/d0sc00982bDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409358PMC
June 2020

Computer Vision-Based Approach for Quantifying Occupational Therapists' Qualitative Evaluations of Postural Control.

Occup Ther Int 2020 27;2020:8542191. Epub 2020 Apr 27.

RIKEN Center for Advanced Intelligence Project, Tokyo, Japan.

This study aimed to leverage computer vision (CV) technology to develop a technique for quantifying postural control. A conventional quantitative index, occupational therapists' qualitative clinical evaluations, and CV-based quantitative indices using an image analysis algorithm were applied to evaluate the postural control of 34 typically developed preschoolers. The effectiveness of the CV-based indices was investigated relative to current methods to explore the clinical applicability of the proposed method. The capacity of the CV-based indices to reflect therapists' qualitative evaluations was confirmed. Furthermore, compared to the conventional quantitative index, the CV-based indices provided more detailed quantitative information with lower costs. CV-based evaluations enable therapists to quantify details of motor performance that are currently observed qualitatively. The development of such precise quantification methods will improve the science and practice of occupational therapy and allow therapists to perform to their full potential.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2020/8542191DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7201486PMC
October 2020

Artificial Neural Networks Applied as Molecular Wave Function Solvers.

J Chem Theory Comput 2020 Jun 8;16(6):3513-3529. Epub 2020 May 8.

Department of Chemistry, Nagoya University, Furocho, Chikusa Ward, Nagoya, Aichi 464-8601, Japan.

We use artificial neural networks (ANNs) based on the Boltzmann machine (BM) architectures as an encoder of molecular many-electron wave functions represented with the complete active space configuration interaction (CAS-CI) model. As first introduced by the work of Carleo and Troyer for physical systems, the coefficients of the electronic configurations in the CI expansion are parametrized with the BMs as a function of their occupancies that act as descriptors. This ANN-based wave function ansatz is referred to as the neural-network quantum state (NQS). The machine learning is used for training the BMs in terms of finding a variationally optimal form of the ground-state wave function on the basis of the energy minimization. It is relevant to reinforcement learning and does not use any reference data nor prior knowledge of the wave function, while the Hamiltonian is given based on a user-specified chemical structure in the first-principles manner. Carleo and Troyer used the restricted Boltzmann machine (RBM), which has hidden units, for the neural network architecture of NQS, while, in this study, we further introduce its replacement with the BM that has only visible units but with different orders of connectivity. For this hidden-node free BM, the second- and third-order BMs based on quadratic and cubic energy functions, respectively, were implemented. We denote these second- and third-order BMs as BM2 and BM3, respectively. The pilot implementation of the NQS solver into an exact diagonalization module of the quantum chemistry program was made to assess the capability of variants of the BM-based NQS. The test calculations were performed by determining the CAS-CI wave functions of illustrative molecular systems, indocyanine green, and dinitrogen dissociation. The simulated energies have been shown to converge to CAS-CI energy in most cases by improving RBM with an increasing number of hidden nodes. BM3 systematically yields lower energies than BM2, reproducing the CAS-CI energies of dinitrogen across potential energy curves within an error of 50 .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jctc.9b01132DOI Listing
June 2020

Exploring Successful Parameter Region for Coarse-Grained Simulation of Biomolecules by Bayesian Optimization and Active Learning.

Biomolecules 2020 03 21;10(3). Epub 2020 Mar 21.

RIKEN Medical Sciences Innovation Hub Program, Yokohama 230-0045, Japan.

Accompanied with an increase of revealed biomolecular structures owing to advancements in structural biology, the molecular dynamics (MD) approach, especially coarse-grained (CG) MD suitable for macromolecules, is becoming increasingly important for elucidating their dynamics and behavior. In fact, CG-MD simulation has succeeded in qualitatively reproducing numerous biological processes for various biomolecules such as conformational changes and protein folding with reasonable calculation costs. However, CG-MD simulations strongly depend on various parameters, and selecting an appropriate parameter set is necessary to reproduce a particular biological process. Because exhaustive examination of all candidate parameters is inefficient, it is important to identify successful parameters. Furthermore, the successful region, in which the desired process is reproducible, is essential for describing the detailed mechanics of functional processes and environmental sensitivity and robustness. We propose an efficient search method for identifying the successful region by using two machine learning techniques, Bayesian optimization and active learning. We evaluated its performance using F1-ATPase, a biological rotary motor, with CG-MD simulations. We successfully identified the successful region with lower computational costs (12.3% in the best case) without sacrificing accuracy compared to exhaustive search. This method can accelerate not only parameter search but also biological discussion of the detailed mechanics of functional processes and environmental sensitivity based on MD simulation studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/biom10030482DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7175118PMC
March 2020

evERdock BAI: Machine-learning-guided selection of protein-protein complex structure.

J Chem Phys 2019 Dec;151(21):215104

School of Life Sciences and Technology, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku, Tokyo 152-8550, Japan.

Computational techniques for accurate and efficient prediction of protein-protein complex structures are widely used for elucidating protein-protein interactions, which play important roles in biological systems. Recently, it has been reported that selecting a structure similar to the native structure among generated structure candidates (decoys) is possible by calculating binding free energies of the decoys based on all-atom molecular dynamics (MD) simulations with explicit solvent and the solution theory in the energy representation, which is called evERdock. A recent version of evERdock achieves a higher-accuracy decoy selection by introducing MD relaxation and multiple MD simulations/energy calculations; however, huge computational cost is required. In this paper, we propose an efficient decoy selection method using evERdock and the best arm identification (BAI) framework, which is one of the techniques of reinforcement learning. The BAI framework realizes an efficient selection by suppressing calculations for nonpromising decoys and preferentially calculating for the promising ones. We evaluate the performance of the proposed method for decoy selection problems of three protein-protein complex systems. Their results show that computational costs are successfully reduced by a factor of 4.05 (in the best case) compared to a standard decoy selection approach without sacrificing accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1063/1.5129551DOI Listing
December 2019

Enhancing Biomolecular Sampling with Reinforcement Learning: A Tree Search Molecular Dynamics Simulation Method.

ACS Omega 2019 Aug 19;4(9):13853-13862. Epub 2019 Aug 19.

Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba 277-8561, Japan.

This paper proposes a novel molecular simulation method, called tree search molecular dynamics (TS-MD), to accelerate the sampling of conformational transition pathways, which require considerable computation. In TS-MD, a tree search algorithm, called upper confidence bounds for trees, which is a type of reinforcement learning algorithm, is applied to sample the transition pathway. By learning from the results of the previous simulations, TS-MD efficiently searches conformational space and avoids being trapped in local stable structures. TS-MD exhibits better performance than parallel cascade selection molecular dynamics, which is one of the state-of-the-art methods, for the folding of miniproteins, Chignolin and Trp-cage, in explicit water.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acsomega.9b01480DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6714528PMC
August 2019

Machine-Learning-Assisted Development and Theoretical Consideration for the AlFeSi Thermoelectric Material.

ACS Appl Mater Interfaces 2019 Mar 18;11(12):11545-11554. Epub 2019 Mar 18.

Graduate School of Frontier Sciences , The University of Tokyo , 5-1-5 Kashiwa-no-ha , Kashiwa 277-8561 , Japan.

Chemical composition alteration is a general strategy to optimize the thermoelectric properties of a thermoelectric material to achieve high-efficiency conversion of waste heat into electricity. Recent studies show that the AlFeSi intermetallic compound with a relatively high power factor of ∼700 μW m K at 400 K is promising for applications in low-cost and nontoxic thermoelectric devices. To accelerate the exploration of the thermoelectric properties of this material in a mid-temperature range and to enhance its power factor, a machine-learning method was employed herein to assist the synthesis of off-stoichiometric samples (namely, AlFeSi) of the AlFeSi compound by tuning the Al/Si ratio. The optimal Al/Si ratio for a high power factor in the mid-temperature range was found rapidly and efficiently, and the optimal ratio of the sample at x = 0.9 was found to increase the power factor at ∼510 K by about 40% with respect to that of the initial sample at x = 0.0. The possible mechanism for the enhanced power factor is discussed in terms of the precipitations of the metallic secondary phases in the AlFeSi samples. Furthermore, the maximum achievable thermal conductivity of AlFeSi estimated by the Slack model is ∼10 W m K at the Debye temperature. An avoided-crossing behavior of the acoustic and the low-lying optical modes along several crystallographic directions is found in the phonon dispersion of AlFeSi calculated by ab initio density functional theory method. These preliminary results suggest that AlFeSi can have a low thermal conductivity. The calculated formation energies of point defects suggest that the antisite defects between Al and Si are likely to cause the Al and Si off-stoichiometries in AlFeSi. The theoretically obtained insight provides additional information for the further understanding of AlFeSi.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acsami.9b02381DOI Listing
March 2019

An interpretable machine learning model for diagnosis of Alzheimer's disease.

PeerJ 2019 1;7:e6543. Epub 2019 Mar 1.

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.

We present an interpretable machine learning model for medical diagnosis called sparse high-order interaction model with rejection option (SHIMR). A decision tree explains to a patient the diagnosis with a long rule (i.e., conjunction of many intervals), while SHIMR employs a weighted sum of short rules. Using proteomics data of 151 subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, SHIMR is shown to be as accurate as other non-interpretable methods (Sensitivity, SN = 0.84 ± 0.1, Specificity, SP = 0.69 ± 0.15 and Area Under the Curve, AUC = 0.86 ± 0.09). For clinical usage, SHIMR has a function to abstain from making any diagnosis when it is not confident enough, so that a medical doctor can choose more accurate but invasive and/or more costly pathologies. The incorporation of a rejection option complements SHIMR in designing a multistage cost-effective diagnosis framework. Using a baseline concentration of cerebrospinal fluid (CSF) and plasma proteins from a common cohort of 141 subjects, SHIMR is shown to be effective in designing a patient-specific cost-effective Alzheimer's disease (AD) pathology. Thus, interpretability, reliability and having the potential to design a patient-specific multistage cost-effective diagnosis framework can make SHIMR serve as an indispensable tool in the era of precision medicine that can cater to the demand of both doctors and patients, and reduce the overwhelming financial burden of medical diagnosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.6543DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6398390PMC
March 2019

Ultranarrow-Band Wavelength-Selective Thermal Emission with Aperiodic Multilayered Metamaterials Designed by Bayesian Optimization.

ACS Cent Sci 2019 Feb 22;5(2):319-326. Epub 2019 Jan 22.

National Institute for Materials Science, 1-2-1 Sengen, Tsukuba 305-0047, Japan.

We computationally designed an ultranarrow-band wavelength-selective thermal radiator via a materials informatics method alternating between Bayesian optimization and thermal electromagnetic field calculation. For a given target infrared wavelength, the optimal structure was efficiently identified from over 8 billion candidates of multilayers consisting of multiple components (Si, Ge, and SiO). The resulting optimized structure is an aperiodic multilayered metamaterial exhibiting high and sharp emissivity with a Q-factor of 273. The designed metamaterials were then fabricated, and reasonable experimental realization of the optimal performance was achieved with a Q-factor of 188, which is significantly higher than those of structures empirically designed and fabricated in the past. This is the first demonstration of the experimental realization of metamaterials designed by Bayesian optimization. The results facilitate the machine-learning-based design of metamaterials and advance our understanding of the narrow-band thermal emission mechanism of aperiodic multilayered metamaterials.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acscentsci.8b00802DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6396383PMC
February 2019

Improving the Accuracy of Protein-Ligand Binding Mode Prediction Using a Molecular Dynamics-Based Pocket Generation Approach.

J Comput Chem 2018 12;39(32):2679-2689

Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto, 606-8507, Japan.

Protein-drug binding mode prediction from the apo-protein structure is challenging because drug binding often induces significant protein conformational changes. Here, the authors report a computational workflow that incorporates a novel pocket generation method. First, the closed protein pocket is expanded by repeatedly filling virtual atoms during molecular dynamics (MD) simulations. Second, after ligand docking toward the prepared pocket structures, binding mode candidates are ranked by MD/Molecular Mechanics Poisson-Boltzmann Surface Area. The authors validated our workflow using CDK2 kinase, which has an especially-closed ATP-binding pocket in the apo-form, and several inhibitors. The crystallographic pose coincided with the top-ranked docking pose for 59% (34/58) of the compounds and was within the top five-ranked ones for 88% (51/58), while those estimated by a conventional prediction protocol were 9% (5/58) and 50% (29/58), respectively. Our study demonstrates that the prediction accuracy is significantly improved by preceding pocket expansion, leading to generation of conformationally-diverse binding mode candidates. © 2018 Wiley Periodicals, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/jcc.25715DOI Listing
December 2018

Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies.

ACS Cent Sci 2018 Sep 20;4(9):1126-1133. Epub 2018 Aug 20.

Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan.

This work presents a proof-of-concept study in artificial-intelligence-assisted (AI-assisted) chemistry where a machine-learning-based molecule generator is coupled with density functional theory (DFT) calculations, synthesis, and measurement. Although deep-learning-based molecule generators have shown promise, it is unclear to what extent they can be useful in real-world materials development. To assess the reliability of AI-assisted chemistry, we prepared a platform using a molecule generator and a DFT simulator, and attempted to generate novel photofunctional molecules whose lowest excited states lie at desired energetic levels. A 10 day run on the 12-core server discovered 86 potential photofunctional molecules around target lowest excitation levels, designated as 200, 300, 400, 500, and 600 nm. Among the molecules discovered, six were synthesized, and five were confirmed to reproduce DFT predictions in ultraviolet visible absorption measurements. This result shows the potential of AI-assisted chemistry to discover ready-to-synthesize novel molecules with modest computational resources.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acscentsci.8b00213DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6161049PMC
September 2018

Data-driven approach for the prediction and interpretation of core-electron loss spectroscopy.

Sci Rep 2018 09 6;8(1):13548. Epub 2018 Sep 6.

Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan.

Spectroscopy is indispensable for determining atomic configurations, chemical bondings, and vibrational behaviours, which are crucial information for materials development. Despite their importance, the interpretation of spectra using "human-driven" methods, such as the manual comparison of experimental spectra with reference/simulated spectra, is difficult due to the explosive increase in the number of experimental spectra to be observed. To overcome the limitations of the "human-driven" approach, we develop a new "data-driven" approach based on machine learning techniques by combining the layer clustering and decision tree methods. The proposed method is applied to the 46 oxygen-K edges of the ELNES/XANES spectra of oxide compounds. With this method, the spectra can be interpreted in accordance with the material information. Furthermore, we demonstrate that our method can predict spectral features from the material information. Our approach has the potential to provide information about a material that cannot be determined manually as well as predict a plausible spectrum from the geometric information alone.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-30994-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6127203PMC
September 2018

Functional Nanoparticles-Coated Nanomechanical Sensor Arrays for Machine Learning-Based Quantitative Odor Analysis.

ACS Sens 2018 08 15;3(8):1592-1600. Epub 2018 Aug 15.

Materials Science and Engineering, Graduate School of Pure and Applied Science , University of Tsukuba , 1-1-1 Tennodai , Tsukuba , Ibaraki 305-8571 , Japan.

A sensing signal obtained by measuring an odor usually contains varied information that reflects an origin of the odor itself, while an effective approach is required to reasonably analyze informative data to derive the desired information. Herein, we demonstrate that quantitative odor analysis was achieved through systematic material design-based nanomechanical sensing combined with machine learning. A ternary mixture consisting of water, ethanol, and methanol was selected as a model system where a target molecule coexists with structurally similar species in a humidified condition. To predict the concentration of each species in the system via the data-driven approach, six types of nanoparticles functionalized with hydroxyl, aminopropyl, phenyl, and/or octadecyl groups were synthesized as a receptor coating of a nanomechanical sensor. Then, a machine learning model based on Gaussian process regression was trained with sensing data sets obtained from the samples with diverse concentrations. As a result, the octadecyl-modified nanoparticles enhanced prediction accuracy for water while the use of both octadecyl and aminopropyl groups was indicated to be a key for a better prediction accuracy for ethanol and methanol. As the prediction accuracy for ethanol and methanol was improved by introducing two additional nanoparticles with finely controlled octadecyl and aminopropyl amount, the feedback obtained by the present machine learning was effectively utilized to optimize material design for better performance. We demonstrate through this study that various information which was extracted from plenty of experimental data sets was successfully combined with our knowledge to produce wisdom for addressing a critical issue in gas phase sensing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acssensors.8b00450DOI Listing
August 2018

Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins.

ACS Synth Biol 2018 09 20;7(9):2014-2022. Epub 2018 Aug 20.

Department of Biomolecular Engineering, Graduate School of Engineering , Tohoku University , 6-6-11 Aoba, Aramaki, Aoba-ku , Sendai 980-8579 , Japan.

Molecular evolution based on mutagenesis is widely used in protein engineering. However, optimal proteins are often difficult to obtain due to a large sequence space. Here, we propose a novel approach that combines molecular evolution with machine learning. In this approach, we conduct two rounds of mutagenesis where an initial library of protein variants is used to train a machine-learning model to guide mutagenesis for the second-round library. This enables us to prepare a small library suited for screening experiments with high enrichment of functional proteins. We demonstrated a proof-of-concept of our approach by altering the reference green fluorescent protein (GFP) so that its fluorescence is changed into yellow. We successfully obtained a number of proteins showing yellow fluorescence, 12 of which had longer wavelengths than the reference yellow fluorescent protein (YFP). These results show the potential of our approach as a powerful method for directed evolution of fluorescent proteins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acssynbio.8b00155DOI Listing
September 2018

Multiple Testing Tool to Detect Combinatorial Effects in Biology.

Methods Mol Biol 2018 ;1807:83-94

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan.

Detecting combinatorial effects is important to various research areas, including biology, genomics, and medical sciences. However, this task was not only computationally nontrivial but also extremely difficult to achieve because of the necessity of a multiple testing procedure; hence few methods can comprehensively analyze high-order combinations. Recently, Limitless Arity Multiple-testing Procedure (LAMP) was introduced, allowing us to enumerate statistically significant combinations from a given dataset. This chapter provides instructions for LAMP using simple examples of combinatorial transcription factor regulation discovery and visualization of the results. This chapter also introduces LAMPLINK, which is extended software of LAMP. LAMPLINK can handle genetic dataset to detect statistically significant interactions among multiple SNPs from a genome-wide association study (GWAS) dataset.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-8561-6_7DOI Listing
March 2019

Structure prediction of boron-doped graphene by machine learning.

J Chem Phys 2018 Jun;148(24):241716

Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan.

Heteroatom doping has endowed graphene with manifold aspects of material properties and boosted its applications. The atomic structure determination of doped graphene is vital to understand its material properties. Motivated by the recently synthesized boron-doped graphene with relatively high concentration, here we employ machine learning methods to search the most stable structures of doped boron atoms in graphene, in conjunction with the atomistic simulations. From the determined stable structures, we find that in the free-standing pristine graphene, the doped boron atoms energetically prefer to substitute for the carbon atoms at different sublattice sites and that the para configuration of boron-boron pair is dominant in the cases of high boron concentrations. The boron doping can increase the work function of graphene by 0.7 eV for a boron content higher than 3.1%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1063/1.5018065DOI Listing
June 2018
-->