563 results match your criteria Journal of Cheminformatics [Journal]


Detecting drug communities and predicting comprehensive drug-drug interactions via balance regularized semi-nonnegative matrix factorization.

J Cheminform 2019 Apr 8;11(1):28. Epub 2019 Apr 8.

Department of Computer Science, The University of Hong Kong, Hong Kong, China.

Background: Because drug-drug interactions (DDIs) may cause adverse drug reactions or contribute to complex-disease treatments, it is important to identify DDIs before multiple-drug medications are prescribed. As the alternative of high-cost experimental identifications, computational approaches provide a much cheaper screening for potential DDIs on a large scale manner. Nevertheless, most of them only predict whether or not one drug interacts with another, but neglect their enhancive (positive) and depressive (negative) changes of pharmacological effects. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0352-9DOI Listing
April 2019
9 Reads

TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data.

J Cheminform 2019 Apr 8;11(1):29. Epub 2019 Apr 8.

In Silico Toxicology, Institute of Physiology, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany.

Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0351-xDOI Listing
April 2019
7 Reads

Methodology of aiQSAR: a group-specific approach to QSAR modelling.

J Cheminform 2019 Apr 3;11(1):27. Epub 2019 Apr 3.

Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy.

Background: Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0350-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6446381PMC
April 2019
3 Reads

QligFEP: an automated workflow for small molecule free energy calculations in Q.

J Cheminform 2019 Apr 2;11(1):26. Epub 2019 Apr 2.

Department of Cell and Molecular Biology, Uppsala University, Uppsala, 75124, Sweden.

The process of ligand binding to a biological target can be represented as the equilibrium between the relevant solvated and bound states of the ligand. This which is the basis of structure-based, rigorous methods such as the estimation of relative binding affinities by free energy perturbation (FEP). Despite the growing capacity of computing power and the development of more accurate force fields, a high throughput application of FEP is currently hampered due to the need, in the current schemes, of an expert user definition of the "alchemical" transformations between molecules in the series explored. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0348-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6444553PMC
April 2019
4 Reads

Algorithm-supported, mass and sequence diversity-oriented random peptide library design.

J Cheminform 2019 Mar 28;11(1):25. Epub 2019 Mar 28.

Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, 08028, Barcelona, Spain.

Random peptide libraries that cover large search spaces are often used for the discovery of new binders, even when the target is unknown. To ensure an accurate population representation, there is a tendency to use large libraries. However, parameters such as the synthesis scale, the number of library members, the sequence deconvolution and peptide structure elucidation, are challenging when increasing the library size. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0347-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437963PMC

Binding mode information improves fragment docking.

J Cheminform 2019 Mar 22;11(1):24. Epub 2019 Mar 22.

Laboratoire d'innovation thérapeutique, UMR7200, CNRS, Université de Strasbourg, 67400, Illkirch, France.

Docking is commonly used in drug discovery to predict how ligand binds to protein target. Best programs are generally able to generate a correct solution, yet often fail to identify it. In the case of drug-like molecules, the correct and incorrect poses can be sorted by similarity to the crystallographic structure of the protein in complex with reference ligands. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0346-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431075PMC

Too many tags spoil the metadata: investigating the knowledge management of scientific research with semantic web technologies.

J Cheminform 2019 Mar 21;11(1):23. Epub 2019 Mar 21.

University of Southampton, Southampton, SO17 1BJ, UK.

Scientific research is increasingly characterised by the volume of documents and data that it produces, from experimental plans and raw data to reports and papers. Researchers frequently struggle to manage and curate these materials, both individually and collectively. Previous studies of Electronic Lab Notebooks (ELNs) in academia and industry have identified semantic web technologies as a means for organising scientific documents to improve current workflows and knowledge management practices. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0345-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6427892PMC

KMR: knowledge-oriented medicine representation learning for drug-drug interaction and similarity computation.

J Cheminform 2019 Mar 14;11(1):22. Epub 2019 Mar 14.

The Shenzhen Key Lab for Information Centric Networking and Blockchain Techologies(ICNLab), School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, 518055, Shenzhen, People's Republic of China.

Efficient representations of drugs provide important support for healthcare analytics, such as drug-drug interaction (DDI) prediction and drug-drug similarity (DDS) computation. However, incomplete annotated data and drug feature sparseness create substantial barriers for drug representation learning, making it difficult to accurately identify new drug properties prior to public release. To alleviate these deficiencies, we propose KMR, a knowledge-oriented feature-driven method which can learn drug related knowledge with an accurate representation. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0342-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419809PMC

CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools.

J Cheminform 2019 Mar 14;11(1):21. Epub 2019 Mar 14.

Text Technology Lab, Goethe-University Frankfurt, Robert-Mayer-Straße 10, 60325, Frankfurt am Main, Germany.

Background: Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of gene and protein related objects. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0343-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419804PMC

Exploring the GDB-13 chemical space using deep generative models.

J Cheminform 2019 Mar 12;11(1):20. Epub 2019 Mar 12.

Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Pepparedsleden 1, 43183, Mölndal, Sweden.

Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) with a subset of the enumerated database GDB-13 (975 million molecules). We show that a model trained with 1 million structures (0. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0341-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419837PMC

Design, implementation, and operation of a rapid, robust named entity recognition web service.

J Cheminform 2019 Mar 8;11(1):19. Epub 2019 Mar 8.

Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Most BioCreative tasks to date have focused on assessing the quality of text-mining annotations in terms of precision and recall. Interoperability, speed, and stability are, however, other important factors to consider for practical applications of text mining. For about a decade, we have run named entity recognition (NER) web services, which are designed to be efficient, implemented using a multi-threaded queueing system to robustly handle many simultaneous requests, and hosted at a supercomputer facility. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0344-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419787PMC

Automated simultaneous assignment of bond orders and formal charges.

J Cheminform 2019 Mar 6;11(1):18. Epub 2019 Mar 6.

Centre for Theoretical Chemistry and Physics, Institute of Natural and Mathematical Sciences, Massey Unversity, Private Bag 102904, Auckland, New Zealand.

Bond orders and formal charges are fundamental chemical descriptors. In cheminformatic applications it is necessary to be able to assign these properties to a given molecular structure automatically, given minimal input information. Here we describe a method for determining the bond order and formal charge assignments from only the atom types and connectivity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0340-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419789PMC

CSgator: an integrated web platform for compound set analysis.

J Cheminform 2019 Mar 4;11(1):17. Epub 2019 Mar 4.

Ewha Research Center for Systems Biology, Department of Life Science, Division of Molecular and Life Sciences, Ewha Womans University, Seoul, Korea.

Drug discovery typically involves investigation of a set of compounds (e.g. drug screening hits) in terms of target, disease, and bioactivity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0339-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419788PMC
March 2019
3 Reads

Software solutions for evaluation and visualization of laser ablation inductively coupled plasma mass spectrometry imaging (LA-ICP-MSI) data: a short overview.

J Cheminform 2019 Feb 18;11(1):16. Epub 2019 Feb 18.

Department of Biochemistry and Biotechnology, Center for Research and Advanced Studies (CINVESTAV) Irapuato, Km. 9.6 Libramiento Norte Carr. Irapuato-León, 36824, Irapuato, Gto., Mexico.

Mass spectrometry imaging (MSI) using laser ablation (LA) inductively coupled plasma (ICP) is an innovative and exciting methodology to perform highly sensitive elemental analyses. LA-ICP-MSI of metals, trace elements or isotopes in tissues has been applied to a range of biological samples. Several LA-ICP-MSI studies have shown that metals have a highly compartmentalized distribution in some organs, which might be altered in consequence of genetic diseases, intoxication, or malnutrition. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0338-7DOI Listing
February 2019
1 Read
4.547 Impact Factor

Identification of novel small molecule inhibitors for solute carrier SGLT1 using proteochemometric modeling.

J Cheminform 2019 Feb 14;11(1):15. Epub 2019 Feb 14.

Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.

Sodium-dependent glucose co-transporter 1 (SGLT1) is a solute carrier responsible for active glucose absorption. SGLT1 is present in both the renal tubules and small intestine. In contrast, the closely related sodium-dependent glucose co-transporter 2 (SGLT2), a protein that is targeted in the treatment of diabetes type II, is only expressed in the renal tubules. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0337-8DOI Listing
February 2019
1 Read

Dimorphite-DL: an open-source program for enumerating the ionization states of drug-like small molecules.

J Cheminform 2019 Feb 14;11(1):14. Epub 2019 Feb 14.

Department of Biological Sciences, University of Pittsburgh, 4249 Fifth Avenue, Pittsburgh, PA, 15260, USA.

Small-molecule protonation can promote or discourage protein binding by altering hydrogen-bond, electrostatic, and van-der-Waals interactions. To improve virtual-screen pose and affinity predictions, researchers must account for all major small-molecule ionization states. But existing programs for calculating these states have notable limitations such as high cost, restrictive licenses, slow execution times, and poor modularity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0336-9DOI Listing
February 2019
6 Reads

rBAN: retro-biosynthetic analysis of nonribosomal peptides.

J Cheminform 2019 Feb 8;11(1):13. Epub 2019 Feb 8.

Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211, Geneva, Switzerland.

Proteinogenic and non-proteinogenic amino acids, fatty acids or glycans are some of the main building blocks of nonribsosomal peptides (NRPs) and as such may give insight into the origin, biosynthesis and bioactivities of their constitutive peptides. Hence, the structural representation of NRPs using monomers provides a biologically interesting skeleton of these secondary metabolites. Databases dedicated to NRPs such as Norine, already integrate monomer-based annotations in order to facilitate the development of structural analysis tools. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0335-xDOI Listing
February 2019
4 Reads

Programming languages in chemistry: a review of HTML5/JavaScript.

Authors:
Kevin J Theisen

J Cheminform 2019 Feb 5;11(1):11. Epub 2019 Feb 5.

iChemLabs, LLC., 7305 Hancock Village Dr #525, Chesterfield, VA, 23112, USA.

This is one part of a series of reviews concerning the application of programming languages in chemistry, edited by Dr. Rajarshi Guha. This article reviews the JavaScript technology as it applies to the chemistry discipline. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0331-1DOI Listing
February 2019
11 Reads

Implementing cheminformatics.

Authors:
Rajarshi Guha

J Cheminform 2019 Feb 5;11(1):12. Epub 2019 Feb 5.

Vertex Pharmaceuticals, 50 Northern Ave, Boston, MA, 02210, USA.

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0333-zDOI Listing
February 2019
7 Reads

Chemoinformatics and structural bioinformatics in OCaml.

J Cheminform 2019 Feb 5;11(1):10. Epub 2019 Feb 5.

Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan.

Background: OCaml is a functional programming language with strong static types, Hindley-Milner type inference and garbage collection. In this article, we share our experience in prototyping chemoinformatics and structural bioinformatics software in OCaml.

Results: First, we introduce the language, list entry points for chemoinformaticians who would be interested in OCaml and give code examples. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0332-0DOI Listing
February 2019
10 Reads

Avoiding hERG-liability in drug design via synergetic combinations of different (Q)SAR methodologies and data sources: a case study in an industrial setting.

J Cheminform 2019 Feb 2;11(1). Epub 2019 Feb 2.

Merck KGaA, Darmstadt, Germany.

In this paper, we explore the impact of combining different in silico prediction approaches and data sources on the predictive performance of the resulting system. We use inhibition of the hERG ion channel target as the endpoint for this study as it constitutes a key safety concern in drug development and a potential cause of attrition. We will show that combining data sources can improve the relevance of the training set in regard of the target chemical space, leading to improved performance. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0334-yDOI Listing
February 2019
9 Reads

The nature of ligand efficiency.

Authors:
Peter W Kenny

J Cheminform 2019 Jan 31;11(1). Epub 2019 Jan 31.

Berwick-on-Sea, North Coast Road, Blanchisseuse, Saint George, Trinidad and Tobago.

Ligand efficiency is a widely used design parameter in drug discovery. It is calculated by scaling affinity by molecular size and has a nontrivial dependency on the concentration unit used to express affinity that stems from the inability of the logarithm function to take dimensioned arguments. Consequently, perception of efficiency varies with the choice of concentration unit and it is argued that the ligand efficiency metric is not physically meaningful nor should it be considered to be a metric. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0330-2DOI Listing
January 2019
8 Reads

OGER++: hybrid multi-type entity recognition.

J Cheminform 2019 Jan 21;11(1). Epub 2019 Jan 21.

Institute of Computational Linguistics, University of Zurich, Andreasstr. 15, 8050, Zürich, Switzerland.

Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0326-3DOI Listing
January 2019
18 Reads

Universal nanohydrophobicity predictions using virtual nanoparticle library.

J Cheminform 2019 Jan 18;11(1). Epub 2019 Jan 18.

The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA.

To facilitate the development of new nanomaterials, especially nanomedicines, a novel computational approach was developed to precisely predict the hydrophobicity of gold nanoparticles (GNPs). The core of this study was to develop a large virtual gold nanoparticle (vGNP) library with computational nanostructure simulations. Based on the vGNP library, a nanohydrophobicity model was developed and then validated against externally synthesized and tested GNPs. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-019-0329-8DOI Listing
January 2019
13 Reads

QBMG: quasi-biogenic molecule generator with deep recurrent neural network.

J Cheminform 2019 Jan 17;11(1). Epub 2019 Jan 17.

Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China.

Biogenic compounds are important materials for drug discovery and chemical biology. In this work, we report a quasi-biogenic molecule generator (QBMG) to compose virtual quasi-biogenic compound libraries by means of gated recurrent unit recurrent neural networks. The library includes stereo-chemical properties, which are crucial features of natural products. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-019-0328-9DOI Listing
January 2019
2 Reads

Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery.

J Cheminform 2019 Jan 10;11(1). Epub 2019 Jan 10.

Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Structure-activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0325-4DOI Listing
January 2019
16 Reads

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.

J Cheminform 2019 Jan 10;11(1). Epub 2019 Jan 10.

Text Technology Lab, Goethe-University Frankfurt, Robert-Mayer-Straße 10, 60325, Frankfurt am Main, Germany.

Background: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0327-2DOI Listing
January 2019
11 Reads

BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification.

J Cheminform 2019 Jan 5;11(1). Epub 2019 Jan 5.

Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada.

Background: A number of computational tools for metabolism prediction have been developed over the last 20 years to predict the structures of small molecules undergoing biological transformation or environmental degradation. These tools were largely developed to facilitate absorption, distribution, metabolism, excretion, and toxicity (ADMET) studies, although there is now a growing interest in using such tools to facilitate metabolomics and exposomics studies. However, their use and widespread adoption is still hampered by several factors, including their limited scope, breath of coverage, availability, and performance. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0324-5DOI Listing
January 2019
11 Reads

A retrosynthetic analysis algorithm implementation.

J Cheminform 2019 Jan 3;11(1). Epub 2019 Jan 3.

Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, 46285, USA.

The need for synthetic route design arises frequently in discovery-oriented chemistry organizations. While traditionally finding solutions to this problem has been the domain of human experts, several computational approaches, aided by the algorithmic advances and the availability of large reaction collections, have recently been reported. Herein we present our own implementation of a retrosynthetic analysis method and demonstrate its capabilities in an attempt to identify synthetic routes for a collection of approved drugs. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0323-6DOI Listing
January 2019
13 Reads

Configurable web-services for biomedical document annotation.

Authors:
Sérgio Matos

J Cheminform 2018 Dec 21;10(1):68. Epub 2018 Dec 21.

DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.

The need to efficiently find and extract information from the continuously growing biomedical literature has led to the development of various annotation tools aimed at identifying mentions of entities and relations. Many of these tools have been integrated in user-friendly applications facilitating their use by non-expert text miners and database curators. In this paper we describe the latest version of Neji, a web-services ready text processing and annotation framework. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0317-4DOI Listing
December 2018
3 Reads

A probabilistic molecular fingerprint for big data settings.

J Cheminform 2018 Dec 18;10(1):66. Epub 2018 Dec 18.

Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure, University of Berne, Freiestrasse 3, 3012, Bern, Switzerland.

Background: Among the various molecular fingerprints available to describe small organic molecules, extended connectivity fingerprint, up to four bonds (ECFP4) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥ 1024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality.

Results: Herein we report a new fingerprint, called MinHash fingerprint, up to six bonds (MHFP6), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0321-8DOI Listing
December 2018
14 Reads

"We were here before the Web and hype…": a brief history of and tribute to the Computational Chemistry List.

J Cheminform 2018 Dec 18;10(1):67. Epub 2018 Dec 18.

Archives Poincaré - Philosophie et Recherches sur les Sciences et les Technologies, UMR 7117 CNRS & Université de Lorraine, Nancy, France.

The Computational Chemistry List is a mailing list, portal, and community which brings together people interested in computational chemistry, mostly practitioners. It was formed in 1991 and continues to exist as a vibrant discussion space, highly valued by its members, and serving both its original and new functions. Its duration has been unusual for online communities. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0322-7DOI Listing
December 2018
4 Reads

A neural network approach to chemical and gene/protein entity recognition in patents.

J Cheminform 2018 Dec 18;10(1):65. Epub 2018 Dec 18.

College of Computer Science and Technology, Dalian University of Technology, Dalian, China.

In biomedical research, patents contain the significant amount of information, and biomedical text mining has received much attention in patents recently. To accelerate the development of biomedical text mining for patents, the BioCreative V.5 challenge organized three tracks, i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0318-3DOI Listing
December 2018
29 Reads

Statistical principle-based approach for gene and protein related object recognition.

J Cheminform 2018 Dec 17;10(1):64. Epub 2018 Dec 17.

Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan.

The large number of chemical and pharmaceutical patents has attracted researchers doing biomedical text mining to extract valuable information such as chemicals, genes and gene products. To facilitate gene and gene product annotations in patents, BioCreative V.5 organized a gene- and protein-related object (GPRO) recognition task, in which participants were assigned to identify GPRO mentions and determine whether they could be linked to their unique biological database records. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0314-7DOI Listing
December 2018
13 Reads

JPlogP: an improved logP predictor trained using predicted data.

J Cheminform 2018 Dec 14;10(1):61. Epub 2018 Dec 14.

Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK.

The partition coefficient between octanol and water (logP) has been an important descriptor in QSAR predictions for many years and therefore the prediction of logP has been examined countless times. One of the best performing models is to predict the logP using multiple methods and average the result. We have used those averaged predictions to develop a training-set which was able to distil the information present across the disparate logP methods into one single model. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0316-5DOI Listing
December 2018
17 Reads

SIA: a scalable interoperable annotation server for biomedical named entities.

J Cheminform 2018 Dec 14;10(1):63. Epub 2018 Dec 14.

DFKI Language Technology Lab, Alt-Moabit 91c, Berlin, Germany.

Recent years showed a strong increase in biomedical sciences and an inherent increase in publication volume. Extraction of specific information from these sources requires highly sophisticated text mining and information extraction tools. However, the integration of freely available tools into customized workflows is often cumbersome and difficult. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0319-2DOI Listing
December 2018
13 Reads

Chaos-embedded particle swarm optimization approach for protein-ligand docking and virtual screening.

J Cheminform 2018 Dec 14;10(1):62. Epub 2018 Dec 14.

Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, Macau, China.

Background: Protein-ligand docking programs are routinely used in structure-based drug design to find the optimal binding pose of a ligand in the protein's active site. These programs are also used to identify potential drug candidates by ranking large sets of compounds. As more accurate and efficient docking programs are always desirable, constant efforts focus on developing better docking algorithms or improving the scoring function. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0320-9DOI Listing
December 2018
3 Reads

Chemlistem: chemical named entity recognition using recurrent neural networks.

J Cheminform 2018 Dec 6;10(1):59. Epub 2018 Dec 6.

Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK.

Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as "deep learning" we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks-a type of recurrent neural net. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0313-8DOI Listing
December 2018
12 Reads

A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications.

J Cheminform 2018 Dec 10;10(1):60. Epub 2018 Dec 10.

Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via la Masa 19, 20156, Milan, Italy.

The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless results. The increasing amounts of data, however, have often made it hard to check of very large databases manually. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0315-6DOI Listing
December 2018
2 Reads

MER: a shell script and annotation server for minimal named entity recognition and linking.

J Cheminform 2018 Dec 5;10(1):58. Epub 2018 Dec 5.

LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749 016, Lisbon, Portugal.

Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a given text, MER only requires: (1) a lexicon (text file) with the list of terms representing the entities of interest; (2) optionally a tab-separated values file with a link for each term; (3) and a Unix shell. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0312-9DOI Listing
December 2018
3 Reads

chemmodlab: a cheminformatics modeling laboratory R package for fitting and assessing machine learning models.

J Cheminform 2018 Nov 28;10(1):57. Epub 2018 Nov 28.

Department of Statistics, North Carolina State University, 2311 Stinson Drive, Campus Box 8203, Raleigh, NC, 27695-8203, USA.

The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of these models. While focused on implementing methods for model fitting and assessment that have been accepted by experts in the cheminformatics field, all of the methods in chemmodlab have broad utility for the machine learning community. chemmodlab contains several assessment utilities, including a plotting function that constructs accumulation curves and a function that computes many performance measures. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0309-4DOI Listing
November 2018
2 Reads

Statistical-based database fingerprint: chemical space dependent representation of compound databases.

J Cheminform 2018 Nov 22;10(1):55. Epub 2018 Nov 22.

Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.

Background: Simplified representation of compound databases has several applications in cheminformatics. Herein, we introduce an alternative and general method to build single fingerprint representations of compound databases. The approach is inspired on the previously published modal fingerprints that are aimed to capture the most significant bits of a fingerprint representation for a compound data set. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0311-xDOI Listing
November 2018
2 Reads

Implicit-descriptor ligand-based virtual screening by means of collaborative filtering.

J Cheminform 2018 Nov 22;10(1):56. Epub 2018 Nov 22.

Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA.

Current ligand-based machine learning methods in virtual screening rely heavily on molecular fingerprinting for preprocessing, i.e., explicit description of ligands' structural and physicochemical properties in a vectorized form. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0310-yDOI Listing
November 2018
14 Reads

Improved understanding of aqueous solubility modeling through topological data analysis.

J Cheminform 2018 Nov 20;10(1):54. Epub 2018 Nov 20.

Mathematical Sciences, University of Southampton, Southampton, UK.

Topological data analysis is a family of recent mathematical techniques seeking to understand the 'shape' of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0308-5DOI Listing
November 2018
3 Reads

Cheminformatics-based enumeration and analysis of large libraries of macrolide scaffolds.

J Cheminform 2018 Nov 12;10(1):53. Epub 2018 Nov 12.

Department of Chemistry, North Carolina State University, Raleigh, NC, USA.

We report on the development of a cheminformatics enumeration technology and the analysis of a resulting large dataset of virtual macrolide scaffolds. Although macrolides have been shown to have valuable biological properties, there is no ready-to-screen virtual library of diverse macrolides in the public domain. Conducting molecular modeling (especially virtual screening) of these complex molecules is highly relevant as the organic synthesis of these compounds, when feasible, typically requires many synthetic steps, and thus dramatically slows the discovery of new bioactive macrolides. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0307-6DOI Listing
November 2018
22 Reads

An automated framework for NMR chemical shift calculations of small organic molecules.

J Cheminform 2018 Oct 26;10(1):52. Epub 2018 Oct 26.

The Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA.

When using nuclear magnetic resonance (NMR) to assist in chemical identification in complex samples, researchers commonly rely on databases for chemical shift spectra. However, authentic standards are typically depended upon to build libraries experimentally. Considering complex biological samples, such as blood and soil, the entirety of NMR spectra required for all possible compounds would be infeasible to ascertain due to limitations of available standards and experimental processing time. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0305-8DOI Listing
October 2018
21 Reads

Choquet integral-based fuzzy molecular characterizations: when global definitions are computed from the dependency among atom/bond contributions (LOVIs/LOEIs).

J Cheminform 2018 Oct 25;10(1):51. Epub 2018 Oct 25.

Grupo de Química Cuántica y Teórica, Facultad de Ciencias Exactas y Naturales, Programa de Química, Universidad de Cartagena, Campus de San Pablo, Cartagena, Colombia.

Background: Several topological (2D) and geometric (3D) molecular descriptors (MDs) are calculated from local vertex/edge invariants (LOVIs/LOEIs) by performing an aggregation process. To this end, norm-, mean- and statistic-based (non-fuzzy) operators are used, under the assumption that LOVIs/LOEIs are independent (orthogonal) values of one another. These operators are based on additive and/or linear measures and, consequently, they cannot be used to encode information from interrelated criteria. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0306-7DOI Listing
October 2018
2 Reads

A new chemoinformatics approach with improved strategies for effective predictions of potential drugs.

J Cheminform 2018 Oct 11;10(1):50. Epub 2018 Oct 11.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Background: Fast and accurate identification of potential drug candidates against therapeutic targets (i.e., drug-target interactions, DTIs) is a fundamental step in the early drug discovery process. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0303-xDOI Listing
October 2018
2 Reads

Evaluating parameters for ligand-based modeling with random forest on sparse data sets.

J Cheminform 2018 Oct 11;10(1):49. Epub 2018 Oct 11.

Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.

Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13321-018-0304-9DOI Listing
October 2018
2 Reads

Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints.

J Cheminform 2018 Oct 4;10(1):48. Epub 2018 Oct 4.

Plasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary.

Background: Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. Read More

View Article

Download full-text PDF

Source
https://jcheminf.springeropen.com/articles/10.1186/s13321-01
Publisher Site
http://dx.doi.org/10.1186/s13321-018-0302-yDOI Listing
October 2018
13 Reads