Publications by authors named "Xiangxiang Zeng"

69 Publications

Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison.

Brief Bioinform 2021 Jul 23. Epub 2021 Jul 23.

College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China.

The biomedical literature is growing rapidly, and the extraction of meaningful information from the large amount of literature is increasingly important. Biomedical named entity (BioNE) identification is one of the critical and fundamental tasks in biomedical text mining. Accurate identification of entities in the literature facilitates the performance of other tasks. Given that an end-to-end neural network can automatically extract features, several deep learning-based methods have been proposed for BioNE recognition (BioNER), yielding state-of-the-art performance. In this review, we comprehensively summarize deep learning-based methods for BioNER and datasets used in training and testing. The deep learning methods are classified into four categories: single neural network-based, multitask learning-based, transfer learning-based and hybrid model-based methods. They can be applied to BioNER in multiple domains, and the results are determined by the dataset size and type. Lastly, we discuss the future development and opportunities of BioNER methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbab282DOI Listing
July 2021

3DMol-Net: Learn 3D Molecular Representation using Adaptive Graph Convolutional Network Based on Rotation Invariance.

IEEE J Biomed Health Inform 2021 Jun 14;PP. Epub 2021 Jun 14.

Studying the deep learning-based molecular representation has great significance on predicting molecular property, promoted the development of drug screening and new drug discovery, and improving human well-being for avoiding illnesses. It is essential to learn the characterization of drug for various downstream tasks, such as molecular property prediction. In particular, the 3D structure features of molecules play an important role in biochemical function and activity prediction. The 3D characteristics of molecules largely determine the properties of the drug and the binding characteristics of the target. However, most current methods merely rely on 1D or 2D properties while ignoring the 3D topological structure, thereby degrading the performance of molecular inferring. In this paper, we propose 3DMol-Net to enhance the molecular representation, considering both the topology and rotation invariance (RI) of the 3D molecular structure. Specifically, we construct a molecular graph with soft relations related to the spatial arrangement of the 3D coordinates to learn 3D topology of arbitrary graph structure and employ an adaptive graph convolutional network to predict molecular properties and biochemical activities. Comparing with current graph-based methods, 3DMol-Net demonstrates superior performance in terms of both regression and classification tasks. Further verification of RI and visualization also show better robustness and representation capacity of our model.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/JBHI.2021.3089162DOI Listing
June 2021

A novel antibacterial peptide recognition algorithm based on BERT.

Brief Bioinform 2021 May 26. Epub 2021 May 26.

Xiamen University, Xiamen 361005, China.

As the best substitute for antibiotics, antimicrobial peptides (AMPs) have important research significance. Due to the high cost and difficulty of experimental methods for identifying AMPs, more and more researches are focused on using computational methods to solve this problem. Most of the existing calculation methods can identify AMPs through the sequence itself, but there is still room for improvement in recognition accuracy, and there is a problem that the constructed model cannot be universal in each dataset. The pre-training strategy has been applied to many tasks in natural language processing (NLP) and has achieved gratifying results. It also has great application prospects in the field of AMP recognition and prediction. In this paper, we apply the pre-training strategy to the model training of AMP classifiers and propose a novel recognition algorithm. Our model is constructed based on the BERT model, pre-trained with the protein data from UniProt, and then fine-tuned and evaluated on six AMP datasets with large differences. Our model is superior to the existing methods and achieves the goal of accurate identification of datasets with small sample size. We try different word segmentation methods for peptide chains and prove the influence of pre-training steps and balancing datasets on the recognition effect. We find that pre-training on a large number of diverse AMP data, followed by fine-tuning on new data, is beneficial for capturing both new data's specific features and common features between AMP sequences. Finally, we construct a new AMP dataset, on which we train a general AMP recognition model.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbab200DOI Listing
May 2021

CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques.

BMC Bioinformatics 2021 Apr 26;22(1):216. Epub 2021 Apr 26.

Department of Computer Science, Xiamen University, Xiamen, 361005, China.

Background: Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington's disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins.

Results: In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses "support vector machine" subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew's correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools.

Conclusion: The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-021-04134-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8077735PMC
April 2021

ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties.

Nucleic Acids Res 2021 07;49(W1):W5-W14

Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.

Because undesirable pharmacokinetics and toxicity of candidate compounds are the main reasons for the failure of drug development, it has been widely recognized that absorption, distribution, metabolism, excretion and toxicity (ADMET) should be evaluated as early as possible. In silico ADMET evaluation models have been developed as an additional tool to assist medicinal chemists in the design and optimization of leads. Here, we announced the release of ADMETlab 2.0, a completely redesigned version of the widely used AMDETlab web server for the predictions of pharmacokinetics and toxicity properties of chemicals, of which the supported ADMET-related endpoints are approximately twice the number of the endpoints in the previous version, including 17 physicochemical properties, 13 medicinal chemistry properties, 23 ADME properties, 27 toxicity endpoints and 8 toxicophore rules (751 substructures). A multi-task graph attention framework was employed to develop the robust and accurate models in ADMETlab 2.0. The batch computation module was provided in response to numerous requests from users, and the representation of the results was further optimized. The ADMETlab 2.0 server is freely available, without registration, at https://admetmesh.scbdd.com/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkab255DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262709PMC
July 2021

A spatial-temporal gated attention module for molecular property prediction based on molecular geometry.

Brief Bioinform 2021 Apr 5. Epub 2021 Apr 5.

College of Computer Science and Electronic Engineering, Hunan University, Changsha 410205, China.

Motivation: Geometry-based properties and characteristics of drug molecules play an important role in drug development for virtual screening in computational chemistry. The 3D characteristics of molecules largely determine the properties of the drug and the binding characteristics of the target. However, most of the previous studies focused on 1D or 2D molecular descriptors while ignoring the 3D topological structure, thereby degrading the performance of molecule-related prediction. Because it is very time-consuming to use dynamics to simulate molecular 3D conformer, we aim to use machine learning to represent 3D molecules by using the generated 3D molecular coordinates from the 2D structure.

Results: We proposed Drug3D-Net, a novel deep neural network architecture based on the spatial geometric structure of molecules for predicting molecular properties. It is grid-based 3D convolutional neural network with spatial-temporal gated attention module, which can extract the geometric features for molecular prediction tasks in the process of convolution. The effectiveness of Drug3D-Net is verified on the public molecular datasets. Compared with other deep learning methods, Drug3D-Net shows superior performance in predicting molecular properties and biochemical activities.

Availability And Implementation: https://github.com/anny0316/Drug3D-Net.

Supplementary Data: Supplementary data are available online at https://academic.oup.com/bib.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbab078DOI Listing
April 2021

MUFFIN: Multi-Scale Feature Fusion for Drug-Drug Interaction Prediction.

Bioinformatics 2021 Mar 15. Epub 2021 Mar 15.

School of Computer Science and Engineering, Hunan University, Changsha, 410012, China.

Motivation: Adverse drug-drug interactions (DDIs) are crucial for drug research and mainly cause morbidity and mortality. Thus, the identification of potential DDIs is essential for doctors, patients, and the society. Existing traditional machine learning models rely heavily on handcraft features and lack generalization. Recently, the deep learning approaches that can automatically learn drug features from the molecular graph or drug-related network have improved the ability of computational models to predict unknown DDIs. However, previous works utilized large labeled data and merely considered the structure or sequence information of drugs without considering the relations or topological information between drug and other biomedical objects (e.g., gene, disease, and pathway), or considered knowledge graph (KG) without considering the information from the drug molecular structure.

Results: Accordingly, to effectively explore the joint effect of drug molecular structure and semantic information of drugs in knowledge graph for DDI prediction, we propose a multi-scale feature fusion deep learning model named MUFFIN. MUFFIN can jointly learn the drug representation based on both the drug-self structure information and the KG with rich bio-medical information. In MUFFIN, we designed a bi-level cross strategy that includes cross- and scalar-level components to fuse multi-modal features well. MUFFIN can alleviate the restriction of limited labeled data on deep learning models by crossing the features learned from large-scale KG and drug molecular graph. We evaluated our approach on three datasets and three different tasks including binary-class, multi-class, and multi-label DDI prediction tasks. The results showed that MUFFIN outperformed other state-of-the-art baselines.

Availability: The source code and data are available at https://github.com/xzenglab/MUFFIN.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab169DOI Listing
March 2021

ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation.

Brief Bioinform 2021 Jul;22(4)

University of Electronic Science and Technology of China.

The peptide therapeutics market is providing new opportunities for the biotechnology and pharmaceutical industries. Therefore, identifying therapeutic peptides and exploring their properties are important. Although several studies have proposed different machine learning methods to predict peptides as being therapeutic peptides, most do not explain the decision factors of model in detail. In this work, an Interpretable Therapeutic Peptide Prediction (ITP-Pred) model based on efficient feature fusion was developed. First, we proposed three kinds of feature descriptors based on sequence and physicochemical property encoded, namely amino acid composition (AAC), group AAC and coding autocorrelation, and concatenated them to obtain the feature representation of therapeutic peptide. Then, we input it into the CNN-Bi-directional Long Short-Term Memory (BiLSTM) model to automatically learn recognition of therapeutic peptides. The cross-validation and independent verification experiments results indicated that ITP-Pred has a higher prediction performance on the benchmark dataset than other comparison methods. Finally, we analyzed the output of the model from two aspects: sequence order and physical and chemical properties, mining important features as guidance for the design of better models that can complement existing methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa367DOI Listing
July 2021

A Robust Algorithm Based on Link Label Propagation for Identifying Functional Modules from Protein-protein Interaction Networks.

IEEE/ACM Trans Comput Biol Bioinform 2020 Nov 19;PP. Epub 2020 Nov 19.

Identifying functional modules in protein-protein interaction (PPI) networks elucidates cellular organization and mechanism. Various methods have been proposed to identify the functional modules in PPI networks, but most of these methods do not consider the noisy links in PPI networks. They achieve a competitive performance on the PPI networks without noisy links, but the performance of these methods considerably deteriorates in the noisy PPI networks. Furthermore, the noisy links are inevitable in the PPI networks. In this paper, we propose a novel link-driven label propagation algorithm (LLPA) to identify functional modules in PPI networks. The LLPA first find link clusters in PPI networks, and then the functional modules are identified from the link clusters. Two strategies aimed to ensure the robustness of LLPA are proposed. One strategy involves the proposed LLPA updating the link labels in accordance with the designed weight of the link, which can reduce the incidence of noisy links. The other strategy involves the filtration of some noisy labels from the link clusters to further reduce the influence of noisy links. The performance evaluation on three real PPI networks shows that LLPA outperforms other eight state-of-the-art detection algorithms in terms of accuracy and robustness.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2020.3038815DOI Listing
November 2020

iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor.

Bioinformatics 2021 05;37(8):1060-1067

College of Computer Science and Electronic Engineering, Hunan University, 410082 Changsha, Hunan, China.

Motivation: Enhancers are non-coding DNA fragments with high position variability and free scattering. They play an important role in controlling gene expression. As machine learning has become more widely used in identifying enhancers, a number of bioinformatic tools have been developed. Although several models for identifying enhancers and their strengths have been proposed, their accuracy and efficiency have yet to be improved.

Results: We propose a two-layer predictor called 'iEnhancer-XG.' It comprises a one-layer predictor (for identifying enhancers) and a second classifier (for their strength) and uses 'XGBoost' as a base classifier and five feature extraction methods, namely, k-Spectrum Profile, Mismatch k-tuple, Subsequence Profile, Position-specific scoring matrix (PSSM) and Pseudo dinucleotide composition (PseDNC). Each method has an independent output. We place the feature vector matrix into the ensemble learning for fusion. This experiment involves the method of 'SHapley Additive explanations' to provide interpretability for the previous black box machine learning methods and improve their credibility. The accuracies of the ensemble learning method are 0.811 (first layer) and 0.657 (second layer). The rigorous 10-fold cross-validation confirms that the proposed method is significantly better than existing technologies.

Availability And Implementation: The source code and dataset for the enhancer predictions have been uploaded to https://github.com/jimmyrate/ienhancer-xg.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa914DOI Listing
May 2021

Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers.

Bioinformatics 2021 07;37(11):1604-1606

College of Information Science and Engineering, Hunan University, Changsha, Hunan 410012, China.

Summary: Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand.

Availability And Implementation: https://github.com/yuansliu/minirmd.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa915DOI Listing
July 2021

Predicting enhancer-promoter interactions by deep learning and matching heuristic.

Brief Bioinform 2021 Jul;22(4)

School of Information Science and Engineering, Hunan University, Changsha, China.

Enhancer-promoter interactions (EPIs) play an important role in transcriptional regulation. Recently, machine learning-based methods have been widely used in the genome-scale identification of EPIs due to their promising predictive performance. In this paper, we propose a novel method, termed EPI-DLMH, for predicting EPIs with the use of DNA sequences only. EPI-DLMH consists of three major steps. First, a two-layer convolutional neural network is used to learn local features, and an bidirectional gated recurrent unit network is used to capture long-range dependencies on the sequences of promoters and enhancers. Second, an attention mechanism is used for focusing on relatively important features. Finally, a matching heuristic mechanism is introduced for the exploration of the interaction between enhancers and promoters. We use benchmark datasets in evaluating and comparing the proposed method with existing methods. Comparative results show that our model is superior to currently existing models in multiple cell lines. Specifically, we found that the matching heuristic mechanism introduced into the proposed model mainly contributes to the improvement of performance in terms of overall accuracy. Additionally, compared with existing models, our model is more efficient with regard to computational speed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa254DOI Listing
July 2021

Repurpose Open Data to Discover Therapeutics for COVID-19 Using Deep Learning.

J Proteome Res 2020 11 24;19(11):4624-4636. Epub 2020 Jul 24.

Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio 44106, United States.

There have been more than 2.2 million confirmed cases and over 120 000 deaths from the human coronavirus disease 2019 (COVID-19) pandemic, caused by the novel severe acute respiratory syndrome coronavirus (SARS-CoV-2), in the United States alone. However, there is currently a lack of proven effective medications against COVID-19. Drug repurposing offers a promising route for the development of prevention and treatment strategies for COVID-19. This study reports an integrative, network-based deep-learning methodology to identify repurposable drugs for COVID-19 (termed CoV-KGE). Specifically, we built a comprehensive knowledge graph that includes 15 million edges across 39 types of relationships connecting drugs, diseases, proteins/genes, pathways, and expression from a large scientific corpus of 24 million PubMed publications. Using Amazon's AWS computing resources and a network-based, deep-learning framework, we identified 41 repurposable drugs (including dexamethasone, indomethacin, niclosamide, and toremifene) whose therapeutic associations with COVID-19 were validated by transcriptomic and proteomics data in SARS-CoV-2-infected human cells and data from ongoing clinical trials. Whereas this study by no means recommends specific drugs, it demonstrates a powerful deep-learning methodology to prioritize existing drugs for further investigation, which holds the potential to accelerate therapeutic development for COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.0c00316DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7384389PMC
November 2020

Monodirectional Tissue P Systems With Promoters.

IEEE Trans Cybern 2021 Jan 22;51(1):438-450. Epub 2020 Dec 22.

Tissue P systems with promoters provide nondeterministic parallel bioinspired devices that evolve by the interchange of objects between regions, determined by the existence of some special objects called promoters. However, in cellular biology, the movement of molecules across a membrane is transported from high to low concentration. Inspired by this biological fact, in this article, an interesting type of tissue P systems, called monodirectional tissue P systems with promoters, where communication happens between two regions only in one direction, is considered. Results show that finite sets of numbers are produced by such P systems with one cell, using any length of symport rules or with any number of cells, using a maximal length 1 of symport rules, and working in the maximally parallel mode. Monodirectional tissue P systems are Turing universal with two cells, a maximal length 2, and at most one promoter for each symport rule, and working in the maximally parallel mode or with three cells, a maximal length 1, and at most one promoter for each symport rule, and working in the flat maximally parallel mode. We also prove that monodirectional tissue P systems with two cells, a maximal length 1, and at most one promoter for each symport rule (under certain restrictive conditions) working in the flat maximally parallel mode characterizes regular sets of natural numbers. Besides, the computational efficiency of monodirectional tissue P systems with promoters is analyzed when cell division rules are incorporated. Different uniform solutions to the Boolean satisfiability problem (SAT problem) are provided. These results show that with the restrictive condition of "monodirectionality," monodirectional tissue P systems with promoters are still computationally powerful. With the powerful computational power, developing membrane algorithms for monodirectional tissue P systems with promoters is potentially exploitable.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2020.3003060DOI Listing
January 2021

Application of deep learning methods in biological networks.

Brief Bioinform 2021 Mar;22(2):1902-1917

The increase in biological data and the formation of various biomolecule interaction databases enable us to obtain diverse biological networks. These biological networks provide a wealth of raw materials for further understanding of biological systems, the discovery of complex diseases and the search for therapeutic drugs. However, the increase in data also increases the difficulty of biological networks analysis. Therefore, algorithms that can handle large, heterogeneous and complex data are needed to better analyze the data of these network structures and mine their useful information. Deep learning is a branch of machine learning that extracts more abstract features from a larger set of training data. Through the establishment of an artificial neural network with a network hierarchy structure, deep learning can extract and screen the input information layer by layer and has representation learning ability. The improved deep learning algorithm can be used to process complex and heterogeneous graph data structures and is increasingly being applied to the mining of network data information. In this paper, we first introduce the used network data deep learning models. After words, we summarize the application of deep learning on biological networks. Finally, we discuss the future development prospects of this field.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa043DOI Listing
March 2021

StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency.

Bioinformatics 2020 05;36(10):3028-3034

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.

Motivation: Cell-penetrating peptides (CPPs) are a vehicle for transporting into living cells pharmacologically active molecules, such as short interfering RNAs, nanoparticles, plasmid DNAs and small peptides, thus offering great potential as future therapeutics. Existing experimental techniques for identifying CPPs are time-consuming and expensive. Thus, the prediction of CPPs from peptide sequences by using computational methods can be useful to annotate and guide the experimental process quickly. Many machine learning-based methods have recently emerged for identifying CPPs. Although considerable progress has been made, existing methods still have low feature representation capabilities, thereby limiting further performance improvements.

Results: We propose a method called StackCPPred, which proposes three feature methods on the basis of the pairwise energy content of the residue as follows: RECM-composition, PseRECM and RECM-DWT. These features are used to train stacking-based machine learning methods to effectively predict CPPs. On the basis of the CPP924 and CPPsite3 datasets with jackknife validation, StackDPPred achieved 94.5% and 78.3% accuracy, which was 2.9% and 5.8% higher than the state-of-the-art CPP predictors, respectively. StackCPPred can be a powerful tool for predicting CPPs and their uptake efficiency, facilitating hypothesis-driven experimental design and accelerating their applications in clinical therapy.

Availability And Implementation: Source code and data can be downloaded from https://github.com/Excelsior511/StackCPPred.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa131DOI Listing
May 2020

A Polar-Metric-Based Evolutionary Algorithm.

IEEE Trans Cybern 2021 Jul 23;51(7):3429-3440. Epub 2021 Jun 23.

Over the past two decades, numerous multi- and many-objective evolutionary algorithms (MOEAs and MaOEAs) have been proposed to solve the multi- and many-objective optimization problems (MOPs and MaOPs), respectively. It is known that the difficulty of maintaining the convergence and diversity performances rapidly grows as the number of objectives increases. This phenomenon is especially evident for the Pareto-dominance-based EAs, because the nondominated sorting often fails to provide enough convergent pressure toward the Pareto front (PF). Therefore, many researchers came up with some non-Pareto-dominance-based EAs, which are based on indicator, decomposition, and so on. In this article, we propose a polar-metric ( p -metric)-based EA (PMEA) for tackling both MOPs and MaOPs. p -metric is a recently proposed performance indicator which adopts a set of uniformly distributed direction vectors. In PMEA, we use a two-phase selection which combines both nondominated sorting and p -metric. Moreover, a modification is proposed to adjust the direction vectors of p -metric dynamically. In the experiments, PMEA is compared with six state-of-the-art EAs in total and is measured by three performance metrics, including p -metric. According to the empirical results, PMEA shows promising performances on most of the test problems, involving both MOPs and MaOPs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2020.2965230DOI Listing
July 2021

Network-based prediction of drug-target interactions using an arbitrary-order proximity embedded deep forest.

Bioinformatics 2020 05;36(9):2805-2812

Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA.

Motivation: Systematic identification of molecular targets among known drugs plays an essential role in drug repurposing and understanding of their unexpected side effects. Computational approaches for prediction of drug-target interactions (DTIs) are highly desired in comparison to traditional experimental assays. Furthermore, recent advances of multiomics technologies and systems biology approaches have generated large-scale heterogeneous, biological networks, which offer unexpected opportunities for network-based identification of new molecular targets among known drugs.

Results: In this study, we present a network-based computational framework, termed AOPEDF, an arbitrary-order proximity embedded deep forest approach, for prediction of DTIs. AOPEDF learns a low-dimensional vector representation of features that preserve arbitrary-order proximity from a highly integrated, heterogeneous biological network connecting drugs, targets (proteins) and diseases. In total, we construct a heterogeneous network by uniquely integrating 15 networks covering chemical, genomic, phenotypic and network profiles among drugs, proteins/targets and diseases. Then, we build a cascade deep forest classifier to infer new DTIs. Via systematic performance evaluation, AOPEDF achieves high accuracy in identifying molecular targets among known drugs on two external validation sets collected from DrugCentral [area under the receiver operating characteristic curve (AUROC) = 0.868] and ChEMBL (AUROC = 0.768) databases, outperforming several state-of-the-art methods. In a case study, we showcase that multiple molecular targets predicted by AOPEDF are associated with mechanism-of-action of substance abuse disorder for several marketed drugs (such as aripiprazole, risperidone and haloperidol).

Availability And Implementation: Source code and data can be downloaded from https://github.com/ChengF-Lab/AOPEDF.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203727PMC
May 2020

Target identification among known drugs by deep learning from heterogeneous networks.

Chem Sci 2020 Jan 13;11(7):1775-1797. Epub 2020 Jan 13.

Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic 9500 Euclid Avenue Cleveland OH 44106 USA +1-216-6361609 +1-216-4447654.

Without foreknowledge of the complete drug target information, development of promising and affordable approaches for effective treatment of human diseases is challenging. Here, we develop deepDTnet, a deep learning methodology for new target identification and drug repurposing in a heterogeneous drug-gene-disease network embedding 15 types of chemical, genomic, phenotypic, and cellular network profiles. Trained on 732 U.S. Food and Drug Administration-approved small molecule drugs, deepDTnet shows high accuracy (the area under the receiver operating characteristic curve = 0.963) in identifying novel molecular targets for known drugs, outperforming previously published state-of-the-art methodologies. We then experimentally validate that deepDTnet-predicted topotecan (an approved topoisomerase inhibitor) is a new, direct inhibitor (IC = 0.43 μM) of human retinoic-acid-receptor-related orphan receptor-gamma t (ROR-γt). Furthermore, by specifically targeting ROR-γt, topotecan reveals a potential therapeutic effect in a mouse model of multiple sclerosis. In summary, deepDTnet offers a powerful network-based deep learning methodology for target identification to accelerate drug repurposing and minimize the translational gap in drug development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/c9sc04336eDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150105PMC
January 2020

A network-based approach to uncover microRNA-mediated disease comorbidities and potential pathobiological implications.

NPJ Syst Biol Appl 2019 13;5:41. Epub 2019 Nov 13.

3Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195 USA.

Disease-disease relationships (e.g., disease comorbidities) play crucial roles in pathobiological manifestations of diseases and personalized approaches to managing those conditions. In this study, we develop a network-based methodology, termed meta-path-based Disease Network (mpDisNet) capturing algorithm, to infer disease-disease relationships by assembling four biological networks: disease-miRNA, miRNA-gene, disease-gene, and the human protein-protein interactome. mpDisNet is a meta-path-based random walk to reconstruct the heterogeneous neighbors of a given node. mpDisNet uses a heterogeneous skip-gram model to solve the network representation of the nodes. We find that mpDisNet reveals high performance in inferring clinically reported disease-disease relationships, outperforming that of traditional gene/miRNA-overlap approaches. In addition, mpDisNet identifies network-based comorbidities for pulmonary diseases driven by underlying miRNA-mediated pathobiological pathways (i.e., hsa-let-7a- or hsa-let-7b-mediated airway epithelial apoptosis and pro-inflammatory cytokine pathways) as derived from the human interactome network analysis. The mpDisNet offers a powerful tool for network-based identification of disease-disease relationships with miRNA-mediated pathobiological pathways.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41540-019-0115-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853960PMC
April 2020

A novel molecular representation with BiGRU neural networks for learning atom.

Brief Bioinform 2020 12;21(6):2099-2111

College of Computer Science and Technology, Hunan University, Changsha, 410082, China.

Molecular representations play critical roles in researching drug design and properties, and effective methods are beneficial to assisting in the calculation of molecules and solving related problem in drug discovery. In previous years, most of the traditional molecular representations are based on hand-crafted features and rely heavily on biological experimentations, which are often costly and time consuming. However, recent researches achieve promising results using machine learning on various domains. In this article, we present a novel method named Smi2Vec-BiGRU that is designed for learning atoms and solving the single- and multitask binary classification problems in the field of drug discovery, which are the basic and also key problems in this field. Specifically, our approach transforms the molecule data in the SMILES format into a set of sample vectors and then feeds them into the bidirectional gated recurrent unit neural networks for training, which learns low-dimensional vector representations for molecular drug. We conduct extensive experiments on several widely used benchmarks including Tox21, SIDER and ClinTox. The experimental results show that our approach can achieve state-of-the-art performance on these benchmarking datasets, demonstrating the feasibility and competitiveness of our proposed approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbz125DOI Listing
December 2020

Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods.

Brief Bioinform 2020 07;21(4):1425-1436

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

Identification of disease-associated circular RNAs (circRNAs) is of critical importance, especially with the dramatic increase in the amount of circRNAs. However, the availability of experimentally validated disease-associated circRNAs is limited, which restricts the development of effective computational methods. To our knowledge, systematic approaches for the prediction of disease-associated circRNAs are still lacking. In this study, we propose the use of deep forests combined with positive-unlabeled learning methods to predict potential disease-related circRNAs. In particular, a heterogeneous biological network involving 17 961 circRNAs, 469 miRNAs, and 248 diseases was constructed, and then 24 meta-path-based topological features were extracted. We applied 5-fold cross-validation on 15 disease data sets to benchmark the proposed approach and other competitive methods and used [email protected] and [email protected] to evaluate their performance. In general, our method performed better than the other methods. In addition, the performance of all methods improved with the accumulation of known positive labels. Our results provided a new framework to investigate the associations between circRNA and disease and might improve our understanding of its functions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbz080DOI Listing
July 2020

Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism.

Bioinformatics 2020 02;36(4):1037-1043

Department of Computer Science, Xiamen University, Xiamen 361005, China.

Motivation: Identification of enhancer-promoter interactions (EPIs) is of great significance to human development. However, experimental methods to identify EPIs cost too much in terms of time, manpower and money. Therefore, more and more research efforts are focused on developing computational methods to solve this problem. Unfortunately, most existing computational methods require a variety of genomic data, which are not always available, especially for a new cell line. Therefore, it limits the large-scale practical application of methods. As an alternative, computational methods using sequences only have great genome-scale application prospects.

Results: In this article, we propose a new deep learning method, namely EPIVAN, that enables predicting long-range EPIs using only genomic sequences. To explore the key sequential characteristics, we first use pre-trained DNA vectors to encode enhancers and promoters; afterwards, we use one-dimensional convolution and gated recurrent unit to extract local and global features; lastly, attention mechanism is used to boost the contribution of key features, further improving the performance of EPIVAN. Benchmarking comparisons on six cell lines show that EPIVAN performs better than state-of-the-art predictors. Moreover, we build a general model, which has transfer ability and can be used to predict EPIs in various cell lines.

Availability And Implementation: The source code and data are available at: https://github.com/hzy95/EPIVAN.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz694DOI Listing
February 2020

A Consensus Community-Based Particle Swarm Optimization for Dynamic Community Detection.

IEEE Trans Cybern 2020 Jun 23;50(6):2502-2513. Epub 2019 Sep 23.

The community detection in dynamic networks is essential for important applications such as social network analysis. Such detection requires simultaneous maximization of the clustering accuracy at the current time step while minimization of the clustering drift between two successive time steps. In most situations, such objectives are often in conflict with each other. This article proposes the concept of consensus community. Knowledge from the previous step is obtained by extracting the intrapopulation consensus communities from the optimal population of the previous step. Subsequently, the intrapopulation consensus communities of the previous step obtained is voted by the population of the current time step during the evolutionary process. A subset of the consensus communities, which receives a high support rate, will be recognized as the interpopulation consensus communities of the previous and current steps. Interpopulation consensus communities are the knowledge that can be transferred from the previous to the current step. The population of the current time step can evolve toward the direction similar to the population in the previous time step by retaining such interpopulation consensus community during the evolutionary process. Community structure is subjected to evaluation, update, and mutation events, which are directed by using interpopulation consensus community information during the evolutionary process. The experimental results over many artificial and real-world dynamic networks illustrate that the proposed method produces more accurate and robust results than those of the state-of-the-art approaches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2019.2938895DOI Listing
June 2020

Corrigendum Investigation and development of maize fused network analysis with multi-omics [Plant Physiol. Biochem. 141 (2019) 380-387].

Plant Physiol Biochem 2019 Sep 27;142:536-537. Epub 2019 Aug 27.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610000, China. Electronic address:

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.plaphy.2019.08.017DOI Listing
September 2019

Investigation and development of maize fused network analysis with multi-omics.

Plant Physiol Biochem 2019 Aug 15;141:380-387. Epub 2019 Jun 15.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610000, China. Electronic address:

Maize is a critically important staple crop in the whole world, which has contributed to both economic security and food in planting areas. The main target for researchers and breeding is the improvement of maize quality and yield. The use of computational biology methods combined with multi-omics for selecting biomolecules of interest for maize breeding has been receiving more attention. Moreover, the rapid growth of high-throughput sequencing data provides the opportunity to explore biomolecules of interest at the molecular level in maize. Furthermore, we constructed weighted networks for each of the omics and then integrated them into a final fused weighted network based on a nonlinear combination method. We also analyzed the final fused network and mined the orphan nodes, some of which were shown to be transcription factors that played a key role in maize development. This study could help to improve maize production via insights at the multi-omics level and provide a new perspective for maize researchers. All related data have been released at http://lab.malab.cn/∼jj/maize.htm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.plaphy.2019.06.016DOI Listing
August 2019

Investigating Maize Yield-Related Genes in Multiple Omics Interaction Network Data.

IEEE Trans Nanobioscience 2020 01 3;19(1):142-151. Epub 2019 Jun 3.

Zea mays (maize) is the highest yielding food crop globally, feeding large numbers of people across the planet. It is thus especially important to explore the key genes that affect maize production with prior knowledge. Merging multiple datasets of different types can improve the accuracy of candidate genes prediction results, so we constructed interaction networks using gene, mRNA, protein, and expression profile datasets. A network propagation schedule was used considering combined scores obtained by integrating both network scores and significance scores for each candidate gene based on the guilt-by-association principle. An SVM model was used to optimize the weighted parameters to achieve more reliable results, according to the accuracy of label classification. We found that integrating multiple omics data with more data types improves the reliability of the results. We investigated the GO terms particularly associated with the top 100 candidate genes and the known genes, and analyzed the roles that these genes play in determining the phenotype of maize. We hope that the candidate genes identified here will provide a biological perspective and contribute to maize breeding research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNB.2019.2920419DOI Listing
January 2020

deepDR: a network-based deep learning approach to in silico drug repositioning.

Bioinformatics 2019 12;35(24):5191-5198

Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA.

Motivation: Traditional drug discovery and development are often time-consuming and high risk. Repurposing/repositioning of approved drugs offers a relatively low-cost and high-efficiency approach toward rapid development of efficacious treatments. The emergence of large-scale, heterogeneous biological networks has offered unprecedented opportunities for developing in silico drug repositioning approaches. However, capturing highly non-linear, heterogeneous network structures by most existing approaches for drug repositioning has been challenging.

Results: In this study, we developed a network-based deep-learning approach, termed deepDR, for in silico drug repurposing by integrating 10 networks: one drug-disease, one drug-side-effect, one drug-target and seven drug-drug networks. Specifically, deepDR learns high-level features of drugs from the heterogeneous networks by a multi-modal deep autoencoder. Then the learned low-dimensional representation of drugs together with clinically reported drug-disease pairs are encoded and decoded collectively via a variational autoencoder to infer candidates for approved drugs for which they were not originally approved. We found that deepDR revealed high performance [the area under receiver operating characteristic curve (AUROC) = 0.908], outperforming conventional network-based or machine learning-based approaches. Importantly, deepDR-predicted drug-disease associations were validated by the ClinicalTrials.gov database (AUROC = 0.826) and we showcased several novel deepDR-predicted approved drugs for Alzheimer's disease (e.g. risperidone and aripiprazole) and Parkinson's disease (e.g. methylphenidate and pergolide).

Availability And Implementation: Source code and data can be downloaded from https://github.com/ChengF-Lab/deepDR.

Supplementary Information: Supplementary data are available online at Bioinformatics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz418DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954645PMC
December 2019

Prediction of Potential Disease-Associated MicroRNAs by Using Neural Networks.

Mol Ther Nucleic Acids 2019 Jun 18;16:566-575. Epub 2019 Apr 18.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610000, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610000, China. Electronic address:

Identifying disease-related microRNAs (miRNAs) is an essential but challenging task in bioinformatics research. Much effort has been devoted to discovering the underlying associations between miRNAs and diseases. However, most studies mainly focus on designing advanced methods to improve prediction accuracy while neglecting to investigate the link predictability of the relationships between miRNAs and diseases. In this work, we construct a heterogeneous network by integrating neighborhood information in the neural network to predict potential associations between miRNAs and diseases, which also consider the imbalance of datasets. We also employ a new computational method called a neural network model for miRNA-disease association prediction (NNMDA). This model predicts miRNA-disease associations by integrating multiple biological data resources. Comparison of our work with other algorithms reveals the reliable performance of NNMDA. Its average AUC score was 0.937 over 15 diseases in a 5-fold cross-validation and AUC of 0.8439 based on leave-one-out cross-validation. The results indicate that NNMDA could be used in evaluating the accuracy of miRNA-disease associations. Moreover, NNMDA was applied to two common human diseases in two types of case studies. In the first type, 26 out of the top 30 predicted miRNAs of lung neoplasms were confirmed by the experiments. In the second type of case study for new diseases without any known miRNAs related to it, we selected breast neoplasms as the test example by hiding the association information between the miRNAs and this disease. The results verified 50 out of the top 50 predicted breast-neoplasm-related miRNAs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.omtn.2019.04.010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6510966PMC
June 2019

Details in the evaluation of circular RNA detection tools: Reply to Chen and Chuang.

PLoS Comput Biol 2019 04 25;15(4):e1006916. Epub 2019 Apr 25.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1006916DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6527241PMC
April 2019
-->