Publications by authors named "Yuan-Ke Zhou"

6 Publications

  • Page 1 of 1

NPI-GNN: Predicting ncRNA-protein interactions with deep graph neural networks.

Brief Bioinform 2021 Apr 5. Epub 2021 Apr 5.

College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.

Noncoding RNAs (ncRNAs) play crucial roles in many biological processes. Experimental methods for identifying ncRNA-protein interactions (NPIs) are always costly and time-consuming. Many computational approaches have been developed as alternative ways. In this work, we collected five benchmarking datasets for predicting NPIs. Based on these datasets, we evaluated and compared the prediction performances of existing machine-learning based methods. Graph neural network (GNN) is a recently developed deep learning algorithm for link predictions on complex networks, which has never been applied in predicting NPIs. We constructed a GNN-based method, which is called Noncoding RNA-Protein Interaction prediction using Graph Neural Networks (NPI-GNN), to predict NPIs. The NPI-GNN method achieved comparable performance with state-of-the-art methods in a 5-fold cross-validation. In addition, it is capable of predicting novel interactions based on network information and sequence information. We also found that insufficient sequence information does not affect the NPI-GNN prediction performance much, which makes NPI-GNN more robust than other methods. As far as we can tell, NPI-GNN is the first end-to-end GNN predictor for predicting NPIs. All benchmarking datasets in this work and all source codes of the NPI-GNN method have been deposited with documents in a GitHub repo (https://github.com/AshuiRUA/NPI-GNN).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbab051DOI Listing
April 2021

LPI-SKF: Predicting lncRNA-Protein Interactions Using Similarity Kernel Fusions.

Front Genet 2020 9;11:615144. Epub 2020 Dec 9.

College of Intelligence and Computing, Tianjin University, Tianjin, China.

Long non-coding RNAs (lncRNAs) play an important role in serval biological activities, including transcription, splicing, translation, and some other cellular regulation processes. lncRNAs perform their biological functions by interacting with various proteins. The studies on lncRNA-protein interactions are of great value to the understanding of lncRNA functional mechanisms. In this paper, we proposed a novel model to predict potential lncRNA-protein interactions using the SKF (similarity kernel fusion) and LapRLS (Laplacian regularized least squares) algorithms. We named this method the LPI-SKF. Various similarities of both lncRNAs and proteins were integrated into the LPI-SKF. LPI-SKF can be applied in predicting potential interactions involving novel proteins or lncRNAs. We obtained an AUROC (area under receiver operating curve) of 0.909 in a 5-fold cross-validation, which outperforms other state-of-the-art methods. A total of 19 out of the top 20 ranked interaction predictions were verified by existing data, which implied that the LPI-SKF had great potential in discovering unknown lncRNA-protein interactions accurately. All data and codes of this work can be downloaded from a GitHub repository (https://github.com/zyk2118216069/LPI-SKF).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2020.615144DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7758075PMC
December 2020

KNIndex: a comprehensive database of physicochemical properties for k-tuple nucleotides.

Brief Bioinform 2020 Nov 5. Epub 2020 Nov 5.

College of Intelligence and Computing, Tianjin University.

With the development of high-throughput sequencing technology, the genomic sequences increased exponentially over the last decade. In order to decode these new genomic data, machine learning methods were introduced for genome annotation and analysis. Due to the requirement of most machines learning methods, the biological sequences must be represented as fixed-length digital vectors. In this representation procedure, the physicochemical properties of k-tuple nucleotides are important information. However, the values of the physicochemical properties of k-tuple nucleotides are scattered in different resources. To facilitate the studies on genomic sequences, we developed the first comprehensive database, namely KNIndex (https://knindex.pufengdu.org), for depositing and visualizing physicochemical properties of k-tuple nucleotides. Currently, the KNIndex database contains 182 properties including one for mononucleotide (DNA), 169 for dinucleotide (147 for DNA and 22 for RNA) and 12 for trinucleotide (DNA). KNIndex database also provides a user-friendly web-based interface for the users to browse, query, visualize and download the physicochemical properties of k-tuple nucleotides. With the built-in conversion and visualization functions, users are allowed to display DNA/RNA sequences as curves of multiple physicochemical properties. We wish that the KNIndex will facilitate the related studies in computational biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa284DOI Listing
November 2020

Predicting lncRNA-Protein Interactions With miRNAs as Mediators in a Heterogeneous Network Model.

Front Genet 2019 22;10:1341. Epub 2020 Jan 22.

College of Intelligence and Computing, Tianjin University, Tianjin, China.

Long non-coding RNAs (lncRNAs) play important roles in various biological processes, where lncRNA-protein interactions are usually involved. Therefore, identifying lncRNA-protein interactions is of great significance to understand the molecular functions of lncRNAs. Since the experiments to identify lncRNA-protein interactions are always costly and time consuming, computational methods are developed as alternative approaches. However, existing lncRNA-protein interaction predictors usually require prior knowledge of lncRNA-protein interactions with experimental evidences. Their performances are limited due to the number of known lncRNA-protein interactions. In this paper, we explored a novel way to predict lncRNA-protein interactions without direct prior knowledge. MiRNAs were picked up as mediators to estimate potential interactions between lncRNAs and proteins. By validating our results based on known lncRNA-protein interactions, our method achieved an AUROC (Area Under Receiver Operating Curve) of 0.821, which is comparable to the state-of-the-art methods. Moreover, our method achieved an improved AUROC of 0.852 by further expanding the training dataset. We believe that our method can be a useful supplement to the existing methods, as it provides an alternative way to estimate lncRNA-protein interactions in a heterogeneous network without direct prior knowledge. All data and codes of this work can be downloaded from GitHub (https://github.com/zyk2118216069/LncRNA-protein-interactions-prediction).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.01341DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6988623PMC
January 2020

VisFeature: a stand-alone program for visualizing and analyzing statistical features of biological sequences.

Bioinformatics 2020 02;36(4):1277-1278

Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.

Summary: Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions.

Availability And Implementation: VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases).

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz689DOI Listing
February 2020

Predicting protein sub-Golgi locations by combining functional domain enrichment scores with pseudo-amino acid compositions.

J Theor Biol 2019 07 30;473:38-43. Epub 2019 Apr 30.

College of Intelligence and Computing, Tianjin University, Tianjin 300350, China. Electronic address:

Golgi apparatus is an important subcellular organelle that participates the secretion pathway. The role of Golgi apparatus in cellular process is related with Golgi-resident proteins. Knowing the sub-Golgi locations of Golgi-resident proteins is helpful in understanding their molecular functions. In this work, we proposed a computational method to predict the sub-Golgi locations for the Golgi-resident proteins. We take three sub-Golgi locations into consideration: the cis-Golgi network (CGN), the Golgi stack and the trans-Golgi network (TGN). By combining Pseudo-Amino Acid Compositions (Type-II PseAAC) and the Functional Domain Enrichment Score (FunDES), our method not only achieved better performances than existing methods, but also capable of recognizing proteins of the Golgi stack location, which is never considered in other state-of-the-art works.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jtbi.2019.04.025DOI Listing
July 2019