Publications by authors named "Qiguo Dai"

11 Publications

  • Page 1 of 1

DHNLDA: A novel deep hierarchical network based method for predicting lncRNA-disease associations.

IEEE/ACM Trans Comput Biol Bioinform 2021 Sep 20;PP. Epub 2021 Sep 20.

Recent studies have found that lncRNA (long non-coding RNA) in ncRNA (non-coding RNA) is not only involved in many biological processes, but also abnormally expressed in many complex diseases. Identification of lncRNA-disease associations accurately is of great significance for understanding the function of lncRNA and disease mechanism. In this paper, a deep learning framework consisting of stacked autoencoder(SAE), multi-scale ResNet and stacked ensemble module, named DHNLDA, was constructed to predict lncRNA-disease associations, which integrates multiple biological data sources and constructing feature matrices. Among them, the biological data including the similarity and the interaction of lncRNAs, diseases and miRNAs are integrated. The feature matrices are obtained by node2vec embedding and feature extraction respectively. Then, the SAE and the multi-scale ResNet are used to learn the complementary information between nodes, and the high-level features of node attributes are obtained. Finally, the fusion of high-level feature is input into the stacked ensemble module to obtain the prediction results of lncRNA-disease associations. The experimental results of five-fold cross-validation show that the AUC of DHNLDA reaches 0.975 better than the existing methods. Case studies of stomach cancer, breast cancer and lung cancer have shown the great ability of DHNLDA to discover the potential lncRNA-disease associations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2021.3113326DOI Listing
September 2021

Predicting RBP Binding Sites of RNA with High-order Encoding Features and CNN-BLSTM Hybrid Model.

IEEE/ACM Trans Comput Biol Bioinform 2021 May 26;PP. Epub 2021 May 26.

RNA binding protein (RBP) is extensively involved in various cellular regulatory processes through the interaction with RNAs. Capturing the RBP binding preferences is fundamental for revealing the pathogenesis of complex diseases. Many experimental detection techniques are still time-consuming and labor-intensive, therefore, it is indispensable to develop a computational method with convincing accuracy. In this study, we proposed a CNN-BLSTM hybrid deep learning framework, named DeepDW, for predicting the RBP binding sites on RNAs with high-order encoding features of RNA sequence and secondary structure. The high-order encoding strategy was used to characterize the dependencies among adjacency nucleotides. For CNN-BLSTM hybrid model, DeepDW firstly employed two 1-D convolutional neural networks (CNNs) for learning the local features from high-order encoded matrices of RNA sequence and structure separately, and then applied two bidirectional long short-term memory networks (BLSTMs) to capture the global information in a higher level. Moreover, a series of experiments were carried out on 31 public datasets to evaluate our proposed framework, and DeepDW achieved superior performance than the state-of-the-art methods. The results indicated that the combination of high-order encoding method and CNN-BLSTM hybrid model had advantages in identifying RBP-RNA binding sites.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2021.3083930DOI Listing
May 2021

MD-MLI: Prediction of miRNA-lncRNA Interaction by Using Multiple Features and Hierarchical Deep Learning.

IEEE/ACM Trans Comput Biol Bioinform 2020 Oct 30;PP. Epub 2020 Oct 30.

Long non-coding RNA(lncRNA) can interact with microRNA(miRNA) and play an important role in inhibiting or activating the expression of target genes and the occurrence and development of tumors. Accumulating studies focus on the prediction of miRNA-lncRNA interaction, and mostly are concerned with biological experiments and machine learning methods. These methods are found with long cycles, high costs, and requiring over much human intervention. In this paper, a data-driven hierarchical deep learning framework was proposed, which was composed of a capsule network, an independent recurrent neural network with attention mechanism and bi-directional long short-term memory network. This framework combines the advantages of different networks, uses multiple sequencederived features of the original sequence and features of secondary structure to mine the dependency between features, and devotes to obtain better results. In the experiment, five-fold cross-validation was used to evaluate the performance of the model, and the zea mays data set was compared with the different model to obtain better classification effect. In addition, sorghum, brachypodium distachyon and bryophyte data sets were used to test the model, and the accuracy reached 0.9850, 0.9859 and 0.9777, respectively, which verified the model's good generalization ability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2020.3034922DOI Listing
October 2020

AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA.

Interdiscip Sci 2020 Dec 22;12(4):414-423. Epub 2020 Jun 22.

Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China.

Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s12539-020-00379-3DOI Listing
December 2020

Construction of Complex Features for Computational Predicting ncRNA-Protein Interaction.

Front Genet 2019 1;10:18. Epub 2019 Feb 1.

Department of Hematology, The First Affiliated Hospital of Harbin Medical University, Harbin, China.

Non-coding RNA (ncRNA) plays important roles in many critical regulation processes. Many ncRNAs perform their regulatory functions by the form of RNA-protein complexes. Therefore, identifying the interaction between ncRNA and protein is fundamental to understand functions of ncRNA. Under pressures from expensive cost of experimental techniques, developing an accuracy computational predictive model has become an indispensable way to identify ncRNA-protein interaction. A powerful predicting model of ncRNA-protein interaction needs a good feature set of characterizing the interaction. In this paper, a novel method is put forward to generate complex features for characterizing ncRNA-protein interaction (named CFRP). To obtain a comprehensive description of ncRNA-protein interaction, complex features are generated by non-linear transformations from the traditional k-mer features of ncRNA and protein sequences. To further reduce the dimensions of complex features, a group of discriminative features are selected by random forest. To validate the performances of the proposed method, a series of experiments are carried on several widely-used public datasets. Compared with the traditional k-mer features, the CFRP complex features can boost the performances of ncRNA-protein interaction prediction model. Meanwhile, the CFRP-based prediction model is compared with several state-of-the-art methods, and the results show that the proposed method achieves better performances than the others in term of the evaluation metrics. In conclusion, the complex features generated by CFRP are beneficial for building a powerful predicting model of ncRNA-protein interaction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.00018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6367266PMC
February 2019

Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

FEBS Open Bio 2015 27;5:251-6. Epub 2015 Mar 27.

College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China.

In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fob.2015.03.011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4392065PMC
April 2015

A least square method based model for identifying protein complexes in protein-protein interaction network.

Biomed Res Int 2014 23;2014:720960. Epub 2014 Oct 23.

School of Computer Science and Technology, Harbin Institute of Technology, P.O. Box 319, 92 Xidazhi Street, Harbin 150001, China ; School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.

Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2014/720960DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227386PMC
July 2015

Computational prediction of protein function based on weighted mapping of domains and GO terms.

Biomed Res Int 2014 23;2014:641469. Epub 2014 Apr 23.

Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China.

In this paper, we propose a novel method, SeekFun, to predict protein function based on weighted mapping of domains and GO terms. Firstly, a weighted mapping of domains and GO terms is constructed according to GO annotations and domain composition of the proteins. The association strength between domain and GO term is weighted by symmetrical conditional probability. Secondly, the mapping is extended along the true paths of the terms based on GO hierarchy. Finally, the terms associated with resident domains are transferred to host protein and real annotations of the host protein are determined by association strengths. Our careful comparisons demonstrate that SeekFun outperforms the concerned methods on most occasions. SeekFun provides a flexible and effective way for protein function prediction. It benefits from the well-constructed mapping of domains and GO terms, as well as the reasonable strategy for inferring annotations of protein from those of its domains.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2014/641469DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4017789PMC
October 2015

Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors.

PLoS One 2013 8;8(8):e70204. Epub 2013 Aug 8.

Key Laboratory of Database and Parallel Computing of Heilongjiang Province, School of Computer Science and Technology, Heilongjiang University, Harbin, China.

Background: The identification of human disease-related microRNAs (disease miRNAs) is important for further investigating their involvement in the pathogenesis of diseases. More experimentally validated miRNA-disease associations have been accumulated recently. On the basis of these associations, it is essential to predict disease miRNAs for various human diseases. It is useful in providing reliable disease miRNA candidates for subsequent experimental studies.

Methodology/principal Findings: It is known that miRNAs with similar functions are often associated with similar diseases and vice versa. Therefore, the functional similarity of two miRNAs has been successfully estimated by measuring the semantic similarity of their associated diseases. To effectively predict disease miRNAs, we calculated the functional similarity by incorporating the information content of disease terms and phenotype similarity between diseases. Furthermore, the members of miRNA family or cluster are assigned higher weight since they are more probably associated with similar diseases. A new prediction method, HDMP, based on weighted k most similar neighbors is presented for predicting disease miRNAs. Experiments validated that HDMP achieved significantly higher prediction performance than existing methods. In addition, the case studies examining prostatic neoplasms, breast neoplasms, and lung neoplasms, showed that HDMP can uncover potential disease miRNA candidates.

Conclusions: The superior performance of HDMP can be attributed to the accurate measurement of miRNA functional similarity, the weight assignment based on miRNA family or cluster, and the effective prediction based on weighted k most similar neighbors. The online prediction and analysis tool is freely available at http://nclab.hit.edu.cn/hdmpred.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0070204PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3738541PMC
April 2014

Measuring gene functional similarity based on group-wise comparison of GO terms.

Bioinformatics 2013 Jun 9;29(11):1424-32. Epub 2013 Apr 9.

Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, P.R. China.

Motivation: Compared with sequence and structure similarity, functional similarity is more informative for understanding the biological roles and functions of genes. Many important applications in computational molecular biology require functional similarity, such as gene clustering, protein function prediction, protein interaction evaluation and disease gene prioritization. Gene Ontology (GO) is now widely used as the basis for measuring gene functional similarity. Some existing methods combined semantic similarity scores of single term pairs to estimate gene functional similarity, whereas others compared terms in groups to measure it. However, these methods may make error-prone judgments about gene functional similarity. It remains a challenge that measuring gene functional similarity reliably.

Result: We propose a novel method called SORA to measure gene functional similarity in GO context. First of all, SORA computes the information content (IC) of a term making use of semantic specificity and coverage. Second, SORA measures the IC of a term set by means of combining inherited and extended IC of the terms based on the structure of GO. Finally, SORA estimates gene functional similarity using the IC overlap ratio of term sets. SORA is evaluated against five state-of-the-art methods in the file on the public platform for collaborative evaluation of GO-based semantic similarity measure. The carefully comparisons show SORA is superior to other methods in general. Further analysis suggests that it primarily benefits from the structure of GO, which implies expressive information about gene function. SORA offers an effective and reliable way to compare gene function.

Availability: The web service of SORA is freely available at http://nclab.hit.edu.cn/SORA/
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt160DOI Listing
June 2013
-->