Deep Robust Framework for Protein Function Prediction using Variable-Length Protein Sequences.

Authors:
Ashish Ranjan
Ashish Ranjan
National Institutes of Health
United States
Akshay Deepak
Akshay Deepak
Iowa State University

IEEE/ACM Trans Comput Biol Bioinform 2019 Apr 16. Epub 2019 Apr 16.

The order of amino acids in a protein sequence enables the protein to acquire a conformation suitable for performing functions, thereby motivating the need to analyse these sequences for predicting functions. Although machine learning based approaches are fast compared to methods using BLAST, FASTA, etc., they fail to perform well for long protein sequences (with more than 300 amino acids). In this paper, we introduce a novel method for construction of two separate feature sets for protein using bi-directional long short-term memory network based on the analysis of fixed 1) single-sized segments and 2) multi-sized segments. The model trained on the proposed feature set based on multi-sized segments is combined with the model trained using state-of-the-art Multi-label Linear Discriminant Analysis (MLDA) features to further improve the accuracy. Extensive evaluations using separate datasets for biological processes and molecular functions demonstrate not only improved results for long sequences, but also significantly improve the overall accuracy over state-of-the-art method. The single-sized approach produces an improvement of +3.37% for biological processes and +5.48% for molecular functions over the MLDA based classifier. The corresponding numbers for multi-sized approach are +5.38% and +8.00%. Combining the two models, the accuracy further improves to +7.41% and +9.21% respectively.

Download full-text PDF

Source
https://ieeexplore.ieee.org/document/8692646/
Publisher Site
http://dx.doi.org/10.1109/TCBB.2019.2911609DOI Listing
April 2019
2 Reads

Publication Analysis

Top Keywords

amino acids
8
multi-sized segments
8
molecular functions
8
improve accuracy
8
biological processes
8
protein sequences
8
model trained
8
protein
6
processes molecular
4
method construction
4
multi-label linear
4
novel method
4
fail perform
4
construction separate
4
fasta fail
4
datasets biological
4
protein bi-directional
4
sets protein
4
feature sets
4
separate feature
4

Similar Publications