Amino acid encoding methods for protein sequences: a comprehensive review and assessment.

Authors:
Xiaoyang Jing
Xiaoyang Jing
School of Computer Science
Qiwen Dong
Qiwen Dong
School of Computer Science and Technology
China
Ruqian Lu
Ruqian Lu
School of Computer Science
Austin | United States

IEEE/ACM Trans Comput Biol Bioinform 2019 Apr 16. Epub 2019 Apr 16.

As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different with the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining with different algorithms. However, it does not attract enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. And then sixteen representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieve the best performance, and the structure-based and machine-learning encoding methods show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.

Download full-text PDF

Source
https://ieeexplore.ieee.org/document/8692651/
Publisher Site
http://dx.doi.org/10.1109/TCBB.2019.2911677DOI Listing
April 2019
5 Reads

Publication Analysis

Top Keywords

acid encoding
20
amino acid
20
encoding methods
16
encoding
14
review assessment
12
machine-learning encoding
8
methods protein
8
comprehensive review
8
prediction protein
8
methods
7
protein
6
amino
6
sixteen representative
4
encoding structure-based
4
structure-based encoding
4
encoding machine-learning
4
encoding sixteen
4
selected compared
4
protein secondary
4
secondary structure
4

Similar Publications