Maqsood Hayat Pakistan Institute of Engineering and Applied Sciences
Artif Intell Med 2017 05 10;78:14-22. Epub 2017 May 10.
Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan. Electronic address:
Golgi is one of the core proteins of a cell, constitutes in both plants and animals, which is involved in protein synthesis. Golgi is responsible for receiving and processing the macromolecules and trafficking of newly processed protein to its intended destination. Dysfunction in Golgi protein is expected to cause many neurodegenerative and inherited diseases that may be cured well if they are detected effectively and timely. Golgi protein is categorized into two parts cis-Golgi and trans-Golgi. The identification of Golgi protein via direct method is very hard due to limited available recognized structures. Therefore, the researchers divert their attention toward the sequences from structures. However, owing to technological advancement, exploration of huge amount of sequences was reported in the databases. So recognition of large amount of unprocessed data using conventional methods is very difficult. Therefore, the concept of intelligence was incorporated with computational model. Intelligence based computational model obtained reasonable results, but the gap of improvement is still under consideration. In this regard, an intelligent automatic recognition model is developed in order to enhance the true classification rate of sub-Golgi proteins. In this approach, discrete and evolutionary feature extraction methods are applied on the benchmark Golgi protein datasets to excerpt salient, propound and variant numerical descriptors. After that, an oversampling technique Syntactic Minority over Sampling Technique is employed to balance the data. Hybrid spaces are also generated with combination of these feature spaces. Further, Fisher feature selection method is utilized to reduce the extra noisy and redundant features from feature vector. Finally, k-nearest neighbor algorithm is used as learning hypothesis. Three distinct cross validation tests are used to examine the stability and efficiency of the proposed model. The predicted outcomes of proposed model are better than the existing models in the literature so far. Finally, it is anticipated that the proposed model will provide the foundation to pharmaceutical industry in drug design and research community to innovate new ideas in the area of computational biology and bioinformatics.