基于异质分类器集成的蛋白质结晶倾向性预测  

Prediction on protein crystallization by ensembling multiple heterogeneous classifiers

在线阅读下载全文

作  者:梁亮[1] Liang Liang(Institute of Network and Communication Technology,Sichuan Normal University,Chengdu 610066,China)

机构地区:[1]四川师范大学网络与通信技术研究所,四川成都610066

出  处:《南京理工大学学报》2021年第5期582-588,共7页Journal of Nanjing University of Science and Technology

基  金:国家自然科学基金(61871089)。

摘  要:X射线结晶学是确定蛋白质分子结构的重要方法之一。准确预测蛋白质的结晶倾向性对于基于X射线结晶学的蛋白质结构确定的成功率具有重要意义。该文提出了一种基于异质分类器集成的方法,以进一步提高蛋白质结晶倾向性预测的准确率。首先从蛋白质序列出发抽取氨基酸组成成分、伪氨基酸组成成分、伪位置特异性得分矩阵以及伪溶剂可及性特征,并将这些特征进行组合;然后,在特征空间训练多个异质分类器并进行集成。该文所提方法在公开训练集上的五重交叉验证及独立测试集上的马修斯系数分别达到了0.64及0.73。与现有的基于序列的蛋白质结晶倾向性预测方法的对比结果进一步验证了所提方法的有效性。X-ray crystallography is one of the most important methods for the determination of protein molecular structure.The accurate prediction of protein crystallization propensity plays a crucial role in improving the success rate of X-ray crystallography based protein structure determination.This paper proposes a method for protein crystallization propensity prediction based on ensembling multiple heterogeneous classifiers.Firstly,four types of sequence based features including amino acid composition,pseudo amino acid composition,pseudo position-specific scoring matrix,and pseudo-predicted solvent accessibility,are extracted to form the discriminative feature of a protein sequence;then,multiple heterogeneous classifiers,i.e.,support vector machine,radial basis function network and random forests,are trained and ensembled on the feature space.Experimental results on the publicly available benchmark dataset show that the proposed method achieves Mathew’s Correlation Coefficient(MCC)values of 0.64 and 0.73 on the training set over five-fold cross-validation and the independent validation set,respectively.In addition,the comparison results with other existing sequence based protein crystallization methods further demonstrate the efficacy of the proposed method.

关 键 词:蛋白质结晶倾向性 特征提取 径向基神经网络 支持向量机 随机森林 分类器集成 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象