基于k-近邻算法预测蛋白质热稳定性的研究  被引量:1

Prediction of protein thermostability with a k-nearest neighbors algorithm

在线阅读下载全文

作  者:张光亚[1] 李红春[1] 方柏山[1] 

机构地区:[1]华侨大学工业生物技术研究所,福建泉州362021

出  处:《计算机与应用化学》2008年第1期39-41,共3页Computers and Applied Chemistry

基  金:"863"计划资助项目(2006AA020102);国务院侨办科研基金(No.05Q0018).

摘  要:基于一级结构信息预测蛋白质热稳定性,对于利用计算机筛选热稳定性蛋白具有重要意义。本文采用k-近邻算法从序列出发预测蛋白质的热稳定性,用自一致性检验、交叉验证和独立样本测试等三种方法评估。仅用20种氨基酸组成作为特征变量时,识别的正确率分别可达100%,87.7%和89.6%;而引入8个新变量后,其精度分别为100%,89.6%和90.2%,对小蛋白质分子识别的精度提高了2.4%。同时探讨了蛋白质分子大小对识别效果的影响。The identification of the thermostability from the amino acid sequence information would be helpful in computational screening for thermostable proteins. The k-Nearest Neighbors (kNN) classifiers were applied to discriminate thermophilie and mesophilie proteins. Three methods, namely, self-consistency test, 5-fold cross-validation and independent testing with other dataset, were used to evaluate the performance and robust of the models. When 50 amino acid composition were used as variables, it achieved overall accuracy of 100% , 89.6% and 90. 2% , respectively. When another 8 variables were added, the overall accuracy was 100% , 89. 6% and 90.5% , the prediction accuracy for the small-size protein improved 2.4%. The influence of protein size on prediction accuracy was also addressed.

关 键 词:K-近邻 蛋白质热稳定性 模式识别 计算机筛选 

分 类 号:Q811[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象