基于整合蛋白质进化保守性的伪氨基酸组成成分预测蛋白质亚细胞定位(英文)  被引量:2

USING PSEUDO AMINO ACID COMPOSITION TO PREDICT PROTEIN SUBCELLULAR LOCALIZATION:APPROACHED BY INCORPORATING EVOLUTIONARY CONSERVATION INFORMATION

在线阅读下载全文

作  者:李利珍[1] 董自梅[1,2] 

机构地区:[1]濮阳职业技术学院实验实训中心,河南濮阳45700 [2]河南师范大学生命科学学院,河南新乡453007

出  处:《生物物理学报》2009年第2期125-132,共8页Acta Biophysica Sinica

基  金:supported by a grant from The Young College Teachers Projects in Henan Province (2007-335)

摘  要:蛋白质亚细胞定位信息对于确定蛋白质功能非常重要,它可以提供蛋白质在什么细胞环境下相互作用或与其它分子作用的信息,另外,如果知道蛋白质在细胞中的定位将有助于在细胞水平上理解复杂的蛋白质调控路径。面对后基因时代产生的海量蛋白质序列数据,迫切需要-些自动、快速、准确地确定蛋白质亚细胞定位的方法。为此,通过整合蛋白质进化保守信息,文章提出一种新的方法预测亚细胞定位。该方法基于Chou的伪氨基酸组成成分概念,应用改进的进化保守性算法计算蛋白质序列中每一个残基的保守值,从而使每一蛋白质序列可用基于小波多尺度能量而构建的特征向量来表示。另外,蛋白质序列还可用其它特征提取方法提取的特征向量来表示,如氨基酸组成成分、加权自相关函数和矩描述子。将这些特征向量输入到多类支持向量机分类器,通过积规则系统融合这四类特征分类器的分类结果。与他人结果相比,在Jackkife交叉验证下和独立样本测试下,该方法获得了较高的预测精度,说明提出的整合蛋白质进化保守性和多特征分类器融合思想,对于蛋白质亚细胞定位预测是有效的,可与现有方法互补。Information of the subcellular locations of proteins is important because it can provide useful insights about their functions, as well as how and in what kind of cellular environments they interact with each other and with other molecules. Knowledge of the localization of proteins within cellular compartments can help understand the intricate pathways that regulate biological processes at the cellular level. Facing the explosion of newly generated protein sequences in the post genomic era, developing an automated method for fast and reliably annotating their subcellular locations is becoming more and more important. Here, a novel approach was developed by incorporating protein evolutionary conservation information. Based on the concept of Chou's pseudo amino acid composition (PseAAC) and per residue conservation score calculated with an improved evolutionary conservation algorithm, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Then, the feature vectors of all protein sequences are further input into multi-class support vector machines to predict 12 kinds of subcellular locations. Finally, the results of four kinds of feature classifiers were fused through a product rule system. Compared with the results reported by the previous investigators, higher success rates were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multi-features classifiers are quite encouraging and promising, and may become a useful tool in complementing the existing methods.

关 键 词:进化信息 多尺度能量 加权自相关函数 矩描述子 融合 亚细胞定位 

分 类 号:Q617[生物学—生物物理学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象