融合PPI网络和基因表达的复合物识别算法  被引量:2

An algorithm for identifying protein complexes based on the integration of PPI network and Gene expression

在线阅读下载全文

作  者:李敏[1] 武学鸿[1] 费耀平[1] 

机构地区:[1]中南大学信息科学与工程学院,长沙410083

出  处:《系统工程理论与实践》2014年第2期437-443,共7页Systems Engineering-Theory & Practice

基  金:国家自然科学基金(61003124;61370024);教育部新世纪优秀人才支持计划(NCET-12-0547)

摘  要:从大规模相互作用网络中识别蛋白质复合物,对解释特定的生物进程和预测蛋白质功能具有重要作用,同时也是后基因组时代一个最重要的研究课题.考虑到传统仅基于蛋白质相互作用网络(PPI网络)的蛋白质复合物识别算法可靠性不高,本文提出了一种新的融合PPI网络和基因表达数据的蛋白质复合物识别算法IPCIPG.区别于之前用基因表达数据评估PPI网络可靠性的做法,本文提出在蛋白质复合物的识别过程中将PPI网络和基因表达数据有机地结合起来.算法IPCIPG首先根据边聚集系数(ECC)与蛋白质间共表达的相关性(PCC)计算PPI网络中每个节点的权重,权重最大的节点作为种子,然后从种子节点开始扩充生成稠密子图.基于酵母数据集的实验结果表明,算法IPCIPG较其他算法HUNTER,HC-PIN,CMC,SPICI,MOCDE,MCL能够更准确,更有效地识别出具有特定生物意义的蛋白质复合物.Identifying protein complexes from the large-scale protein interaction network is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Generally, the traditional protein complex discovery algorithms are only based on the protein-protein interaction network (PPI network), and are not so accurate. In this paper, a novel algorithm IPCIPG is proposed based on the integration of the PPI network and the gene expression data. Different from other previous methods which use gene expression data to evaluate the reliability of PPIs, IPCIPG integrates the gene expression data into PPI network during the identification of protein complexes. IPCIPG uses the edge clustering coefficient (ECC) and the co-expression correlation between proteins (PCC) to calculate the weight of each node in the PPI network. And then the node with the highest weight is selected as seed, then, a dense sub-graph will be obtained by extending from the seed. The experiment results on the data of Saccharomyces cerevisiae show that IPCIPG can identify the protein complexes with specific biological meaning more effectively, precisely and comprehensively than the other algorithms HUNTER, HC-PIN, CMC, SPICI, MOCDE, and MCL.

关 键 词:系统生物学 蛋白质相互作用网络 蛋白质复合物 基因表达数据 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象