检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐周波[1] 杨健[1] 刘华东[1,2] 黄文文 XU Zhoubo;YANG Jian;LIU Huadong;HUANG Wenwen(Guangxi Key Laboratory of Trusted Software(Guilin University of Electronic Technology),Guilin Guangxi 541004,China;School of Mechanical and Electrical Engineering,Guilin University of Electronic Technology,Guilin Guangxi 541004,China)
机构地区:[1]广西可信软件重点实验室(桂林电子科技大学),广西桂林541004 [2]桂林电子科技大学机电工程学院,广西桂林541004
出 处:《计算机应用》2020年第5期1510-1514,共5页journal of Computer Applications
基 金:国家自然科学基金资助项目(61762027);广西自然科学基金资助项目(2017GXNSFAA198172)。
摘 要:蛋白质相互作用(PPI)网络中存在大量不确定性及已知蛋白质复合物数据的不完整性,单独地根据结构信息进行搜索或对已知复合物进行监督学习的方法在识别蛋白质复合物的准确性上存在不足。对此,提出一种XGBoost模型与复合物拓扑结构信息相结合的搜索方法(XGBP)。首先,根据复合物拓扑结构信息进行特征提取;然后,把所提取的特征用XGBoost模型进行训练;最后,将拓扑结构信息与监督学习方法相结合,建立特征与复合物之间的映射关系以提高蛋白质复合物预测的准确性。该算法分别与目前流行的马尔可夫聚类算法(MCL)、极大团聚类方法(CMC)、基于核心-附属结构算法(COACH)、快速层级聚类算法(HC-PIN)、基于重叠邻居的扩展聚类(ClusterONE)、分子复合物检测算法(MCODE)、基于不确定图模型的蛋白质复合物检测方法(DCU)和加权核心-附属算法(WCOACH)这八种非监督学习算法和三种监督学习方法贝叶斯网络(BN)、支持向量机(SVM)、回归模型(RM)进行比较,所提方法在精准度、敏感度、F-measure方面显示出良好的性能。Large amount of uncertainty in PPI network and the incompleteness of the known protein complex data add inaccuracy to the methods only considering the topological structural information to search or performing supervised learning to the known complex data.In order to solve the problem,a search method called XGBoost model for Predicting protein complex(XGBP)was proposed.Firstly,feature extraction was performed based on the topological structural information of complexes.Then,the extracted features were trained by XGBoost model.Finally,a mapping relationship between features and protein complexes was constructed by combining topological structural information and supervised learning method,in order to improve the accuracy of protein complex prediction.Comparisons were performed with eight popular unsupervised algorithms:Markov CLustering(MCL),Clustering based on Maximal Clique(CMC),Core-Attachment based method(COACH),Fast Hierarchical clustering algorithm for functional modules discovery in Protein Interaction(HC-PIN),Cluster with Overlapping Neighborhood Expansion(ClusterONE),Molecular COmplex DEtection(MCODE),Detecting Complex based on Uncertain graph model(DCU),Weighted COACH(WCOACH);and three supervised methods:Bayesian Network(BN),Support Vector Machine(SVM),Regression Model(RM).The results show that the proposed algorithm has good performance in terms of precision,sensitivity and F-measure.
关 键 词:蛋白质复合物 XGBoost模型 蛋白质相互作用网络 图数据挖掘 机器学习
分 类 号:TP399[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.29.162