检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Bin Yu Cheng Chen Hongyan Zhou Bingqiang Liu Qin Ma
机构地区:[1]School of Life Sciences,University of Science and Technology of China,Hefei 230027,China [2]College of Mathematics and Physics,Qingdao University of Science and Technology,Qingdao 266061,China [3]Artificial Intelligence and Biomedical Big Data Research Center,Qingdao University of Science and Technology,Qingdao 266061,China [4]School of Mathematics,Shandong University,Jinan 250100,China [5]Department of Biomedical Informatics,College of Medicine,The Ohio State University,Columbus,OH 43210,USA
出 处:《Genomics, Proteomics & Bioinformatics》2020年第5期582-592,共11页基因组蛋白质组与生物信息学报(英文版)
基 金:supported by the National Natural Science Foundation of China(Grant No.61863010);the Key Research and Development Program of Shandong Province of China(Grant No.2019GGX101001);the Natural Science Foundation of Shandong Province of China(Grant No.ZR2018MC007)。
摘 要:Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.
关 键 词:Protein-protein interaction Feature fusion L1-regularized logistic regression Gradient tree boosting Machine learning
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7