检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴疆[1] 董婷[1] 蒋平 WU Jiang;DONG Ting;JIANG Ping(Department of Information Engineering,Yulin University,Yulin,Shanxi 719000,China;School of Computer Science and Technology,Xidian University,Xi’an,Shanxi 710071,China)
机构地区:[1]榆林学院信息工程学院,陕西榆林719000 [2]西安电子科技大学计算机科学与技术学院,陕西西安710071
出 处:《微型电脑应用》2020年第8期5-8,共4页Microcomputer Applications
基 金:国家自然科学基金(51864046);陕西省科技厅项目(2019NY-182)。
摘 要:应用半监督学习方法拉普拉斯支持向量机(Laplace Support Vector Machine, LapSVM)对蛋白质结构类进行预测。首先7个氨基酸理化性质参数作为替代模型将蛋白质序列转换为数字序列,自协方差变换(Autocross-Covariance, AC)用来描述具有一定间隔氨基酸残基之间的相互关系并将数字序列变换为统一长度的向量,构建样本的特征空间。然后在数据集中分别随机挑选20、50、80、110、140、170个样本作为无标签样本构建训练集,一对多分解策略和留一法用来评价LapSVM模型的预报能力。分类器对蛋白质样本类预测正确率为94.12%,与标准支持向量机算法(Support Vector Machine, SVM)方法90.69%的预测精度相比有明显的竞争力。实验结果有效验证了无标签样本的分布信息作为弱规则能有效提升分类器的预报性能。同时提供了一种新颖的思路,应用半监督方法解决全监督学习问题,更小的优化规模,更好的预报能力。The purpose of the study is to predict protein structural classes by using Laplace support vector machine(LapSVM) which is a novel semi-supervised learning method. Firstly, seven amino acid physicochemical properties cited from literature was applied to transform the protein sequences into numeric vectors, and auto covariance(AC) was used in transforming the physicochemical properties of the amino acids of given proteins into features space with the same size, which is suitable for training models. AC focuses on the neighboring effects and the interactions between residues with a certain distance apart in protein sequences. Secondly, 20, 50, 80, 110, 140 and 170 samples were randomly selected as unlabelled samples to construct training datasets, "one-against-all" strategy and leave-one-out method were employed to estimate the performance. The prediction accuracy 94.12% was obtained, and it is very promising compared with the accuracy 90.69% predicted by Support Vector Machine(SVM). The experimental results proofed that the unlabelled samples input as weak rules can lightly improve the prediction performances, simultaneously, a novel idea is using semi-supervised method to solve a supervised learning problem intends to less optimal scale and higher prediction accuracy.
关 键 词:半监督学习 蛋白质结构类 拉普拉斯支持向量机 自协方差变换
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.104