检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]衡阳师范学院计算机科学系,湖南衡阳421002 [2]衡阳师范学院数学与计算科学系,湖南衡阳421002
出 处:《计算机与应用化学》2013年第9期1038-1042,共5页Computers and Applied Chemistry
基 金:湖南省自然科学基金项目(12JJ4058);衡阳师范学院科研基金项目(09A36)
摘 要:传统随机文法模型预测RNA二级结构需要寻找足够多的相关序列样本,这限制了该方法的实际应用。为有效利用大量未标注的RNA序列进行结构预测,将半监督学习方法融入到随机文法模型中,采用少量已标注的RNA样本和大量未标注样本作为预测模型的训练集。设计了基于EM算法的半监督学习预测模型,该模型将基于产生式方法的SCFG模型作为分类器,通过训练对未标记的RNA序列进行标注,再将己标注的序列逐步合并到已标记样本集中,并能够调节已标记样本和未标记样本所占的比例,最后输出结构标签序列。实验结果表明,通过对多种混合了已标注和未标注RNA序列集的测试,验证了该方法可有效地利用未标注序列数据,大大降低了对已标注序列样本的需求数量,提高了预测精度,并测试了掺入不同的未标记序列数量对模型预测性能的影响。To predict RNA secondary structures, traditional stochastic grammar models need to collect plenty of related RNA sequences, which limits the practical application of this method. In order to use a large number of unlabeled RNA sequences effectively for structure prediction, the Semi-supervised method has been applied to stochastic grammar models. We use a small amount of labeled RNA samples and a large number of unlabeled samples as a training set of prediction model. Designing a semi-supervised learning model based on EM algorithm, using a SCFG model based on generative method as classifier, we labeled the unlabeled RNA sequences through training, and then gradually merged into labeled Dataset. Moreover, the model can regulate the proportion of labeled and unlabeled sequences, finally It can output structure tags sequence. By experiment result show, through training variety of the mixture of RNA sequence set, this method can utilize unlabeled sequences data effectively, greatly reduces the demand for the number of related sequence samples, and improve the prediction accuracy. In addition, we had measured the performance of the model prediction influenced by different amount of unlabeled sequences.
分 类 号:TP301.2[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.124.95