半监督多标记学习的基因功能分析  被引量:5

Gene function analysis of semi-supervised multi-label learning

在线阅读下载全文

作  者:陈晓峰[1] 王士同[1] 曹苏群[1] 

机构地区:[1]江南大学信息工程学院,江苏无锡214122

出  处:《智能系统学报》2008年第1期83-90,共8页CAAI Transactions on Intelligent Systems

基  金:国家“863”基金资助项目(2006AA10Z313);国家自然科学基金资助项目(60773206/F020106,60704047/F030304);国防应用基础研究基金资助项目(A1420461266);教育部跨世纪优秀人才支持计划基金资助项目(NCET-04-0496);教育部科学研究重点基金资助项目(105087)

摘  要:传统的机器学习主要解决单标记学习,即一个样本仅有一个标记.在生物信息学中,一个基因通常至少具有一个功能,即至少具有一个标记,与传统学习方法相比,多标记学习能更有效地识别生物相关基因组的功能.目前的研究主要集中在监督多标记学习算法.然而,研究半监督多标记学习算法,从已标记和未标记的基因表达数据中学习,仍然是未解决问题.提出一种有效的基因功能分析的半监督多标记学习算法SML_SVM.首先,SML_SVM根据PT4方法,将半监督多标记学习问题转化为半监督单标记学习问题,然后根据最大后验概率原则(MAP)和K近邻方法估计未标记样本的标记,最后,用SVM求解单标记学习问题.在yeast基因数据和genbase蛋白质数据上的实验表明,SML_SVM性能比基于PT4方法的MLSVM和自训练MLSVM更优.Conventional machine learning is used only for single label learning, implying that every sample has only one label. However, in bioinformatics, a gene has more than one function, so it needs more than one label. Therefore, multi-label learning is more effective for identifying gene groups than conventional learning approach. Current research mainly focuses on supervised multi-label learning. The problem of effective semi-supervised multi-label learning strategies for labeled examples and unlabeled examples of gene expression datasets still remains unsolved. In this paper, a semi-supervised multi-label learning algorithm, named SML_SVM, is presented as an effective multi-label learner for analysis of gene expressions with at least one function. First, the proposed SML_SVM algorithm transforms the semi-supervised multi-label learning into corresponding semi-supervised single-label learning by the PT4 method, then it labels unlabeled examples using the maximum a posteriori (MAP) principle in combination with the K-nearest neighbor method, and finally, it solves the corresponding single-label learning problem using SVM. The distinctive characteristic of the proposed algorithm is its efficient integration of SVM-based single-label learning with MAP and K-nearest neighbor methods. Experimental results with a real Yeast gene expression dataset and a Genbase protein dataset show that the proposed SML SVM algorithm outperforms the PT4- based MLSVM method and self-training MLSVM.

关 键 词:半监督 多标记 自训练 支持向量机 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象