多重支持向量机递归特征消除模型在癌症特征基因筛选中的应用  被引量:3

Application of multiple support vector machine recursive feature elimination model in cancer feature gene selection

在线阅读下载全文

作  者:徐文彬 夏翃[1] 郑卫英[1] 华琳[1] Xu Wenbin;Xia Hong;Zheng Weiying;Hua Lin(School of Biomedical Engineering, Capital Medical University, Beijing 100069, China)

机构地区:[1]首都医科大学生物医学工程学院,北京100069

出  处:《国际生物医学工程杂志》2019年第1期33-38,共6页International Journal of Biomedical Engineering

基  金:北京市自然科学基金(7142015).

摘  要:目的使用多重支持向量机递归特征消除算法(MSVM-RFE)对癌症的基因表达谱数据进行分析并计算基因排序分数,得到最优特征基因子集。方法从高通量基因表达数据库下载膀胱癌、乳腺癌、结肠癌和肺癌的基因表达谱数据并通过差异表达分析得到差异表达基因。对差异表达基因进行基于MSVM-RFE算法的特征基因排序并计算每种基因子集的平均测试误差,进而根据最小平均测试误差得到最优基因子集。基于4种癌症特征基因筛选前后的数据集,分别构建线性SVM并验证最优特征基因子集的分类效能。结果使用MSVM-RFE算法得到的最优特征基因子集,可使膀胱癌的分类准确率从(96.77±1.28)%提高至(99.85±0.46)%,使乳腺癌的分类准确率从(83.77±4.93)%提高至(88.30±3.85)%,肺癌的分类准确率从(72.69±2.41)%提高至(90.21±3.31)%,使结肠癌的分类准确率维持在较高的程度(>99.5%)。结论基于MSVM-RFE算法的特征基因提取可在一定程度上提高癌症的分类效能。Objective To analyze the cancergene expression profile data using multi-support vector machine recursive feature elimination algorithm (MSVM-RFE) and calculate the genetic ranking score to obtain the optimal feature gene subset. Methods Gene expression profiles of bladder cancer, breast cancer, colon cancer and lung cancer were downloaded from GEO (Gene Expression Omnibus) database.The differentially expressed genes were obtained by differential expression analysis. The differential gene expressions were sequenced by MSVM-RFE algorithm and the average test errors of each gene subset were calculated. Then the optimal gene subsetsof four kinds of cancer were obtained according to the minimum average test errors. Based on the datasets of four kinds of cancer characteristic genes before and after screening, linear SVM classifiers were constructed and the classification efficiencies of the optimal feature gene subsets were verified. Results Using the optimal feature gene subsetobtained by MSVM-RFE algorithm, the classification accuracy was improved from (96.77±1.28)% to (99.85±0.46)% for the bladder cancer data, improved from (83.77±4.93)% to (88.30±3.85)% for the breast cancer data, and improved from (72.69±2.41)% to (90.21±3.31)% for the lung cancer data.Besides, theoptimal feature gene subsetkept the classification accuracy of colon cancer classifierat a high level (>99.5%). Conclusions The feature gene extraction based on MSVM-RFE algorithm can improve the classification efficiency of cancer.

关 键 词:基因表达谱 递归特征消除 支持向量机 特征基因 

分 类 号:R73-3[医药卫生—肿瘤]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象