基于最大相关最小冗余朴素贝叶斯分类器的应用  被引量:1

Application of Naive Bayesian Classifier Based on Maximum Relevance Minimum Redundancy Method

在线阅读下载全文

作  者:陈江鹏[1] 彭斌[1] 文雯[1] 曾庆[1] 唐小静[1] 胡珊[1] 文小焱 阙萍 

机构地区:[1]重庆医科大学公共卫生与管理学院医学与社会研究中心健康领域社会风险预测治理协同创新中心,400016

出  处:《中国卫生统计》2015年第6期932-934,共3页Chinese Journal of Health Statistics

基  金:国家自然科学基金(81373103);重庆市科委基础与前沿研究计划项目(cstc2013jcyj A10009)

摘  要:目的将基于最大相关最小冗余(maximum relevance minimum redundancy,MRMR)的朴素贝叶斯分类器(naive bayesian classifier,NBC)应用于基因表达数据并与经典NBC、随机森林(random forests,RF)进行比较。方法采用Matlab与R软件编程,应用结肠癌与肺癌基因表达数据集,分别采用上述三种方法进行比较研究,使用10-折交叉验证方法估计经典NBC与RF的分类准确率。结果应用MRMR-NBC分析结肠癌基因表达数据集显示,采用信息熵(mutual information quotient,M IQ)法,当特征m=11时分类准确率达93.55%;而采用信息差(mutual information difference,M ID)法时,当m=15时分类准确率达到95.16%。应用MRMR-NBC分析肺癌基因表达数据集显示,采用MIQ法,当m=14时分类准确率最高达98.63%,而采用MID法时当m=12时分类准确率达到97.26%。而采用经典NBC分析结肠癌与肺癌基因表达数据时,分类准确率分别为66.67%、80.00%;RF在分析结肠癌与肺癌基因表达数据时,分类准确率分别为81.89%、77.62%。结论 M RM R-NBC能在仅有极少属性参与分类时,得到较高的分类准确率,优于经典NBC与RF。Objective To apply Naive Bayesian classifier with Maximum Relevance Minimum Redundancy(MRMR) feature selection methods into gene expression data, and to compare it with Naive Bayesian classifier( NBC ) and Random Forests (RF). Methods The three methods were applied to classify the colon and lung genes by Matlab and R software. 10-fold cross-validation was used to estimate the classification accuracy. Results When applying MRMR-NBC method to classify the colon genes,the classification accuracy reached 93.55% with features with mutual information quotient(MIQ) ,95.16% with with mutual information difference(MID). When applying MRMR-NBC method to classify the lung genes ,the classification accura- cy reached 98.63% with with MIQ,97. 26% with with MID. When applying NBC to classify both of the colon and lung genes, the classification accuracy reached 66. 67% and 80. 00% ; when applying Random Forests to classify both of the colon and lung genes,the classification accuracy reached 81.89% and 77.62%. Conclusion The classification accuracy of MRMR-NBC can reach higher than NBC and RF with fewer features.

关 键 词:最大相关最小冗余 朴素贝叶斯分类器 随机森林 特征选择 

分 类 号:R195.1[医药卫生—卫生统计学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象