检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈江鹏[1] 彭斌[1] 文雯[1] 曾庆[1] 唐小静[1] 胡珊[1] 文小焱 阙萍
机构地区:[1]重庆医科大学公共卫生与管理学院医学与社会研究中心健康领域社会风险预测治理协同创新中心,400016
出 处:《中国卫生统计》2015年第6期932-934,共3页Chinese Journal of Health Statistics
基 金:国家自然科学基金(81373103);重庆市科委基础与前沿研究计划项目(cstc2013jcyj A10009)
摘 要:目的将基于最大相关最小冗余(maximum relevance minimum redundancy,MRMR)的朴素贝叶斯分类器(naive bayesian classifier,NBC)应用于基因表达数据并与经典NBC、随机森林(random forests,RF)进行比较。方法采用Matlab与R软件编程,应用结肠癌与肺癌基因表达数据集,分别采用上述三种方法进行比较研究,使用10-折交叉验证方法估计经典NBC与RF的分类准确率。结果应用MRMR-NBC分析结肠癌基因表达数据集显示,采用信息熵(mutual information quotient,M IQ)法,当特征m=11时分类准确率达93.55%;而采用信息差(mutual information difference,M ID)法时,当m=15时分类准确率达到95.16%。应用MRMR-NBC分析肺癌基因表达数据集显示,采用MIQ法,当m=14时分类准确率最高达98.63%,而采用MID法时当m=12时分类准确率达到97.26%。而采用经典NBC分析结肠癌与肺癌基因表达数据时,分类准确率分别为66.67%、80.00%;RF在分析结肠癌与肺癌基因表达数据时,分类准确率分别为81.89%、77.62%。结论 M RM R-NBC能在仅有极少属性参与分类时,得到较高的分类准确率,优于经典NBC与RF。Objective To apply Naive Bayesian classifier with Maximum Relevance Minimum Redundancy(MRMR) feature selection methods into gene expression data, and to compare it with Naive Bayesian classifier( NBC ) and Random Forests (RF). Methods The three methods were applied to classify the colon and lung genes by Matlab and R software. 10-fold cross-validation was used to estimate the classification accuracy. Results When applying MRMR-NBC method to classify the colon genes,the classification accuracy reached 93.55% with features with mutual information quotient(MIQ) ,95.16% with with mutual information difference(MID). When applying MRMR-NBC method to classify the lung genes ,the classification accura- cy reached 98.63% with with MIQ,97. 26% with with MID. When applying NBC to classify both of the colon and lung genes, the classification accuracy reached 66. 67% and 80. 00% ; when applying Random Forests to classify both of the colon and lung genes,the classification accuracy reached 81.89% and 77.62%. Conclusion The classification accuracy of MRMR-NBC can reach higher than NBC and RF with fewer features.
关 键 词:最大相关最小冗余 朴素贝叶斯分类器 随机森林 特征选择
分 类 号:R195.1[医药卫生—卫生统计学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.59.235.245