基于随机森林结合博弈论的特征选择算法在近红外光谱分类中的应用研究  被引量:8

Research on Application of Feature Selection Algorithm Based on Combination of Random Forest and Game Theory in Near Infrared Spectroscopy

在线阅读下载全文

作  者:孔清清 丁香乾[1] 宫会丽[1] 李忠任 唐兴宏 于春霞 

机构地区:[1]中国海洋大学信息科学与工程学院,山东青岛266100 [2]云南中烟工业有限责任公司技术中心,云南昆明650024

出  处:《分析测试学报》2017年第10期1203-1207,共5页Journal of Instrumental Analysis

基  金:国家科技支撑计划项目(2015BAF12B01);云南中烟工业有限责任公司项目(JSZX2014YL01;20530001020152000086)

摘  要:针对近红外光谱中的噪声和冗余信息导致分类模型识别率低的问题,提出了随机森林结合博弈论的特征选择算法。该算法首先根据随机森林对特征重要性进行度量,优选出对分类具有一定相关性的特征;然后利用改进的夏普利值结合互信息计算优选特征的权重,从加权后的特征集合中去掉冗余得到最优特征子集。为了验证算法的有效性,将其应用于烟叶产地识别模型,实验结果表明,该文所提出的特征选择算法对烟叶产地识别效果较好,分类识别率可达95.88%。The feature selection algorithm based on the combination of random forest and game theory was proposeed in this paper as noise and redundant information in the near infrared spectroscopy would lead to the low recognition rate of a model. This algorithm was first used to measure the feature significance according to the random forest and select some features related to classification, then compute the weights of selected characters by using the improved Shapley values and mutual informa- tion computed to remove redundant information from the weighted feature set and get the optimal fea- ture subset. To validate effectiveness of this algorithm, the tobacco leaf production area identification model was established. The experimental results indicated that the algorithm proposed in this paper had a good recognition on the area of tobacco leaf production with a recognition rate of 95.88%.

关 键 词:近红外光谱 随机森林 特征选择 夏普利值 产地识别 

分 类 号:O657.3[理学—分析化学] O433.4[理学—化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象