蒙特卡洛-偏最小二乘回归系数法用于近红外光谱变量筛选  被引量:6

Variable selection in near infrared spectroscopic data using monte carlo-partial least square regression coefficient method

在线阅读下载全文

作  者:张明锦[1] 杜一平[2] 

机构地区:[1]青海师范大学化学系,西宁810008 [2]上海市功能性材料化学重点实验室,华东理工大学,上海200237

出  处:《分析试验室》2013年第2期12-16,共5页Chinese Journal of Analysis Laboratory

基  金:国家自然科学基金项目(20975039)资助

摘  要:提出了一种蒙特卡洛-偏最小二乘回归系数法用于近红外光谱的变量筛选。方法主要包含如下几步:(1)采用蒙特卡洛采样方式,建立多个子集;(2)对每个子集建模,计算其回归系数,并按回归系数绝对值大小对各子模型中的变量进行排序;(3)按频数统计方法对波长排序;(4)对上步中排序后的波长以逐步累加进入最佳变量子集的方式进行交互验证,用以选择最佳变量集。将方法用于生物样品溶液和烟草样品近红外光谱的变量筛选,最终分别从原始的1234及1557个变量中选择了27和68个特征变量,对独立测试集进行预测的RMSEP分别从全谱变量的0.02716和0.06411降低为0.02372和0.03977。方法可有效地对近红外光谱进行变量筛选。A Monte Carlo-Partial Least Square Regression Coefficient (MC-PLSRC) method was proposed for feature selection from near infrared spectroscopic (NIRS) data. The method mainly includes 4 steps : ( 1 ) Create multi subsets by using Monte Carlo sampling; (2) Modeling for each subset and sort the variables according to their absolute values of regression coefficient. (3) Sequencing the wavelength variables by count the frequency of each variable ranked in step 2; (4) Determine the optimum feature set through cross validation, where the features were selected from the sorted wavelengths in the "accumulation step by step" manners. The method was used for feature selection from NIRS data of a set of biological sample solutions and tobacco samples. As results, 27 and 68 features were selected from the original variables, and the RMSEP on independent test sets were dropped down from 0. 02716 and O. 06411 for full speetra to O. 02372 and 0. 03977 for features, respectively.

关 键 词:蒙特卡洛采样 回归系数 近红外光谱 变量筛选 

分 类 号:O657.3[理学—分析化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象