机构地区:[1]华东交通大学智能机电装备创新研究院,江西南昌330013
出 处:《光谱学与光谱分析》2023年第5期1419-1425,共7页Spectroscopy and Spectral Analysis
基 金:国家自然科学基金项目(31760344);国家奖后备项目(20192AEI91007)资助。
摘 要:糖度是评价苹果内部品质的重要指标之一。建立苹果糖度预测模型时,建模样本和波长的质量影响模型的准确性和后期的更新维护。以90个苹果样本为研究对象,采集350~1 150nm波段共1 044个波长的苹果近红外漫透射光谱,研究基于最小角回归索套算法(LASSOLars)优选建模样本和波长的有效性和可行性。结合使用Norris平滑、一阶微分和归一化变量排序对光谱预处理。根据浓度排序划分样本集的75%为原始训练集(68个)和25%为预测集(22个),使用LASSOLars建立优选训练集,对比LASSOLars和蒙特卡罗无信息变量消除、竞争性自适应重加权法,从样本、波长的数目和分布以及模型的结果进行对比分析。结果表明,优选训练集压缩了原始训练集16%的样本,在不改变原始训练集平均水平的前提下,更接近预测集分布,没有削弱模型质量。优选和原始的训练集交叉验证均方根误差RMSECV分别为0.460和0.491,交叉验证决定系数R_(CV)^(2)分别为0.913和0.916,预测集均方根误差RMSEP分别为0.462和0.471,预测集决定系数RP2分别为0.909和0.906。LASSOLars筛选出40个信噪比高的波长,数目最少,建立的模型效果最好,RMSECV,R_(CV)^(2),RMSEP,RP2和RPD分别是0.933,0.400,0.944,0.373和2.838。基于LASSOLars优化建模样本和波长建立苹果糖度预测模型,拓展了LASSOLars算法在子集选择方面的应用,为优化、更新和维护模型提供思路。Sugar degree,one of the important indicators,is evaluating apples’internal quality.When establishing a parsimonious model for analyzing apple sugar degree,the quality of calibrated samples and wavelengths affect the model’s accuracy,later update and maintenance.In this paper,90 apples were taken as objects,a total of 1044 wavelength points in the 350~1150 nm spectra bands were collected.This paper studied the efficiency and feasibility of the Lasso implemented Least Angle Regression(LASSOLars)on sample and wavelength optimization.A combination of Norris derivative filtering,first-derivation and Variable Sorting for Normalization was used to preprocess.Considering the concentration ranking,split 75%of the sample dataset into the original train dataset(68 apples)and 25%into the test dataset(22 apples),and obtained the optimal train subset by LASSOLars.Compared LASSOLars with other two variables selection methods such as Monte Carlo Uninformative Variable Elimination and Competitive Adaptive Reweight Sampling respectively.Analyzing the model results,samples and wavelength sizes&distributions.The result shows that the optimal train subset compressed 16%of the original train dataset.At the same time,not changing the average level of the original train dataset,and the distribution was closer to the test dataset,the model quality was not weakened after reducing calibrated samples.The RMSECV of the optimal train subset and original train dataset were 0.460 and 0.491,the R_(CV)^(2) were 0.913 and 0.916,the RMSEP were 0.462 and 0.471,R^(2) P were 0.909 and 0.906.LASSOLars selected out 40 wavelength points,the least size with the best results and highest signal-to-noise ratio,RMSECV,R_(CV)^(2),RMSEP,R^(2) P and RPD were 0.933,0.400,0.944,0.373,2.838.Based on the samples and wavelengths optimization by LASSOLars,which expanded the application of LASSOLars in subset selection,and provides ideas for optimizing,updating and maintaining the model.
关 键 词:近红外光谱分析技术 基于最小角回归索套算法 样本优选 波长优选
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...