出 处:《光谱学与光谱分析》2022年第2期440-445,共6页Spectroscopy and Spectral Analysis
基 金:国家重点研发计划项目(2018YFF01011204)资助。
摘 要:在近红外光谱数据分析中,全光谱数据具有波长点多、冗余量大、共线性关系严重的特点,导致了部分波长点对建立校正模型没有积极作用,甚至还会降低模型的预测能力。波长选择被证明是有效避免上述问题的重要方法。针对近红外光谱的特性,提出了一种基于直接正交信号校正(DOSC)与蒙特卡罗方法(Monte Carlo,MC)结合的波长选择算法。与大多数根据波长的“重要性”进行选择的方法不同,MC-DOSC依据波长的“不重要”性进行选择。波长“不重要”性通过DOSC的权重w来度量。首先将w归一化作为波长被滤除的概率,以此建立波长选择的概率模型,并使用蒙特卡罗随机抽样得到N个波长子集的集合。在每一次抽样过程中,用选择的波长点建立PLS模型,计算相应的交叉验证均方根误差(RMSECV)。经过N次随机抽样后,以RMSECV最小时的PLS模型对应的波长子集作为备选子集。将备选子集包含的光谱数据作为新的光谱阵,重复上述过程直到RMSECV不再下降为止。迭代停止后,将RMSECV最小的备选子集作为最佳波长子集。采用玉米数据集和汽油数据集对该算法进行测试,同时与蒙特卡罗无信息变量消除法(MCUVE)、遗传算法(GA)、竞争性自适应权重取样法(CARS)三种算法进行比较。实验结果表明:该算法能大幅度减少波长点个数,并且相应的PLS模型的预测能力也提高了。玉米数据集的实验运行结果,波长点个数从全光谱的700个减少到15个,预测集相关系数从0.8282提高到0.9314,RMSEP从0.1098减少到0.0713。汽油数据集的实验运行结果,波长点个数从全光谱的301个减少到31个,预测集相关系数从0.9875提高到0.9939,RMSEP从0.2555减少到0.1788。该算法在2个数据集中的表现均优于对比的三种算法。In the analysis of near-infrared spectroscopy data,full-spectrum data has the characteristics of multiple wavelength points,large redundancy,and serious collinearity.This leads to some wavelength points that have no positive effect on the establishment of the correction model and even reduce the model’s predictive ability.Wavelength selection has proven to be an important method to avoid above problems effectively.Aiming at the characteristics of near-infrared spectroscopy,a wavelength selection algorithm based on the combination of Direct Orthogonal Signal Correction(DOSC)and Monte Carlo(MC)is proposed.Unlike most methods of selecting wavelength according to its“importance”,MC-DOSC selects wavelength according to its“unimportance”.The“unimportance”of wavelength is measured by the weight W of DOSC.Specifically,first,normalize was the probability of wavelength being filtered to establish the probability model of wavelength selection,and Monte Carlo random sampling is used to obtain the set of N wavelength subsets.The selected wavelength point is used to establish a PLS model in each sampling process,and the corresponding cross-validation root mean square error(RMSECV)is calculated.After N times of random sampling,the wavelength subset corresponding to the PLS model with minimum RMSECV is selected as the candidate subset.The spectral data contained in the candidate subset is used as a new spectral matrix,and the above process is repeated until the RMSECV no longer drops.After the iteration stops,the candidate subset with the smallest RMSECV is taken as the best wavelength subset.And compared with the three algorithms of Monte Carlo Uninformative Variable Elimination(MCUVE),Genetic Algorithm(GA)and Competitive Adaptive Weight Sampling(CARS).Experimental results show that the algorithm can greatly reduce the number of wavelength points,and the prediction ability of the corresponding PLS model is also improved.In the experimental results of the corn data set,the number of wavelength points is reduced fro
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...