融合OMP和PLS的粮食作物近红外光谱变量选择  

Selection of Near-Infrared Spectral Variables of Food Crops Combining Orthogonal Matching Tracking and Partial Least Squares

作  者:李四海[1] 朱刚 刘明奇 董雯 Li Sihai;Zhu Gang;Liu Mingqi;Dong Wen(School of Medical Information Engineering,Gansu University of Chinese Medicine,Lanzhou 730000)

机构地区:[1]甘肃中医药大学医学信息工程学院,兰州730000

出  处:《中国粮油学报》2025年第1期220-224,共5页Journal of the Chinese Cereals and Oils Association

基  金:甘肃省科技计划项目(21JR1RA272),甘肃省教育厅高校教师创新基金项目(2023B-105)。

摘  要:为进一步解决正交匹配追踪算法用于近红外光谱定量分析时存在的偏差小、方差大、选择变量较多、模型容易过拟合的问题,提出了一种融合正交匹配追踪和偏最小二乘回归的正交匹配偏最小二乘变量选择方法OMPLS(Orthogonal matching pursuit based partial least squares regression)。OMPLS为前向变量选择方法,算法根据OMP回归系数绝对值大小评价光谱变量重要性,使用偏最小二乘回归和贝叶斯信息准则确定剩余光谱变量中的重要变量,最终得到满足给定数量要求的最优变量集合。分别在corn数据集和wheat kernels数据集上进行变量选择实验,根据选择变量个数、RMSEC和RMSEP比较PLS、OMP、OMPLS 3种变量选择方法的性能。实验结果表明:OMPLS方法在corn数据集和Wheat kernels数据集上选择变量个数、RMSEP值均小于OMP方法,表明模型泛化能力有了一定程度的提高。OMPLS变量选择方法以BIC指标作为模型选择准则,在模型复杂度和预测能力之间取得平衡。与OMP方法相比,能够进一步减少选择变量的数量,防止过拟合,提高模型的预测能力和可解释性。In order to further solve the problems of small deviation,large variance,multiple selection variables,and easy overfitting of the model in the quantitative analysis of near-infrared spectroscopy by using the orthogonal matching tracking algorithm,an orthogonal matching partial least squares regression(OMPLS,Orthogonal matching pursuit based partial least squares regression)method was proposed.It combined orthogonal matching tracking and partial least squares regression.The OMPLS was a forward variable selection method,by which the algorithm evaluated the importance of spectral variables based on the absolute value of OMP regression coefficients.Partial least squares regression and Bayesian information criteria were adopted to determine important variables in the remaining spectral variables,ultimately obtaining the optimal set of variables that met the given quantity requirements.Variable selection experiments were conducted on the corn dataset and the Wheat kernels dataset,and the performance of PLS,OMP,and OMPLS variable selection methods was compared based on the number of selected variables,RMSEC,and RMSEP.The experimental results indicated that the number of selected variables and RMSEP values of the OMPLS method were smaller than those of the OMP method based on the corn dataset and the wheat kernels dataset,indicating that the model's generalization ability had been improved to a certain extent.The OMPLS variable selection method utilized the BIC index as the model selection criterion,achieving a balance between model complexity and predictive ability.Compared with the OMP method,the number of selected variables could be further reduced to prevent overfitting and improve the predictive ability and interpretability of the model.

关 键 词:近红外光谱 变量选择 正交匹配追踪 偏最小二乘 贝叶斯信息准则 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象