近红外光谱结合Stacking集成学习的猕猴桃糖度检测研究  

Study on Sugar Content Detection of Kiwifruit Using Near-Infrared Spectroscopy Combined With Stacking Ensemble Learning

在线阅读下载全文

作  者:郭志强[1] 张博涛 曾云流[2] GUO Zhi-qiang;ZHANG Bo-tao;ZENG Yun-liu(College of Information Engineering,Wuhan University of Technology,Wuhan 430070,China;National Key Laboratory for Germplasm Innovation&Utilization of Horticultural Crops,Huazhong Agricultural University,National R&D Center for Citrus Preservation,Wuhan 430070,China)

机构地区:[1]武汉理工大学信息工程学院,湖北武汉430070 [2]华中农业大学果蔬园艺作物种质创新与利用全国重点实验室·国家柑橘保鲜技术研发专业中心,湖北武汉430070

出  处:《光谱学与光谱分析》2024年第10期2932-2940,共9页Spectroscopy and Spectral Analysis

基  金:江西省科学院重点研发项目(2021YSBG21019);猕猴桃质量安全与加工保鲜岗位项目(CARS26);湖北省重点研发项目(2023BBB064)资助。

摘  要:利用近红外光谱技术Stacking集成学习对猕猴桃糖度的无损检测。以湖北“云海一号”猕猴桃为研究对象,采用红外分析仪获取了280个样本的光谱数据,包含了4000~10000cm^(-1)范围内的1557个波长数据,使用折射仪测量糖度值。通过蒙特卡洛随机采样结合T检验的奇异样本识别算法筛除异常值样本。利用SPXY算法按照4∶1的比例划分训练集和测试集。使用多元散射校正(MSC)、SG平滑滤波(SG)、趋势校正(DT)、矢量归一化(VN)、标准正态变换(SNV)五种方法对数据进行预处理。使用无信息变量消除法(UVE)、竞争性自适应重加权算法(CARS)和区间变量迭代空间收缩特征选择算法(iVISSA)提取特征波长,使用连续投影算法(SPA)进行二次提取,消除共线性变量。由于单一模型的泛化能力有限,为了扩大建模能力,设计了一种基于Stacking算法的集成学习模型。选择贝叶斯岭回归(BRR)、偏最小二乘回归(PLSR)、支持向量机回归(SVR)以及人工神经网络(ANN)作为基学习器,线性回归(LR)作为元学习器建立集成模型,比较不同组合下集成模型的性能。使用Pearson相关系数分析基学习器与集成模型之间的关系。结果表明:在五种预处理方法之中,矢量归一化的效果最佳。对预处理后的光谱进行特征波长提取,结果显示VN-CARS-PLSR模型效果最好,在测试集上的RP2为0.805,RMSEP为0.498。模型提取了177个特征波长,数据量相比于原始光谱减少了88.6%。通过Stacking算法对基学习器进行融合,对比不同的组合方式,发现PLS+SVR+ANN集成模型预测精度最高,RP2达到了0.853,RMSEP下降至0.433。通过Pearson相关系数分析了基学习器对集成模型性能的影响。研究表明,与单一模型相比,Stacking集成模型能够进行更加全面的建模,具有更高的泛化能力,该方法为猕猴桃糖度品质的无损检测提供了技术支持。In this study,we employ near-infrared spectroscopy with Stacking ensemble learning to perform non-destructive sugar content analysis in kiwifruit.Our research focuses on the“Yunhai No.1”kiwifruit variety from Hubei.Using an infrared analyzer,we gathered spectral data from 280 samples,spanning 1557 wavelengths in the 4000~10000 cm^(-1) range,and measured sugar content with a refractometer.Outliers were identified and excluded using a singular sample identification algorithm that combines Monte Carlo random sampling with a T-test.The SPXY algorithm was then employed to split the data into training and testing sets in a 4∶1 ratio.Data preprocessing involved multiple scattering corrections(MSC),Savitzky-Golay smoothing(SG),de-trending(DT),vector normalization(VN),and standard normal variable(SNV)transformations.Feature wavelengths were initially selected using uninformative variable elimination(UVE),competitive adaptive reweighted sampling(CARS),and interval variable iterative space shrinkage approach(iVISSA),followed by a secondary selection with the successive projections algorithm(SPA)to remove collinear variables.To address the limitations of single models in generalization,we designed an integrated learning model using the Stacking algorithm.This model incorporated Bayesian ridge regression(BRR),partial least squares regression(PLSR),support vector regression(SVR),and artificial neural networks(ANN)as base learners,with linear regression(LR)serving as the meta-learner.We assessed the performance of various ensemble model combinations and analyzed the influence of base learners on ensemble performance using the Pearson correlation coefficient.Experimental results indicated that vector normalization was the most effective among the five preprocessing methods.The VN-CARS-PLSR model demonstrated superior performance,with R2P of 0.805 and RMSEP of 0.498,identifying 177 feature wavelengths and reducing data volume by 88.6%compared to the original spectrum.Comparisons of different base learner combinations in the

关 键 词:猕猴桃 近红外光谱 糖度 Stacking集成学习 模型融合 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象