机构地区:[1]陕西科技大学电子信息与人工智能学院,陕西西安710021 [2]暨南大学光电系,广东广州510632 [3]江西保利制药有限公司,江西赣州341900
出 处:《光谱学与光谱分析》2024年第3期737-743,共7页Spectroscopy and Spectral Analysis
基 金:国家药品监督管理局药品快速检验技术重点实验室开放课题(KF2022006);广州市科技计划项目珠江科技新星专项(201610010113);国家自然科学基金项目(62031021)资助。
摘 要:在近红外光谱(NIRS)波长筛选过程中,当变量数目远大于样本量时,特征波长的选择是一个极具挑战性的问题。Lasso与Elastic Net算法虽被用于大维小样本数据的变量选择,但二者均以最小平方误差作为损失函数的度量方法来选择特征变量。因此,当样本中含有异常点时,经两种算法建立的模型对异常点更加敏感,导致模型向异常点偏移,鲁棒性降低。针对上述问题,采用Huber函数作为损失函数,提出了Lasso-Huber法进行近红外特征波长选择,结合偏最小二乘(PLS)方法建立安胎丸质控指标成分的定量校正模型,并与全波长建模、 Lasso与Elastic-Net方法波长选择后建模的模型性能进行对比。本实验采集21批安胎丸的近红外光谱数据共116条,其中101条数据作为校正集,采用留一交叉验证法对模型进行内部验证,另外15条数据则作为验证集用于外部验证。对于校正集中的异常光谱,使用基于主成分分析(PCA)的马氏距离法(MD)进行检测。以安胎丸的质控指标成分之一阿魏酸为例,采用Lasso、 Elastic-Net和Lasso-Huber方法分别筛选了安胎丸样品无异常光谱中69、 155和87个特征波长。其中Lasso-Huber法结合PLS建立的预测模型效果最佳,外部验证的RP2和SEP分别为0.953 1和0.058 7。此外,通过对校正集中是否包含异常光谱的校正模型预测性能对比发现,Lasso-Huber法在包含异常光谱的建模中更具优势。结果显示,Lasso-Huber算法优选出最佳波长点数为88,结合PLS建立的模型性能R_(v)^(2)为0.967 3,而Lasso方法的R_(v)^(2)为0.840 5, Elastic-Net方法的R_(v)^(2)为0.834 7,全波长建模的R_(v)^(2)为0.852 0。可见,在含有异常光谱的样本中,Lasso-Huber法不仅减少了特征波段的数量,同时降低了算法对异常光谱的敏感性,提高了模型的准确度和鲁棒性。从简化模型的角度上比较,Lasso法和Elastic-Net法的建模时间分别为61.826 0和79.959 9 s,而Lasso-Huber建模�In near-infrared spectroscopy(NIRS)wavelength screening,selecting characteristic wavelengths is challenging problem when the number of variables is much larger than the sample size.Lasso and Elastic Net algorithms are used for variable selection for large-dimensional small-sample data,but both use the least square error to measure the loss function to select characteristic variables.Therefore,when the sample contains outliers,the model established using Lasso or Elastic Net algorithms is more sensitive to outliers,resulting in the model shifting to outliers and reduced robustness.Because of the above problems,this paper uses the Huber function as the loss function and proposes the Lasso-Huber wavelength selection method for near-infrared characteristic wavelength selection.Combined with the partial least squares(PLS)method,the quantitative correction model of the quality control index components of Antai pills is established and compared with the model performance of full wavelength modeling,Lasso and Elastic-Net method wavelength selection.In this experiment,116 NIRS data from 21 batches of Antai Pills were collected,of which 101 data were used as calibration sets.The model was internally verified by the leave-one-out cross-validation method,and the other 15 data were used as validation sets for external verification.The Mahalanobis distance method(MD)based on principal component analysis(PCA)was used for detection for outliers in the calibration set.Taking ferulic acid,one of the quality control index components of Antai pills,as an example,Lasso,Elastic-Net and Lasso-Huber methods were used to screen 69,155 and 87 characteristic wavelength points in the normal spectra of Antai pill samples.The prediction model established by the Lasso-Huber method combined with PLS was the best,and the R 2 p and SEP of the prediction set were 0.9531 and 0.0587.In addition,the Lasso-Huber method was found to be more advantageous in modeling with outliers by comparing the prediction performance of calibration models normal spect
关 键 词:近红外光谱 波长选择 大维小样本 定量校正模型 Lasso-Huber
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...