基于数据增强策略和卷积神经网络的近红外光谱分析研究  

Near Infrared Spectral Analysis Based on Data Augmentation Strategy and Convolutional Neural Network

在线阅读下载全文

作  者:郑运 杨思雨 王涛[1] 邓焯文 兰维杰 云永欢 潘磊庆[2] ZHENG Yun;YANG Si-Yu;WANG Tao;DENG Zhuo-Wen;LAN Wei-Jie;YUN Yong-Huan;PAN Lei-Qing(College of Food Science and Engineering,Hainan University,Haikou570228,China;College of Food Science and Technology,NanjingAgricultural University,Nanjing 430000,China)

机构地区:[1]海南大学食品科学与工程学院,海口570228 [2]南京农业大学食品科技学院,南京430000

出  处:《分析化学》2024年第9期1266-1276,共11页Chinese Journal of Analytical Chemistry

基  金:海南省重点研发项目(No.ZDYF2024XDNY197);海南省自然科学基金项目(Nos.323QN202,322CXTD523);国家自然科学基金项目(No.22164008);海南省院士团队创新中心平台资助。

摘  要:近红外光谱技术结合化学计量学算法已广泛应用于食品和药品等领域的定性和定量分析。然而,传统化学计量学方法,特别是线性分类方法,在解决多分类问题时的效果不佳。卷积神经网络(CNN)能够提取数据中的深层次特征,适合处理非线性关系,但其建模性能依赖样本量的大小和多样性,而近红外光谱样本数据的采集和预处理过程通常耗时且费力,获取样本成本较高。本研究提出了一种基于数据增强策略和CNN的近红外光谱定性分析方法。此数据增强策略分为两步:(1)分别采用Bootstrap重采样和生成对抗网络(GAN)方法对3个近红外光谱数据集(药片、咖啡和葡萄)进行样本扩增;(2)将原始样本(Y)分别与Bootstrap扩增样本(B)和GAN扩增样本(G)进行组合,得到3种增强数据集(Y-B、Y-G和Y-B-G)。在此基础上,设计了适用于此数据集的CNN模型结构,由2个一维卷积层、1个最大池化层和1个全连接层组成。与偏最小二乘判别分析(PLS-DA)、支持向量机(SVM)和BP神经网络(BP)的最优模型相比,基于Y-B数据集的CNN模型对药片(2类)分析的平均准确率分别提升了3.998%、9.364%和4.689%;基于Y-B-G数据集的CNN模型对咖啡(7类)分析的平均准确率分别提升了6.001%、2.004%和7.523%;基于Y-B数据集的CNN模型对葡萄(20类)分析的平均准确率分别提升了33.408%、51.994%和34.378%。此结果表明,基于数据增强策略和CNN在不同数据集和分类类别中建立的模型均表现出更好的分类准确率和泛化性能。Near infrared spectroscopy(NIRS)technology combined with chemometrics algorithms has been widely used in quantitative and qualitative analysis of food and medicine.However,traditional chemometrics methods,especially linear classification methods,often yield unsatisfactory results when addressing multi-class classification problems.Convolutional neural network(CNN)is adept at extracting deep-level features from data and suitable for handling non-linear relationships.The modeling performance of CNN depends on the size and diversity of sample,while the collection and preprocessing of NIRS sample data is often time-consuming and laborintensive.This study proposed a NIRS qualitative analysis method based on data augmentation strategies and CNN.The data augmentation strategy included two steps.Firstly,applying Bootstrap resampling and generative adversarial network(GAN)methods to augment three NIRS datasets(Medicine,coffee and grape).Secondly,combining the original samples(Y)with the Bootstrap augmented samples(B)and GAN augmented samples(G)to obtain three augmented datasets(Y-B,Y-G and Y-B-G).Based on this,a CNN model structure suitable for these datasets was designed,consisting of 2 one-dimensional convolutional layers,1 max-pooling layer,and 1 fully connected layer.The results showed that compared to the optimal models of partial least squares discriminant analysis(PLS-DA),support vector machine(SVM),and back propagation neural network(BP),the CNN model based on Y-B dataset achieved average accuracy improvements of 3.998%,9.364%,and 4.689%for medicine(Binary classification);the CNN model based on the Y-B-G dataset achieved average accuracy improvements of 6.001%,2.004%,and 7.523%for coffee(7-class classification);and the CNN model based on the Y-B dataset achieved average accuracy improvements of 33.408%,51.994%,and 34.378%for grapes(20-class classification).It was evident that the models established based on data augmentation strategies and CNN demonstrated better classification accuracy and generalization performa

关 键 词:数据增强 近红外光谱 卷积神经网络 化学计量学 

分 类 号:O657.33[理学—分析化学] TP183[理学—化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象