机构地区:[1]中国农业大学信息与电气工程学院食品质量与安全北京实验室,北京100083 [2]中国农业大学烟台研究院,山东烟台264000 [3]苏州大学基础医学与生物科学学院,江苏苏州215200 [4]中国农业大学工学院,北京100083
出 处:《光谱学与光谱分析》2020年第1期195-201,共7页Spectroscopy and Spectral Analysis
基 金:国家重点研发计划项目(2017YFE0111200)资助
摘 要:应用紫外(Ultraviolet,UV)光谱技术对水产养殖水质总氮含量进行快速检测。为了消除各种系统误差与偶然误差对模型预测性能造成的影响,将88个水样的总氮浓度实测值数据和光谱吸光度数据作为原始数据,将模型建立分为样本集划分、数据预处理、特征波段提取、模型选择与LV数量选择5个阶段,以求达到最优预测效果,其中前4个阶段分别使用多种方法进行比较。结果证明每个阶段都是必不可少的,只有通过对比其优劣才能找到最适合总氮含量测定的建模过程及方法。首先用浓度梯度(CG)法对原始数据进行相同的样本集划分处理,然后在此基础上分别建立主成分回归(PCR)、逐步回归(SR)和偏最小二乘回归(PLSR)三种模型,选择预测效果最好的PLSR作为本文的预测模型。PLSR的建模效果会在很大程度上受到潜在变量(LVs)数量的影响,通常选取模型预测均方根误差RMSEP值最小时所对应的LV个数为最优LV个数。其次,选用CG法、随机抽样(RS)法、 Kennard Stone(KS)法和SPXY法4种样本集划分算法对样本进行处理,并对所建立的PLSR模型预测效果进行比较,最终选择SPXY算法作为最优样本划分算法。然后在对样本集进行SPXY法划分的基础上,运用多种预处理算法对光谱吸光度数据进行预处理,包括小波变换(WT)、一阶导数法(Der1st)与二阶导数法(Der2nd)三种单一算法和小波变换与两种导数法的组合预处理算法WT-Der1st和WT-Der2nd。然后在预处理的基础上分别使用连续投影变换(SPA)和逐步回归(SR)两种特征波段提取方法,对比可知, SPA特征提取方法比SR的提取效率高且建模效果好。SPA算法既可以大大地简化模型,又可以在一定程度上提升模型的预测精度。基于WT-Der1st-SPA提取的特征波段为218 nm,与总氮特征波段区间相一致,由此说明该方法比较科学。综合上述建立的10个PLSR模型,考虑到预测精度与模型复�The paper is intended to achieve rapid determination of total nitrogen(TN) concentration by using Ultraviolet(UV) spectroscopy technology, which was one of the most important indicators to measure the pollution degree in aquaculture water. The original dataset used in the paper contains 88 samples data with actual concentration value and spectral absorbance value. It is helpful to select the optimal model through the five stages that include sample set division algorithms, data preprocessing algorithms, feature band extraction algorithms, model selection algorithms and latent values(LVs) selection method. In the first four stages, the comparison results of different methods show that each stage is necessary, and only by comparing the advantages and disadvantages of modeling results with various algorithms can we find the most suitable modeling process and method. First of all, the original sample set is processed by the concentration gradient(CG) method, then three models are built which respectively are principal component regression(PCR), stepwise regression(SR) and partial least squares regression(PLSR), and it proves that the PLSR is the best prediction model. The number of LVs can greatly influence the accuracy of model, and usually when the value of the model root mean square error of prediction(RMSEP) is the minimum, the LV number is optimal. Secondly, it is testified that the SPXY algorithm is the best by comparing the effect of random sampling(RS) algorithm, concentration gradient(CG) algorithm, kennard stone(KS) algorithm and SPXY algorithm. Thirdly, based on SPXY algorithm, the paper uses five preprocessing algorithms which are wavelet transform(WT) method, first derivative(Der1 st), and second derivative(Der2 nd) three single preprocessing algorithms, WT-Der1 st and WT-Der2 nd. Fourthly, according to the results of data processing, using successive projections algorithm(SPA) and stepwise regression(SR) for feature band extraction algorithms, the results show that the extraction efficiency of SPA not o
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...