机构地区:[1]北京林业大学信息学院,北京100083 [2]国家林业和草原局林业智能信息处理工程技术研究中心,北京100083 [3]中国科学院软件研究所南京软件技术研究院,江苏南京210049
出 处:《光谱学与光谱分析》2022年第8期2353-2358,共6页Spectroscopy and Spectral Analysis
基 金:中国科学院科研装备研制项目(YJKYYQ20170044);国家自然科学基金项目(61772078)资助。
摘 要:化学需氧量(COD)是水体有机污染的一项重要指标,如何快速准确检测水体的COD含量尤为重要。机器学习在水质反演领域应用日益增多,并取得了较多的研究成果,高光谱遥感具有光谱空间分辨率高、成像通道多等优势,使其在水体COD反演方面有着极大的潜力。利用不同的高光谱预处理方法对原始高光谱数据进行处理,并利用处理前后的高光谱数据对比研究了不同机器学习模型、不同高光谱预处理方法对水体COD的反演性能。首先利用ZK-UVIR-I型原位光谱水质在线监测仪在扬州宝带河实地收集了1548组COD和对应的高光谱数据(400~1000 nm)样本,为降低光谱噪音干扰以及消除光谱散射影响,分别使用Savitzky-Golay(SG)平滑、多元散射校正数据(MSC)以及SG平滑结合MSC对原始光谱进行预处理。其次,将样本集随机划分为训练集和测试集,其中训练集占比80%,测试集占比20%。对预处理后的训练集全波段光谱基于线性回归、随机森林(random forest)、AdaBoost、XGBoost四种机器学习方法建立COD高光谱反演模型,并选取了决定系数(R)、均方根误差(RMSE)、相对分析误差(RPD)三种指标在测试集数据中评估高光谱反演模型的精度。结果表明,随机森林、AdaBoost、XGBoost均优于线性回归,无论光谱处理与否,通过XGBoost建立的反演模型预测能力均为最佳,其中使用XGBoost对经过SG平滑和MSC处理后的光谱数据进行建模的反演模型精度最高,其R达到0.92,RMSE为7.1 mg·L,RPD为3.4。考虑到原始光谱可能存在冗余,通过主成分分析法(PCA)对经过SG平滑和MSC处理后的光谱进行降维,并选取累计贡献率达到95%的前十个主成分作为模型的输入变量。通过XGBoost建立反演模型,结果表明经过PCA后的反演模型不仅精度有所上升,RPD达到3.8,而且模型的训练时间也由72 s缩短到2.9 s。以上研究可为该水域及类似水域的高光谱水质反演模型的建�Chemical oxygen demand(COD)is an important indicator of organic pollution in water.How to quickly and accurately test the COD content of water is particularly important.The application of machine learning in the field of water quality inversion is increasing,and more research results have been obtained.Hyperspectral remote sensing has the advantages of high spectral-spatial resolution and multiple imaging channels,so it has great potential in retrieving water’s COD.This study uses different hyperspectral pre-processing methods to process the original hyperspectral data.It uses the hyperspectral data before and after processing to compare the inversion performance of different machine learning models and different hyperspectral pre-processing methods on the COD content of water.Firstly,1548 groups of COD content and corresponding hyperspectral data(400~1000 nm)samples were collected by ZK-UVIR-I in-situ spectral water quality on-line monitor in Baodai River.In order to reduce the interference of spectral noise and eliminate the influence of spectral scattering,Savitzky-Golay(SG)smoothing,Multiplicative scatter correction(MSC)and SG smoothing combined with MSC methods were used to pre-process the original spectra.Secondly,the sample set is randomly divided into training set and test set,where the training set accounts for 80%and the test set accounts for 20%.A COD hyperspectral inversion model based on the four machine learning methods of linear regression,random forest(random forest),AdaBoost,and XGBoost was established for the pre-processed training set full-band spectrum.Moreover,three indexes of determination coefficient(R~2),root mean square error(RMSE)and relative analysis error(RPD)were selected to evaluate the accuracy of the hyperspectral inversion model.The results show that random forest,AdaBoost and XGboost are all the better than linear regression.The prediction ability of the inversion model established by XGboost is the best whether the spectral data is processed or not,with R~2 of 0.92,RMSE of 7.1
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...