基于变量筛选与机器学习算法的渭-库绿洲土壤有机质含量估测研究  

Estimation of Soil Organic Matter Content in Wei-Ku Oasis Based on Variables Screening and Machine Learning Algorithms

在线阅读下载全文

作  者:李顿 王雪梅[1,2] 李坤玉 安柏耸 LI Dun;WANG Xuemei;LI Kunyu;AN Baisong(College of Geographic Science and Tourism,Xinjiang Normal University,Urumqi 830054,China;Xinjiang Uygur Autonomous Region Key Laboratory"Xinjiang Arid Lake Environment and Resources Laboratory",Urumqi 830054,China)

机构地区:[1]新疆师范大学地理科学与旅游学院,乌鲁木齐830054 [2]新疆维吾尔自治区重点实验室“新疆干旱区湖泊环境与资源实验室”,乌鲁木齐830054

出  处:《地球与环境》2024年第3期375-385,共11页Earth and Environment

基  金:国家自然科学基金项目(41561051);新疆维吾尔自治区自然科学基金项目(2020D01A79)。

摘  要:选择合适的变量筛选方法和模型可有效提升土壤有机质含量的估测精度。本研究以新疆渭干河-库车河绿洲为研究区,基于哨兵2号(Sentinel-2)卫星影像和实测土壤有机质,通过对土壤有机质与遥感影像波段及多种光谱指数进行相关分析,结合Boruta算法和连续投影算法(Successive Projections Algorithm,SPA)进行变量筛选,构建随机森林(Random Forest,RF)模型和BP神经网络(Back Propagation Neural Network,BPNN)模型进行表层土壤有机质含量的估测。结果表明:(1)波段B3、B4、B5、B7和B8A以及转换植被指数(Transformed Vegetation Index,TVI)、颜色指数(Color Index,CI)对土壤有机质含量的估测具有重要作用;(2)单独使用Boruta算法和SPA算法筛选的变量集建模效果要优于全变量集以及结合算法筛选的变量集,Boruta算法优于SPA算法;(3)RF模型的估测能力优于BPNN模型,最优估测模型训练集和验证集的决定系数(R^(2))均大于0.74,模型拟合效果较好,均方根误差(RMSE)小于2.0 g/kg,相对分析误差(RPD)大于1.6,能够较好地进行土壤有机质含量的估测。采用Boruta算法结合随机森林模型可较好地反演绿洲表层土壤有机质的空间分布,为该区域土壤养分评价提供参考。Appropriate variable screening methods and models can effectively improve the accuracy of soil organic matter content prediction.This study takes the Weigan-Kuche River oasis in Xinjiang as the research area.Based on Sentinel-2 satellite images and measured soil organic matter,correlation analysis was conducted between soil organic matter and remote sensing image bands,as well as multiple spectral indices.Variable screening was performed using the Boruta algorithm and the Continuous Projections Algorithm(SPA).The Random Forest(RF)model and the Back Propagation Neural Network(BPNN)model were constructed to estimate the organic matter content of the topsoil.The results indicate that:(1)Bands of B3,B4,B5,B7,and B8A,as well as the Transformed Vegetation Index(TVI)and Color Index(CI),play an important role in estimating soil organic matter content.(2)The modeling effect of variable sets filtered by the Boruta algorithm and SPA algorithm alone is better than that of variable sets filtered by full variable sets and the combined algorithm,and the Boruta algorithm is better than the SPA algorithm.(3)The prediction ability of the RF model is better than the BPNN model.The determination coefficient(R^(2))of both the training and validation sets of the optimal estimation model are greater than 0.74,and the model fits well with root mean square error(RMSE)less than 2.0 g/kg and relative percent deviation(RPD)greater than 1.6,indicating that the random forest model can effectively predict the content of soil organic matter.Using the Boruta algorithm combined with the random forest model can better retrieve the spatial distribution of soil organic matter in the surface soil of the oasis and provide a reference for soil nutrient evaluation in this region.

关 键 词:Boruta算法 连续投影算法 随机森林 BP神经网络 土壤有机质 

分 类 号:X87[环境科学与工程—环境工程] S151.9[农业科学—土壤学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象