机构地区:[1]郑州大学农学院,郑州450001 [2]河南省土壤肥料站,郑州450002 [3]郑州大学政治与公共管理学院,郑州450001
出 处:《土壤学报》2023年第6期1595-1609,共15页Acta Pedologica Sinica
基 金:国家重点研发计划项目(2021YFD1700900)资助。
摘 要:土壤空间预测与数字化制图的精度受土壤样点规模、采样策略、预测模型选择、目标区域地貌与成土环境复杂程度、协变量数据质量等多种因素共同制约。选择河南省为研究区,基于9种土壤样点规模、5种采样方法,应用5种最具代表性的机器学习(Machine learning,ML)算法对耕地表层土壤pH实施空间预测与数字化制图,用以对比分析不同样点规模与采样方法对ML模型的性能表现及土壤pH预测精度的影响。结果表明:(1)当研究区土壤样点规模从200个经由400个、800个、1200个、1600上升至2000个时,无论使用何种采样方法,所有ML模型的性能表现与预测精度均呈快速上升的总体趋势;当样点规模达到并超过2000个时,大多数ML性能表现及预测精度趋于稳定,表明2000个土壤样点可能是这些ML模型预测研究区耕地表层土壤pH的样点规模阈值。(2)5种ML模型性能表现及其土壤pH预测精度存在明显差距,基于树结构的随机森林(Randomforests,RF)和Cubist表现最好,无论使用哪种采样方法,这两种模型预测结果的决定系数(R2)均可稳定在0.75~0.80之间、RMSE保持在0.50以下。(3)当土壤样点规模足够大时,采样方法对ML模型性能和土壤pH预测精度的影响很小,五种采样方法的效果相差不大。当土壤样点规模小于2000个时,采样方法的影响逐渐凸显。比较而言,条件拉丁超立方采样在样点规模较小时具备优势。当样点规模为1000个时,条件拉丁超立方采样仍可使随机森林和Cubist预测的R2维持在0.80左右;在样点规模小至200个时,条件拉丁超立方采样方法下5种ML模型预测的R2均在0.55以上。(4)不确定性分析结果显示,平均73.9%的验证样点表层土壤pH观测值落入随机森林模型90%预测区间,表明该模型的可靠性被轻微高估,但处于可接受范畴。此外,数据显示模型预测的不确定性与样点规模无明显关联。【Objective】Under the background of high-intensity soil resource utilization,digital soil mapping has become an effective method to obtain and characterize soil information quickly,efficiently and accurately.The accuracy and reliability of soil spatial prediction and digital mapping are restricted by multiple factors,such as soil sample size,sampling strategy,prediction model,the complexity of geomorphology and soil-forming environment in the target region,and quality of covariate data.【Method】Choosing Henan Province as the study region,we applied five of the most representative machine learning(ML)algorithms to spatially predict and digitally map the topsoil pH of croplands.Afterwards,the impact of different sample sizes and sampling methods on the performance of the chosen ML models and the prediction accuracy of topsoil pH were compared.【Result】The results showed that:(1)When the soil sample size increased from 200 to 2000,the performance of all ML models and prediction accuracy of topsoil pH showed a general trend of rapid increase regardless of the sampling method.When sample size reached and exceeded 2000,the performance of most ML models tended to be stable,and the prediction accuracy of topsoil pH increase rapidly slowed down,suggesting that a soil sample size of 2000 might be the sample size threshold for these ML models to predict the topsoil pH of croplands in this area.(2)The performance of the five ML models and their topsoil pH prediction accuracy was significantly different.The tree-based ML models,namely Random forests(RF)and Cubist performed best.No matter which sampling method was used,when the sample size was more than 2000,the archived coefficient of determination(R2)of the two models could be stable between 0.75 and 0.80,and the RMSE could be kept below 0.50.(3)When the soil sample size was large enough,the sampling method had little impact on the ML model performance.Also,the topsoil pH prediction accuracy and the sampling method gradually highlighted when the soil sample size was
关 键 词:土壤空间预测 数字土壤制图 机器学习 样点规模 采样方法 土壤PH
分 类 号:S159.9[农业科学—土壤学] P934[农业科学—农业基础科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...