Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration  

在线阅读下载全文

作  者:Houda Lamane Latifa Mouhir Rachid Moussadek Bouamar Baghdad Ozgur Kisi Ali El Bilali 

机构地区:[1]Department of Process Engineering and Environment,Faculty of Sciences and Techniques of Mohammedia,Hassan II University of Casablanca,Mohammedia 28806,Morocco [2]Department of Environment and Natural Resources,National Institute for Agricultural Research(INRA),Rabat 10000,Morocco [3]International Centre for Agriculture Research in the Dry Areas(ICARDA),Rabat 10000,Morocco [4]School of Architecture and Landscape,Casablanca 20100,Morocco [5]Department of Civil Engineering,Luebeck University of Applied Sciences,Libeck 23562,Germany [6]Department of Civil Engineering,Ilia State University,Tbilisi 0162,Georgia [7]River Basin Agency of Bouregreg and Chaouia,Benslimane 13000,Morocco

出  处:《International Journal of Sediment Research》2025年第1期91-107,共17页国际泥沙研究(英文版)

摘  要:Machine learning(ML)has become a powerful tool for predicting suspended sediment concentration(SSC).Nonetheless,the ability to interpret the physical process is considered the main issue in applying most of ML approaches.In this regard,the current study presents a novel framework involving four standalone ML models(extra trees(ET),random forest(RF),categorical boosting(CatBoost),and extreme gradient boosting(XGBoost))and their combination with genetic programming(GP).Three metrics(coefficient of correlation(r),root mean square error(RMSE)),and Nash-Sutcliffe model-fit efficiency(NSE)and a more advanced interpretation system SHapley Additive exPlanations(SHAP)are used to assess the performance of these models applied to hydro-climatic datasets for prediction of ssc.The calibration process was based on data from 2016 to 2020,and the validation was done for 2021 data.Further description and application of the framework are provided based on a case study of the Bouregreg watershed.The results revealed that all implemented models are efficient in SSC prediction with NSE,RMSE,and r varying from 0.53 to 0.86,1.20-2.55 g/L,and 0.83-0.91 g/L respectively.Box plot diagrams confirm the enhanced performance of these combined models,and the best-performing ones for the four hydrological stations being the combined RF+GP model at the Aguibat Ziar station,the combined XGBoost+GP model at the Ain Loudah station,the CatBoost model at the Ras Fathia station,and the RF model at the Sidi M^(ed) Cherif station.The interpretability results showed that flow(Q)and seasonality(S)are the features most impacting ssc.These outcomes indicate that the applied models can extract accurate and detailed information from the interactions between the hydroclimatic factors and the generation of sediment by erosion(output).ML approaches illustrated the good reliability and transparency of the models developed for predicting Ssc in a semi-arid setting,offered new perspectives for reducing ML models'"black box"character,and provided a useful source of

关 键 词:INTERPRETABILITY Machine learning(ML) Shapley values Suspended sediment concentration(SSC) Soil erosion Bouregreg watershed(BW) 

分 类 号:R73[医药卫生—肿瘤]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象