LightGBM模型及模型可解释性方法在预测职业伤害严重程度中的探讨  

Exploration of predicting occupational injury severity based on LightGBM model and model interpretability method

在线阅读下载全文

作  者:莫有桦 张鹏 谷一硕 朱晓俊 樊晶光 MO Youhua;ZHANG Peng;GU YiShuo;ZHU Xiaojun;FAN Jingguang(National Center for Occupational Safety and Health,National Health Commission of the People's Republic of China,NHC Key Laboratory for Engineering Control of Dust Hazard,Beijing 102308,China;School of Public Health,Guangdong Pharmaceutical University,Guangzhou,Guangdong 510240,China)

机构地区:[1]国家卫生健康委职业安全卫生研究中心,国家卫生健康委粉尘危害工程防护重点实验室,北京102308 [2]广东药科大学公共卫生学院,广东广州510240

出  处:《环境与职业医学》2025年第2期157-164,共8页Journal of Environmental and Occupational Medicine

摘  要:[背景]轻量级梯度提升机算法(LightGBM)以其高效、快速等特点成为预测模型中的热门选择。然而,由于机器学习模型存在“黑盒”特性,导致模型可解释性较差。目前很少有研究从LightGBM模型及模型可解释性的角度评估职业伤害的严重程度。[目的]评估LightGBM模型及模型可解释性方法在职业伤害预测中的应用价值。[方法]应用美国矿山安全与健康管理局(MSHA)1983—2022年采矿业工人职业伤害数据集,以伤害程度(死亡/致命性职业伤害和永久/部分残疾)作为结局变量,以伤害发生的月份、年龄、性别、事故发生时间、轮班开始时间、事故发生时间与轮班开始时间间隔、总工龄、矿山总工龄、现矿山工龄、职业伤害致因、事故类型、伤害发生活动(即伤害发生时工人正在进行的活动)、伤害来源、受伤部位、作业环境类型、产品类别、伤害性质共17个指标作为预测变量。通过最小绝对收缩与选择算子算法(Lasso)回归方法筛选特征集。应用LightGBM构建职业伤害预测模型,以预测模型的曲线下面积(AUC)为主要评价指标,AUC越接近1,说明模型预测性能越好。应用Shapley加法解释(SHAP)法对模型可解释性进行评价。[结果]通过Lasso回归,识别出关键影响因素7个,分别为事故发生时间与轮班开始时间间隔、现矿山工龄、职业伤害致因、事故类型、受伤部位、伤害性质、作业环境类型。基于Lasso回归特征筛选构建的LightGBM模型预测性能良好,其AUC值、准确度、特异度、灵敏度分别为0.9941(95%CI:0.9917~0.9966)、0.9743、0.9781、0.9640,预测的致死性职业伤害概率与实际的致死性职业伤害概率一致性较高。在职业伤害预测模型中,通过SHAP值分析各指标的重要性,发现受伤部位和伤害性质是影响模型预测结果的两个主要特征,其他特征的影响较小。受伤部位的SHAP值分布广泛,尤其是头颈部和多部位�[Background]Light gradient boosting machine(LightGBM)has become a popular choice in prediction models due to its high efficiency and speed.However,the"black box"issues in machine learning models lead to poor model interpretability.At present,few studies have evaluated the severity of occupational injuries from the perspective of LightGBM model and model interpretability.[Objective]To evaluate the application value of LightGBM models and model interpretability methods in occupational injury prediction.[Methods]The Mine Safety and Health Administration(MSHA)occupational injury data set of mining industry workers from 1983 to 2022 was used.Injury severity(death/fatal occupational injury and permanent/partial disability)was used as the outcome variable,and the predictor variables included the month of occurrence,age,sex,time of accident,time since beginning of shift,accident time interval from shift start,total experience,total mining experience,experience at this mine,cause of injury,accident type,activity of injury,source of injury,body part of injury,work environment type,product category,and nature of injury.Feature sets were screened using least absolute shrinkage and selection operator(Lasso)regression.A LightGBM model was then employed to predict occupational injury,with area under curve(AUC)of the model serving as the primary evaluation metric;an AUC closer to 1 indicates better predictive performance of the model.The interpretability of the model was evaluated using Shapley additive explanations(SHAP).[Results]Through Lasso regression,7 key influencing factors were identified,including accident time interval from shift start,experience at this mine,cause of injury,accident type,body part of injury,nature of injury,and work environment type.A LightGBM model,constructed based on feature selection via Lasso regression,demonstrated good predictive performance with an AUC value of 0.9941(95%CI:0.9917,0.9966),accuracy of 0.9743,specificity of 0.9781,and sensitivity of 0.9640.The predicted probability of fatal occu

关 键 词:职业伤害 轻量级梯度提升机算法 预测模型 模型可解释性 Shapley加法解释 

分 类 号:R13[医药卫生—劳动卫生]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象