机构地区:[1]广东药科大学公共卫生学院,广东广州510240 [2]国家卫生健康委职业安全卫生研究中心,北京102308
出 处:《环境与职业医学》2023年第10期1115-1120,共6页Journal of Environmental and Occupational Medicine
摘 要:[背景]职业伤害影响因素的识别分析是特征选择的重要研究内容,随着机器学习算法兴起,特征选择结合Boosting算法模型构建可为职业伤害预测分析中提供新的分析思路。[目的]探讨基于Boosting算法模型在预测矿工非致命性职业伤害严重等级中的适用性,为科学合理地预测矿工非致命性职业伤害严重等级提供依据。[方法]应用美国矿山安全与健康管理局(MSHA)2001—2021年金属矿工非致命性职业伤害的公开数据,以损失工作日天数<105 d为轻伤、≥105 d为重伤作为结局变量。通过最小绝对收缩与选择算子算法(Lasso)回归、逐步回归、单因素+Lasso回归、单因素+逐步回归4种特征选择方法分别筛选出4个不同特征集。选择基于Boosting思想的梯度提升决策树(GBDT)和极端梯度提升算法(XGBoost)两种模型,应用4个特征集分别训练logistic回归、GBDT、XGBoost三种模型,共形成12种矿工非致命性职业伤害严重等级预测模型,以获取预测模型的曲线下面积(AUC)、灵敏度、特异度、约登指数为主要评价指标。[结果]根据4种不同特征选择方法,年龄、事故发生时间、总工龄、伤害发生原因、伤害发生活动、受伤部位、伤害性质、伤害结局8个特征是影响矿工非致命性职业伤害严重等级的主要影响因素。单因素+逐步回归筛选的特征集4为最优特征集并且其构建的GBDT模型对非致命性职业伤害严重等级预测效能最佳,特异度、灵敏度、约登指数分别为0.7530、0.9490、0.7020。特征集4构建logistic回归、GBDT、XGBoost预测模型的AUC值分别为0.8526(95%CI:0.8387~0.8750)、0.8640(95%CI:0.8474~0.8806)、0.8603(95%CI:0.8439~0.8773),均比逐步回归筛选的特征集2所构建的预测模型AUC值[0.8487(95%CI:0.8203~0.8669)、0.8110(95%CI:0.8012~0.8344)、0.8439(95%CI:0.8245~0.8561)]高,并且特征集4构建GBDT、XGBoost均比logistic回归预测模型AUC值高。[结论]两种特征选择[Background]Identification and analysis of influencing factors of occupational injury is an important research content of feature selection.In recent years,with the rise of machine learning algorithms,feature selection combined with Boosting algorithm provides a new analysis idea to construct occupational injury prediction models.[Objective]To evaluate applicability of Boosting algorithm-based model in predicting severity of miners'non-fatal occupational injuries,and provide a basis for rationally predicting the severity level of miners'non-fatal occupational injuries.[Methods]The publicly available data of the US Mine Safety and Health Administration(MSHA)from 2001 to 2021 on metal miners'non-fatal occupational injuries were used,and the outcome variables were lost working days<105 d(minor injury)and≥105 d(serious injury).Four different feature sets were screened out by four feature selection methods including least absolute shrinkage and selection operator(Lasso)regression,stepwise regression,single factor+Lasso regression,and single factor+stepwise regression.Logistic regression,gradient boosting decision tree(GBDT),and extreme gradient boosting(XGBoost)were selected to construct prediction models by training with the four feature sets.A total of 12 prediction models of severity of miners'non-fatal occupational injuries were built and their area under the curve(AUC),sensitivity,specificity,and Youden index were calculated for model evaluation.[Results]According to the results of four feature selection methods,age,time of accident occurrence,total length of service,cause of injury,activities that triggered injury occurrence,body part of injury,nature of injury,and outcome of injury were identified as influencing factors of non-fatal occupational injury severity in miners.Feature set 4 was the optimal set screened out by single factor+stepwise regression and the GBDT model presented the best predictive performance in predicting the severity of non-fatal occupational injuries.The associated specificity,sensitiv
关 键 词:非致命性职业伤害 机器学习 BOOSTING算法 特征选择 损失工作日
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...