集成学习在水产品无机砷含量预测中的应用研究  

Research on the application of integrated learning in predicting inorganic arsenic content in aquatic products

在线阅读下载全文

作  者:梁怀新 王润东 王浩然 刘斌[1] 郑存芳[3] LIANG Huaixin;WANG Rundong;WANG Haoran;LIU Bin;ZHENG Cunfang(Physical and Chemical Testing Department,Qinhuangdao Center for Disease Control and Prevention,Qinhuangdao Hebei,066000,China;Laboratory Department,Port Hospital of Hebei Port Group,Qinhuangdao Hebei,066000,China;School of Electrical Engineering,Yanshan University,Qinhuangdao Hebei,066044,China)

机构地区:[1]秦皇岛市疾病预防控制中心理化检验科,河北秦皇岛066000 [2]河北港口集团港口医院检验科,河北秦皇岛066000 [3]燕山大学电气工程学院,河北秦皇岛066044

出  处:《职业与健康》2024年第7期913-916,922,共5页Occupation and Health

基  金:河北省卫健委重点科技研究计划(20231899)。

摘  要:目的水产品中无机砷元素慢性暴露严重危害人体健康,其检测时间成本较高,为实现无机砷元素含量快速预测,构建一种基于小样本特征量的无机砷元素含量预测集成学习模型。方法抽取2018—2022年秦皇岛地区水产品重金属元素检测数据,采用皮尔逊法对铅、镉、汞、无机砷4种元素做相关性分析并检验多重共线性。使用逐步回归向量组合法测试不同特征组合下梯度提升回归树(gradient boosting regressor,GBR)和随机森林模型(random forest,RF)的拟合优度(R^(2))及均方误差(mean squared error,MSE),以筛选最优组合。综合比对5种集成学习算法在模型评估指标、预测误差比率分布、目标危险系数(target hazard quotients,THQ)三方面的预测效果以评估方法可行性。结果4种重金属元素间呈弱相关,不存在多重共线性。RF及GBR算法拟合优度R^(2)分别为89.9%、93.3%。极端学习树模型(extremely randomized tree,ET)在贝、鱼、蟹、虾类水产品中R^(2)分别为100.00%、99.42%、100.00%、99.92%,且箱式图中预测误差的偏差均最小,异常值在可接受范围内,预测前后膳食风险评估THQ结果均为安全水平。结论本研究提出的方法针对小样本量及特征量的无机砷元素快速预测,可为食品安全风险预警提供一种低成本、高效的预测技术方法。Objective Chronic exposure of inorganic arsenic in aquatic products is harmful to human health,and its detection time cost is high.For the purpose of realizing the rapid prediction of inorganic arsenic content,an integrated learning model of inorganic arsenic content prediction based on small sample size and characteristic quantity was established.Methods Data on heavy metals in aquatic products in Qinhuangdao from 2018 to 2022 was collected.Pearson correlation coefficient method was used to analyze the correlation of lead,cadmium,mercury and inorganic arsenic,and to examine the multicollinearity.The stepwise regression vector combination method was used to test the goodness of fit(R^(2))and mean square error(MSE)of gradient boosting regressor(GBR)and random forest model(RF)under different feature combinations in order to screen the optimal combination.The feasibility of 5 integrated learning algorithm models in model evaluation index,recovery rate and target risk coefficient(THQ)was compared.Results There was a weak correlation between the four elements and no multicollinearity exists.The R^(2)values of RF and BGR algorithm were 89.9%and 93.3%,respectively.R^(2)of extreme learning tree(ET)model in shellfish,fish,crab and shrimp were 100.00%,99.42%,100.00%,99.92%,respectively,and the box plots showed the smallest deviation in prediction errors.The outliers were within the acceptable range,the results of dietary risk assessed by THQ index were consistent before and after prediction.Conclusion The method proposed in this study is aimed at rapid prediction of inorganic arsenic elements with small sample sizes and characteristic sizes,and can provide a low-cost and high-efficiency method for the earlywarning of food safety risk.

关 键 词:无机砷 回归预测 水产品 小样本 

分 类 号:R155.55[医药卫生—营养与食品卫生学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象