基于机器学习的全国布鲁氏菌病重复报告分析方法研究  被引量:2

Identification and analysis method of duplicate reports of brucellosis based on machine learning in China

在线阅读下载全文

作  者:董帅兵 王丽萍[2] 张业武[3] 李言飞[3] DONG Shuai-bing;WANG Li-ping;ZHANG Ye-wu;LI Yan-fei(Institute for Infectious Disease and Endemic Disease Control,Beijing Center for Disease Prevention and Control,Beijing 100013,China;Division of Infectious Disease Control and Prevention,Key Laboratory for Surveillance and Early Warning of Infectious Disease,Chinese Center for Disease Control and Prevention,Beijing 102206,China;Public Health Surveillance and Information Service Center,Chinese Center for Disease Control and Prevention,Beijing 102206,China)

机构地区:[1]北京市疾病预防控制中心传染病地方病控制所,北京100013 [2]中国疾病预防控制中心传染病管理处,传染病监测预警重点实验室,北京102206 [3]中国疾病预防控制中心公共卫生监测与信息服务中心,北京102206

出  处:《公共卫生与预防医学》2022年第5期29-31,共3页Journal of Public Health and Preventive Medicine

基  金:国家科技重大专项(No.2017ZX10303401-005)。

摘  要:目的研究利用机器学习的方法识别布鲁氏菌病重复报告卡(重卡)。方法利用2005—2017年中国疾病预防控制信息系统中报告的499577张布鲁氏菌病个案卡,参考人工识别法,重卡为3785张,建立数据集并构建特征后进行机器学习,选择KNN(K Nearest Neighbor)、支持向量机(Support Vector Machine,SVM)和随机森林三种模型进行训练,所得模型进行分类预测,最后评估分类结果。结果KNN、SVM和随机森林三种模型的分类效果AUC(Area Under Curve)值分别为0.97、0.97、0.98。结论KNN、SVM和随机森林三种模型识别效果均较好,其中随机森林模型重卡识别效果最好,其次为SVM。机器学习方法能够很好识别布鲁氏菌病累计重卡,对传染病报告数据分析和报告管理有一定实用价值。Objective To study the identification of brucellosis duplicate cards by machine learning.Methods Using the 499577 brucellosis case cards reported in the National Notifiable Disease Report System from 2005 to 2017,referring to the manual identification of 3785 duplicate cards,a data set and related features were established for machine learning.KNN(K Nearest Neighbor),support vector machine(SVC),and random forest models were selected for training,and the resulting models were classified and predicted.Results The AUC(Area Under Curve)values of KNN,SVM and random forest models were 0.97,0.97 and 0.98,respectively.Conclusions Three models of KNN,SVM and random forest all display good recognition effects,among which,the random forest model has the best identification effect,followed by the SVM.Method of machine learning can well identify brucellosis accumulated duplicate cards,which has certain practical value for data analysis and data report management of infectious disease report.

关 键 词:布鲁氏菌 国家法定传染病报告系统 机器学习 数据质量 重复报告 

分 类 号:R181[医药卫生—流行病学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象