检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:董帅兵 王丽萍[2] 张业武[3] 李言飞[3] DONG Shuai-bing;WANG Li-ping;ZHANG Ye-wu;LI Yan-fei(Institute for Infectious Disease and Endemic Disease Control,Beijing Center for Disease Prevention and Control,Beijing 100013,China;Division of Infectious Disease Control and Prevention,Key Laboratory for Surveillance and Early Warning of Infectious Disease,Chinese Center for Disease Control and Prevention,Beijing 102206,China;Public Health Surveillance and Information Service Center,Chinese Center for Disease Control and Prevention,Beijing 102206,China)
机构地区:[1]北京市疾病预防控制中心传染病地方病控制所,北京100013 [2]中国疾病预防控制中心传染病管理处,传染病监测预警重点实验室,北京102206 [3]中国疾病预防控制中心公共卫生监测与信息服务中心,北京102206
出 处:《公共卫生与预防医学》2022年第5期29-31,共3页Journal of Public Health and Preventive Medicine
基 金:国家科技重大专项(No.2017ZX10303401-005)。
摘 要:目的研究利用机器学习的方法识别布鲁氏菌病重复报告卡(重卡)。方法利用2005—2017年中国疾病预防控制信息系统中报告的499577张布鲁氏菌病个案卡,参考人工识别法,重卡为3785张,建立数据集并构建特征后进行机器学习,选择KNN(K Nearest Neighbor)、支持向量机(Support Vector Machine,SVM)和随机森林三种模型进行训练,所得模型进行分类预测,最后评估分类结果。结果KNN、SVM和随机森林三种模型的分类效果AUC(Area Under Curve)值分别为0.97、0.97、0.98。结论KNN、SVM和随机森林三种模型识别效果均较好,其中随机森林模型重卡识别效果最好,其次为SVM。机器学习方法能够很好识别布鲁氏菌病累计重卡,对传染病报告数据分析和报告管理有一定实用价值。Objective To study the identification of brucellosis duplicate cards by machine learning.Methods Using the 499577 brucellosis case cards reported in the National Notifiable Disease Report System from 2005 to 2017,referring to the manual identification of 3785 duplicate cards,a data set and related features were established for machine learning.KNN(K Nearest Neighbor),support vector machine(SVC),and random forest models were selected for training,and the resulting models were classified and predicted.Results The AUC(Area Under Curve)values of KNN,SVM and random forest models were 0.97,0.97 and 0.98,respectively.Conclusions Three models of KNN,SVM and random forest all display good recognition effects,among which,the random forest model has the best identification effect,followed by the SVM.Method of machine learning can well identify brucellosis accumulated duplicate cards,which has certain practical value for data analysis and data report management of infectious disease report.
关 键 词:布鲁氏菌 国家法定传染病报告系统 机器学习 数据质量 重复报告
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.91