文本分析联合支持向量机的肿瘤ICD-O-3病理形态学自动分类效果评价  

Automated classification of ICD-O-3 morphology code from pathology reports using text-mining and support vector machine

在线阅读下载全文

作  者:潘劲[1] 龚巍巍[1] 费方荣[1] 王蒙 周晓燕 胡如英[1] 钟节鸣[1] PAN Jin;GONG Weiwei;FEI Fangrong;WANG Meng;ZHOU Xiaoyan;HU Ruying;ZHONG Jieming(Department of Non-communicable Disease Control and Prevention,Zhejiang Provincial Center for Disease Control and Prevention,Hangzhou,Zhejiang 310051,China)

机构地区:[1]浙江省疾病预防控制中心慢性非传染性疾病防制所,浙江杭州310051

出  处:《预防医学》2021年第3期255-258,263,共5页CHINA PREVENTIVE MEDICINE JOURNAL

基  金:浙江省医药卫生科技计划(2018PY007,2019KY355)。

摘  要:目的评价文本分析联合支持向量机(SVM)对肿瘤ICD-O-3病理形态学自动分类的准确性,为汉语环境的肿瘤分类编码研究提供参考。方法通过浙江省慢性病监测信息管理系统收集2017—2019年浙江省户籍居民肿瘤报告卡,根据ICD-O-3编码,对病理学文本提取关键词,采用SVM进行自动化分类;并与16名有2年以上肿瘤编码经验的专业技术人员分类结果比较,计算准确率、召回率及两者的调和平均数(F值)评估分类效果。结果纳入2017—2019年浙江省肿瘤报告卡83082例,17个形态学分类,以腺癌、鳞状和移行细胞癌为主,52877例占63.65%。通过文本分析筛选出1090个关键词,准确率为77.20%,召回率为96.27%,F值为85.69。结论采用文本分析联合SVM可提高肿瘤ICD-O-3病理形态学自动分类效率,但准确性有待进一步提升。Objective To evaluate the accuracy of automated classification of ICD-O-3 morphology code from pathology reports by text-mining and support vector machine(SVM),in order to provide basis for automated tumor coding in Chinese.Methods The tumor report cards of Zhejiang residents from 2017 to 2019 were collected from Chronic Disease Surveillance Information Management System of Zhejiang Province.According to ICD-O-3,the keywords of the pathology reports were extracted,and SVM was used for automatic classification.The classification results were compared with those of 16 professionals with more than two years of experience in tumor coding,and the accuracy rate,recall rate and F-score were calculated for effect evaluation.Results Totally 83082 cases from 2017 to 2019 were included and were categorized into 17 morphological classifications,with 52877(63.65%)cases of adenocarcinoma,squamous carcinoma and transitional cell carcinoma.A total of 1090 keywords were enrolled into main corpus.The total F-score,accuracy rate and recall rate are 85.69,77.20% and 96.27%,respectively.Conclusion Text-mining combined with SVM can improve the efficiency of ICD-O-3 morphology coding;however,the accuracy needs to be further improved.

关 键 词:肿瘤 病理学 文本分析 支持向量机 自动分类 

分 类 号:R181.2[医药卫生—流行病学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象