基于自然语言处理进行新冠肺炎确诊患者流行病学史的变化趋势分析的探索  被引量:4

Exploration of the Change Trend Analysis of the Epidemiological History of COVID-19 Patients Based on Natural Language Processing

在线阅读下载全文

作  者:费晓璐[1] 江澜 陈鹏宇 李嘉[1] 魏岚[1] 江瑞[2] 闾海荣 FEI Xiao-lu;JIANG Lan;CHEN Peng-yu(Xuanwu Hospital,Capital Medical University,Beijing 100053,P.R.C.)

机构地区:[1]首都医科大学宣武医院,北京市西城区100053 [2]清华大学自动化系/北京信息科学与技术国家研究中心,北京市海淀区100084

出  处:《中国数字医学》2020年第5期76-78,106,共4页China Digital Medicine

基  金:北京市自然科学基金-海淀原始创新联合基金(编号:L192047)。

摘  要:目的:探讨陕西省自1月23日以来确诊患者接触史的变化趋势,以及不同时期应关注的重点场所。验证自然语言处理和大数据分析技术在流行病学史分析中应用的可行性。方法:收集陕西省2020年1月23日至2月20日卫健委公布的疫情数据,应用自然语言处理技术对确诊患者情况进行分词处理,计算每日的热点词语出现频率并排序。通过对词频和共词频率的统计,分析确诊患者的流行病学史变化趋势。结果:陕西省自1月23日起,前8日确诊患者描述中,标志着输入性疫情的"返回"等词频较高;自第9日起,标志以家庭为主要场所的聚集性疫情的各类"亲属关系"词频显著提高;第14日开始,体现其他场所的聚集性疫情的词汇频率明显提升。发病地区词频较高的为:西安市、安康市等;与陕西确诊病例相关的外省市地区词频较高为武汉/湖北、孝感市和杭州市。结论:通过词频分析可以发现陕西省疫情由输入性转向聚集性的转折点大致发生在2月1日。家庭作为聚集场所的传播方式为主要传播方式,后期有其他传播场所值得关注。本研究也验证了使用自然语言处理和词频分析等大数据分析技术可以在经典的流行病学史分析基础上,拓展新的思路,呈现新的表达方式。Objective:To explore the changing trend of contact history of patients diagnosed in Shaanxi Province since January 23,and the places that should be paid attention to in different periods.And to verify the feasibility of the application of natural language processing(NLP)and big data analysis technology in epidemic history analysis.Methods:To collect the epidemic data from January 23 in 2020 to February 20 in Shaanxi Province,and to use NLP to segment the text of diagnosed patients,calculate the frequency of hot words.Through the statistics of word and co-word frequency,the change trend of infectious disease contact history and hot words of confirmed patients were analyzed.Results:Since January 23,in the first 8 days in Shaanxi Province,the frequency of words marked the importation of the epidemic such as"return"is higher.Later the frequency of"relatives"words,which indicate the family-centered concentrated epidemic,has increased significantly.Since the 14th day,the frequency of terms embodying the concentrated epidemic in other places has increased significantly.Xi'an and Ankang are most frequent"region"words,and the provinces and cities with higher frequency of words related to the confirmed cases in Shaanxi were Wuhan,Xiaogan and Hangzhou.Conclusion:The turning point of epidemic situation from import to aggregation in Shaanxi Province occurred on February 1.Family concentration is the main mode of communication,and there are still some other places of communication worthy of attention in the later period.This study also verified that using NLP and big data analysis such as word frequency analysis can expand new ideas and present new expressions.

关 键 词:自然语言处理 词频分析 新型冠状病毒肺炎 流行病学史 

分 类 号:R319[医药卫生—基础医学] TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象