基于LDA模型的卫生健康媒体数据时间序列主题分析  

Thematic Analysis of Time Series of Health Media Data Based on LDA Model

在线阅读下载全文

作  者:吴旭生 查亚东 张冬云 彭祖胜 林圣 刘宇锋 和晓峰 WU Xusheng;ZHA Yadong;ZHANG Dongyun;PENG Zusheng;LIN Sheng;LIU Yufeng;HE Xiaofeng(Shenzhen Health Development Research and Data Management Center,Shenzhen 518028,China;Shenzhen Media Group,Shenzhen 518026,China)

机构地区:[1]深圳市卫生健康发展研究和数据管理中心,深圳518028 [2]深圳广播电影电视集团,深圳518026

出  处:《医学信息学杂志》2025年第2期62-67,75,共7页Journal of Medical Informatics

基  金:广东省自然科学基金面上项目(项目编号:2022A1515012077)。

摘  要:目的/意义探索卫生健康领域媒体数据主题及其演化趋势。方法/过程以深圳广电媒资数据库中的160549条卫生健康领域媒体数据为研究对象,采用隐含狄利克雷分布模型结合时间序列进行主题聚类分析,并结合专家经验,进行对比分析。结果/结论得到25个与卫生健康领域强相关的主题,根据主题强度演化趋势分为6组。主题建模的内容划分和强度变化有效反映了卫生健康领域热点事件的发生及其演进过程。利用隐含狄利克雷分布模型进行主题建模,结合时间序列分析主题分布、解读主题意义,有助于探索媒体数据在卫生健康领域的应用,为卫生健康公共事业赋能。Purpose/Significance To explore the theme and evolution trend of media data in the field of health.Method/Process Taking 160549 pieces of health media data obtained from the Shenzhen Media Group database as the research object,topic clustering analysis is conducted based on the latent Dirichlet allocation(LDA)model combined with time series.Expert experience is used to compare and analyze the themes obtained from the LDA model.Result/Conclusion 25 themes are obtained strongly related to the field of health,6 groups are diveded based on the trend of theme intensity evolution.The content division and intensity change of topic modeling effectively reflect the occurrence and evolution of hot events in the field of health.LDA model is used for theme modeling,combined with time series to analyze theme distribution,interpret theme significance,which is conducive to exploring the application of media data in the field of health and empowering public health undertakings.

关 键 词:卫生健康媒体数据 隐含狄利克雷分布模型 热点事件 主题演化 

分 类 号:R-058[医药卫生]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象