基于共现分析的文本主题词聚类研究  被引量:14

Clustering of Textual Subject Words Based on Co-word Analysis

在线阅读下载全文

作  者:阮光册 夏磊[2] Ruan Guangce;Xia Lei(Faculty of Economics and Management,East China Nomal Universit;Shanghai Librar)

机构地区:[1]华东师范大学经济与管理学部 [2]上海图书馆

出  处:《图书馆杂志》2018年第11期99-104,119,共7页Library Journal

摘  要:本文将共现分析应用于非结构化文本文件,挖掘文本主题的语义关联。由于文本文献不同于科技文献,缺少关键字等描述信息,本文引入主题模型对文本进行语义降维,生成的主题词作为共词分析的研究对象。实验发现中频主题词能更好地反映文本的主题特征,为此,本文结合齐普夫定律和同词频理论选取中频主题词,通过共词分析识别语义关联,并采用K-means聚类算法实现主题词的聚类。本文以"创新创业"相关新闻文本进行实验,实现文本集主题词的聚类,通过实验对比分析,本文的方法能够更好地体现文本主题的语义联系。In this paper, co-word analysis is applied to the text file to explore the semantic relevance of the text topic. Since text file is different from the science literature, andlacks keywords, the author uses the topic model to reduce the dimensionality of the text, and uses the topic words generated by LDA as the research object of co-word analysis. The experiment results show that the medium frequency words can better reflect the theme of the text. Therefore, the author uses Zipf's law and the same frequency theory to select medium frequency words as the topic words set, calculates the semantic association strength with co-word analysis, and clusters the topic words using K-means clustering algorithm. The author then makes an experiment with the news text relevant to "innovation and entrepreneurship", and concludes that this method can reflect the semantic relevance of the theme of the text better.

关 键 词:主题模型 齐普夫定律 共词分析 主题词聚类 

分 类 号:G254[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象