检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:阮光册 夏磊[2] Ruan Guangce;Xia Lei(Faculty of Economics and Management,East China Nomal Universit;Shanghai Librar)
机构地区:[1]华东师范大学经济与管理学部 [2]上海图书馆
出 处:《图书馆杂志》2018年第11期99-104,119,共7页Library Journal
摘 要:本文将共现分析应用于非结构化文本文件,挖掘文本主题的语义关联。由于文本文献不同于科技文献,缺少关键字等描述信息,本文引入主题模型对文本进行语义降维,生成的主题词作为共词分析的研究对象。实验发现中频主题词能更好地反映文本的主题特征,为此,本文结合齐普夫定律和同词频理论选取中频主题词,通过共词分析识别语义关联,并采用K-means聚类算法实现主题词的聚类。本文以"创新创业"相关新闻文本进行实验,实现文本集主题词的聚类,通过实验对比分析,本文的方法能够更好地体现文本主题的语义联系。In this paper, co-word analysis is applied to the text file to explore the semantic relevance of the text topic. Since text file is different from the science literature, andlacks keywords, the author uses the topic model to reduce the dimensionality of the text, and uses the topic words generated by LDA as the research object of co-word analysis. The experiment results show that the medium frequency words can better reflect the theme of the text. Therefore, the author uses Zipf's law and the same frequency theory to select medium frequency words as the topic words set, calculates the semantic association strength with co-word analysis, and clusters the topic words using K-means clustering algorithm. The author then makes an experiment with the news text relevant to "innovation and entrepreneurship", and concludes that this method can reflect the semantic relevance of the theme of the text better.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249