基于语义概念和词共现的微博主题词提取研究  被引量:11

Microblog Subject Words Extract Based on Semantic Concept and Word Co-occurrence

在线阅读下载全文

作  者:张孝飞[1] 陈航行 张春花[1] ZHANG Xiao-fei;CHEN Hang-xing;ZHANG Chun-hua(Library of Xizang Minzu University,Xianyang 712082,China;School of Journalism&Communication,Xizang Minzu University,Xianyang 712082,China)

机构地区:[1]西藏民族大学图书馆,陕西咸阳712082 [2]西藏民族大学新闻与传播学院,陕西咸阳712082

出  处:《情报科学》2021年第1期142-147,共6页Information Science

基  金:国家社科基金西部项目“自媒体环境下藏区网络舆情转变及其治理方略研究”(18XXW010);教育部人文社会科学研究规划基金西藏项目“智慧校园环境下西藏高校图书馆用户画像及其应用研究”(19XZJA870001);西藏自治区高等学校人文社会科学研究项目“基于藏文网络媒体舆情分析的热点话题发现方法研究”(SK2017-13)。

摘  要:【目的/意义】从海量微博信息中提取准确的主题词,以期为政府和企业进行舆情分析提供有价值的参考。【方法/过程】通过分析传统微博主题词提取方法的特点及不足,提出了基于语义概念和词共现的微博主题词提取方法,该方法利用文本扩充策略将微博从短文本扩充为较长文本,借助于语义词典对微博文本中的词汇进行语义概念扩展,结合微博文本结构特点分配词汇权重,再综合考虑词汇的共现度来提取微博主题词。【结果/结论】实验结果表明本文提出的微博主题词提取算法优于传统方法,它能够有效提高微博主题词提取的性能。【创新/局限】利用语义概念结合词共现思想进行微博主题词提取是一种新的探索,由于算法中的分词方法对个别网络新词切分可能不合适,会对关键词提取准确性造成微小影响。【Purpose/significance】Extracting accurate keywords from massive microblog information,in order to provide valuable reference for government and enterprises to analyze public opinion.【Method/process】Through the analysis of the characteristics of traditional microblog keywords extraction method and the insufficiency,proposed microblog keywords extraction method based on the semantic concept and word co-occurrence,the method uses text expansion strategy to expand microblog from short text to long text,by means of semantic dictionary to do semantic concept extenseion for microblog words,combining with the characteristics of microblog to distribute structure weight of vocabulary,and considering the degree of co-occurrence words to extract microblog keywords.【Result/conclusion】The experimental results show that the microblog subject word extraction algorithm proposed in this paper is superior to traditional methods.It can effectively improve the performance of microblog subject word extraction.【Innovation/limitation】It is a new exploration to use semantic concepts combined with the idea of word co-occurrence to extract microblog subject words.Since the word segmentation method in the algorithm may not be appropriate for the segmentation of some new network words,there is a slight impact on the accuracy of keyword extraction.

关 键 词:微博 主题词 语义概念 词共现 特征词 

分 类 号:G254.9[文化科学—图书馆学] C912[经济管理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象