基于复杂网络词节点移除的关键词抽取方法  被引量:1

Extracting Keywords Based on Removed Network Word Nodes

在线阅读下载全文

作  者:王安 顾益军 李坤明 李文政 Wang An;Gu Yijun;Li Kunming;Li Wenzheng(College of Information Technology and Cyber Security,People’s Public Security University of China,Beijing 102600,China)

机构地区:[1]中国人民公安大学信息技术与网络安全学院

出  处:《数据分析与知识发现》2019年第11期35-42,共8页Data Analysis and Knowledge Discovery

基  金:国家重点研发计划项目(项目编号:2017YFC0820100)的研究成果之一

摘  要:【目的】将词节点移除融入TextRank算法,提升中文文本关键词抽取效果。【方法】提出中文关键词抽取改进算法RemoveRank。通过引入词节点移除的方式,交替进行排序步骤与移除步骤,综合考虑词图的复杂网络结构特性,将移除队列作为词节点排序结果,实现关键词的抽取。【结果】利用南方周末网络带关键词标注数据集进行实验评估,实验结果表明,引入词节点移除的方式优于传统算法,在关键词抽取数量分别取3,5,7时,其F值相比TextRank方法分别提高4%,6%,5%。【局限】构建词图时只考虑词节点是否连通,尚未考虑词节点连边的权重。【结论】在合适的滑动窗口取值下,RemoveRank算法可以有效地完成关键词抽取工作。[Objective]This study modifies the TextR ank algorithm with a method of removing word nodes,aiming to improve the results of keyword extraction from Chinese documents.[Methods]We proposed an updated Remove Rank algorithm to collect Chinese keywords and alternately carried out the sorting and removing steps.Based on the complex network structure characteristics of word graph,we used the removal queue as the sorting results for word nodes to extract keywords.[Results]We examined the proposed method on dataset with marked keywords from Southern Weekend.The new algorithm had better performance than the traditional methods.When the number of extracted keywords were 3,5,and 7,their F values were 4%,6%,and 5%higher than those of the TextR ank.[Limitations]Our word graph did not include the weight of edges.[Conclusions]The RemoveR ank method could effectively extract keywords from Chinese documents with the appropriate sliding window values.

关 键 词:关键词抽取 TextRank 图模型 词语节点 子图划分 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象