中文短文本自动关键词提取的改进RAKE算法  被引量:11

Improved RAKE Algorithm for Automatic Keyword Extraction in Chinese Short Text

在线阅读下载全文

作  者:陈可嘉 黄思翌 CHEN Ke-jia;HUANG Si-yi(School of Economics and Management,Fuzhou University,Fuzhou 350108,China)

机构地区:[1]福州大学经济与管理学院,福州350108

出  处:《小型微型计算机系统》2021年第6期1171-1175,共5页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(71701019)资助.

摘  要:针对RAKE(Rapid Automatic Keywords Extraction)算法在中文短文本关键词提取算法中未考虑词语语义和候选关键词过长的问题,提出一种以RAKE算法为基础的改进方法.在词语特征值计算阶段,利用词项距离、词间关系频率、共现频率构建共现矩阵,利用语境值计算公式计算每个候选关键词的特征值;按照特征值的降序输出候选关键词,若候选关键词词语个数超过n个,则利用窗口输出算法限制关键词的长度.实验表明,本文方法在中文短文本关键词提取方面相比RAKE算法及其它算法有更好的表现.In order to solve the problem that RAKE(Rapid Automatic Keywords Extraction)does not consider the word semantics and the candidate Key words are too long,an improved algorithm based on RAKE method is proposed.In the eigenvalue calculation stage,the co-occurrence matrix is constructed by using the term distance,the frequency of inter-word relation and the co-occurrence frequency,and the eigenvalue of each candidate keyword is calculated by using the contextual value calculation formula.Candidate keywords are output in descending order according to the eigenvalues.If the number of candidate keyword words exceeds n,the window output algorithm is used to limit the length of keywords.Experiments show that the proposed method has better performance in extracting Chinese short text keywords than RAKE algorithm and other algorithms.

关 键 词:RAKE算法 自动关键词提取 语境 窗口输出 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象