基于关键词抽取的网络博客自动文摘算法的研究  被引量:2

Research on Automatic Digest Algorithm of Web Blog based on Keyword Extraction

在线阅读下载全文

作  者:李敏[1] 陶宏才[1] LI Min;TAO Hongcai(School of Information Science&Technology,Southwest Jiaotong University,Chengdu 611756,China)

机构地区:[1]西南交通大学信息科学与技术学院,四川成都611756

出  处:《成都信息工程大学学报》2020年第2期158-162,共5页Journal of Chengdu University of Information Technology

基  金:国家自然科学基金资助项目(61806170)。

摘  要:TextRank算法基于图论,考虑文本的整体结构,而关键词与文本主题紧密关联。网络博客作为一种新兴的出版方式,与新闻、专业论文等文本不同,其编辑方式更为随意,没有传统意义上的一般格式。将关键词抽取与TextRank算法结合起来,提出一种适用于博客文本的基于关键词抽取的自动文摘算法。首先通过TextRank算法抽取文本关键词,用BM25算法计算句子相似度。然后,以句子相似度为权重构建带权图,迭代计算获取TextRank评分。将TextRank评分与关键词评分相加得到句子最终得分,选择评分最高的前i个句子,按照句子在原文中的顺序输出得到自动文摘。通过ROUGE工具的测评,设计对比实验证明算法效果良好。The TextRank algorithm is based on graph theory,considering the overall structure of the text.Keywords are closely related to the text theme.As a new publishing method,online blogs are different from texts such as news and papers,and their editing methods are more casual.There is no general format in the traditional sense.This paper combines keyword extraction with TextRank algorithm,and proposes an automatic abstracting algorithm based on keyword extraction suitable for blog text.First,keywords are extracted through the TextRank algorithm,and sentence similarity is calculated using the B M 25 algorithm.The sentence similarity is used to construct weighted graphs for weights,and iterative calculations are used to obtain TextRank scores.The TextRank score and the keyword score are added to the final sentence score,and the top i sentences with the highest score are selected and output in the order of the original text to obtain automatic digests.Through the evaluation of ROUGE tools,comparison experiments show that the algorithm works well.

关 键 词:自动文摘 TextRank 关键词 BM25 ROUGE 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象