基于TextRank和自注意力的长文档无监督抽取式摘要

AN UNSUPERVISED EXTRACTIVE SUMMARY METHOD FOR LONG DOCUMENTS BASED ON TEXTRANK AND SELF-ATTENTION

作　　者：邢玲程兵[1] 闫强 Xing Ling;Cheng Bing;Yan Qiang(Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China;School of Mathematical Sciences,University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

机构地区：[1]中国科学院数学与系统科学研究院,北京100190 [2]中国科学院大学数学科学学院,北京100049 [3]中国科学院计算技术研究所,北京100190

出　　处：《计算机应用与软件》2025年第3期274-283,共10页Computer Applications and Software

基　　金：中国科学院随机复杂结构与数据科学重点实验室项目(2008DP173182)。

摘　　要：针对中文长文档自动文本摘要问题,提出将TextRank与自注意力相融合的两种模型:TRAI和TRAO。TRAI将基于统计共现字数得到的句子相似性同基于自注意力得到的句子相关性进行加权求和,作为TextRank边的权重参与迭代计算,对句子进行打分。TRAO利用TextRank对句子打分;利用自注意力重新表示每个句子融合整个文档信息的分布式向量,在此基础上计算句子间余弦相似度,作为TextRank边的权重参与迭代计算,给句子打分;将两种得分加权求和作为句子最终得分。两种模型均根据得分对句子进行排序得到候选摘要。为去除摘要冗余性,利用最大边界相关法(Maximal Marginal Relevance,MMR)在候选摘要中选取摘要句子。将提出的两种模型在构建的长文档上进行实验,与TextRank方法相比,所提方法在ROUGE评价指标上有显著提高。For automatic text summarization of long documents in Chinese,two models TRAI and TRAO,which integrate the Self-Attention with TextRank,are proposed.TRAI performed a weighted summation of sentence similarity based on the number of co-occurring words and sentence relevance based on Self-Attention,which was used as weight of the edge in TextRank to participate in iterative calculation to score the sentence.TRAO used TextRank to score sentences.Self-Attention was used to re-express the distributed vector of each sentence integrating the entire document information,and on this basis,cosine similarity between sentences was calculated as the weight of TextRank edges to participate in iterative calculation to score the sentence.The two scores were weighted and summed as the final score for each sentence.Both TRAI and TRAO sorted sentences based on scores to get candidate abstracts.In order to remove redundancy of abstracts,maximal marginal relevance(MMR)method was used to select abstract sentences from candidate abstracts.The two proposed models were tested on the constructed long documents.Compared with TextRank method,the proposed method has a significant improvement in ROUGE evaluation index.

关键词：中文长文本摘要 TextRank 自注意力机制分布式向量表示语义信息融合文档信息

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于TextRank和自注意力的长文档无监督抽取式摘要

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于TextRank和自注意力的长文档无监督抽取式摘要

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索