检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邢玲 程兵[1] 闫强 Xing Ling;Cheng Bing;Yan Qiang(Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China;School of Mathematical Sciences,University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]中国科学院数学与系统科学研究院,北京100190 [2]中国科学院大学数学科学学院,北京100049 [3]中国科学院计算技术研究所,北京100190
出 处:《计算机应用与软件》2025年第3期274-283,共10页Computer Applications and Software
基 金:中国科学院随机复杂结构与数据科学重点实验室项目(2008DP173182)。
摘 要:针对中文长文档自动文本摘要问题,提出将TextRank与自注意力相融合的两种模型:TRAI和TRAO。TRAI将基于统计共现字数得到的句子相似性同基于自注意力得到的句子相关性进行加权求和,作为TextRank边的权重参与迭代计算,对句子进行打分。TRAO利用TextRank对句子打分;利用自注意力重新表示每个句子融合整个文档信息的分布式向量,在此基础上计算句子间余弦相似度,作为TextRank边的权重参与迭代计算,给句子打分;将两种得分加权求和作为句子最终得分。两种模型均根据得分对句子进行排序得到候选摘要。为去除摘要冗余性,利用最大边界相关法(Maximal Marginal Relevance,MMR)在候选摘要中选取摘要句子。将提出的两种模型在构建的长文档上进行实验,与TextRank方法相比,所提方法在ROUGE评价指标上有显著提高。For automatic text summarization of long documents in Chinese,two models TRAI and TRAO,which integrate the Self-Attention with TextRank,are proposed.TRAI performed a weighted summation of sentence similarity based on the number of co-occurring words and sentence relevance based on Self-Attention,which was used as weight of the edge in TextRank to participate in iterative calculation to score the sentence.TRAO used TextRank to score sentences.Self-Attention was used to re-express the distributed vector of each sentence integrating the entire document information,and on this basis,cosine similarity between sentences was calculated as the weight of TextRank edges to participate in iterative calculation to score the sentence.The two scores were weighted and summed as the final score for each sentence.Both TRAI and TRAO sorted sentences based on scores to get candidate abstracts.In order to remove redundancy of abstracts,maximal marginal relevance(MMR)method was used to select abstract sentences from candidate abstracts.The two proposed models were tested on the constructed long documents.Compared with TextRank method,the proposed method has a significant improvement in ROUGE evaluation index.
关 键 词:中文长文本摘要 TextRank 自注意力机制 分布式向量表示 语义信息 融合文档信息
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:52.14.186.192