检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周伟枭 蓝雯飞[1] 许智明 朱容波[1] ZHOU Weixiao;LAN Wenfei;XU Zhiming;ZHU Rongbo(School of Computer Science,South-Central University for Nationalities,Wuhan 430074,China;School of Mechanical Engineering and Automation,Fuzhou University,Fuzhou 350108,China)
机构地区:[1]中南民族大学计算机科学学院,武汉430074 [2]福州大学机械工程及自动化学院,福州350108
出 处:《计算机科学与探索》2021年第5期907-921,共15页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金(61772562)。
摘 要:针对抽取式方法、生成式方法在长文档摘要上的流畅性、准确性缺陷以及在文档编码前截断原始文档造成的重要信息缺失问题,提出一种两阶段长文档摘要模型SFExt-PGAbs,由次模函数抽取式摘要SFExt与指针生成器生成式摘要PGAbs组成。SFExt-PGAbs模拟人类对长文档进行摘要的过程,首先使用SFExt在长文档中抽取出重要句子,过滤不重要且冗余的句子形成过渡文档,然后PGAbs接收过渡文档作为输入以生成流畅且准确的摘要。为获取与原始文档中心思想更为接近的过渡文档,在传统SFExt中拓展出位置重要性、准确性两个子方面,同时设计新的贪心算法。为研究不同特征提取器对生成摘要质量的影响,在PGAbs中应用两种循环神经网络。实验结果显示,在CNNDM测试集上,SFExt-PGAbs相较于基线模型生成了更为流畅、准确的摘要,ROUGE指标有较大提升。同时,子方面拓展后的SFExt也能抽取得到更准确的摘要。Aiming at the fluency problem of extractive method,the accuracy problem of abstractive method,and the important information missing problem caused by truncating the original document before document encoding,this paper proposes a two-stage long document summarization model SFExt-PGAbs.It is composed of submodular function for extractive summarization SFExt and pointer generator for abstractive summarization PGAbs.SFExt-PGAbs simulates the human process of summarizing a long document.First,SFExt is used to extract important sentences from the long document and filter the unimportant and redundant sentences to form a transitional document.Then,PGAbs receives the transitional document as input to generate a fluent and accurate summary.In order to get a transitional document that is closer to the original document-centered idea,this paper expands the two sub-aspects of positional importance and accuracy in the traditional SFExt,and designs a new greedy algorithm at the same time.In order to study the effect of different feature extractors on the quality of the generated summary,two kinds of recurrent neural networks are applied in PGAbs.The experimental results show that on the CNNDM test set,SFExt-PGAbs generates a more fluent and more accurate summary compared with the baseline model,and the ROUGE indicators are significantly improved.At the same time,the expanded sub-aspects of SFExt can extract more accurate summary.
关 键 词:两阶段摘要模型 长文档摘要 抽取式摘要 生成式摘要 次模函数 指针生成器 子方面融合
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117