基于BERT模型的生成式自动文本摘要  

Abstractive Summarization Based on BERT Model

在线阅读下载全文

作  者:周圆 张琨[1] 陈智源 江浩俊 方自正 ZHOU Yuan;ZHANG Kun;CHEN Zhiyuan;JIANG Haojun;FANG Zizheng(Nanjing University of Science and Technology,Nanjing 210094)

机构地区:[1]南京理工大学,南京210094

出  处:《计算机与数字工程》2024年第10期3052-3058,共7页Computer & Digital Engineering

摘  要:随着深度学习的不断发展,预训练语言模型在自然语言处理领域已经取得了良好的效果。当然,自动文本摘要作为自然语言处理领域的重要研究方向之一也得益于大规模预训练语言模型。尤其在生成式文本摘要方面,利用大规模预训练语言模型,生成一段能较为准确地反映原文主旨信息的摘要。但是目前的研究还存在一些问题,比如对原文档的语义信息了解不够充分,无法对多义词进行有效表征,生成的摘要存在重复内容,且逻辑性不强等。为了缓解上述问题,论文基于BERT预训练语言模型提出一种新的生成式文本摘要模型TextRank-BERT-PGN-Coverage(TBPC)。该模型利用经典的Encoder-Decoder框架,预训练权重并生成摘要。该实验采用CNN/Daily Mail数据集作为实验所用数据集,实验结果表明,与该领域目前已有的研究结果相比,论文提出的模型取得了较好的实验效果。With the continuous development of deep learning,pre-trained language models have achieved great results in the field of natural language processing.Of course,automatic text summarization,as an important research direction in natural lan⁃guage processing,also benefits from large-scale pre-trained language models.In particular,a large-scale pre-training language model is used to generate an abstractive summarization that can accurately reflect the main idea of the original text.However,there are still some problems in current research,such as insufficient understanding of the semantic information of the original document,unable to effectively represent polysemy,the generated abstract has repeated content,and the logicality is not strong.In order to al⁃leviate the above problems,this paper proposes a new generative text summarization model TextRank-BERT-PGN-Coverage(TB⁃PC)based on BERT pre-trained language model.The model uses classical Encoder-Decoder framework to pre-train weights and generate abstracts.In this experiment,CNN/Daily Mail dataset is used as the experimental dataset.Experimental results show that compared with the existing research results in this field,the model proposed in this paper achieves a better experimental result.

关 键 词:生成式文本摘要 TextRank算法 BERT模型 指针生成网络 覆盖机制 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象