基于语句融合和自监督训练的文本摘要生成模型  被引量:3

Text Summary Generation ModelBased on Sentence Fusion and Self-Supervised Training

在线阅读下载全文

作  者:邹傲 郝文宁 靳大尉 陈刚 ZOU Ao;HAO Wenning;JIN Dawei;CHEN Gang(College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007)

机构地区:[1]陆军工程大学指挥控制工程学院,南京210007

出  处:《模式识别与人工智能》2022年第5期401-411,共11页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.61806221)资助。

摘  要:为了提高深度神经网络文本生成技术的语句融合能力,文中提出基于语句融合和自监督训练的文本摘要生成模型.在模型训练前,首先根据语句融合理论中的信息联系点概念对训练数据进行预处理,使其满足之后模型训练的需要.文中模型可分为两个阶段的训练.在第一阶段,根据语句融合现象在数据集上的分布情况,设计以信息联系点为最小语义单元的排列语言模型训练任务,增强模型对融合语句上下文的信息捕捉能力.在第二阶段,采用基于语句融合信息的注意力掩码策略控制模型在生成文本过程中的信息摄入程度,加强文本生成阶段的语句融合能力.在公开数据集上的实验表明,文中模型在基于统计、深层语义和语句融合比例等多个评测指标上都较优.To improve the capability of sentence fusion of deep neural network text generation technique,a text summary generation model based on sentence fusion and self-supervised training is proposed.Before the model training,the training data are firstly pre-processed according to the concept of points of correspondence in the theory of sentence fusion,and thus the data can meet the needs of model training.The training of the proposed model falls into two parts.In the first stage,according to the distribution of the sentence fusion phenomenon in the dataset,the training task of the permutation language model is designed with the points of correspondence as the minimum semantic unit to enhance the ability to capture the information of the fused sentence context.In the second stage,an attention masking strategy based on the fusion information is utilized to control the information intake of the model during the text generation process to enhance the fusion ability in the text generation stage.Experiments on the open dataset show that the proposed model is superior in several evaluation metrics,including those based on statistics,deep semantics and sentence fusion ratio.

关 键 词:自动文本摘要 语句融合 预训练语言模型 深度神经网络 自监督训练 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象