基于ALBERT-UniLM模型的文本自动摘要技术研究  被引量:6

Automatic Text Summarization Technology Based on ALBERT-UniLM Model

在线阅读下载全文

作  者:孙宝山[1,2] 谭浩 SUN Baoshan;TAN Hao(School of Computer Science and Technology,Tiangong University,Tianjin 300387,China;Tianjin Key Laboratory of Autonomous Intelligence Technology and Systems,Tiangong University,Tianjin 300387,China)

机构地区:[1]天津工业大学计算机科学与技术学院,天津300387 [2]天津市自主智能技术与系统重点实验室,天津300387

出  处:《计算机工程与应用》2022年第15期184-190,共7页Computer Engineering and Applications

基  金:国家自然科学基金(61972456,61173032);天津市自然科学基金(20JCYBJC00140);泛网无线通信教育部重点实验室(BUPT)开放课题(KFKT-2020101)。

摘  要:任务中的生成式摘要模型对原文理解不充分且容易生成重复文本等问题,提出将词向量模型ALBERT与统一预训练模型UniLM相结合的算法,构造出一种ALBERT-UniLM摘要生成模型。该模型采用预训练动态词向量ALBERT替代传统的BERT基准模型进行特征提取获得词向量。利用融合指针网络的UniLM语言模型对下游生成任务微调,结合覆盖机制来降低重复词的生成并获取摘要文本。实验以ROUGE评测值作为评价指标,在2018年CCF国际自然语言处理与中文计算会议(NLPC-C2018)单文档中文新闻摘要评价数据集上进行验证。与BERT基准模型相比,ALBERT-UniLM模型的Rouge-1、Rouge-2和Rouge-L指标分别提升了1.57%、1.37%和1.60%。实验结果表明,提出的ALBERT-UniLM模型在文本摘要任务上效果明显优于其他基准模型,能够有效提高文本摘要的生成质量。Aiming at the problem that the generative summary model in the text summarization task does not fully under-stand the original text and is easy to generate repeated texts,an algorithm combining the dynamic word vector model ALBERT and the unified pre-training model UniLM is proposed to construct an ALBERT-UniLM summary generate the model.The model first uses the pre-trained dynamic word vector ALBERT to replace the traditional BERT benchmark model for feature extraction to obtain the word vector.Then the UniLM language model of the fusion pointer network is used to fine-tune the downstream generation tasks,and the coverage mechanism is combined to reduce the generation of repetitive content and obtain the summary text.The experimental result uses the ROUGE evaluation value as the evaluation indicator.It is verified on the 2018 CCF International Natural Language Processing and Chinese Computing Conference(NLPCC2018)single-document Chinese news summary evaluation data set.Compared with the BERT benchmark model,the Rouge of the ALBERT-UniLM model Rouge-1,Rouge-2 and Rouge-L indicators increased by 1.57%,1.37%and 1.60%respectively.Experimental results show that the ALBERT-UniLM model proposed in the article is significantly better than other benchmark models on text summarization tasks,and can effectively improve the quality of text summarization generation.

关 键 词:自然语言处理 预训练语言模型 ALBERT模型 UniLM模型 生成式摘要 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象