使用词对齐半监督对抗学习的汉越跨语言摘要生成方法被引量：4

Semi-supervised Adversarial Chinese-Vietnamese Cross-lingual Summarization Generation Method Using Word Alignment

作　　者：王剑[1] 张莹余正涛[1,2] 黄于欣[1,2] WANG Jian;ZHANG Ying;YU Zheng-tao;HUANG Yu-xin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)

机构地区：[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学云南省人工智能重点实验室,昆明650500

出　　处：《小型微型计算机系统》2022年第5期992-997,共6页Journal of Chinese Computer Systems

基　　金：国家重点研发计划项目(2018YFC0830105,2018YFC0830101,2018YFC0830100)资助;国家自然科学基金项目(61972186,61762056,61472168)资助;云南省重大科技专项计划项目(202002AD080001)资助;云南省高新技术产业专项(201606)资助。

摘　　要：是将输入的源语言文本生成目标语言摘要的过程.目前跨语言摘要任务大多是借助于机器翻译,而针对越南语这类低资源语言,机器翻译效果不佳是汉越跨语言摘要面临的挑战.针对该问题,提出了一种基于词对齐的半监督对抗学习汉越跨语言摘要生成方法,其思想是将汉越双语对齐到同一空间,得到对齐的双语特征,然后同时利用双语特征生成跨语言摘要.具体来讲,基于编解码框架,首先利用Bert编码器分别对输入的汉越文本进行向量表征;然后基于汉越双语词典的半监督对抗学习方法,实现双语词向量在同一语义空间对齐;最后基于注意力机制同时关注双语上下文向量,解码得到目标语言摘要.在收集的汉越摘要数据集上的实验结果表明,该方法可以有效提升汉越跨语言摘要模型的性能.Cross-lingual summarization is the process of generating a summary text in the target language from a text input in the source language.Currently,most cross-lingual summarization tasks rely on machine translation.For low-resource languages such as Vietnamese,poor translation is a challenge for Chinese-Vietnamese cross-language summarization.In response to the challenge,this model proposes a method of generating cross-lingual summarization under the semi-supervised adversarial learning framework based on word alignment.The idea is to align the Chinese and Vietnamese bilinguals into the same semantic space to obtain aligned bilingual features,and then simultaneously use the bilingual features to generate cross-language abstracts.The model first adopts the Bert encoder to represent the input Chinese and Vietnamese text respectively,then conducts semi-supervised adversarial learning based on a Chinese-Vietnamese bilingual dictionary to realize the alignment of bilingual word vectors in the same semantic space.Ultimately,based on the attention mechanism,the bilingual context vector is simultaneously focused,and the target language abstract is obtained by the transformer decoder.The experimental results show that the collected Chinese-Vietnamese abstract datasets indicate that our method effectively improves the performance of Chinese-Vietnamese cross-lingual summarization.

关键词：跨语言摘要 Bert 半监督对抗学习词对齐

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

使用词对齐半监督对抗学习的汉越跨语言摘要生成方法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

使用词对齐半监督对抗学习的汉越跨语言摘要生成方法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

使用词对齐半监督对抗学习的汉越跨语言摘要生成方法被引量：4