基于双判别器对抗模型的半监督跨语言词向量表示方法  

Semi-Supervised Method for Cross-Lingual Word Embedding Based on an Adversarial Model with Double Discriminators

在线阅读下载全文

作  者:张玉红[1,2] 植文武[1,2] 李培培 胡学钢 Zhang Yuhong;Zhi Wenwu;Li Peipei;Hu Xuegang(Key Laboratory of Knowledge Engineering with Big Data(Hefei University of Technology),Ministry of Education,Hefei 230009;School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601)

机构地区:[1]大数据知识工程教育部重点实验室(合肥工业大学),合肥230009 [2]合肥工业大学计算机与信息学院,合肥230601

出  处:《计算机研究与发展》2023年第9期2127-2136,共10页Journal of Computer Research and Development

基  金:国家重点研发计划项目(2020AAA0106100);国家自然科学基金项目(62076087,61976077);安徽省自然科学基金项目(2208085MF170)。

摘  要:跨语言词向量表示旨在利用语言资源丰富的词向量提高语言资源缺乏的词向量表示.已有方法学习2个词向量空间的映射关系进行单词对齐,其中生成对抗网络方法能在不使用对齐字典的条件下获得良好性能.然而,在远语言对上,由于缺乏种子字典的引导,映射关系的学习仅依赖向量空间的全局距离,导致求解的词对存在多种可能,难以准确对齐.为此,提出了基于双判别器对抗的半监督跨语言词向量表示方法.在已有对抗模型基础上,增加一个双向映射共享的、细粒度判别器,形成具有双判别器的对抗模型.此外,引入负样本字典补充预对齐字典,利用细粒度判别器进行半监督对抗学习,消减生成多种词对的可能,提高对齐精度.在2个跨语言数据集上的实验效果表明,提出的方法能有效提升跨语言词向量表示性能.Cross-lingual word embedding aims to use the embedding space of resource-rich languages to improve the embedding of resource-scare languages,and it is widely used in a variety of cross-lingual tasks.Most of the existing methods address the word alignment by learning a linear mapping between two embedding spaces.Among them,the adversarial model based methods have received widespread attention because they can obtain good performance without using any dictionary.However,these methods perform not well on the dissimilar language pairs.The reason may be that the mapping learning only relies on the distance measurement for the entire space without the guidance of the seed dictionary,which results in multiple possibilities for the aligned word pairs and unsatisfying alignment.Therefore,in this paper,a semi-supervised cross-lingual word embedding method based on an adversarial model with dual discriminators is proposed.Based on the existing adversarial model,a bi-directional shared and fine-grained discriminator is added,and then an adversarial model with double discriminators is constructed.In addition,a negative sample dictionary is introduced as a supplement of the supervised seed dictionary to guild the fine-grained discriminator in a semi-supervised way.By minimizing the distance between the initial word-pairs and the supervised dictionary,including the seed dictionary and negative dictionary,the fine-grained discriminator will reduce the possibility of multiple word pairs and recognize the correct aligned pairs from those initial generated dictionaries.Finally,experimental results conducted on two cross-lingual datasets show that our proposed method can effectively improve the performance of the cross-lingual word embedding.

关 键 词:跨语言 词向量表示 对抗训练 双判别器 半监督 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象