编码器-解码器模型合成汉英语码转换文本  

Synthesizing Mandarin-English Code-Switching Text Using Encoder-Decoder Model

在线阅读下载全文

作  者:黄哲莹 刘作桢 徐及 赵庆卫[1,2] HUANG Zheying;LIU Zuozhen;XU Ji;ZHAO Qingwei(University of Chinese Academy of Sciences,Beijing 100049,China;Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics,Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]中国科学院大学,北京100049 [2]中国科学院声学研究所语音与智能信息处理实验室,北京100190

出  处:《信号处理》2022年第10期2074-2081,共8页Journal of Signal Processing

基  金:国家自然科学基金(61901466)。

摘  要:为了解决汉英语码转换文本数据稀缺的问题,本文提出了基于编码器-解码器模型合成语码转换文本的方法,从有限的语码转换文本与大量单语种平行语料中学习语码转换语言学规则与语种内部的语言学规则,来合成语码转换文本。但是该模型合成的语码转换文本自然度低,因此本文又提出基于带复制机制的编码器-解码器模型合成语码转换文本的方法,在编码器-解码器的基础上,增加了一个门控,用来决定从编码器的预测结果还是从编码器的输入源文本中产生下一个词。最终,该方法使语言模型在SEAME测试集上的困惑度降低了绝对13.96。由此可得出结论,本文提出的方法可大规模地合成自然度高的语码转换文本,缓解语码转换文本数据的稀缺性。To address the scarcity of code-switching text data,a code-switching text synthesizing method was proposed,which constructed a code-switching text generator based on Encoder-Decoder model.The text generator implicitly learned the linguistic constraint rules of code-switching from the limited code-switching text,and the linguistic constraint rules of each language from a large number of monolingual parallel data to synthesize code-switching text.However,the natural⁃ness of the generated text was low.To solve this problem,a method of synthesizing code-switching text based on Encoder-Decoder model with copy mechanism was proposed.On the basis of the code-switching text generator based on Encoder-Decoder model,a gating was added to decide whether to generate the next word from the prediction of the decoder or from the input source text of the encoder.Finally,the proposed method made the perplexity of the language model to obtains an absolute decrease of 13.96.It can be concluded that the method proposed can synthesize a large amount of code-switching text with high naturalness and alleviate the scarcity of code-switching text data.

关 键 词:语码转换 编码器-解码器 合成文本 语言模型 语音识别 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象