基于Transformer的多语种字音转换  

Transformer Based Multilingual Grapheme-to-Phoneme Conversion

在线阅读下载全文

作  者:张亚停 张寒 曹少中 姜丹 肖克晶 

机构地区:[1]北京印刷学院信息工程学院,北京

出  处:《计算机科学与应用》2023年第3期510-517,共8页Computer Science and Application

摘  要:字音转换(Grapheme-to-Phoneme, G2P)是语音合成前端的重要部分,影响着语音合成的质量。现如今,大多数的字音转换的研究是针对于单一语种的,而在实际应用中,单一语种合成的语音远没有多语种的实用性高。因此,本文利用Transformer架构研究了在文本交叉混合条件下多语种(英、日、韩)的字音转换,使用音素错误率(Phoneme Error Rate, PER)和单词错误率(Word Error Rate, WER)作为评价指标。英文在基于美国英语的CMUDict数据集进行评估,韩语和日语则是先对SIGMORPHON 2021字音转换任务上的韩语及日语数据集进行了数据扩充,并在扩充后的数据集上进行评估。实验结果表明,在文本交叉混合条件下,基于Transformer架构的英、日、韩字音转换在音素错误率和单词错误率方面与基于Transformer架构的英、日、韩三个语言的单一语种相比都大大降低了。Grapheme-to-Phoneme (G2P) conversion is an important part of the front end of speech synthesis, which affects the quality of speech synthesis. Nowadays, most of the research on G2P conversion is aimed at a single language, and in practical applications, single-language synthesized speech is far less practical than multilingual. Therefore, this paper uses the Transformer architecture to study the G2P conversion of multiple languages (English, Japanese, and Korean) under the condition of text crossmixing, and uses Phoneme Error Rate (PER) and Word Error Rate (WER) as evaluation indicators. English is evaluated on the CMUDict dataset based on American English, while Korean and Japanese are first expanded on the Korean and Japanese data set on the SIGMORPHON 2021 G2P conversion task, and then evaluated on the expanded data set. Experimental results show that under the condition of text crossmixing, the phoneme error rate and word error rate of English, Japanese and Korean characters based on Transformer architecture are greatly reduced compared with the single language of English, Japanese and Korean based on Transformer architecture.

关 键 词:字音转换 TRANSFORMER 多语种 交叉混合 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象