检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京印刷学院信息工程学院,北京
出 处:《计算机科学与应用》2023年第3期510-517,共8页Computer Science and Application
摘 要:字音转换(Grapheme-to-Phoneme, G2P)是语音合成前端的重要部分,影响着语音合成的质量。现如今,大多数的字音转换的研究是针对于单一语种的,而在实际应用中,单一语种合成的语音远没有多语种的实用性高。因此,本文利用Transformer架构研究了在文本交叉混合条件下多语种(英、日、韩)的字音转换,使用音素错误率(Phoneme Error Rate, PER)和单词错误率(Word Error Rate, WER)作为评价指标。英文在基于美国英语的CMUDict数据集进行评估,韩语和日语则是先对SIGMORPHON 2021字音转换任务上的韩语及日语数据集进行了数据扩充,并在扩充后的数据集上进行评估。实验结果表明,在文本交叉混合条件下,基于Transformer架构的英、日、韩字音转换在音素错误率和单词错误率方面与基于Transformer架构的英、日、韩三个语言的单一语种相比都大大降低了。Grapheme-to-Phoneme (G2P) conversion is an important part of the front end of speech synthesis, which affects the quality of speech synthesis. Nowadays, most of the research on G2P conversion is aimed at a single language, and in practical applications, single-language synthesized speech is far less practical than multilingual. Therefore, this paper uses the Transformer architecture to study the G2P conversion of multiple languages (English, Japanese, and Korean) under the condition of text crossmixing, and uses Phoneme Error Rate (PER) and Word Error Rate (WER) as evaluation indicators. English is evaluated on the CMUDict dataset based on American English, while Korean and Japanese are first expanded on the Korean and Japanese data set on the SIGMORPHON 2021 G2P conversion task, and then evaluated on the expanded data set. Experimental results show that under the condition of text crossmixing, the phoneme error rate and word error rate of English, Japanese and Korean characters based on Transformer architecture are greatly reduced compared with the single language of English, Japanese and Korean based on Transformer architecture.
关 键 词:字音转换 TRANSFORMER 多语种 交叉混合
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49