基于WFST的俄语字音转换算法研究被引量：3

Algorithm of Grapheme-to-Phoneme Conversion for Russian Based on WFST

作　　者：冯伟易绵竹马延周 FENG Wei;YI Mianzhu;MA Yanzhou(The PLA Strategic Support Force Information Engineering University Luoyang Campus, Luoyang, Henan 471003, China)

机构地区：[1]战略支援部队信息工程大学洛阳校区,河南洛阳471003

出　　处：《中文信息学报》2018年第2期87-93,101,共8页Journal of Chinese Information Processing

基　　金：洛阳市社会科学规划项目(2016B285)

摘　　要：在俄语语音信息处理的资源建设中,字音转换技术起到了至关重要的作用。该文尝试对基于SAMPA的俄语音素集进行改进设计,使标音结果能够反映俄语单词的重音位置及元音弱化现象。依据改进的新音素集构建了包含20 000词的俄语发音词典。在此基础上,实现了一种数据驱动的俄语字音转换算法,将加权有限状态转化器(WFST)应用于算法的对齐、建模和解码过程中。首先利用期望最大化算法以"多对多"的方式对俄语字音进行对齐,然后将对齐结果通过联合N-gram模型训练,并转化为WFST发音模型,最后通过WFST解码算法对任意单词的发音进行预测。交叉验证实验结果表明,平均词形正确率为62.9%,平均音素正确率为92.2%。Grapheme-to-phoneme conversion（G2P） plays a very important role in the resources construction of Russian speech information processing.This paper attempts to improve and design a new Russian phoneme set based on SAMPA,enabling the transcription results to reflect the stress position and vowel reduction of Russian words.After constructing the 20,000-word Russian pronunciation dictionary according to the new phoneme set,this paper implements a data-driven Russian G2P algorithm,emloying the Weighted Finite-State Transducer（WFST） for alignment,model building and decoding.First,the ＂multiple-to-multiple＂alignment algorithm based on Expectation Maximization algorithm is applied to Russian grapheme and phoneme sequences.Then,the joint N-gram model is trained based on the alignment result and converted into WFST as pronunciation model.Finally,the pronunciation of a novel input word can be predicted through WFST decoding algorithm.In cross-validation experiments,the average word accuracy is 62.9%,and the average phoneme accuracy is 92.2%.

关键词：字音转换俄语发音词典加权有限状态转化器

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于WFST的俄语字音转换算法研究被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于WFST的俄语字音转换算法研究 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于WFST的俄语字音转换算法研究被引量：3