检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高贝贝 张仰森[1] GAO Beibei;ZHANG Yangsen(Institution of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100192,China)
机构地区:[1]北京信息科技大学智能信息处理研究所,北京100192
出 处:《计算机科学》2024年第11期273-279,共7页Computer Science
基 金:国家自然科学基金(62176023)。
摘 要:字音转换是中文语音合成系统(Text-To-Speech,TTS)的重要组成部分,其核心问题是多音字消歧,即在若干候选读音中为多音字选择一个正确的发音。现有的方法通常无法充分理解多音字所在词语的语义,且多音字数据集存在分布不均衡的问题。针对以上问题,提出了一种基于预训练模型RoBERTa的多音字消歧方法CLTRoBERTa(Cross-lingual Translation RoBERTa)。首先联合跨语言互译模块获得多音字所在词语的另一种语言翻译,并将其作为额外特征输入模型以提升对词语的语义理解,然后使用判别微调中的层级学习率优化策略来适应神经网络不同层之间的学习特性,最后结合样本权重模块以解决多音字数据集的分布不均衡问题。CTLRoBERTa平衡了数据集的不均衡分布带来的性能差异,并且在CPP(Chinese Polyphone with Pinyin)基准数据集上取得了99.08%的正确率,性能优于其他基线模型。Grapheme-to-phoneme conversion(G2P)is an important part of the Chinese text-to-speech system(TTS).The key issue of G2P is to select the correct pronunciation for polyphonic characters among several alternatives.Existing methods usually struggle to fully grasp the semantics of words that contain polyphonic characters,and fail to effectively handle the imbalanced distribution in datasets.To solve these problems,this paper proposes a polyphone disambiguation method based on the pre-trained model RoBERTa,called cross-lingual translation RoBERTa(CLTRoBERTa).Firstly,the cross-lingual translation module gene-rates another translation of the word containing the polyphonic character as an additional input feature to improve the model's semantic comprehension.Secondly,the hierarchical learning rate optimization strategy is employed to adapt the different layers of the neural network.Finally,the model is enhanced with the sample weight module to address the imbalanced distribution in the dataset.Experimental results show that CLTRoBERTa mitigates performance differences caused by uneven dataset distribution and achieves a 99.08%accuracy on the public Chinese polyphone with pinyin(CPP)dataset,outperforming other baseline models.
关 键 词:多音字消歧 预训练模型 字音转换 跨语言互译 层级学习率 样本权重
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.153.154