检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:蔡文彬 魏云龙[1] 徐海华[2] 潘林[1] CAI Wenbin;WEI Yunlong;XU Haihua;PAN Lin(College of Physics and Information Engineering,Fuzhou University,Fuzhou 350108,China;Temasek Laboratory,Nanyang Technological University,Singapore 639798,Singapore)
机构地区:[1]福州大学物理与信息工程学院,福州350108 [2]南洋理工大学Temasek实验室,新加坡639798
出 处:《计算机工程与应用》2018年第24期20-25,共6页Computer Engineering and Applications
基 金:福建省科技重大项目(No.2017H6009)
摘 要:合成语音的基元是通过最小化目标代价和拼接代价来选取。由于拼接基元涉及复杂的语言学、声学特性,如何选择能准确描述基元信息的声学特征(或语言学特征)并构建相应目标代价是提高合成语音质量的关键。从声学特征和声学模型两个方面对目标代价构建进行了探究。实验结果表明,经过相似语料训练后微调的深度声学网络模型,预测的瓶颈特征更能表征拼接基元特性,从而指导目标代价筛选理想候选单元,提高合成语音的质量。A general method of guiding concatenate unit selection is minimized the sum of target and concatenation cost. Since candidate units involve complex linguistic and acoustic properties, the key of improving synthesized speech quality is how to choose the acoustic (or lingustic) features that accurately represent units characteristics and constructed the corresponding target cost. This paper explores target cost construction from two aspects: acoustic characteristics (Mel-generalized cepstral, log fundamental frequency, bottle-neck feature) and acoustic models. Experimental results show that the Deep Neural Network(DNN) based acoustic model which trained by a big similar cropus and fine tuned with current cropus can predict more robust bottle-neck features, it can employ those features to participate the target cost calculation then guide optimal candidate units selection, and impove the quality of synthesized speech.
关 键 词:语音合成 目标代价 声学特征 声学模型 拼接基元
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.189.143