检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周健[1,2] 窦云峰[1,2] 刘荣敏 王华彬 陶亮[1] ZHOU Jian;DOU Yunfeng;LIU Ronglnin;WANG Huabin;TAO Liang(Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education,Anhui University Hefei 230039;Institute of Media Computing,Anhui University Hefei 230601)
机构地区:[1]安徽大学计算智能与信号处理教育部重点实验室,合肥230039 [2]安徽大学媒体计算研究所,合肥230601
出 处:《声学学报》2018年第5期855-863,共9页Acta Acustica
基 金:国家自然科学基金项目(61301295,61371217);安徽省自然科学基金项目(1708085MF151);安徽大学博士科研启动经费项目资助
摘 要:在将耳语音转换为正常音时,为了研究降维后语音特征对耳语音转换的影响,分别对耳语音和正常音谱包络进行自适应编码以提取耳语音和正常音的低维特征,然后使用BP网络建立耳语音和正常音低维谱包络特征之间的映射关系以及正常音基频和耳语音低维谱包络特征之间的关系。转换时,根据耳语音低维谱包络特征获得对应正常音的低维谱包络特征和基频,对低维谱包络特征进行解码后获得对应的正常音谱包络。实验结果表明,采用此方法转换后的语音与正常音之间的倒谱距离相比高斯混合模型方法下降了10%,转换后语音的自然度和可懂度都有所提高。In order to characterize the relationship between whisper and its corresponding normal speech for whisper to normal speech conversion, the low dimension features of spectrum envelope in whisper and normal speech are extracted and represented by a sparse auto-encoder. In the low dimension space, two BP networks are then trained. One is used to model the spectrum relation between the whisper and its corresponding normal speech and the other is used to model the relation between the whisper spectrum and the pitch of normal speech. In the conversion stage, the spectral envelope of whisper is sparsely encoded to obtain low dimension spectral envelope feature. The low dimension normal speech feature and pitch are then estimated respectively through the trained BP networks. With sparse decoding, the envelope spectrum of normal speech is then obtained and used to reconstruct the normal speech. Experimental results show that the ceptral distance of the normal speech estimated by the proposed method decreases 10% compared with that of the GMM-based method. Subjective listening tests also show better naturalness and intelligibility obtained by the proposed method.
关 键 词:特征映射 耳语音 低维 高斯混合模型 语音转换 谱包络 自适应编码 映射关系
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229