中文语音合成系统中的一种两层韵律结构生成体系(英文)  被引量:2

A Two-stage Prosodic Structure Generation Strategy for Mandarin Text-to-speech Systems

在线阅读下载全文

作  者:董远 周涛 董乘宇 王海拉 

机构地区:[1]Beijing University of Posts and Telecommunications [2]France Telecom R&D(Beijing)

出  处:《自动化学报》2010年第11期1569-1574,共6页Acta Automatica Sinica

基  金:Supported by National Natural Science Foundation of China(90920001);the Key Project of the Ministry of Education of China(108012);Joint-research Project between France Telecom R&DBeijing and Beijing University of Posts and Telecommunications(SEV01100474)

摘  要:Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase,which are two fundamental layers in the hierarchical prosodic structure of Mandarin,and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with diflerent feature selections. Besides,a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%.Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase,which are two fundamental layers in the hierarchical prosodic structure of Mandarin,and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with diflerent feature selections. Besides,a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%.

关 键 词:中文语音合成系统 两层韵律结构生成体系 计算机技术 自动化系统 

分 类 号:TN912.33[电子电信—通信与信息系统] TP3[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象