普通话发音过程中的舌3维运动控制模型  被引量:3

3D motion control model of tongue for Mandarin pronunciation

在线阅读下载全文

作  者:刘蝉 张少川 钱兆鹏 牛海军[1,2] Liu Chan;Zhang Shaochuan;Qian Zhaopeng;Niu Haijun(School of Biological Science and Medical Engineering,Beihang University,Beijing 100083,China;Beijing Advanced Innovation Center for Biomedical Engineering,Beihang University,Beijing 100083,China)

机构地区:[1]北京航空航天大学生物与医学工程学院,北京100083 [2]北京航空航天大学北京市生物医学工程高精尖中心,北京100083

出  处:《中国图象图形学报》2019年第11期1942-1951,共10页Journal of Image and Graphics

基  金:虚拟现实技术与系统国家重点实验室(北京航空航天大学)开放课题基金项目(VRLAB2018B06)

摘  要:目的言语发音过程中发音器官及其运动形态的精确可视化对发音机制的理解、言语疾病的诊断和治疗以及人机言语交互研究都具有重要意义。舌作为言语产生的重要器官,因其运动速度快、变形复杂、发音过程中不可见等原因,可视化比较困难。为此,提出一种基于统计模型法研究汉语普通话元辅音发音时舌的3维动态控制模型。方法首先采集普通话元辅音发音过程中讲话人的磁共振图像(MRI),采用手动标记法提取舌轮廓并建立静态3维网格模型;其次以模型顶点为变量,通过线性主成分分析法提取控制参数并建立舌运动控制方程;最后对发音过程中舌运动控制仿真效果进行评估。结果共提取含舌尖、舌体、舌背和下颌在内的6个3维模型运动控制参数,下颌参数控制下颌张合引起的舌旋转运动,舌体和舌背参数分别控制舌前后、拱起和凹陷运动,舌尖参数分别控制舌尖上下、前后和上翘运动,所提取的6个参数可以表达87. 4%的舌3维运动变化,仿真效果优于其他语言的运动控制结果。结论本文方法可以有效应用于汉语普通话发音的舌建模与3维运动控制,降低舌3维运动建模的复杂性,研究结果可以为汉语普通话发音过程中的器官可视化提供有用信息。Objective The accurate visualization of vocal organs and their movement patterns during pronunciation is crucial for the understanding of pronunciation mechanism,diagnosis and treatment of speech diseases,and human-computer interaction research. As an important vocal organ,the tongue is not completely visible and moves rapidly and flexibly during speaking;therefore,it is difficult to visualize. Advancements in medical imaging technique in recent years have made it possible to capture clear tongue images,thus promoting the development of modeling strategies. Among these strategies,the three most common methods are parametric modeling,physiological modeling,and statistical modeling. Statistical modeling has the advantages of simple calculation,minimal control parameters,fast simulation speed,and strong interpretability and it is suitable for developing a real-time speech training system. However,few studies have applied this method to tongue modeling for Chinese Mandarin pronunciation,and existing statistical models have drawbacks in precision and simulation capabilities. Therefore,this study proposes an improved 3D dynamic control model of the tongue based on statistical modeling for Mandarin vowel-consonant pronunciation. Method The control parameters were extracted using statistical modeling based on linear principal component analysis. The model was based on the assumption that the tongue motion and control parameters are linear. First,a representative corpus was established on the basis of tongue shape variation during Mandarin vowel-consonant pronunciation. The corpus included a set of 49 artificially sustained articulations designed to cover the maximal range of Mandarin allophones,namely,8 vowels,40 consonants in consonant vowel( CV) sequences,and a rest position. On the basis of the corpus,sagittal volume images of the tongue from one speaker were acquired by magnetic resonance imaging( MRI),and supplementary images of the hard palate,jaw,and teeth were acquired by computed tomography( CT). The images were

关 键 词:发音器官可视化  磁共振成像 统计模型 汉语普通话 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象