基于小样本数据统计的双阶段舌位建模研究  

Tongue Shapes Modeling from Small Data Using Two-Stage Autoencoder

在线阅读下载全文

作  者:徐正丽 肖素芳 简敏 杨明浩[2] XU Zhengli;XIAO Sufang;JIAN Min;YANG Minghao(Guilin University of Electronic Technology,Guilin,Guangxi,541004,China;Institute of Automation of the Chinese Academy of Sciences,Beijing,100190,China)

机构地区:[1]桂林电子科技大学,广西桂林541004 [2]中国科学院自动化研究所,北京100190

出  处:《广西科学》2023年第4期745-753,共9页Guangxi Sciences

基  金:国家自然科学基金项目(71463010,22180155466);广西科技计划项目(2021GXNSFBA220048,桂科AB21220038);桂林科技计划项目(2023010123)资助。

摘  要:舌头是人类重要的发音器官,对发音时其形状的降维分析能有效协助语言学家分析人类的发音模式。主成分分析(Principal Component Analysis, PCA)是目前最常用的舌位轮廓降维分析方法。近年来,基于深度学习的自动编码器在降维方面被证明优于PCA。然而,舌头隐藏于口腔内部,难以获得大量的相关数据,这使得传统自动编码器无法直接用于舌位轮廓建模研究。为此,本文提出一种面向小样本舌位运动轮廓数据的双阶段自动编码器降维方法。首先该方法采用主动形状模型(Active Shape Model, ASM)产生大量舌头轮廓生理变形数据,并构建通用轮廓重建模型;接着,在第一阶段模型上添加降维层,用于对舌位轮廓数据进行压缩和分析。实验选取了从人类发音X光片中获得的240个元音舌形数据,并将该方法与传统PCA方法进行比较。结果表明,所提出方法获得的元音舌位图谱在二维平面上相对于传统PCA方法,区分度更好,具有更好的舌形降维和重建能力。The tongue plays a crucial role in human speech production.The dimensionality reduction analysis of tongue pronunciation can effectively assist linguists in analyzing human pronunciation patterns.Traditional methods for tongue position contour compression often relay on Principal Component Analysis(PCA)for dimensionality reduction.In recent years,deep-learning-based autoencoders have been widely used for data compression.However,they require a large number of samples and cannot be directly and effectively used for tongue motion pattern researches.Besides,obtaining a substantial volume of tongue movement data has been challenging due to the tongue's location within the oral cavity.To address these limitations,this paper introduces a two-stage autoencoder dimensionality reduction method designed for small-sample tongue motion contour data.Firstly,Active Shape Model(ASM)is used to generate a large amount of physiological deformation data of tongue contour,and a general tongue contour reconstruction model is constructed based on a conventional automatic encoder.Secondly,on the basis of the automatic encoder in the previous stage,an additional network layer is added to compress and analyze the tongue position data.In experiments,240 vowel and tongue shape datasets obtained from X-ray films of human speech are selected.The tongue position model and traditional PCA methods were compared.The results show that the vowel tongue position map obtained by the proposed method exhibits better discrimination on the two dimensional plane,and has better tongue shape reconstruction performance.

关 键 词:深度神经网络 自动编码器 主成分分析 舌位轮廓 隐藏单元 

分 类 号:TP389[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象