检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]西北工业大学计算机学院,陕西西安710072
出 处:《西北工业大学学报》2004年第5期674-678,共5页Journal of Northwestern Polytechnical University
基 金:中国科技部与比利时弗拉芒大区国际科技合作项目 (国科外 19990 2 0 9号 )资助
摘 要:提出了一种用于听视觉语音识别的基于 MASM的口形轮廓提取方法 ,这种方法只需要少量的训练数据就可以实现对大量口形轮廓的准确提取。还引入了一种口形轮廓的平滑修正方法 ,该方法利用口形连续变化的特点 ,对错误轮廓进行修正。实验证明 ,利用该方法提取轮廓的准确率比常规 ASM模型高出 2 0个百分点 ;将该口形轮廓特征引入到听视觉语音识别中 。In audio visual speech recognition and lipreading, the widely used ASM (Active Shape Model) for lip contour extraction suffers from the lack of robustness and cannot extract the exact lip contours due to the various mouth shape changes when uttering. We present a more robust model——Multiple Active Shape Model (MASM). The model classifies the mouth shapes into closed mouth set, half-opened mouth set, and round mouth set. An independent ASM is built for each different set with a tiny set of the training data. The MASM contour extraction algorithm automatically selects the best accurate lip contour from multiple shape searching procedures. Considering the consecutive changes of the mouth, a method for smoothing lip contours is also presented to correct the contour extraction errors. Experimental results from AVCONDIG database show that extraction accuracy achieved by the MASM is 13% higher than that of conventional ASM. The combination of the MASM and the contour-smoothing method leads to another 7% accuracy improvement. With the fusion of the exact lip contour feature and audio MFCC (Mel Frequency Cepstral Coefficients) feature, the average word recognition rates of the considered connected-digits speech recognition task are considerably increased under noisy acoustic conditions.
关 键 词:语音识别 听视觉语音识别 ASM MASM 口形轮廓提取
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.44.106