基于多模态唇部状态识别的语音导航抗干扰系统  

Voice navigation anti-interference system based on multimodal lip state recognition

在线阅读下载全文

作  者:王晗[1,2] 陈怡霖 季钰姣 杜若琳 WANG Han;CHEN Yilin;JI Yujiao;DU Ruolin(School of Information Science and Technology,Nantong University,Nantong,Jiangsu 226019,China;School of Transportation and Civil Engineering,Nantong University,Nantong,Jiangsu 226019,China)

机构地区:[1]南通大学信息科学技术学院,江苏南通226019 [2]南通大学交通与土木工程学院,江苏南通226019

出  处:《江苏大学学报(自然科学版)》2025年第1期82-90,共9页Journal of Jiangsu University:Natural Science Edition

基  金:国家自然科学基金资助项目(61872425);江苏省研究生创新计划项目(SJCX24_2009)。

摘  要:针对现有车载语音导航设备易受到车内外噪声干扰、无法准确判定声音信号来源的问题,提出了一种基于唇部状态识别的语音导航抗干扰系统.通过摄像头实时识别驾驶员唇部状态,准确判定驾驶员声音信号的起止时间端点,进而控制语音导航输入信号开启和关闭,增强驾驶员对语音导航的控制权限,减少车内外的噪声干扰.为保证唇部状态识别的准确性和鲁棒性,提出了一种基于关键点-外观短时特征融合的多模态唇部状态识别网络,进行了关键点短时特征有效性试验、多模态特征融合的唇部状态识别消融试验、实验室模拟环境和真实车载环境下的语音导航抗干扰试验.结果表明,文中提出的关键点短时特征算子可增强唇部状态变化表征能力14%以上,关键点-外观特征融合的唇部状态识别网络通过特征互补提升识别准确性8.98%以上.基于该网络的语音导航抗干扰系统准确性高(92.6%)、实时性好(检测速度为35帧/s);在驾驶员左、右侧面超过70°的大幅度头部姿态变化下,能有效减少车内外噪声对导航语音控制的干扰,表现出较高的鲁棒性.To solve the problem that the existing in-vehicle voice navigation devices were susceptible to interference from the noise both inside and outside vehicle and could not accurately determine the source of sound signals,the voice navigation anti-interference system based on lip state recognition was proposed.Using a camera to perform real-time recognition of the driver lip state,the start and end points of the driver voice signal were accurately determined,and the activation and deactivation of the voice navigation input signal were controlled for enhancing the driver control over the voice navigation and reducing the interference from the noise inside and outside vehicle.To accurately assess the accuracy and robustness of lip state recognition,the multimodal lip state recognition network based on key point-appearance short-term feature fusion was proposed.The experiment of validating the effectiveness of key point short-term features,the ablation experiment of multimodal feature fusion in lip state recognition and the voice navigation anti-interference tests in both simulated laboratory environments and real in-vehicle environments were conducted.The results show that the proposed key point short-term feature operator can enhance the representation ability of lip state changes by more than 14%.The key point-appearance fusion lip state recognition network improves the recognition accuracy by 8.98%through feature complementation.The voice navigation anti-interference system based on this network exhibits high accuracy of 92.6%and good real-time performance with detection speed of 35 F/s.The interference from the noise inside and outside vehicle on the driver voice control authority can be effectively reduced even under the significant head pose changes of more than 70 degrees to the left or right,which demonstrates high robustness.

关 键 词:语音导航抗干扰系统 唇部状态识别 关键点 外观特征 特征融合 长短期记忆网络 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象