基于ChatGLM2大模型的座舱多模态拒识模型研究  

Research on Multimodal Rejection Model of Cockpit Based on ChatGLM2 Large Model

在线阅读下载全文

作  者:张强 石琴[1,2,3] 程腾 倪昊 Zhang Qiang;Shi Qin;Cheng Teng;Ni Hao(School of Automotive and Transportation Engineering,Hefei University of Technology,Hefei 230009;Key Laboratory for Automated Vehicle Safety Technology of Anhui Province,Hefei 230009;Engineering Research Center for Intelligent Transportation and Cooperative Vehicle-Infrastructure of Anhui Province,Hefei 230009;Chery Automobile Co.,Ltd.,Wuhu 241000)

机构地区:[1]合肥工业大学汽车与交通工程学院,合肥230009 [2]自动驾驶汽车安全技术安徽省重点实验室,合肥230009 [3]安徽省智慧交通车路协同工程研究中心,合肥230009 [4]奇瑞汽车股份有限公司,芜湖241000

出  处:《汽车工程》2025年第3期412-417,429,共7页Automotive Engineering

基  金:安徽省自然科学基金(2208085MF171);中央高校基本科研业务费专项资金(JZ2023YQTD0073,PA2023GDSK0112);安徽省重点研究与开发计划项目(202304A05020087)资助。

摘  要:在智能网联汽车领域,车载系统在复杂环境下对非指令性语音输入的识别精度(系统正确识别语音输入的比例)具有重要意义。针对这一挑战,本文提出了一种多模态拒识模型。该模型基于开源的ChatGLM2-6B大型语言模型,并针对车机交互场景进行了专属的拒识数据集构建和模型微调。拒识数据集采集自真实的驾驶场景,综合了语音信息与驾驶员的面部朝向、情绪等非语言信号,以提供更为丰富的交互信息,有效克服了纯语言识别机制在复杂环境中的局限性。通过实验发现,多模态拒识模型相较于纯语言拒识模型,在测试集上展现出更高的识别准确率ACC和更低的误识别率FRR。In the field of intelligent connected vehicles,the recognition accuracy of in-car systems for noncommand voice input in complex environment(the proportion of correct voice input recognition by the system)is of great significance.To address this challenge,in this paper a multimodal rejection model is proposed.The model is based on the open-source ChatGLM2-6B large language model and has undergone exclusive rejection dataset construction and model fine-tuning for the in-vehicle interaction scenario.The rejection dataset is collected from real driving scenarios,integrating voice information with the driver's facial orientation,gestures,and emotion,and other non-verbal signals to provide richer interaction information,effectively overcoming the limitation of pure language recognition mechanisms in complex environment.Through experiments,it is found that the multimodal rejection model shows higher recognition accuracy(ACC)and lower false rejection rate(FRR)on the test set compared to the pure language rejection model.

关 键 词:智能网联汽车 车载语音交互 拒识 大模型 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象