基于EfficientNetV2-RetNet的端到端中文管制语音识别  

End-to-End Mandarin Speech Recognition for Air TrafficControl Utilizing EfficientNetV2-RetNet

作  者:梁海军[1] 常瀚文 何一民 赵志伟 孔建国[1] LIANG Haijun;CHANG Hanwen;HE Yimin;ZHAO Zhiwei;KONG Jianguo(College of Air Traffic Management,Civil Aviation Flight University of China,Guanghan 618307,China)

机构地区:[1]中国民用航空飞行学院空中交通管理学院,四川广汉618307

出  处:《电讯技术》2025年第2期254-260,共7页Telecommunication Engineering

基  金:国家重点研发计划(2021YFF0603904);中央高校基本科研业务费专项资金资助(PHD2023-035);中央高校基本科研业务费资助项目(24CAFUC10195)。

摘  要:自动语音识别(Automatic Speech Recognition, ASR)技术在空中交通管制(Air Traffic Control, ATC)领域的应用有望提高通信效率、减少人为错误、提升安全性,并促进航空交通管理系统的创新和改进。然而,由于ATC通信通常涉及敏感信息,获取大量带有标签的ATC语音数据较为困难,这给构建高准确度的ASR系统带来了巨大挑战。基于Retentive Network(RetNet)和迁移学习设计了一种新的端到端ASR框架EfficientNetV2-RetNet-CTC,用于ATC系统。EfficientNetV2的多层卷积结构有助于对语音信号提取更复杂的特征表示。RetNet使用多尺度保持机制学习序列数据上的全局时间动态,可以非常高效地处理长距离依赖性。连接时序分类不用强制对齐标签且标签可变长。此外,迁移学习通过在源任务上学习的知识来改善在目标任务上的性能,解决了民航领域数据资源稀缺的问题且提高了模型的泛化能力。实验结果表明,所设计的模型优于其他基线,在Aishell语料库上预训练的最低词错误率为7.6%和8.7%,在ATC语料库上降至5.6%和6.8%。The utilization of automatic speech recognition(ASR)technology in the air traffic control(ATC)field shows potential in enhancing communication efficiency,minimizing human errors,improving safety,and promoting innovation and advancement in air traffic management(ATM)systems.However,acquiring a substantial size of labeled ATC speech data is challenging due to the sensitive nature of ATC communications,which presents significant obstacles in the development of highly accurate ASR systems.In this paper,a novel end-to-end ASR framework,EfficientNetV2-RetNet-CTC,is developed for ATC systems by utilizing retentive network(RetNet)and transfer learning techniques.The multi-layer convolutional architecture of EfficientNetV2 enables the extraction of intricate feature representations from speech signals.RetNet utilizes a multi-scale retention mechanism for effectively capturing global temporal dynamics in sequence data,thereby enhancing its ability to manage long-distance dependencies efficiently.Connectionist temporal classification(CTC)obviates the necessity for forced alignment of labels and can handle labels of variable lengths.Moreover,transfer learning enhances the performance of the target task by capitalizing on the knowledge acquired from the source task.This approach helps overcome the limited availability of data resources in the civil aviation domain and boosts the model’s capacity for generalization.Experimental results indicate that the developed model surpasses alternative baselines.It achieves a minimum character error rate of 7.6%and 8.7%when pre-trained on the Aishell corpus,which is further reduced to 5.6%and 6.8%on the ATC corpus.

关 键 词:空中交通管制 自动语音识别 端到端深度学习 迁移学习 

分 类 号:V355.1[航空宇航科学与技术—人机与环境工程] TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象