基于有限训练数据和开放集学习的鲁棒小型关键词检测系统  

Open-set learning for a robust small-footprint keyword spotting system with limited training data

在线阅读下载全文

作  者:黄子峻 张晓雷 HUANG Zijun;ZHANG Xiaolei(School of Marine Science and Technology,Northwestern Polytechnical University,Xi'an 710072,China;Shenzhen Research Institute,Northwestern Polytechnical University,Shenzhen 518057,China)

机构地区:[1]西北工业大学航海学院,西安710072 [2]西北工业大学深圳研究院,深圳518057

出  处:《清华大学学报(自然科学版)》2024年第11期1927-1935,共9页Journal of Tsinghua University(Science and Technology)

基  金:国家自然科学基金面上项目(62176211);深圳市科创委国际合作研究项目(GJHZ20240218114401004)。

摘  要:关键词检测旨在从语音中检测出待识别的关键词,深度神经网络为小型关键词检测任务提供了有效的解决方案。大多数现有关键词检测方法采用Softmax最小化交叉熵损失函数,假设测试和训练样本来自相同分布,侧重于在训练集上最大化分类精度,而未考虑训练集外的未知语音。若训练数据有限,关键词检测系统在遇到未知语音时,实现鲁棒性和高准确率仍比较困难。该文研究了开放集学习方法,结合深度特征编码器和基于卷积原型学习、互斥点学习的分类器,用于开放集关键词检测任务。该文提出的关键词检测方法不仅提高了关键词的分类精度,而且具有较好的非关键词检测性能。在Google Speech Commands数据集V0.01和V0.02,以及由Libri Seechp衍生的Libri Words数据集上的试验结果表明:该文提出的关键词检测方法在大多数评估指标上优于基线方法。[Objective] Keyword spotting(KWS) aims to detect recognizable keywords from speech.Deep neural networks have provided effective solutions for KWS in small-scale applications.However,most KWS methods employ Softmax-based cross-entropy loss,assuming that the test and training samples have identical distributions.These methods focus on maximizing the classification accuracy of the training set,often neglecting unknown speech data outside the training samples.This approach can lead to significant challenges in real-world scenarios where limited training data is available and individuals frequently encounter unfamiliar speech.[Methods] This paper introduces a approach to KWS by exploring open-set learning methods that can accommodate the open vocabulary of KWS tasks.These methods combine deep feature encoders with classifiers based on convolutional prototype learning and reciprocal point learning.For convolutional prototype learning,this paper first replaces the Softmax network with the prototype network to eliminate the closed-world assumption.Subsequently,constructs prototypes for each keyword that represent class-level features in the feature space.This paper uses a distance-based method to represent the similarity between the sample and the keyword for classification,maximizing the likelihood probability of the sample.To effectively reject non-keywords,this paper applies a regularization constraint on the boundary of the prototypes,which improves the robustness of the system.For reciprocal point learning,this paper constructs reciprocal points that represent features not associated with the keyword class.This paper assumes that the probability of a sample belonging to a keyword is proportional to the distance between this point and the reciprocal point,and uses this as a classification criterion.To detect non-keywords,this paper restricts the boundary range of reciprocal points.In addition,this paper explores variants of reciprocal point learning,such as adversarial reciprocal point learning,which uses a more effe

关 键 词:有限训练数据 关键词检测 开放集识别 原型学习 

分 类 号:TN912.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象