检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王萌威 杨哲[1] WANG Mengwei;YANG Zhe(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006
出 处:《计算机科学》2025年第3期214-221,共8页Computer Science
基 金:教育部产学合作协同育人项目(220606363154256)。
摘 要:现有说话人确认方法中用于提取帧级特征的时延神经网络(TDNN)存在两个问题,一是缺少对局部频率特征的建模能力,二是多层特征融合方式无法对高层和低层特征之间的复杂关系进行有效建模。因此,提出一种新的前端模型以及一种新的多层特征融合方式。在前端模型中,通过将输入特征图划分为多个子频带,并逐层扩大子频带的频率范围,使TDNN可以渐进地对局部频率特征进行建模。同时,在主干模型中新增一条由高层向低层传递的反向路径,对相邻两层输出特征之间的关系进行建模,并将反向路径中每层的输出拼接后作为融合后的特征。此外,在主干模型中使用逆瓶颈层的设计,进一步提升模型的性能。在VoxCeleb1测试集上的实验结果表明,所提方法与目前的TDNN方法相比,等错误率和最小代价检测函数分别降低了9%和14%,而参数量仅为目前方法的52%。Two problems with time delay neural networks(TDNN)used to extract frame-level features in existing speaker confirmation methods are the lack of the ability to model local frequency features and the inability of the multilayer feature fusion approach to effectively model the complex relationships between high-level and low-level features.Therefore,a new front-end model as well as a new multilayer feature fusion approach are proposed.In the front-end model,by dividing the input feature map into multiple sub-bands and expanding the frequency range of the sub-bands layer by layer,the TDNN can model the local frequency features progressively.Meanwhile,a new inverse path passing from higher to lower layers is added to the backbone model to model the relationship between the output features of two adjacent layers,and the outputs of each layer in the inverse path are concatenated to serve as the fused features.In addition,the design of the inverse bottleneck layer is used in the backbone model to further improve the performance of the model.Experimental results on the VoxCeleb1 test set show that the proposed method has a relative reduction of 9%in the equal error rate and 14%in the minimum cost detection function,compared to the current TDNN method,while the number of parameters is only 52%of the current method.
关 键 词:声纹识别 说话人确认 时延神经网络 子频带特征提取 多层特征融合
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7