基于改进SKA-TDNN的说话人语音聚类研究

Research on Speaker Speech Clustering Based on Improved SKA-TDNN

作　　者：陆思宇姜因王志翼 LU Si-yu;JIANG Nan;WANG Zhi-yi(College of Public Security Information Technology and Intelligence,Criminal Investigation Police University of China,Shenyang Liaoning 110854,China;Key Laboratory of Evidence Science,Ministry of Education,China University of Political Science and Law,Beijing,100088,China)

机构地区：[1]中国刑事警察学院公安信息技术与情报学院,辽宁沈阳110854 [2]中国政法大学证据科学教育部重点实验室,北京100088

出　　处：《计算机仿真》2025年第3期358-364,共7页Computer Simulation

基　　金：公安学科基础理论研究创新计划项目(2022XKGJ0110);辽宁省科技厅联合开放基金机器人学国家重点实验室开放基金资助项目(2020-KF-12-11);中国政法大学证据科学教育部重点实验室开放基金资助课题(2021KFKT09);中央高校基本科研业务费专项资金资助(3242019010);辽宁省自然科学基金项目(2019-ZD-0168);教育部重点研究项目(E-AQGABQ20202710)。

摘　　要：说话人语音聚类可广泛应用于大规模无标注语音数据的预处理任务中。针对短时语音段的特征提取能力弱以及聚类算法聚类不稳定的问题,基于可变卷积核注意力时延网络(SKA-TDNN)提出一种带有多尺度注意力的网络结构。在不增加网络体积的基础上,进一步提升频域和通道信息的捕获能力和网络的全局感受野。同时,根据说话人语音数据分布特性,提出一种基于峰值统计的k-means聚类算法,有效解决原始算法中随机初始化聚类中心带来的聚类准确率低和收敛速度慢的问题。实验结果表明,在Aishell4中文会议数据集中,提出的改进SKA-TDNN特征提取网络和改进k-means聚类算法,在聚类准确率和算法收敛速度上得到了有效的提升。Speaker speech clustering can be widely used in the preprocessing of large-scale unlabeled speech data.In order to solve the problem of weak feature extraction ability of short-term speech segments and the instability of the clustering algorithm,this paper proposes a network structure with multi-scale attention based on a variable convolution kernel attention delay network.On the basis of not increasing the volume of the network,the acquisition ability of frequency domain and channel information and the global receptive field of the network are further improved.At the same time,according to the distribution characteristics of speaker speech data,a k-means clustering algorithm based on peak statistics is proposed,which can effectively solve the problems of low clustering accuracy and slow convergence speed caused by random initialization of clustering centers in the original algorithm.The experimental results show that in the Aishell4 Chinese conference data set,the improved SKA-TDNN feature extraction network and the improved k-means clustering proposed in this paper have effectively improved the clustering accuracy and algorithm convergence speed.

关键词：说话人语音聚类时延神经网络注意力机制

分类号：TP391.9[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进SKA-TDNN的说话人语音聚类研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进SKA-TDNN的说话人语音聚类研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索