基于YOLOv5的口吃类型检测研究

Research on stuttering type detection based on YOLOv5

作　　者：程振贾嘉敏蒋作[2] 王欣 CHENG Zhen;JIA Jia-min;JIANG Zuo;WANG Xin(School of Electrical and Information Technology,Yunnan Minzu University,Kunming 650500,China;School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China)

机构地区：[1]云南民族大学电气信息工程学院,云南昆明650500 [2]云南民族大学数学与计算机科学学院,云南昆明650500

出　　处：《云南民族大学学报(自然科学版)》2025年第1期84-92,共9页Journal of Yunnan Minzu University:Natural Sciences Edition

基　　金：国家自然科学基金(61866040)。

摘　　要：语言交流效率得分是量化口吃严重程度的方法,该方法需要获得口吃发生的时间,但目前相关研究仅能判断语音段中是否存在口吃,无法精确定位口吃的发生位置,不利于对口吃严重程度的判别.针对目前深度学习检测口吃类型无法可视化定位目标的问题,首先使用短时傅里叶变换将语音转化为语谱图,然后对其进行口吃类型标记,最后使用YOLOv5对口吃类型进行检测.在YOLOv5的基础框架下尝试YOLOv5s、YOLOv5m、YOLOv5l、YOLOv5x 4种不同深度和宽度的模型,实现口吃类型的分类和定位,并选择在其性能最优的模型YOLOv5l中引入高效通道注意力机制和CIOU目标框损失函数对基础模型进行改进.实验结果表明,改进的YOLOv5l模型在训练损失值有明显降低,在准确率、召回率和mAP_0.5上分别提升了1.2、0.6和0.4个百分点,较原模型漏检情况有所改善.The language communication efficiency score is a method to quantify the severity of stuttering.This method requires the time when the stuttering occurs.However,current related research can only determine whether there is stuttering in the speech segment,and cannot accurately locate the stuttering,which is not condu-cive to the identification of severity of stuttering.In view of the problem that the current deep learning detection of stuttering type cannot visually locate the target,this paper first uses short-term Fourier transform to convert the speech into a spectrogram,then marks the stuttering type,and finally uses YOLOv5 to detect the stuttering type.Under the basic framework of YOLOv5,four models of different depth and width of YOLOv5s,YOLOv5m,YOLOv5l,and YOLOv5x are tried to realize the classification and positioning of stuttering types,and the efficient attention mechanism and CIOU target box loss function are introduced into with the best performance to improve the basic model.The experimental results show that the improved YOLOv5l model has a significant reduction in the training loss value,and the accuracy,recall and mAP_0.5 are increased by 1.2,0.6 and 0.4 percentage point respectively,which is an improvement compared with the miss detection of the original model.

关键词：YOLOv5 口吃识别语谱图目标检测

分类号：TN912.34[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于YOLOv5的口吃类型检测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于YOLOv5的口吃类型检测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索