融合多种语音特征参数的阈下抑郁风险预测  

Subthreshold Depression Risk Prediction by Fusing Different Speech Feature Parameters

在线阅读下载全文

作  者:何婉婷 林琴韵 杨旭东 严洪立 徐攀 杨朝阳[3] 高跃明[1,2,4] HE Wanting;LIN Qinyun;YANG Xudong;YAN Hongli;XU Pan;YANG Zhaoyang;GAO Yueming(College of Advanced Manufacturing,Fuzhou University,Quanzhou,Fujian 362251,China;International Joint Laboratory of Intelligent Perception of Health Information,Fuzhou University,Fuzhou,Fujian 350108,China;School of Traditional Chinese Medicine,Fujian University of Traditional Chinese Medicine,Fuzhou,Fujian 350122,China;School of Physics and Information Engineering,Fuzhou University,Fuzhou,Fujian 350108,China)

机构地区:[1]福州大学先进制造学院,福建泉州362251 [2]福州大学健康信息智能感知国际联合实验室,福建福州350108 [3]福建中医药大学中医学院,福建福州350122 [4]福州大学物理与信息工程学院,福建福州350108

出  处:《复旦学报(自然科学版)》2024年第3期344-350,共7页Journal of Fudan University:Natural Science

基  金:国家重点研发计划“政府间国际科技创新合作”重点专项(2022YFE0115500)。

摘  要:声纹识别为阈下抑郁辨识诊断和干预评价提供客观的参考依据。本研究采用不同言语方式(读/a/音、文本朗读、图片描述、自由访谈)和情绪刺激(正性、中性、负性),融合韵律、音色、频谱、共振峰等4类语音特征,提取出Mel频率倒谱系数、音速、基频、共振峰等16种特征参数,利用随机森林分类算法建立阈下抑郁风险预测模型,并与其他分类器对比。结果表明:未融合特征前图片描述和自由访谈的识别率高于其他言语方式,其中正性刺激的预测结果更好,准确率达72.50%和67.39%;融合特征后读/a/音和自由访谈分别获得了93.00%和85.00%的高准确率。由此可知,融合特征后模型学习到的语音信息不仅仅包含被试者的情感状态,也包含特征类型间的相互关系;读/a/音和自由访谈保留更多的声道信息,其中读/a/音发声持久、音强持续,自由访谈语量和特征全面,接近自然言语。本文结果对阈下抑郁早期风险预测有一定的参考意义。Vocal pattern recognition provides an objective reference for subthreshold depression recognition diagnosis and intervention evaluation.In this study,we used different speech modalities(reading/a/tone,text reading,picture description,free interview)and emotional stimuli(positive,neutral,negative),fused four types of speech features such as rhyme,timbre,spectrum,and resonance peak,extracted 16 feature parameters such as Mel-frequency cepstrum coefficient,speed of sound,fundamental frequency,and resonance peak,established a subthreshold depression risk prediction model using random forest,and compared the performance with other classifiers.The results showed that the recognition rate of picture description and free interview before fusing features was higher than other speech modalities,in which the prediction results of positive stimuli were better with 72.50%and 67.39%accuracy;the high accuracy rates of 93.00%and 85.00%were obtained for reading/a/tone and free interview after fusing feature layers,respectively.It can be seen that the phonetic information learned by the model after fusing features contains not only the subject s emotional state but also the interrelationship between feature types;the reading/a/tone and free interview retain more vocal tract information,where the reading/a/tone vocalization is persistent and the sound intensity is sustained,and the free interview speech volume and features are comprehensive and close to natural speech,which are informative for early risk prediction of subthreshold depression.

关 键 词:阈下抑郁 语音特征 分类器 声纹识别 

分 类 号:TN912[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象