The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition  被引量:1

在线阅读下载全文

作  者:Mohammad Amaz Uddin Mohammad Salah Uddin Chowdury Mayeen Uddin Khandaker Nissren Tamam Abdelmoneim Sulieman 

机构地区:[1]Department of Computer Science and Engineering,BGC Trust University Bangladesh,Chittagong,4381,Bangladesh [2]Centre for Applied Physics and Radiation Technologies,School of Engineering and Technology,Sunway University,Bandar Sunway,Selangor,47500,Malaysia [3]Department of Physics,College of Sciences,Princess Nourah bint Abdulrahman University,P.O Box 84428,Riyadh,11671,Saudi Arabia [4]Department of Radiology and Medical Imaging,Prince Sattam bin Abdulaziz University,Alkharj,Saudi Arabia

出  处:《Computers, Materials & Continua》2023年第1期1709-1722,共14页计算机、材料和连续体(英文)

基  金:Princess Nourah bint Abdulrahman University Researchers Supporting Project(Grant No.PNURSP2022R12),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.

摘  要:Human speech indirectly represents the mental state or emotion of others.The use of Artificial Intelligence(AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech.In this study,we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory(LSTM)and Convolutional Neural Network(CNN).About 2800 audio files were extracted from the Toronto emotional speech set(TESS)database for this study.A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data.A total of seven types of emotions;Angry,Disgust,Fear,Happy,Neutral,Pleasant-surprise,and Sad were used in this study.Energy,Fundamental frequency,and Mel Frequency Cepstral Coefficient(MFCC)have been used to extract the emotion features,and these features resulted in 97.5%accuracy in the mixed LSTM+CNN model.This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech.It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing.

关 键 词:Emotion recognition Savitzky Golay fundamental frequency MFCC neural networks 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象