机构地区:[1]College of Information Science and Engineering,Xinjiang University Urumqi 830046 [2]Department of Physics,Changji University Changji 831100
出 处:《Chinese Journal of Acoustics》2020年第1期117-132,共16页声学学报(英文版)
基 金:supported by the the Xinjiang Uygur Autonomous Region Key Laboratory Project(2015KL013);the National Key Basic Research and Development Program(973 Program)Sub-topics(2014CB340506,213-61590);the National Natural Science Foundation of China(61433012,U1435215,U1603262)。
摘 要:Based on the actual needs of speech application research such as speech recognition and voiceprint recognition,the acoustic characteristics and recognition of Hotan dialect were studied for the first time.Firstly,the Hetian dialect voice was selected for artificial multi-level annotation,and the formant,duration and intensity of the vowel were analyzed to describe statistically the main pattern of Hetian dialect and the pronunciation characteristics of male and female.Then using the analysis of variance and nonparametric analysis to test the formant samples of the three dialects of Uygur language,the results show that there are significant differences in the formant distribution patterns of male vowels,female vowels and whole vowels in the three dialects.Finally,the GUM-UBM(Gaussian Mixture Model-Universal Background Model),DNN-UBM(Deep Neural Networks-Universal Background Model)and LSTM-UBM(Long Short Term Memory Network-Universal Background Model)Uyghur dialect recognition models are constructed respectively.Based on the Mel-frequency cepstrum coefficient and its combination with the formant frequency for the input feature extraction,the contrastive experiment of dialect i-vector distinctiveness is carried out.The experimental results show that the combined features of the formant coefficients can increase the recognition of the dialect,and the LSTM-UBM model can extract more discriminative dialects than the GMM-UBM and DNN-UBM.Based on the actual needs of speech application research such as speech recognition and voiceprint recognition,the acoustic characteristics and recognition of Hotan dialect were studied for the first time.Firstly,the Hetian dialect voice was selected for artificial multi-level annotation,and the formant,duration and intensity of the vowel were analyzed to describe statistically the main pattern of Hetian dialect and the pronunciation characteristics of male and female.Then using the analysis of variance and nonparametric analysis to test the formant samples of the three dialects of Uygur language,the results show that there are significant differences in the formant distribution patterns of male vowels,female vowels and whole vowels in the three dialects.Finally,the GUM-UBM(Gaussian Mixture Model-Universal Background Model),DNN-UBM(Deep Neural Networks-Universal Background Model) and LSTM-UBM(Long Short Term Memory Network-Universal Background Model) Uyghur dialect recognition models are constructed respectively.Based on the Mel-frequency cepstrum coefficient and its combination with the formant frequency for the input feature extraction,the contrastive experiment of dialect i-vector distinctiveness is carried out.The experimental results show that the combined features of the formant coefficients can increase the recognition of the dialect,and the LSTM-UBM model can extract more discriminative dialects than the GMM-UBM and DNN-UBM.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...