检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:沈之杰 郭武[1] SHEN Zhijie;GUO Wu(Department of Electronic Engineering&Information Science,University of Science and Technology of China,Hefei 230027,China)
机构地区:[1]中国科学技术大学电子工程与信息科学系,合肥230027
出 处:《数据采集与处理》2023年第1期101-110,共10页Journal of Data Acquisition and Processing
基 金:国家自然科学基金(U1836219)。
摘 要:基于无监督预训练技术的wav2vec 2.0在许多低资源语种上获得了良好的性能,成为研究的热点。本文在预训练模型的基础上进行越南语连续语音识别。将语音学信息引入到基于链接时序分类代价函数(Connectionist temporal classification,CTC)的声学建模中,选取音素与含位置信息的音素作为基础单元。为了平衡建模单元数目以及模型的精细程度,采用字节对编码(Byte-pair encoding,BPE)算法生成音素子词,将上下文信息结合到声学建模过程。实验在美国NIST的BABEL任务低资源的越南语开发集上进行,所提算法相对wav2vec 2.0基线系统有明显改进,识别词错误率由37.3%降低到29.4%。Based on the unsupervised pre-training technology,wav2vec 2.0 has become a research hotspot for the state of the art performance in many low-resource languages.In this paper,the Vietnamese continuous speech recognition is carried out on the basis of the pre-trained model.The phonetics information is integrated into the connectionist temporal classification(CTC)loss function based acoustic modeling,and the phones and the position dependent phones are selected as the basic modeling units.To balance the number of modeling units and the refinement of the model,a byte-pair encoding(BPE)algorithm is used to generate phone based subwords,and the contextual information is integrated into the acoustic modeling process.Experiments are carried out on the low-resource Vietnamese development set of NIST’s BABEL task,and the proposed algorithm significantly improves the wav2vec 2.0 baseline system.The word error rate is reduced from 37.3%to 29.4%.
关 键 词:低资源语音识别 建模单元 字节对编码 音素子词 预训练 越南语识别
分 类 号:TN912.34[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3