基于预训练与音素字节对编码的越南语识别  被引量:2

Vietnamese Speech Recognition Based on Pre-training and Phone-Based Byte-Pair Encoding

在线阅读下载全文

作  者:沈之杰 郭武[1] SHEN Zhijie;GUO Wu(Department of Electronic Engineering&Information Science,University of Science and Technology of China,Hefei 230027,China)

机构地区:[1]中国科学技术大学电子工程与信息科学系,合肥230027

出  处:《数据采集与处理》2023年第1期101-110,共10页Journal of Data Acquisition and Processing

基  金:国家自然科学基金(U1836219)。

摘  要:基于无监督预训练技术的wav2vec 2.0在许多低资源语种上获得了良好的性能,成为研究的热点。本文在预训练模型的基础上进行越南语连续语音识别。将语音学信息引入到基于链接时序分类代价函数(Connectionist temporal classification,CTC)的声学建模中,选取音素与含位置信息的音素作为基础单元。为了平衡建模单元数目以及模型的精细程度,采用字节对编码(Byte-pair encoding,BPE)算法生成音素子词,将上下文信息结合到声学建模过程。实验在美国NIST的BABEL任务低资源的越南语开发集上进行,所提算法相对wav2vec 2.0基线系统有明显改进,识别词错误率由37.3%降低到29.4%。Based on the unsupervised pre-training technology,wav2vec 2.0 has become a research hotspot for the state of the art performance in many low-resource languages.In this paper,the Vietnamese continuous speech recognition is carried out on the basis of the pre-trained model.The phonetics information is integrated into the connectionist temporal classification(CTC)loss function based acoustic modeling,and the phones and the position dependent phones are selected as the basic modeling units.To balance the number of modeling units and the refinement of the model,a byte-pair encoding(BPE)algorithm is used to generate phone based subwords,and the contextual information is integrated into the acoustic modeling process.Experiments are carried out on the low-resource Vietnamese development set of NIST’s BABEL task,and the proposed algorithm significantly improves the wav2vec 2.0 baseline system.The word error rate is reduced from 37.3%to 29.4%.

关 键 词:低资源语音识别 建模单元 字节对编码 音素子词 预训练 越南语识别 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象