检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:幸梦阳 马延周 杨政 XING Mengyang;MA Yanzhou;YANG Zheng(Strategic Support Force Information Engineering University Luoyang Campus,Luoyang Henan 471003)
机构地区:[1]战略支援部队信息工程大学洛阳校区,河南洛阳471003
出 处:《软件》2022年第5期85-87,共3页Software
摘 要:随着神经网络的迅速发展,语音翻译研究开始了端到端方向的尝试。而训练一个性能良好的语音翻译模型往往需要一定规模和质量的语音语料库,在俄汉语音翻译领域也是如此。由于语音翻译研究起步较晚,经常面临着缺乏可公开获取的高质量的语音语料库问题,因此自主构建语音语料库以满足神经网络的训练需求显得十分重要。本文在综合衡量了构建语音语料库成本和质量的基础上,通过在公开可获取的字幕网站中人工挑选了70小时的俄汉影视作品,经过制定规范、加工处理和人工评价三个环节,最终成功构建了小规模的俄汉语音语料库,证明了此种方法的可行性,为端到端语音翻译研究提供了数据基础。With the rapid development of neural network,the research of speech translation has begun an end-to-end attempt.Training a good speech translation model often requires a certain size and quality of speech translation corpus,and is also true in the field of Russian Chinese speech translation.Due to the late start of speech translation research,it is often faced with the problem of lack of publicly available high-quality speech translation corpus.Therefore,it is very important to independently construct speech translation corpus to meet the training needs of neural network.Based on the comprehensive measurement of the cost and quality of constructing the speech translation corpus,this paper manually selects 70 hours of Russian and Chinese film and television works from the publicly available subtitle website,and finally successfully constructs a small-scale Russian and Chinese speech translation corpus through three links:Formulation of norms,processing and manual evaluation,which proves the feasibility of this method,it provides a data base for the research of end-to-end speech translation.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49