基于改进的N-gram模型和知识库的文本查错算法被引量：9

THE ALGORITHM OF TEXT ERROR-DETECTING BASED ON IMPROVED N-GRAM MODEL AND KNOWLEDGE BASES

作　　者：王琼旷文珍[2] 许丽[1] Wang Qiong;Kuang Wenzhen;Xu Li(College of Automation and Electrical Engineering,Lanzhou Jiaotong University,Lanzhou 730070,Gansu,China;Gansu Research Center of Automation Engineering Technology for Industry&Transportation,Research Institute,Lanzhou Jiaotong University,Lanzhou 730070,Gansu,China)

机构地区：[1]兰州交通大学自动化与电气工程学院,甘肃兰州730070 [2]兰州交通大学研究院甘肃工业交通自动化工程技术研究中心,甘肃兰州730070

出　　处：《计算机应用与软件》2021年第10期310-315,320,共7页Computer Applications and Software

基　　金：中国铁路总公司科技研究开发计划重点项目(2016X003-H);甘肃省工业交通自动化工程技术研究中心2019年开放基金项目(GSITA201904)。

摘　　要：针对语音识别引擎识别后文本容易发生散串错误和同音字错误,提出一种基于改进的N-gram模型和专业术语查错知识库的查错算法。采用Witten-Bell平滑算法解决N-gram模型训练过程中数据稀疏问题,并对N-gram模型增加权重分配,增强模型对散串错误的查错率。针对铁路特殊用语规定和同音字错误,构建一种适应关键字的专业术语查错知识库,实现知识库的自动更新。经过实验对比,该算法查错确率为87.9%,相比通用的N-gram查错模型提高52.8百分点。该算法的提出为后续的纠错以及语音识别准确率的提高奠定了基础,并对铁路车务系统语音识别技术的应用具有重要意义。For the text recognized by the speech recognition engine,it is easy to make the errors of scattered string and homophone.Aiming at the type of errors,an algorithm combining improved N-gram model and professional terminology error-detecting knowledge bases is proposed.The Witten-Bell smoothing algorithm was used to solve the data sparsity problem in the N-gram model training process,and the weight distribution was added to the N-gram model,which enhanced the error-detecting rate of the model for the scattered string errors.Aiming at the railway special term regulations and homophone errors,professional terminology error-detecting knowledge bases adapted to keywords was constructed to realize automatic update of the knowledge bases.After experimental comparison,the error-detecting rate of this algorithm is 87.9%,which is 52.8 percentage points higher than the general N-gram error-detecting model.The algorithm provides a basis for subsequent error-correction and the improvement of speech recognition accuracy,and it is of great significance for the application of speech recognition technology in the railway train operation system.

关键词：N-GRAM模型铁路车务标准用语散串错误专业术语查错知识库同音字错误

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进的N-gram模型和知识库的文本查错算法被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进的N-gram模型和知识库的文本查错算法 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于改进的N-gram模型和知识库的文本查错算法被引量：9