检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王琼 旷文珍[2] 许丽[1] Wang Qiong;Kuang Wenzhen;Xu Li(College of Automation and Electrical Engineering,Lanzhou Jiaotong University,Lanzhou 730070,Gansu,China;Gansu Research Center of Automation Engineering Technology for Industry&Transportation,Research Institute,Lanzhou Jiaotong University,Lanzhou 730070,Gansu,China)
机构地区:[1]兰州交通大学自动化与电气工程学院,甘肃兰州730070 [2]兰州交通大学研究院甘肃工业交通自动化工程技术研究中心,甘肃兰州730070
出 处:《计算机应用与软件》2021年第10期310-315,320,共7页Computer Applications and Software
基 金:中国铁路总公司科技研究开发计划重点项目(2016X003-H);甘肃省工业交通自动化工程技术研究中心2019年开放基金项目(GSITA201904)。
摘 要:针对语音识别引擎识别后文本容易发生散串错误和同音字错误,提出一种基于改进的N-gram模型和专业术语查错知识库的查错算法。采用Witten-Bell平滑算法解决N-gram模型训练过程中数据稀疏问题,并对N-gram模型增加权重分配,增强模型对散串错误的查错率。针对铁路特殊用语规定和同音字错误,构建一种适应关键字的专业术语查错知识库,实现知识库的自动更新。经过实验对比,该算法查错确率为87.9%,相比通用的N-gram查错模型提高52.8百分点。该算法的提出为后续的纠错以及语音识别准确率的提高奠定了基础,并对铁路车务系统语音识别技术的应用具有重要意义。For the text recognized by the speech recognition engine,it is easy to make the errors of scattered string and homophone.Aiming at the type of errors,an algorithm combining improved N-gram model and professional terminology error-detecting knowledge bases is proposed.The Witten-Bell smoothing algorithm was used to solve the data sparsity problem in the N-gram model training process,and the weight distribution was added to the N-gram model,which enhanced the error-detecting rate of the model for the scattered string errors.Aiming at the railway special term regulations and homophone errors,professional terminology error-detecting knowledge bases adapted to keywords was constructed to realize automatic update of the knowledge bases.After experimental comparison,the error-detecting rate of this algorithm is 87.9%,which is 52.8 percentage points higher than the general N-gram error-detecting model.The algorithm provides a basis for subsequent error-correction and the improvement of speech recognition accuracy,and it is of great significance for the application of speech recognition technology in the railway train operation system.
关 键 词:N-GRAM模型 铁路车务标准用语 散串错误 专业术语查错知识库 同音字错误
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.59.1.209