特定领域的汉语语言模型平滑算法比较研究被引量：5

Comparative Study on Smoothing Algorithms for Domain-Specific Chinese Language Models

出　　处：《计算机工程与应用》2006年第32期14-16,共3页Computer Engineering and Applications

基　　金：国家自然科学基金资助项目(编号:60535030)。

摘　　要：为了完成特定领域的语音识别任务,利用有限的语料建立高性能的语言模型成为提高系统性能的关键。针对此问题,对特定领域的语言模型进行了研究。提出了利用高频新词来加强模型的领域特征的方法,采取了两种方案:一种是将高频新词直接加入原有字典,并在训练过程中增加这些新词的权重,使模型更能表达与领域相关的特征;一种是基于高频新词统计出一个和领域相关的小词表,并对这两种方案进行了比较研究。通过实验研究了适合汉语语言的平滑策略。最后,实验结果表明,对于特定领域问题,语言模型平滑算法对模型性能影响较大;采用适合汉语的Witten-Bell插值平滑,可以使识别率达到88.4%,比通用模型性能相对提高了18.18%。It is important to build a powerful language model by using limited corpora in the field of speech recognition for a specific domain.To deal with this problem,two methods concerning how to process new words with high frequencies in a specific domain are presented.One way is to add the new words to the dictionary directly and then give them a high weight in the procedure of training.The other is to work out a new dictionary according to the new words. And based on some comparative experiments,these two methods and various smoothing algorithms are studied in detail. At last,it can be concluded that the performance of language model is affected by the smoothing algorithm greatly,and the Witten-Bell interpolation method could improve the recognition rate to 88.4%,which is 18.18% higher than the general language model.

关键词：语言模型特定领域语音识别平滑字典

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

特定领域的汉语语言模型平滑算法比较研究被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

特定领域的汉语语言模型平滑算法比较研究 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

特定领域的汉语语言模型平滑算法比较研究被引量：5