特定领域的汉语语言模型平滑算法比较研究  被引量:5

Comparative Study on Smoothing Algorithms for Domain-Specific Chinese Language Models

在线阅读下载全文

作  者:杨琳[1] 张建平[1] 颜永红[1] 

机构地区:[1]中科院声学所中科信利语音实验室,北京100080

出  处:《计算机工程与应用》2006年第32期14-16,共3页Computer Engineering and Applications

基  金:国家自然科学基金资助项目(编号:60535030)。

摘  要:为了完成特定领域的语音识别任务,利用有限的语料建立高性能的语言模型成为提高系统性能的关键。针对此问题,对特定领域的语言模型进行了研究。提出了利用高频新词来加强模型的领域特征的方法,采取了两种方案:一种是将高频新词直接加入原有字典,并在训练过程中增加这些新词的权重,使模型更能表达与领域相关的特征;一种是基于高频新词统计出一个和领域相关的小词表,并对这两种方案进行了比较研究。通过实验研究了适合汉语语言的平滑策略。最后,实验结果表明,对于特定领域问题,语言模型平滑算法对模型性能影响较大;采用适合汉语的Witten-Bell插值平滑,可以使识别率达到88.4%,比通用模型性能相对提高了18.18%。It is important to build a powerful language model by using limited corpora in the field of speech recognition for a specific domain.To deal with this problem,two methods concerning how to process new words with high frequencies in a specific domain are presented.One way is to add the new words to the dictionary directly and then give them a high weight in the procedure of training.The other is to work out a new dictionary according to the new words. And based on some comparative experiments,these two methods and various smoothing algorithms are studied in detail. At last,it can be concluded that the performance of language model is affected by the smoothing algorithm greatly,and the Witten-Bell interpolation method could improve the recognition rate to 88.4%,which is 18.18% higher than the general language model.

关 键 词:语言模型 特定领域 语音识别 平滑 字典 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象