基于trigram语体特征分类的语言模型自适应方法  被引量:6

Language Model Adaptation Based on the Classification of a Trigram's Language Style Feature

在线阅读下载全文

作  者:梁奇[1] 郑方[1] 徐明星[1] 吴文虎[1] 

机构地区:[1]清华大学计算机科学与技术系智能技术与系统国家重点实验室语音技术中心,北京100084

出  处:《中文信息学报》2006年第4期68-74,共7页Journal of Chinese Information Processing

摘  要:本文从书面语和口语存在的差异出发,提出了语言模型的语体自适应方法。自适应采用了几种不同的计数意义上的插值算法。考虑Katz平滑的插值算法根据trigram单元的可信度来分配权值。基于trigram语体特征分类的自适应算法根据trigram单元的语体特征倾向动态分配权值,并选取了几种不同的权值生成函数。对口语语料做音转字的实验证明,使用这几种自适应算法可以让基准模型的性能有不同程度的提高,其中综合考虑单元可信度和特征倾向的算法效果最好,相对于本文的两个基准的汉字错误率下降率分别达到了50.2%和23.7%。In this paper, a language style based adaptive method for language model is proposed based on the differences between oral and written languages. Several interpolation methods based on trigram counts are used for the adaptation. An interpolation method considering Katz smoothing computes weights according to the confidence score of a trigram. An adaptation method based on the classification of a trigram' s style feature computes weights dynamically according to the trigram' s language style tendency with several weight generation functions proposed. Experiments on spoken Chinese corpora show that these methods could reduce the Chinese character error rate for pinyin-to-character conversion to some extent, more or less, and the one considering both a trigram' s confidence and style tendency achieved the best performance with character error rate reduction of 50. 2% and 23.7%, respectively, compared with two baselines in this paper.

关 键 词:计算机应用 中文信息处理 统计语言模型 trigram 自适应 语体 插值算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象