N-gram模型综述  被引量:22

Survey on N-gram Model

在线阅读下载全文

作  者:尹陈 吴敏[1] YIN Chen;WU Min(School of Software Engineering,University of Science and Technology of China,Hefei 230051,China)

机构地区:[1]中国科学技术大学软件学院,合肥230051

出  处:《计算机系统应用》2018年第10期33-38,共6页Computer Systems & Applications

摘  要:N-gram模型是自然语言处理中最常用的语言模型之一,广泛应用于语音识别、手写识别、拼写纠错、机器翻译和搜索引擎等众多任务.但是N-gram模型在训练和应用时经常会出现零概率问题,导致无法获得良好的语言模型,因此出现了拉普拉斯平滑、卡茨回退和Kneser-Ney平滑等平滑方法.在介绍了这些平滑方法的基本原理后,使用困惑度作为度量标准去比较了基于这几种平滑方法所训练出的语言模型.The N-gram model is one of the most commonly used language models in natural language processing and is widely used in many tasks such as speech recognition, handwriting recognition, spelling correction, machine translation and search engines. However, the N-gram model often presents zero-probability problems in training and application,resulting in failure to obtain a good language model. As a result, smoothing methods such as Laplace smoothing, Katz back-off, and Kneser-Ney smoothing appeared. After introducing the basic principles of these smoothing methods, we use the perplexity as a metric to compare the language models trained based on these types of smoothing methods.

关 键 词:N-GRAM模型 拉普拉斯平滑 卡茨回退 Kneser-Ney平滑 困惑度 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象