基于N-gram语言模型的哈萨克文机构名识别  被引量:2

Altenbek,Mayra.Hapar.Kazakh organization name recognition based on N-gram model

在线阅读下载全文

作  者:冯鲸华[1] 古丽拉.阿东别克 玛依来.哈帕尔 

机构地区:[1]新疆大学信息科学与工程学院,乌鲁木齐830046

出  处:《计算机工程与应用》2010年第31期135-138,共4页Computer Engineering and Applications

基  金:国家自然科学基金No.60763005;国家教育部;国家语委民族语言文字规范标准建设及信息化科研项目(No.MZ115-92)~~

摘  要:针对哈萨克文文本中机构名构成特点,提出了一种基于N-gram语言模型的哈萨克文机构名可信度计算方法,并以机构名尾词为触发词,构建了一个哈萨克文机构名识别系统。系统分为训练和识别两个模块,识别过程是:首先从训练语料中提取特征进行训练,得到一个特征训练模型,然后利用训练好的特征模型及少量的附加规则,对测试文本中的机构名进行识别,实验结果表明该方法可行。Aiming at the characters of Kazakh organization name' composition in Kazakh text,an effective method based on N-gram model for computing Kazakh organization name' confidence is proposed.Using the tail words of Kazakh organization name as the burst words, this paper constructs a recognition system for Kazakh organization name.The system consists of a training module and a recognizing module.The recognition process is as follows:At first, features are extracted from the training corpus, and they are trained.A model is established,which has been trained by some features.Then, this model and some simple rule-bases are used to recognize Kazakh organization name in the testing corpus.The experimental results show that this method is feasible.

关 键 词:N—gram语言模型 哈萨克文机构名识别 实体名识别 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象