中文分词交叉型歧义消解算法被引量：2

Resolution Algorithm of Cross Ambiguity in Chinese Word Segmentation

作　　者：甘蓉[1] GAN Rong(School of Automotive Engineering,Shanxi Polytechnic Institute,Xianyang 712000 China)

机构地区：[1]陕西工业职业技术学院汽车工程学院,陕西咸阳712000

出　　处：《西华大学学报（自然科学版）》2018年第6期32-36,共5页Journal of Xihua University:Natural Science Edition

摘　　要：中文分词是自然语言处理的基础。交叉型歧义是提高中文分词精度的瓶颈之一。文章提出一种基于正向、负向最大匹配算法和passive aggressive(PA)算法结合的交叉型歧义消解算法。基于PA算法训练分词模型;利用正向、负向最大匹配算法检测交叉型歧义的位置;把可能出现交叉型歧义的句子或者句子的部分传递给分词模型,解码得到分词结果;最后,把正向、负向最大匹配结果和分词模型解码结果拼接成最终的分词结果。利用PA算法基于2014年2—12月份人民日报数据训练分词模型、2014年1月份人民日报数据作为测试语料进行实验,得到交叉型歧义的准确率、召回率和F-score分别为98. 32%、98. 14%和98. 23%,说明该方法有效可行。Chinese word segmentation is the foundation of natural language processing, and cross ambiguity is one of the bottlenecks to improve the accuracy of Chinese word segmentation. This paper proposes a method combining max- imunl matching algorithm and passive aggressive （ PA ） algorithm to eliminate cross ambiguity. Firstly, segmentation model was trained based on PA. Secondly, we checked the position of cross ambiguity based on forward maxinmnl matching algorithm and negative maximum matching algorithm. Thirdly, the position of cross ambiguity and the context were submitted to the segmentation model, and they were decoded. Lastly, the final result was obtained. The experi- ment results on Renmin Daily 2014 show flint the precision, recall and F - score of cross ambiguity are 98.32% ,98. 14% and 98.23% respectively.

关键词：中文分词交叉型歧义最大匹配算法 PA算法

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文分词交叉型歧义消解算法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文分词交叉型歧义消解算法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

中文分词交叉型歧义消解算法被引量：2