中文合成词识别及分词修正被引量：4

Chinese compound-word recognition and word segmentation modification

机构地区：[1]华南理工大学计算机科学与工程学院,广州510640 [2]五邑大学计算机学院,广东江门529020

出　　处：《计算机应用研究》2011年第8期2905-2908,共4页Application Research of Computers

基　　金：广东省自然科学基金资助项目(9451064101003233);广东省科技计划资助项目(2010B010600039);华南理工大学中央高校基本科研业务费专项资金资助项目(2009ZM0125;2009ZM0189;2009ZM0255)

摘　　要：提出一种中文合成词识别及分词修正方法。该方法先采用词性探测从文本中提取词串,进而由提取到的词串生成词共现有向图,借鉴Bellman-Ford算法思想,设计了运行在词共现有向图中识别合成词的算法,即搜索多源点长度最长、权重值满足给定条件的路径,则该路径所对应的词串为合成词。最后,采用核心属性渗透理论对合成词标注词性,同时修正分词结果。实验结果表明,合成词识别正确率达到了91.60%,且分词修正效果良好。This paper proposed a Chinese compound-word recognition and word segmentation modification method.Firstly,the method got word strings from a text through by part-of-speech detecting,then generated word co-occurrence directed graph,borrowed the idea of the Bellman-Ford algorithm to search the longest paths with weight value satisfies the given condition for multiple starting points in the word co-occurrence directed graph,the word strings corresponding to the paths are considered as compound-words.Lastly,part-of-speech tagged of compound-words by head-feature percolation,and modified word segmentation results.Experimental results show that the proposed method achieves 91.16% upon the precision,and word segmentation modification achieving very good performance.

关键词：合成词词共现有向图词性标注分词修正自然语言处理

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文合成词识别及分词修正被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文合成词识别及分词修正 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

中文合成词识别及分词修正被引量：4