基于交叉覆盖算法的中文分词  被引量:4

Chinese word segment based on alternative covering algorithm

在线阅读下载全文

作  者:刘政怡[1,2] 吴建国[1,2] 李炜[1,2] 

机构地区:[1]安徽大学计算机科学与技术学院,安徽合肥230039 [2]安徽大学计算智能与信号处理教育部重点实验室,安徽合肥230039

出  处:《计算机工程与设计》2010年第6期1355-1357,1361,共4页Computer Engineering and Design

基  金:国家自然科学基金项目(60773114);安徽省教育厅重点科研基金项目(2006kj013A);安徽大学人才队伍建设基金项目(02203105)

摘  要:中文分词是自然语言处理的前提和基础,利用分类效果较好的交叉覆盖算法实现中文分词。将中文分词想象成字的分类过程,把字放入向前向后相邻两个字这样一个语境下判断该字所属的类别,是自己独立,或是跟前一字结合,或是跟后一字结合,或是跟前后的字结合。对人民日报熟语料库进行训练,不需要词典,可以较好地解决中文分词中的交叉歧义问题,分词正确率达90.6%。Chinese word segment is very important in natural language processing.Chinese word segment is regards as classified process of character.The character is put in the linguistic environment which covers four characters around it.Every character belongs to one of such four categories as independent existence, existence connecting with the character before, existence connecting with the character after and existence connecting with the character before and after.The category of every character is judged by using alternative covering algorithm which has good classification effect.This method carries on statistics in a large annotated corpus and does not need the dictionary.It has a good solution to overlapping ambiguity and achieves 90.6% accuracy.

关 键 词:中文分词 覆盖 交叉覆盖算法 互信息 交叉歧义 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象