检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]安徽大学计算机科学与技术学院,安徽合肥230039 [2]安徽大学计算智能与信号处理教育部重点实验室,安徽合肥230039
出 处:《计算机工程与设计》2010年第6期1355-1357,1361,共4页Computer Engineering and Design
基 金:国家自然科学基金项目(60773114);安徽省教育厅重点科研基金项目(2006kj013A);安徽大学人才队伍建设基金项目(02203105)
摘 要:中文分词是自然语言处理的前提和基础,利用分类效果较好的交叉覆盖算法实现中文分词。将中文分词想象成字的分类过程,把字放入向前向后相邻两个字这样一个语境下判断该字所属的类别,是自己独立,或是跟前一字结合,或是跟后一字结合,或是跟前后的字结合。对人民日报熟语料库进行训练,不需要词典,可以较好地解决中文分词中的交叉歧义问题,分词正确率达90.6%。Chinese word segment is very important in natural language processing.Chinese word segment is regards as classified process of character.The character is put in the linguistic environment which covers four characters around it.Every character belongs to one of such four categories as independent existence, existence connecting with the character before, existence connecting with the character after and existence connecting with the character before and after.The category of every character is judged by using alternative covering algorithm which has good classification effect.This method carries on statistics in a large annotated corpus and does not need the dictionary.It has a good solution to overlapping ambiguity and achieves 90.6% accuracy.
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7