检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机工程与设计》2013年第2期654-659,共6页Computer Engineering and Design
基 金:国家自然科学基金项目(60970083);模式识别国家重点实验室开放课题基金项目;河南省科技创新人才杰出青年基金项目(104100510026)
摘 要:词的兼类问题是汉语词性标注中的关键问题之一。针对常用词的兼类识别进行研究,综合考虑了影响兼类词识别的不同特征,分别使用条件随机场模型、最大熵模型和k最近邻等统计方法,根据兼类词本身的特点以及其在上下文句子中的关系,同时针对不同的方法采用词语信息、词性信息等不同的特征模板分别对训练语料进行特征抽取,并取得了较好的实验结果;对一些识别结果不够理想的词又尝试了规则的方法,构建兼类词的规则,不断进行测试,改进规则库,在相同的条件下,得到了优于统计方法的实验结果。The problem of multiple syntactic category words is one of the key issues in part of speech tagging of Chinese. The reconginition on syntactic category of common words is mainly researched and the different characteristics is considered, which impact the recognition of multi category word. Firstly, three methods attempted, which are conditional random fields, Maxi mum Entropy and knearest neighbor method, and have achieved good results are obtained. According to the characteristics of the multicategory words and their relations in the context of the sentence, for the different methods, such as word information and part of speech information will be used as templates to extract features for the training data. The rule method also is tried to deal with some words, which recognition results are not ideal and the rules for the multicategory words are constructed, and by constantly testing to the rule base is improved. In the same conditions, it has been better than the results of statistical methods.
关 键 词:中文信息处理 兼类词 条件随机场 最大熵 K近邻
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3