检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]大连理工大学计算机科学与技术学院
出 处:《计算机研究与发展》2010年第5期962-968,共7页Journal of Computer Research and Development
基 金:国家"八六三"高技术研究发展计划基金项目(2006AA012140)~~
摘 要:提出了基于子词的双层CRFs(conditional random fields)中文分词方法,旨在解决中文分词中切分歧义与未登录词的问题.该方法是建立在基于子词的序列标注模型上.方法第1层利用基于字CRFs模型来识别待测语料中的子词,这样做是为了减少子词的跨越标记错误和增加子词识别的精确率;第2层利用CRFs模型学习基于子词的序列标注,对第1层的输出进行测试,进而得到分词结果.在2006年SIGHAN Bakeoff的中文简体语料上进行了测试,包括UPUC和MSRA语料,分别在F值上达到了93.3%和96.1%的精度.实验表明,基于子词的双层CRFs模型能够更加有效地利用子词来提高中文分词的精度.A subword based dual-layer CRFs(conditional random fields) method for Chinese word segmentation is proposed,which aims to solve the problem of word segmentation disambiguation and unknown words recognition.Previous work in CRFs reported that the subword-based tagging outperforms the character-based tagging in all comparative experiments.However,subwords-based tagging often produces errors of cross word boundaries.This method is established on sequence labeling methods based on subwords,which are selected with a subword filtering algorithm.The learning process is divided into two:one for learning the first layer subword tagging CRF with character-based tagging,and the other for learning the second layer word tagging CRF with subword-based tagging.In word sequence labeling process,the first layer uses subword tagging CRFs model to recognize the subwords in testing corpora for reducing error rate generated by label spanning,and the second layer is used to subword-based sequence labeling and then to test the output of first layer to get the final result.The proposed method is evaluated using test data from SIGHAN Bakeoff 2006.F-score of 93.3% and 96.1% are achieved respectively in UPUC corpora and MSRA corpora.The experimental results show that this method can gain state-of-the-art performance on Chinese word segmentation.
关 键 词:中文分词 条件随机场 双层条件随机场 子词 子词过滤
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.137.177.255