基于双层条件随机场的汉语词性标注方法研究  被引量:2

The Quantitative Analysis of the Context Effective Range in Chinese Word Segmentation Based on Word Boundary Tagging

在线阅读下载全文

作  者:王艺帆[1] 王希杰[2] WANG Yi - fan WANG Xi - jie(School of Electronic Information and Communications,Huazhong University of Science and Technology, Wuhan 430000, China School of Computer and Information Engineering, Anyang Normal University, Anyang 455002, China)

机构地区:[1]华中科技大学电子信息与通信学院,湖北武汉430000 [2]安阳师范学院计算机与信息工程学院,河南安阳455000

出  处:《安阳师范学院学报》2016年第5期87-91,共5页Journal of Anyang Normal University

基  金:国家自然科学基金项目(60663004);河南省高等学校青年骨干教师项目(2009GGJS-108)

摘  要:针对汉语词性标注中词性类别划分较细、类别较多的问题,提出一种利用双层条件随机场进行汉语词性标注的方法,该方法将汉语词性标注分为两个阶段,每个阶段采用一层条件随机场建模实现。第一阶段底层条件随机场根据上下文产生每个词语的词性粗分结果;第二阶段高层条件随机场将词语及其粗分结果作为上下文特征对每个词语的词性进一步细分,产生最终词性标记。利用CRF++0.53工具包,在国际汉语分词评测Bakeoff2007(国际汉语分词评测)的NCC和CTB语料上进行了实验,结果表明该方法可行且可以获得较好的标注结果。Chinese part-of-speech tagging often has the problem of too many well defined lexical catalogs. To improve this problem,the paper proposes a Chinese part-of-speech tagging method based on Dual-Lay-er conditional random fields.The approach divides the tagging procedure into two stages,each of which uses single-lyer conditional random fields to complete modeling.The first stage using context achieves coarse -grained part-of-speech tagging of each word.Taken the coarse-grained result as features,the second stage further produces sequences of fine-grained part-of-speech tags.Closed evaluations are performed on NCC and CTB corpus from the Bakeoff-2007 ,and comparative experiments are performed on different feature tem-plates.Experimental results show that this approach can obtain better pos tagging set.

关 键 词:汉语词性标注 双层条件随机场 上下文特征 特征模板 词性粗分结果 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象