基于三词位的字标注汉语分词  被引量:1

Three Word-positions-based Tagging for Chinese Word Segmentation

在线阅读下载全文

作  者:王希杰[1] 黄勇杰[1] 

机构地区:[1]安阳师范学院计算机与信息工程学院,河南安阳455000

出  处:《安阳师范学院学报》2013年第5期49-52,共4页Journal of Anyang Normal University

摘  要:借助于统计语言模型将汉语分词转换为字序列标注并实现汉语分词已经成为近年来汉语分词的主流方法,但统计语言模型训练时间较长一直是这一方法中的最大问题。提出了一种基于三词位的字标注汉语分词方法,并在bakeoff2005提供的语料上进行了对比实验,结果表明该方法可以取得接近四词位字标注分词方法的性能,但在模型的训练时间上明显优于四词位标注方法。In recent years, it has been the mainstream method that treates Chinese word segmentation as a se-quence data tagging problem with the help of statistical language mode. But the biggest problem is that the training time of the model is too long. A method based on three word - positions tagging is proposed for Chi-nese word segmentation, and comparative experiments are performed on corpus from the second international Chinese word segmentation Bakeoff-2005. Experimental results show that the method could get the closer performance of Chinese word segmentation which using the four word - positions tagging could get, but the training time is significantly reduced.

关 键 词:汉语分词 三词位 条件随机场 特征模板 上下文窗口 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象