基于词的分布式实值表示的汉语基本块识别被引量：4

Identification of Chinese Base Chunk Based on Real-Valued Word Distributed Representations

机构地区：[1]山西大学数学科学学院,山西太原030006 [2]山西大学计算机中心,山西太原030006

出　　处：《中北大学学报（自然科学版）》2013年第5期582-585,共4页Journal of North University of China(Natural Science Edition)

基　　金：国家自然科学基金资助项目(60873128)

摘　　要：基于神经语言模型生成汉语词语的实值向量表示,称为词语的分布式表示,相应地以这种分布式表示构造的词特征称为分布式词特征.将这种分布式词特征替换基本块识别任务中所常用的条件随机场模型中的词特征,在清华大学TCT语料上进行了汉语基本块识别任务实验,结果表明:在仅使用词窗口[-2,2]的词特征的模型中,和使用词窗口[-2,2]+词性特征的模型中,采用分布式词特征比传统的词特征的模型的标记精度分别高38.01%,1.86%,说明词语的分布式表示对汉语基本块识别任务是有作用的.A real - valued vector representation of Chinese words based on neural language model is called dis- tributed representation of words, and the corresponding word feature is also called distributed word feature. The experiments used distributed word feature replacing traditional word feature for the identification of Chi- nese Base Chunks were carried out based on the conditional random field model on the TCT corpus of Tsing- hua University. The results show that the marking precision using distributed word features improves 38.01% than the traditional word feature model only using sliding- window word features of size [ -2,21 and 1.86 % than using sliding - window word features of size [ - 2,21 ＋ part-of-speech feature, respectively. This indicates that the distributed representation of Chinese words is available for the identification task of Chinese base chunk.

关键词：神经语言模型分布式词特征基本块分析边界识别

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词的分布式实值表示的汉语基本块识别被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词的分布式实值表示的汉语基本块识别 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于词的分布式实值表示的汉语基本块识别被引量：4