基于带权词格的循环神经网络句子语义表示建模  被引量:2

Weighted Lattice Based Recurrent Neural Networks for Sentence Semantic Representation Modeling

在线阅读下载全文

作  者:张祥文 陆紫耀 杨静[1] 林倩[1] 卢宇 王鸿吉 苏劲松[1,2] Zhang Xiangwen;Lu Ziyao;Yang Jing;Lin Qian;Lu Yu;Wang Hongji;Su Jinsong(Xiamen University,Xiamen,Fujian 361000;Jiangsu Provincial Key Laboratory for Computer Information Processing Technology(Soochow University),Suzhou,Jiangsu 215006)

机构地区:[1]厦门大学,福建厦门361000 [2]江苏省计算机信息处理技术重点实验室(苏州大学),江苏苏州215006

出  处:《计算机研究与发展》2019年第4期854-865,共12页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61672440);北京语言大学语言资源高精尖创新中心资助;国家语言文字工作委员会一般项目(YB135-49);中央高校基本科研业务费专项资金项目(ZK1024);苏州大学江苏省计算机信息处理技术重点实验室开放课题(KJS1520)~~

摘  要:目前,循环神经网络(recurrent neural network, RNN)已经被广泛应用于自然语言处理的文本序列语义表示建模.对于没有词语分隔符的语言,例如中文,该网络以经过分词预处理的词序列作为标准输入.然而,非最优的分词粒度和分词错误会对句子语义表示建模产生负面作用,影响后续自然语言处理任务的进行.针对这些问题,提出基于带权词格的循环神经网络模型.该模型以带权词格作为输入,在每个时刻融合多个输入向量和对应的隐状态,融合生成新的隐状态.带权词格是一种包含指数级别分词结果的压缩数据结构,词格中的边权重在一定程度上体现了不同分词结果的一致性.特别地,利用词格权重作为融合函数中权重建模的监督信息,进一步提升了模型句子语义表示的学习效果.相比于传统循环神经网络,该模型不仅能够缓解分词错误对句子语义建模产生的负面影响,同时使得语义建模具有更强的灵活性.在情感分类和问句分类2个任务上的实验结果证明了该模型的有效性.Currently,recurrent neural networks(RNNs)have been widely used in semantic representation modeling of text sequences in natural language processing.For those languages without natural word delimiters(e.g.,Chinese),RNNs generally take the segmented word sequence as input.However,sub-optimal segmentation granularity and segmentation errors may affect sentence semantic modeling negatively,as well as subsequent natural language processing tasks.To address these issues,the proposed weighted word lattice based RNNs take the weighted word lattice as input and produce current state at each time step by integrating arbitrarily many input vectors and the corresponding previous hidden states.Weighted word lattice expresses a compressed data structure that contains exponential word segmentation results.To a certain extent,the weighted word lattice reflects the consistency of different word segmentation results.Specifically,lattice weights are further exploited as a supervised regularizer to refine weights modeling of the semantic composition operation in this model,leading to better sentence semantic representation learning.Compared with traditional RNNs,the proposed model not only alleviates the negative impact of segmentation errors but also is more expressive and flexible to sentence representation learning.Experimental results on sentiment classification and question classification tasks demonstrate the superiority of the proposed model.

关 键 词:带权词格 循环神经网络 句子语义建模 情感分类 问句分类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象