基于Bi-LSTM-6Tags的智能中文分词方法  被引量:6

Smart Chinese word segmentation method based on Bi-LSTM-6Tags

在线阅读下载全文

作  者:王玮 WANG Wei(Graduate School,Academy of Military Sciences,Beijing 100091,China)

机构地区:[1]军事科学院研究生院,北京100091

出  处:《计算机应用》2018年第A02期107-110,共4页journal of Computer Applications

摘  要:针对当前基于深度学习模型中文分词算法中存在的语义理解不全和词位信息不足的问题,提出了基于双向长短期记忆(Bi-LSTM)神经网络模型的六词位标注集中文分词方法。首先,利用双向长短期记忆神经网络模型自动发现文本特征;然后,通过六词位标注集从文本深层语义上高效准确完成中文分词任务;最后,通过第二国际汉语分词评测(SIGHAN)提供的Backoff2005语料集进行实验验证,在相同实验条件下,该方法与条件随机场(CRF)方法、单向长短期记忆神经网络方法、双向长短期记忆神经网络四词位方法进行比较,分别可以提高分词准确率3%、4%、1%,从而证明该中文分词方法是合理和有效的。In view of the problem of incomplete semantic understanding and insufficient word information in the Chinese word segmentation algorithm based on the depth learning model,this paper proposed a six-word-position-based tagging method based on Bidirectional Long Short-Term Memory(Bi-LSTM)neural network model.Firstly,the text features were automatically discovered by using a Bi-LSTM deep learning neural network.Then,the six-word-position-based tagging method was used to complete the middle segmentation task efficiently and accurately from the deep semantic meaning of the text.Finally,through SIGHAN(the Second International Chinese word segmentation evaluation),the Backoff2005 corpus is provided by the experimental verification.Under the same experimental conditions,the method and CRF(Conditional Random Field)method,the LSTM(long short memory neural network),and the Bi-LSTM four word position method can improve the accuracy of word segmentation by 3%,4%and 1%respectively.It proves that the Chinese word segmentation method proposed in this paper is reasonable and effective,and the accuracy of segmentation is improved.

关 键 词:双向LSTM 六词位标注 中文分词 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象