检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王玮 WANG Wei(Graduate School,Academy of Military Sciences,Beijing 100091,China)
出 处:《计算机应用》2018年第A02期107-110,共4页journal of Computer Applications
摘 要:针对当前基于深度学习模型中文分词算法中存在的语义理解不全和词位信息不足的问题,提出了基于双向长短期记忆(Bi-LSTM)神经网络模型的六词位标注集中文分词方法。首先,利用双向长短期记忆神经网络模型自动发现文本特征;然后,通过六词位标注集从文本深层语义上高效准确完成中文分词任务;最后,通过第二国际汉语分词评测(SIGHAN)提供的Backoff2005语料集进行实验验证,在相同实验条件下,该方法与条件随机场(CRF)方法、单向长短期记忆神经网络方法、双向长短期记忆神经网络四词位方法进行比较,分别可以提高分词准确率3%、4%、1%,从而证明该中文分词方法是合理和有效的。In view of the problem of incomplete semantic understanding and insufficient word information in the Chinese word segmentation algorithm based on the depth learning model,this paper proposed a six-word-position-based tagging method based on Bidirectional Long Short-Term Memory(Bi-LSTM)neural network model.Firstly,the text features were automatically discovered by using a Bi-LSTM deep learning neural network.Then,the six-word-position-based tagging method was used to complete the middle segmentation task efficiently and accurately from the deep semantic meaning of the text.Finally,through SIGHAN(the Second International Chinese word segmentation evaluation),the Backoff2005 corpus is provided by the experimental verification.Under the same experimental conditions,the method and CRF(Conditional Random Field)method,the LSTM(long short memory neural network),and the Bi-LSTM four word position method can improve the accuracy of word segmentation by 3%,4%and 1%respectively.It proves that the Chinese word segmentation method proposed in this paper is reasonable and effective,and the accuracy of segmentation is improved.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.81