基于BiLSTM_CRF模型的藏文分词方法被引量：9

Tibetan word segmentation method based on BiLSTM_CRF model

作　　者：王莉莉王宏渊白玛曲珍杨鸿武[1,2,3] WANG Lili;WANG Hongyuan;BAIMA Quzhen;YANG Hongwu(College of Physics and Electronic Engineering,Northwest Normal University,Lanzhou 730070,P.R.China;Engineering Research Center of Gansu Province for Intelligent Information Technology and Application,Lanzhou 730070,P.R.China;National and Local Joint Engineering Laboratory of Data Learning and Analysis Technology for Internet Education,Lanzhou 730070,P.R.China)

机构地区：[1]西北师范大学物理与电子工程学院,兰州730070 [2]甘肃省智能信息技术与应用工程研究中心,兰州730070 [3]互联网教育数据学习分析技术国家地方联合工程实验室,兰州730070

出　　处：《重庆邮电大学学报（自然科学版）》2020年第4期648-654,共7页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基　　金：国家自然科学基金(11664036,61263036);甘肃省高等学校科技创新团队项目(2017C-03)。

摘　　要：藏文分词是实现藏文语音合成和藏文语音识别的关键技术之一。提出一种基于双向长短时记忆网络加条件随机场(bidirectional long-short-term memory with conditional random field model,BiLSTM_CRF)模型的藏文分词方法。对手工分词的语料经过词向量训练后输入到双向长短时记忆网络(bidirectional long-short-term memory,BiLSTM)中,将前向长短时记忆网络(long-short-term memory,LSTM)和后向LSTM学习到的过去输入特征和未来输入特征相加,传入到线性层和softmax层进行非线性操作得到粗预测信息,再利用条件随机场(conditional random field,CRF)模型进行约束性修正,得到一个利用词向量和CRF模型优化的藏文分词模型。实验结果表明,基于BiLSTM_CRF模型的藏文分词方法可取得较好的分词效果,分词准确率可达94.33%,召回率为93.89%,F值为94.11%。Tibetan word segmentation is one of the key technologies to realize Tibetan speech synthesis and Tibetan speech recognition.This paper proposes a Tibetan word segmentation method based on bidirectional long-short-term memory with conditional random field(BiLSTM_CRF)model.Firstly,the corpus of manual word segmentation is input into BiLSTM model after word vector training.Then the past input features acquired by forward long-short-term memory network(LSTM)are added with the future input features acquired by backward LSTM.The nonlinear operation is carried out in the linear layer and the softmax layer to obtain the rough prediction information.The constraint correction is finally carried out in the conditional random field(CRF)model to obtain a Tibetan word segmentation model optimized by word vector and CRF model.The experimental results show that the proposed method can achieves 94.33%on word segmentation accuracy,93.89%on recall rate and 94.11%on F value.

关键词：文本分词长短时计忆网络深度神经网络词向量民族语言

分类号：TP391.1[自动化与计算机技术—计算机应用技术] TN912.33[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BiLSTM_CRF模型的藏文分词方法被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BiLSTM_CRF模型的藏文分词方法 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于BiLSTM_CRF模型的藏文分词方法被引量：9