检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王莉莉 王宏渊 白玛曲珍 杨鸿武[1,2,3] WANG Lili;WANG Hongyuan;BAIMA Quzhen;YANG Hongwu(College of Physics and Electronic Engineering,Northwest Normal University,Lanzhou 730070,P.R.China;Engineering Research Center of Gansu Province for Intelligent Information Technology and Application,Lanzhou 730070,P.R.China;National and Local Joint Engineering Laboratory of Data Learning and Analysis Technology for Internet Education,Lanzhou 730070,P.R.China)
机构地区:[1]西北师范大学物理与电子工程学院,兰州730070 [2]甘肃省智能信息技术与应用工程研究中心,兰州730070 [3]互联网教育数据学习分析技术国家地方联合工程实验室,兰州730070
出 处:《重庆邮电大学学报(自然科学版)》2020年第4期648-654,共7页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基 金:国家自然科学基金(11664036,61263036);甘肃省高等学校科技创新团队项目(2017C-03)。
摘 要:藏文分词是实现藏文语音合成和藏文语音识别的关键技术之一。提出一种基于双向长短时记忆网络加条件随机场(bidirectional long-short-term memory with conditional random field model,BiLSTM_CRF)模型的藏文分词方法。对手工分词的语料经过词向量训练后输入到双向长短时记忆网络(bidirectional long-short-term memory,BiLSTM)中,将前向长短时记忆网络(long-short-term memory,LSTM)和后向LSTM学习到的过去输入特征和未来输入特征相加,传入到线性层和softmax层进行非线性操作得到粗预测信息,再利用条件随机场(conditional random field,CRF)模型进行约束性修正,得到一个利用词向量和CRF模型优化的藏文分词模型。实验结果表明,基于BiLSTM_CRF模型的藏文分词方法可取得较好的分词效果,分词准确率可达94.33%,召回率为93.89%,F值为94.11%。Tibetan word segmentation is one of the key technologies to realize Tibetan speech synthesis and Tibetan speech recognition.This paper proposes a Tibetan word segmentation method based on bidirectional long-short-term memory with conditional random field(BiLSTM_CRF)model.Firstly,the corpus of manual word segmentation is input into BiLSTM model after word vector training.Then the past input features acquired by forward long-short-term memory network(LSTM)are added with the future input features acquired by backward LSTM.The nonlinear operation is carried out in the linear layer and the softmax layer to obtain the rough prediction information.The constraint correction is finally carried out in the conditional random field(CRF)model to obtain a Tibetan word segmentation model optimized by word vector and CRF model.The experimental results show that the proposed method can achieves 94.33%on word segmentation accuracy,93.89%on recall rate and 94.11%on F value.
关 键 词:文本分词 长短时计忆网络 深度神经网络 词向量 民族语言
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TN912.33[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.177