检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:帕丽旦.木合塔尔 吾守尔.斯拉木 买买提阿依甫 MUHETAER Palidan;SILAMU Wushouer;Maimaitayifu(College of Information Science and Engineering,Xinjiang University,Urumqi Xinjiang 830046,China)
机构地区:[1]新疆大学信息科学与工程学院,新疆乌鲁木齐830046
出 处:《计算机仿真》2019年第1期268-273,共6页Computer Simulation
基 金:国家"973"重点基础研究计划基金资助项目((2014CB340506);国家自然科学基金资助项目(61363063);新疆大学多语种重点实验室开放课题(XJDX0905-2013-01)
摘 要:维吾尔语词性标注是词法分析中的重要任务之一,其标注结果的准确性直接影响到自然语言处理的后续工作。维吾尔语词性标注的难点是如何正确判断兼类词和未登录词的词性。提出了基于BiLSTM-CNN-CRF的混合模型进行维吾尔语词性标注。上述模型采用三层结构,先用CNN网络框架训练出维吾尔文单词的字符级形态特征向量,其次用skip-gram方法对大规模语料进行训练生成具有语义信息的低维度稠密实数词向量,然后将字符级特征向量和词向量拼接的组合向量作为BiLSTM-CRF深层神经网络的输入向量进行训练,构建适合维吾尔语词性标注的BiLSTM-CNN-CRF混合神经网络模型。实验结果显示,新的神经网络混合模型的词性标注准确率在实验室提供的数据集上达到了最好的标注结果,F1值达到了97.01%,对维吾尔语兼类词及未登录词标注有明显的提高。Uyghur part of speech tagging is one of the most important tasks in lexical analysis. The accuracy of the tagging results directly affects the follow-up work of natural language processing. The difficulty of Uyghur part of speech tagging is how to correctly judge the part of speech of concurrent and unregistered words. This paper proposed a hybrid model based on Bi LSTM-CNN-CRF for Uyghur part of speech tagging. The model adopted a three-layer structure. Firstly,the character-level morphological feature vectors of Uyghur words were trained by CNN network framework. Secondly,the large-scale corpus was trained by skip-gram method to generate low-dimensional dense real word vectors with semantic information. Secondly,we constructed Bi LSTM-CNN-CRF hybrid neural network model suitable for Uyghur part of speech tagging,which used the combination vectors of character-level feature vectors and word vectors as input vectors for training. The experimental results show that the new hybrid neural network model achieves the best part of speech tagging accuracy on the data set provided by the laboratory,and the f1 value reaches97. 01 %,which significantly improves the tagging of Uyghur concurrent words and unregistered words.
关 键 词:递归神经网络 卷积神经网络 条件随机场 维吾尔语 词性标注
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15