检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭晓然[1] 王维兰[2] 罗平 GUO Xiaoran;WANG Weilan;LUO Ping(School of Mathematics and Computer Science,Northwest Minzu University,Lanzhou 730030,China;Key Laboratory of China's Ethnic Languages and Information Technology,Northwest Minzu University,Lanzhou 730030,China;School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
机构地区:[1]西北民族大学数学与计算机科学学院,甘肃兰州730030 [2]西北民族大学中国民族语言文字信息技术教育部重点实验室,甘肃兰州730030 [3]兰州交通大学电子与信息工程学院,甘肃兰州730070
出 处:《高原科学研究》2020年第4期87-94,共8页Plateau Science Research
基 金:国家自然科学基金项目(61162021);国家民委创新团队计划(〔2018〕98号);中央高校青年教师创新项目(31920200067).
摘 要:命名实体识别是自然语言处理中的一项基础性关键任务。针对汉译藏传佛教典籍中各种神灵名称难以识别的问题,提出一种基于BERT预训练语言模型、双向长短时记忆网络(BiLSTM)和条件随机场(CRF)的多神经网络融合方法BERT-BiLSTM-CRF-a。该方法使用BERT代替浅层网络训练字向量,充分表征字的多义性;引入注意力机制的权重思想将BiLSTM层的前向和后向隐层向量加权后再拼接,进一步提高了上下文特征的有效利用率;最后使用CRF模型输出序列上的最优标注结果。实验表明,该方法在测试集上准确率达95.2%,较传统的BiLSTM-CRF模型提升7.6%,召回率也高出8.7%,因此能够应用于汉译藏传佛教典籍中神灵名称识别任务。Named entity recognition is a basic task in natural language processing,but also an important foundation of knowledge graph construction.In view of the fact that the naming patterns of various deities are not fixed and difficult to identify in Tibetan buddhist classics translated in Chinese,a multi-neural network fusion method BERT-BILSTM-CRF-a was proposed based on the BERT pre-training language model,Bidirectional Long Short-Term Memory(BiLSTM)and Conditional Random Field(CRF).The model used the BERT pre-training method instead of traditional shallow neural network to train word vector to fully represent the polysemy of the words.In addition,the weight thinking of attention mechanism is introduced to weight the forward and backward LSTM hidden layer vectors before concatenate to further improve the effective utilization of context features.Finally,the CRF model is used to label texts and the to output optimal label resulted on the sentence sequence.The experimental results showed that the model has an accuracy of 95.2%in the test set,7.6%higher than BiLSTM-CRF model,and the recall rate is also 8.7%higher than that of BiLSTM-CRF model.Therefore,BERTBILSTM-CRF-a can be effectively applied to the task of deities named entity recognition in the field of Tibetan Buddhism.
关 键 词:藏传佛教神灵 命名实体识别 BERT预训练模型 注意力机制
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.112