检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡万亭 郭建英 张继永[2] HU Wan-ting;GUO Jian-ying;ZHANG Ji-yong(Puyang Institute of Technology,Henan University,Puyang 457000,China;School of Information Science and Technology,Southwest Jiaotong University,Chengdu 610000,China)
机构地区:[1]河南大学濮阳工学院,河南濮阳457000 [2]西南交通大学信息科学与技术学院,四川成都610000
出 处:《计算机技术与发展》2020年第11期25-29,共5页Computer Technology and Development
基 金:河南省高校重点科研项目(18B510014)。
摘 要:组织机构名识别是命名实体识别的核心任务之一,也是最困难的任务。近年来,预训练模型在中文自然语言处理领域得到广泛应用,预训练的词嵌入模型在中文命名实体识别上取得了非常好的效果,但是在组织机构名识别上还有很大的提升空间。针对这一问题,改进ELMO(embedding from language models)预训练模型,结合双向LSTM神经网络模型和条件随机场模型,去识别组织机构名。对于ELMO的改进,主要通过筛选高频机构词,然后将高频机构词加入中文字典,通过ELMO模型训练生成机构词向量和普通字向量。字向量不用考虑未登录词的问题,机构词向量引入了先验知识,结合起来可以使得生成的字词向量能够更好地表征组织机构名。实验结果表明,预训练模型的数据集相对较小时,该方法比字向量嵌入的方法有更好的效果,F1值提高了1.3%。Organization name recognition is one of the primary tasks of named entity recognition and the most difficult task.Recently,the pre-training model has been widely used in the field of Chinese natural language processing.The word embedding model has achieved excellent results in Chinese named entity recognition,but there is still much room for improvement in organization name recognition.To solve the problem,the ELMO(embedding from language models)is improved,and then it is combined with the Bi-LSTM model and conditional random field model to identify the organization name.The improvement of ELMO is mainly through filtering high-frequency organization words,then adding them into Chinese character set,and generating organization word vector and character vector through ELMO model training.The character vector hasn’t the problem of unknown words and organization word vector introduces prior knowledge,which can be combined to make the generated word vector can better represent the organization name.The experiment shows that when the data set of the pre-training model is relatively small,the proposed method has a better effect than the word vector embedding method,with F1 value increasing by 1.3%.
关 键 词:ELMO模型 LSTM模型 机构词 条件随机场 组织机构名识别
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7