检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘志豪 金相国 邱芹军 陶留锋 黄振 谢忠 Liu Zhihao;Jin Xiangguo;Qiu Qinjun;Tao Liufeng;Huang Zhen;Xie Zhong(National Engineering Research Center of Geographic Information System,Wuhan 430074;School of Computer Science,China University of Geosciences(Wuhan),Wuhan 430074;Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources,Shenzhen,Guangdong 518034;National and Local Joint Engineering Laboratory of Geographic Information System,Wuhan 430074)
机构地区:[1]国家地理信息系统工程技术研究中心,武汉430074 [2]中国地质大学(武汉)计算机学院,武汉430074 [3]自然资源部城市国土资源监测与仿真重点实验室,广东深圳518034 [4]地理信息系统国家地方联合工程实验室,武汉430074
出 处:《地质科学》2023年第4期1535-1553,共19页Chinese Journal of Geology(Scientia Geologica Sinica)
基 金:国家重点研发计划项目(编号:2022YFF0711601);湖北省自然科学基金项目(编号:2022CFB640);中国博士后科学基金项目(编号:2021M702991);地质探测与评估教育部重点实验室主任基金项目(编号:GLAB2023ZR01);自然资源部城市国土资源监测与仿真重点实验室开放基金课题项目(编号:KF-2022-07-014)资助。
摘 要:矿产资源地质报告中蕴含大量专家经验及基础地质知识。快速准确地从海量矿产资源文本中抽取形成结构化知识已成为目前研究热点,命名实体识别是信息抽取与知识挖掘的重要步骤。针对矿产资源地质文本中存在实体长度长、专业术语多、实体嵌套等问题,已有基于深度学习的命名实体识别直接应用在矿产资源领域性能低下,本文提出了一种矿产资源命名实体识别深度学习模型:ALBERT(A Lite Bidirectional Encoder Representations from Transformers)-BiLSTM(Bi-directional Long Short-Term Memory)-CRF(Conditional Random Field),通过ALBERT预训练语言模型获取地质文本丰富语义特征,同时结合汉字拼音、字形和词边界特征来共同作为嵌入层,从而提高对复杂实体的识别能力。本文方法在人民日报、电子简历数据集及构建的矿产资源数据集上进行实验,结果表明提出方法在准确率、召回率、F1值上分别达到70.97%、64.33%、67.49%。Mineral resource geological reports contain a large amount of expert empirical knowledge and basic geological knowledge.Rapid and accurate extraction of structured knowledge from massive mineral resource texts has become a hot research topic,and named entity recognition is an important step in information extraction and knowledge mining.To address the problems of long entity length,many technical terms and nested entities in geological texts,the existing deep learning-based named entity recognition is directly applied to the mineral resources field,which leads to low performance,a deep learning model for named entity recognition of mineral resources is proposed:ALBERT-BiLSTM-CRF,through which ALBERT pre-trained language model to obtain rich semantic features of geological text,while combining Chinese pinyin,character form and word boundary features to jointly serve as an embedding layer,thus improving the recognition ability of complex entities.The method in this paper was experimented on the Peoples Daily,Resume dataset and the constructed mineral resources dataset,and the results showed that the proposed method achieved 70.97%,64.33%and 67.49%in accuracy,recall and F1 value respectively.
关 键 词:矿产资源报告 命名实体识别 预训练模型 多特征融合
分 类 号:P628.4[天文地球—地质矿产勘探] TP391[天文地球—地质学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249