检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邱芹军 田苗 马凯 谢忠[1,2] 金相国 段雨希[5] 陶留锋 QIU Qinjun;TIAN Miao;MA Kai;XIE Zhong;JIN Xiangguo;DUAN Yuxi;TAO Liufeng(School of Computer Science,China University of Geosciences,Wuhan,430074;National Local Joint Engineering Laboratory of Geographic Information System,Wuhan,430074;Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering,China Three Gorges University,Yichang,Hubei,443002;College of Computer and Information Technology,China Three Gorges University,Yichang,Hubei,443002;National Engineering Research Center for Geographic Information System,Wuhan,430074)
机构地区:[1]中国地质大学(武汉)计算机学院,武汉430074 [2]中国地质大学(武汉)地理信息系统国家地方联合工程实验室,武汉430074 [3]湖北省水电工程智能视觉监测重点实验室,湖北宜昌443002 [4]三峡大学计算机与信息学院,湖北宜昌443002 [5]中国地质大学(武汉)国家地理信息系统工程技术研究中心,武汉430074
出 处:《地质论评》2023年第4期1423-1433,共11页Geological Review
基 金:国家重点研发计划(编号:2022YFF0711601);国家自然科学基金资助项目(编号:42050101);中国博士后科学基金资助项目(编号:2021M702991)的成果~。
摘 要:作为我国地质调查领域最重要的数据源之一,地质调查报告中蕴含着丰富的地学知识及地质体描述等关键信息,准确高质量地抽取地质命名实体为地学知识图谱构建、知识推理及知识演化提供基础。笔者等在阐述地质命名实体识别任务基础上,分析地质实体不仅包含大量专业术语,还存在实体嵌套、大量长实体等领域特性,进一步增加了地质命名实体识别难度。笔者等提出一种基于轻量级预训练模型(ALBERT)—双向长短时记忆网络(BiLSTM)—条件随机场(CRF)模型的地质命名实体识别方法。首先利用ALBERT对输入字符上下文特征进行建模,并采用BiLSTM对其进行进一步上下文特征表征,最后采用CRF实现标注序列预测。实验结果表明,在构建的地质命名实体识别数据集上,相比于主流的命名实体识别模型算法,本文所提出的方法具有更好的抽取性能,提出的命名实体识别模型能为领域实体识别提供借鉴,同时为地学领域实体关系抽取和地学知识图谱构建提供有力方法支撑。As one of the most important data sources in the field of geological survey in China,geological survey texts contain a wealth of geological knowledge and descriptions of geological bodies and other key information,and accurate and effective extraction of geological entities in this field can provide the basis for geological knowledge graph and knowledge inference.In this paper,based on the description of the geological named entity recognition task,it is analysed that geological entities contain a large number of terminologies along with domain characteristics such as entity nesting and a large number of long entities,which further increase the difficulty of geological named entity recognition.A lightweight pre-training model(ALBERT)—bi-directional long and short-term memory network(BiLSTM)—conditional random field(CRF)model is proposed for geological named entity recognition.Firstly,ALBERT is used to model the contextual features of the input characters,and BiLSTM is used to further characterize the contextual features,and finally CRF is used to achieve annotated sequence prediction.The experimental results show that the proposed method has superior extraction performance than the mainstream named entity recognition model algorithms on the constructed geological named entity recognition datasets,and the proposed named entity recognition model can provide reference for domain entity recognition,as well as provide powerful methodological support for entity relationship extraction and geological knowledge graph construction in the geoscience domain.
关 键 词:地质命名实体识别 轻量级预训练模型 ALBERT 知识图谱 地质报告
分 类 号:P622[天文地球—地质矿产勘探]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49