区域地质调查文本中文命名实体识别  被引量:6

Chinese named entity recognition for regional geological survey text

在线阅读下载全文

作  者:邱芹军 田苗 马凯 谢忠[1,2] 金相国 段雨希[5] 陶留锋 QIU Qinjun;TIAN Miao;MA Kai;XIE Zhong;JIN Xiangguo;DUAN Yuxi;TAO Liufeng(School of Computer Science,China University of Geosciences,Wuhan,430074;National Local Joint Engineering Laboratory of Geographic Information System,Wuhan,430074;Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering,China Three Gorges University,Yichang,Hubei,443002;College of Computer and Information Technology,China Three Gorges University,Yichang,Hubei,443002;National Engineering Research Center for Geographic Information System,Wuhan,430074)

机构地区:[1]中国地质大学(武汉)计算机学院,武汉430074 [2]中国地质大学(武汉)地理信息系统国家地方联合工程实验室,武汉430074 [3]湖北省水电工程智能视觉监测重点实验室,湖北宜昌443002 [4]三峡大学计算机与信息学院,湖北宜昌443002 [5]中国地质大学(武汉)国家地理信息系统工程技术研究中心,武汉430074

出  处:《地质论评》2023年第4期1423-1433,共11页Geological Review

基  金:国家重点研发计划(编号:2022YFF0711601);国家自然科学基金资助项目(编号:42050101);中国博士后科学基金资助项目(编号:2021M702991)的成果~。

摘  要:作为我国地质调查领域最重要的数据源之一,地质调查报告中蕴含着丰富的地学知识及地质体描述等关键信息,准确高质量地抽取地质命名实体为地学知识图谱构建、知识推理及知识演化提供基础。笔者等在阐述地质命名实体识别任务基础上,分析地质实体不仅包含大量专业术语,还存在实体嵌套、大量长实体等领域特性,进一步增加了地质命名实体识别难度。笔者等提出一种基于轻量级预训练模型(ALBERT)—双向长短时记忆网络(BiLSTM)—条件随机场(CRF)模型的地质命名实体识别方法。首先利用ALBERT对输入字符上下文特征进行建模,并采用BiLSTM对其进行进一步上下文特征表征,最后采用CRF实现标注序列预测。实验结果表明,在构建的地质命名实体识别数据集上,相比于主流的命名实体识别模型算法,本文所提出的方法具有更好的抽取性能,提出的命名实体识别模型能为领域实体识别提供借鉴,同时为地学领域实体关系抽取和地学知识图谱构建提供有力方法支撑。As one of the most important data sources in the field of geological survey in China,geological survey texts contain a wealth of geological knowledge and descriptions of geological bodies and other key information,and accurate and effective extraction of geological entities in this field can provide the basis for geological knowledge graph and knowledge inference.In this paper,based on the description of the geological named entity recognition task,it is analysed that geological entities contain a large number of terminologies along with domain characteristics such as entity nesting and a large number of long entities,which further increase the difficulty of geological named entity recognition.A lightweight pre-training model(ALBERT)—bi-directional long and short-term memory network(BiLSTM)—conditional random field(CRF)model is proposed for geological named entity recognition.Firstly,ALBERT is used to model the contextual features of the input characters,and BiLSTM is used to further characterize the contextual features,and finally CRF is used to achieve annotated sequence prediction.The experimental results show that the proposed method has superior extraction performance than the mainstream named entity recognition model algorithms on the constructed geological named entity recognition datasets,and the proposed named entity recognition model can provide reference for domain entity recognition,as well as provide powerful methodological support for entity relationship extraction and geological knowledge graph construction in the geoscience domain.

关 键 词:地质命名实体识别 轻量级预训练模型 ALBERT 知识图谱 地质报告 

分 类 号:P622[天文地球—地质矿产勘探]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象