基于四份区域地质调查报告构建的命名实体识别试验数据集研发  被引量:5

Development of a Named Entity Recognition Dataset Based on Four Regional Geological Survey Reports

在线阅读下载全文

作  者:马凯[1] 田苗 谭永健 王曙 谢忠 邱芹军 Ma,K.;Tian,M.;Tan,Y.J.;Wang,S.;Xie,Z.;Qiu,Q.J.(College of Computer and Information Technology,China Three Gorges University,Yichang 443002,China;School of Computer Science,China University of Geosciences,Wuhan 430074,China;National Engineering Research Centerof Geographic Information System,Wuhan 430074,China;State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China)

机构地区:[1]三峡大学计算机与信息学院,宜昌443002 [2]中国地质大学(武汉)计算机学院,武汉430074 [3]国家地理信息系统工程技术研究中心,武汉430074 [4]中国科学院地理科学与资源研究所,资源与环境信息系统国家重点实验室,北京100101

出  处:《全球变化数据学报(中英文)》2022年第1期78-84,I0080-I0086,共14页Journal of Global Change Data & Discovery

基  金:国家自然科学基金(42050101,41871311,U1711267)。

摘  要:区域地质调查报告是全面反映区域地质调查工作成果的重要技术文件。目前全国地质资料馆已经积累了海量的地质成果报告,对其进行信息抽取和挖掘可以充分挖掘现有报告的隐含价值,促进新知识的发现。本文面向自然语言处理领域的命名实体识别任务,构建了基于四份区域地质调查报告的命名实体识别试验数据集,该数据集可以用于训练和测试地质命名实体模型。数据集共包含四份区域地质调查成果报告,对地质时间、地质构造、地层、岩石、矿物和地点六类典型的地质命名实体进行了标注,对数据集分别进行了一致性检验、测试、评估等工作,保证了数据集的质量。数据集大小为4.84 MB,存储格式为.txt文本。Regional geological survey reports are important technical documents that comprehensively reflect the results of regional geological survey work. At present, the national geological data library has accumulated a large number of geological result reports, and information extraction and mining can fully explore the implicit value of existing reports and promote the discovery of new knowledge. In this paper, a named entity recognition experimental dataset based on four regional geological survey reports is constructed for the task of named entity recognition in the field of natural language processing, which can be used for training and testing geological named entity models. The dataset contains a total of four regional geological survey results reports, which are annotated with six typical categories of geological named entities: geological time, geological formations, strata, rocks, minerals and locations. The dataset is checked for consistency, tested and evaluated separately to ensure the quality of the dataset. The size of the dataset is 4.84 MB, and the data format is.txt.

关 键 词:区域地质调查报告 命名实体识别 一致性检验 测试 评估 

分 类 号:P628[天文地球—地质矿产勘探]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象