检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马凯[1] 田苗 谭永健 王曙 谢忠 邱芹军 Ma,K.;Tian,M.;Tan,Y.J.;Wang,S.;Xie,Z.;Qiu,Q.J.(College of Computer and Information Technology,China Three Gorges University,Yichang 443002,China;School of Computer Science,China University of Geosciences,Wuhan 430074,China;National Engineering Research Centerof Geographic Information System,Wuhan 430074,China;State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China)
机构地区:[1]三峡大学计算机与信息学院,宜昌443002 [2]中国地质大学(武汉)计算机学院,武汉430074 [3]国家地理信息系统工程技术研究中心,武汉430074 [4]中国科学院地理科学与资源研究所,资源与环境信息系统国家重点实验室,北京100101
出 处:《全球变化数据学报(中英文)》2022年第1期78-84,I0080-I0086,共14页Journal of Global Change Data & Discovery
基 金:国家自然科学基金(42050101,41871311,U1711267)。
摘 要:区域地质调查报告是全面反映区域地质调查工作成果的重要技术文件。目前全国地质资料馆已经积累了海量的地质成果报告,对其进行信息抽取和挖掘可以充分挖掘现有报告的隐含价值,促进新知识的发现。本文面向自然语言处理领域的命名实体识别任务,构建了基于四份区域地质调查报告的命名实体识别试验数据集,该数据集可以用于训练和测试地质命名实体模型。数据集共包含四份区域地质调查成果报告,对地质时间、地质构造、地层、岩石、矿物和地点六类典型的地质命名实体进行了标注,对数据集分别进行了一致性检验、测试、评估等工作,保证了数据集的质量。数据集大小为4.84 MB,存储格式为.txt文本。Regional geological survey reports are important technical documents that comprehensively reflect the results of regional geological survey work. At present, the national geological data library has accumulated a large number of geological result reports, and information extraction and mining can fully explore the implicit value of existing reports and promote the discovery of new knowledge. In this paper, a named entity recognition experimental dataset based on four regional geological survey reports is constructed for the task of named entity recognition in the field of natural language processing, which can be used for training and testing geological named entity models. The dataset contains a total of four regional geological survey results reports, which are annotated with six typical categories of geological named entities: geological time, geological formations, strata, rocks, minerals and locations. The dataset is checked for consistency, tested and evaluated separately to ensure the quality of the dataset. The size of the dataset is 4.84 MB, and the data format is.txt.
关 键 词:区域地质调查报告 命名实体识别 一致性检验 测试 评估
分 类 号:P628[天文地球—地质矿产勘探]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38