检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈忠良 袁峰[1] 李晓晖[1] 张明明[1] CHEN Zhongliang;YUAN Feng;LI Xiaohui;ZHANG Mingming(School of Resources and Environment Engineering,Hefei University of Technology,Hefei,230009;Geological Survey of Anhui Province,Hefei,230001)
机构地区:[1]合肥工业大学资源与环境工程学院,合肥230009 [2]安徽省地质调查院,合肥230001
出 处:《地质论评》2022年第2期742-750,共9页Geological Review
基 金:国家自然科学基金资助项目(编号:41820104007,42072321,41872247)的成果。
摘 要:地质调查正在从“数字化”走向“智能化”,需要在大数据思维的指导下,面向非结构化数据开展机器阅读和地质知识的自动提取。地学命名实体和关系联合提取是当前研究的难点和核心。本文采用基于大规模预训练中文语言模型的BERT—BiLSTM—CRF方法开展岩石描述文本命名实体与关系联合提取。首先,通过收集数字地质填图工作中的剖面测量和路线地质观测数据,建立岩石描述语料;然后,在岩石学理论指导下分析岩石知识组成,完成岩石知识图谱命名实体与关系的模式设计,标注岩石语料;最后,开展岩石描述语料知识提取的深度学习训练和消融试验对比。试验结果显示,大规模预训练中文语言模型(BERT)对岩石描述语料知识提取具有较高的适用性。推荐的BERT—BiLSTM—CRF模型方法对岩石命名实体与关系联合提取的准确率(F1值)为91.75%,对岩石命名实体识别的准确率(F1值)为97.38%。消融试验证明基于BERT的词嵌入层对岩石描述知识提取的性能提升影响显著,双向长短时记忆网络模型层(BiLSTM Layer)能提升实体关系联合提取性能。At present, the geological survey is developing from digitization towards the direction of intelligence. According to the big data thinking, the machine reading technique and the auto-extration of geological knowledge based on the unstructured data deserves academic concern in geosciences. The problem about joint extration of the geological named entity and relation is the key to this research and yet it is lack of study. This paper proposes the BERT—BiLSTM—CRF model based on the pre-trained Chinese language representation model which was called BERT to conduct the joint task of geological named entity recognition(NER) and relation extraction(RE) on the lithological description corpus. First, the sentence-level corpus was collected from the the profiling and field geological observation data which were produced by the digital geological survey information system designed by China Geological Survey(CGS). Second, based on the theory of petrology, the meta-graph was projected for the rock named entities and relations and the corpus was manual labeled. Third, the comparison experiment of geological knowledge extration task were carried out on the labeled corpus. The experiment results showed that the BERT model does apply to the NER and RE task on the lithological description corpus. The performance(F1) achieved by the proposed BERT—BiLSTM—CRF model on the lithological named entity and relation joint extraction task reached 91.75%, and F1 even reached 97.38% on the task of the named entity recognition. The ablation experiments indicated that the influence of the BERT-embedding layer is prominent on the lithological knowledge extration task and the BiLSTM layer can improvement the performance of the entity and relation joint extraction task.
关 键 词:大数据思维 深度学习 预训练中文语言模型 命名实体识别 关系提取
分 类 号:P628[天文地球—地质矿产勘探]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.185