GeoNER:Geological Named Entity Recognition with Enriched Domain Pre-Training Model and Adversarial Training  

在线阅读下载全文

作  者:MA Kai HU Xinxin TIAN Miao TAN Yongjian ZHENG Shuai TAO Liufeng QIU Qinjun 

机构地区:[1]Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering,China Three Gorges University,Yichang,Hubei 443002,China [2]College of Computer and Information Technology,China Three Gorges University,Yichang,Hubei 443002,China [3]Key Laboratory of Geological Survey and Evaluation of Ministry of Education,China University of Geosciences,Wuhan 430074,China [4]School of Computer Science,China University of Geosciences,Wuhan 430074,China [5]Key Laboratory of Quantitative Resource Evaluation and Information Engineering,Ministry of Natural Resources,China University of Geosciences,Wuhan 430074,China

出  处:《Acta Geologica Sinica(English Edition)》2024年第5期1404-1417,共14页地质学报(英文版)

基  金:financially supported by the Natural Science Foundation of China(Grant No.42301492);the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200);the Major Special Project of Xinjiang(Grant No.2022A03009-3);the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014);the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01);the Fundamental Research Funds for the Central Universities。

摘  要:As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.

关 键 词:geological named entity recognition geological report adversarial training confrontation training global pointer pre-training model 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] P628[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象