基于BERT的交互式地质实体标注语料库构建方法  被引量:6

Construction Method of Interactive Geological Entity Annotation Corpus Based on BERT

在线阅读下载全文

作  者:张春菊[1,2] 张磊 陈玉冰[3] 刘文聪 薄嘉晨 肖鸿飞 ZHANG Chun-ju;ZHANG Lei;CHEN Yu-bing;LIU Wen-cong;BO Jia-chen;XIAO Hong-fei(College of Civil Engineering,Hefei University of Technology,Hefei 230009;Planning and Natural Resources Bureau of Shenzhen Municipality,Shenzhen 518034;Fiberhome Communication Technology Co.,Ltd.,Nanjing 210019,China)

机构地区:[1]合肥工业大学土木与水利工程学院,安徽合肥230009 [2]深圳市规划和自然资源局,广东深圳518034 [3]烽火天地通信科技股份有限公司,江苏南京210019

出  处:《地理与地理信息科学》2022年第4期7-12,共6页Geography and Geo-Information Science

基  金:自然资源部城市国土资源监测与仿真重点实验室开放基金项目(KF-2020-05-084);国家自然科学基金项目(42171453)。

摘  要:地质实体识别是地质文本信息挖掘和地质知识图谱构建的重要基础,高质量的地质实体语料库是提高地质实体识别效果的重要因素,但目前用于中文地质实体识别的标注语料较少且内容局限于一定领域范围内,而传统的人工标注方法往往耗时耗力且依赖专业知识。因此,该文开展基于BERT的交互式地质实体标注方法研究,通过BERT-BiLSTM-CRF模型自动标注文本中的地质实体并结合人机交互方式校正,同时利用标注的语料扩充原始语料规模和优化地质实体识别模型的性能。实验表明,基于BERT-BiLSTM-CRF模型比CRF、Word2vec-BiLSTM-CRF、Lattice-LSTM-CRF 3种常用模型的识别效果好,在自主构建的初始地质实体语料库的F 1值达91.47%,扩大语料规模后提升了1.36%,在保证质量的前提下,减少了人工标注工作,可实现大规模、高质量地质实体标注语料库的构建。Geological entity recognition is an important basis for geological text information mining and geological knowledge graph construction.High quality geological entity corpus is an important factor to improve the effect of geological entity recognition.At present,there are few annotation corpora for Chinese geological entity recognition,and the content involves a limited range of fields.However,the traditional manual annotation methods are often time-consuming and labor-intensive,and need to rely on professional knowledge.In this paper,the interactive geological entity annotation method based on BERT is studied.The geological entities in the text are automatically annotated through the BERT-BiLSTM-CRF model and corrected in combination with human-computer interaction.At the same time,the annotated corpus is used to expand the scale of the original corpus and optimize the performance of the geological entity recognition model.Experiments show that the F 1 value of the initial geological entity corpus built independently based on BERT-BiLSTM-CRF model in this paper reaches 91.47%.And after expanding the scale of the corpus,the F 1 value is increased by 1.36%.The recognition effect of BERT-BiLSTM-CRF model is better than that of CRF,Word2vec-BiLSTM-CRF and Lattice-LSTM-CRF.On the premise of ensuring the quality,the manual annotation work is reduced,so as to realize the construction of large-scale and high-quality geological entity annotation corpus.

关 键 词:BERT 地质实体识别 交互式 地质实体语料库 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象