检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张春菊[1,2] 张磊 陈玉冰[3] 刘文聪 薄嘉晨 肖鸿飞 ZHANG Chun-ju;ZHANG Lei;CHEN Yu-bing;LIU Wen-cong;BO Jia-chen;XIAO Hong-fei(College of Civil Engineering,Hefei University of Technology,Hefei 230009;Planning and Natural Resources Bureau of Shenzhen Municipality,Shenzhen 518034;Fiberhome Communication Technology Co.,Ltd.,Nanjing 210019,China)
机构地区:[1]合肥工业大学土木与水利工程学院,安徽合肥230009 [2]深圳市规划和自然资源局,广东深圳518034 [3]烽火天地通信科技股份有限公司,江苏南京210019
出 处:《地理与地理信息科学》2022年第4期7-12,共6页Geography and Geo-Information Science
基 金:自然资源部城市国土资源监测与仿真重点实验室开放基金项目(KF-2020-05-084);国家自然科学基金项目(42171453)。
摘 要:地质实体识别是地质文本信息挖掘和地质知识图谱构建的重要基础,高质量的地质实体语料库是提高地质实体识别效果的重要因素,但目前用于中文地质实体识别的标注语料较少且内容局限于一定领域范围内,而传统的人工标注方法往往耗时耗力且依赖专业知识。因此,该文开展基于BERT的交互式地质实体标注方法研究,通过BERT-BiLSTM-CRF模型自动标注文本中的地质实体并结合人机交互方式校正,同时利用标注的语料扩充原始语料规模和优化地质实体识别模型的性能。实验表明,基于BERT-BiLSTM-CRF模型比CRF、Word2vec-BiLSTM-CRF、Lattice-LSTM-CRF 3种常用模型的识别效果好,在自主构建的初始地质实体语料库的F 1值达91.47%,扩大语料规模后提升了1.36%,在保证质量的前提下,减少了人工标注工作,可实现大规模、高质量地质实体标注语料库的构建。Geological entity recognition is an important basis for geological text information mining and geological knowledge graph construction.High quality geological entity corpus is an important factor to improve the effect of geological entity recognition.At present,there are few annotation corpora for Chinese geological entity recognition,and the content involves a limited range of fields.However,the traditional manual annotation methods are often time-consuming and labor-intensive,and need to rely on professional knowledge.In this paper,the interactive geological entity annotation method based on BERT is studied.The geological entities in the text are automatically annotated through the BERT-BiLSTM-CRF model and corrected in combination with human-computer interaction.At the same time,the annotated corpus is used to expand the scale of the original corpus and optimize the performance of the geological entity recognition model.Experiments show that the F 1 value of the initial geological entity corpus built independently based on BERT-BiLSTM-CRF model in this paper reaches 91.47%.And after expanding the scale of the corpus,the F 1 value is increased by 1.36%.The recognition effect of BERT-BiLSTM-CRF model is better than that of CRF,Word2vec-BiLSTM-CRF and Lattice-LSTM-CRF.On the premise of ensuring the quality,the manual annotation work is reduced,so as to realize the construction of large-scale and high-quality geological entity annotation corpus.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7