一种新的小样本地表覆盖命名实体识别方法  

A new land cover named entity recognition method considering limited corpus

在线阅读下载全文

作  者:承达瑜 乔介 朱秀丽[2,3] 张兆江 刘万增 CHENG Dayu;QIAO Jie;ZHU Xiuli;ZHANG Zhaojiang;LIU Wanzeng(School of Mining and Geomatics Engineering,Hebei University of Engineering,Handan,Hebei 056038,China;National Geomatics Center of China,Beijing 100036,China;Key Laboratory of Spatiotemporal Information and Intelligent Services,Beijing 100036,China)

机构地区:[1]河北工程大学矿业与测绘工程学院,河北邯郸056038 [2]国家基础地理信息中心,北京100036 [3]时空信息与智能服务重点实验室,北京100036

出  处:《测绘科学》2024年第8期153-163,共11页Science of Surveying and Mapping

基  金:科技基础资源调查专项(2019FY202503);河北省重大科技成果转化专项(22287401Z)。

摘  要:针对地表覆盖命名实体识别缺乏足够标注数据的问题,该文通过引入领域关键词增强模型对实体命名规律理解的思路,提出了一种基于预训练模型的小样本命名实体识别方法。该方法采用MacBERT层提取深层语义信息,利用BiLSTM层处理长距离的上下文依赖,同时通过CRF层优化序列标注的整体一致性。在模型训练初期,收集地表覆盖领域关键词和文本数据,通过数据增强技术提升数据的多样性,避免模型过拟合;其次引入领域关键词与文本数据融合的训练策略,提高模型在地表覆盖领域的泛化能力,增强其对于实体边界的判别能力。实验表明,相比BERT模型,本文模型精确率提高2.5%,召回率提高1.62%,F_(1)值提高2.04%;相较于MacBERT-BiLSTM-CRF基础模型,本文模型精确率提高0.43%,召回率提高1.67%,F_(1)值提高2.38%。Aiming at the problem of lacking enough labeled data in land cover named entity recognition,this paper proposed a small sample named entity recognition method based on pre-trained model by introducing domain keywords to enhance the model's understanding of entity naming rules.The MacBERT layer is used to extract deep semantic information,the BiLSTM layer is used to deal with long-distance context dependence,and the CRF layer is used to optimize the overall consistency of sequence labeling.In the early stage of model training,keywords and text data in the field of land cover were collected,and data augmentation technology was used to improve the diversity of data to avoid model overfitting.Secondly,the training strategy combining domain keywords with text data was introduced to improve the generalization ability of the model in the field of land cover and enhance its ability to distinguish entity boundaries.The results of experiment show that compared with BERT model,the accuracy of the proposed model is increased by 2.5%,the recall rate is increased by 1.62%,and the F_(1)value is increased by 2.04%.Compared with the basic MacBERT-BiLSTM-CRF model,the accuracy of the proposed model is increased by 0.43%,the recall rate is increased by 1.67%,and the F_(1)value is increased by 2.38%.

关 键 词:地表覆盖 命名实体识别 小样本 领域关键词 深度学习 预训练语言模型 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象