基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别  被引量:23

Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model

在线阅读下载全文

作  者:储德平 万波[1,2] 李红 方芳[1,2] 王润[1,2] Chu Deping;Wan Bo;Li Hong;Fang Fang;Wang Run(School of Geography and Information Engineering,China University of Geosciences,Wuhan 430078,China;National Engineering Research Center of Geographic Information System,Wuhan 430078,China)

机构地区:[1]中国地质大学地理与信息工程学院,湖北武汉430078 [2]国家地理信息系统工程技术研究中心,湖北武汉430078

出  处:《地球科学》2021年第8期3039-3048,共10页Earth Science

基  金:国家重点研发计划项目(No.2016YFB0502300);中国地质调查局项目(No.12120114074001)。

摘  要:地质实体是地质文本中的关键和核心信息,对其准确识别是地质信息提取和挖掘的重要前提.设计了ELMO-CNNBiLSTM-CRF模型,基于预训练字向量构建深层Bi LSTM-CRF神经网络模型,通过添加词语动态特征以及词语字符级别的特征,弥补字向量特异性缺失的问题,提高对于地质文本中复杂多词义的识别水平和对地质实体局部特征的提取能力.以《西藏自治区谢通门县雄村铜矿勘探地质报告》为例,对该模型的性能进行了评估,模型的准确率、召回率和F1值分别为95.15%、95.26%和95.21%.实验表明相比Bi LSTM-CRF和CNN-BiLSTM-CRF模型,该模型在小规模语料地质实体识别方面效果更优,且能够有效识别长地质实体词汇和地质多义词.Geological entity is the key and core information in geological texts, and its accurate recognition is an important prerequisite for geological information extraction and mining. The ELMO-CNN-Bi LSTM-CRF model is designed in this paper.Based on the pre-trained word vector, the deep BiLSTM-CRF neural network model is constructed. By adding dynamic features of words and character-level features of words, it makes up for the lack of specificity of word vectors, improves the recognition level of complex multi-word meanings in geological text and the ability to extract local features of geological entities. Taking the geological survey report of Xiongcun copper mine in Xietongmen County of Xizang Autonomous Region as an example, the performance of the model is evaluated. The accuracy rate, recall rate and F1 value of the model are 95.15%, 95.26% and 95.21%respectively. Experiments show that compared with Bi LSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geological entity recognition, and can effectively identify long geological entity words and geological polysemants.

关 键 词:地质大数据 地质实体 命名实体识别 ELMO-CNN-BiLSTM-CRF 地质文本 数学地质 

分 类 号:P628.4[天文地球—地质矿产勘探]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象