Chinese named entity recognition with multi-network fusion of multi-scale lexical information  

在线阅读下载全文

作  者:Yan Guo Hong-Chen Liu Fu-Jiang Liu Wei-Hua Lin Quan-Sen Shao Jun-Shun Su 

机构地区:[1]School of Computer Science,China University of Geosciences,Wuhan,430078,China [2]School of Geography and Information Engineering,China University of Geosciences,Wuhan,430078,China [3]Xining Comprehensive Natural Resources Survey Centre,China Geological Survey(CGS),Xining,810000,China

出  处:《Journal of Electronic Science and Technology》2024年第4期53-80,共28页电子科技学刊(英文版)

基  金:supported by the International Research Center of Big Data for Sustainable Development Goals under Grant No.CBAS2022GSP05;the Open Fund of State Key Laboratory of Remote Sensing Science under Grant No.6142A01210404;the Hubei Key Laboratory of Intelligent Geo-Information Processing under Grant No.KLIGIP-2022-B03.

摘  要:Named entity recognition(NER)is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs.In today’s Chinese named entity recognition(CNER)task,the BERT-BiLSTM-CRF model is widely used and often yields notable results.However,recognizing each entity with high accuracy remains challenging.Many entities do not appear as single words but as part of complex phrases,making it difficult to achieve accurate recognition using word embedding information alone because the intricate lexical structure often impacts the performance.To address this issue,we propose an improved Bidirectional Encoder Representations from Transformers(BERT)character word conditional random field(CRF)(BCWC)model.It incorporates a pre-trained word embedding model using the skip-gram with negative sampling(SGNS)method,alongside traditional BERT embeddings.By comparing datasets with different word segmentation tools,we obtain enhanced word embedding features for segmented data.These features are then processed using the multi-scale convolution and iterated dilated convolutional neural networks(IDCNNs)with varying expansion rates to capture features at multiple scales and extract diverse contextual information.Additionally,a multi-attention mechanism is employed to fuse word and character embeddings.Finally,CRFs are applied to learn sequence constraints and optimize entity label annotations.A series of experiments are conducted on three public datasets,demonstrating that the proposed method outperforms the recent advanced baselines.BCWC is capable to address the challenge of recognizing complex entities by combining character-level and word-level embedding information,thereby improving the accuracy of CNER.Such a model is potential to the applications of more precise knowledge extraction such as knowledge graph construction and information retrieval,particularly in domain-specific natural language processing tasks that require high entity recognition precision.

关 键 词:Bi-directional long short-term memory(BiLSTM) Chinese named entity recognition(CNER) Iterated dilated convolutional neural network(IDCNN) Multi-network integration Multi-scale lexical features 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP391.1[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象