考虑特征稀疏特性的短文本命名实体快速识别方法  

Fast recognition method of short text named entities considering feature sparsity

在线阅读下载全文

作  者:马月坤 郝益锋 MA Yue-kun;HAO Yi-feng(College of Artificial Intelligence,North China University of Science and Technology,Tangshan O63210,China;Hebei Provincial Key Laboratory of Industrial Intelligent Perception,North China University of Science and Technology,Tangshan 063210,China;School of Computer&Communication Engineering,University of Science&Technology,Beijing 100083,China;Beijing Key Laboratory of Knowledge Engineering for Materials Science,University of Science&Technology Beijing,Beijing 100083,China)

机构地区:[1]华北理工大学人工智能学院,河北唐山063210 [2]华北理工大学河北省工业智能感知重点实验室,河北唐山063210 [3]北京科技大学计算机与通信学院,北京100083 [4]北京科技大学材料领域知识工程北京市重点实验室,北京100083

出  处:《吉林大学学报(工学版)》2023年第12期3529-3535,共7页Journal of Jilin University:Engineering and Technology Edition

基  金:中央高校基本科研业务费项目(FRF-DF-20-04);河北省三三三人才项目(A201803083)。

摘  要:首先,通过过滤标点符号选择适当的特征,并构建向量,分割两个及两个以上词语组成特定语义,标注词性,找出相对词类。其次,利用潜在狄利克雷分配(LDA)模型令话题与文档间存在相关性,明确文档主题,降低数据特征稀疏特性。再次,本文双向长短期记忆网络条件随机场(BR-BiLSTM-CRF)模型通过双向LSTM模型检测文本命名实体的边界,与链式条件随机场层的输出实体类型相结合,增加了词汇和词类的特征,实现对文本整体序列实体边缘的检测。最后,采用交叉熵和梯度下降修正网络参数,直至误差不超过指定数值,实现文本命名实体的识别。实验结果表明:本文方法识别速度快、精度高、整体性能强;该方法能够更好地通过计算机识别语言明确文本词性,提高命名实体识别的准确性和效率。The proposed method selects appropriate features by filtering punctuation marks,constructs vectors,segments two or more words to form specific semantics,and labels parts of speech to identify relative parts of speech;Utilizing the Latent dirichlet allocation(LDA)model to establish correlation between topics and documents,clarify document topics,and reduce data feature sparsity;The Bidirectional long short-term memory-conditional random field(BR-BiLSTM-CRF)model detects the boundaries of text named entities through a bidirectional LSTM model,which is combined with the output entity types of the chain conditional random field layer.After adding features of vocabulary and parts of speech,the overall sequence entity edge of the text is detected.The network parameters are corrected using cross entropy and gradient descent until the error does not exceed the specified value,achieving text named entity recognition.Through experiments,it has been proven that the proposed method has fast recognition speed,high accuracy,and strong overall performance.The proposed method can better recognize language through computers,clarify the part of speech of text,and improve the accuracy and efficiency of named entity recognition.

关 键 词:自然语言处理 特征稀疏特性 短文本命名 短文本实体快速识别 文本预处理 特性权重 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象