基于Attention-BIGRU-CRF的中文分词模型  被引量:2

Chinese Word Segmentation Model Based on Attention-BIGRU-CRF

在线阅读下载全文

作  者:周慧 徐名海[1] 许晓东 ZHOU Hui;XU Ming-hai;XU Xiao-dong(College of Communication and Information Engineering,Nanjing University of Post and Telecommunication,Nanjing 210003,China)

机构地区:[1]南京邮电大学通信与信息工程学院,江苏南京210003

出  处:《计算机与现代化》2022年第8期7-12,19,共7页Computer and Modernization

摘  要:自然语言处理是人工智能发展的重要分支,而中文分词是自然语言处理的第一步,提高中文分词的效率可以提高自然语言处理的结果的准确性。因此提出一种Attention-BIGRU-CRF模型,首先将中文文本通过词向量转换,将文本转换成向量的形式,再利用BIGRU进行序列化学习,随后引入attention机制将BIGRU的输入和输出进行相关性计算获取更精确向量值,最后将该向量值与BIGRU序列化得到的向量值进行拼接作为CRF层的输入并得到标签预测结果。由仿真结果可知,Attention-BIGRU-CRF模型在人民日报2014和MSRA的语料库得到的F1值分别为97.34%和98.25%,处理文本的分词速率为248.1 KB/s。故融合attention机制和BIGRU-CRF网络的模型既能够提高分词准确率,又能提高分词时间和效率。Natural language processing is an important branch of the development of artificial intelligence,and Chinese word segmentation is the first step of natural language processing.Improving the efficiency of Chinese word segmentation can improve the accuracy of the results of natural language processing.Therefore,an Attention-BIGRU-CRF-CRF model is proposed in this paper.Firstly,the Chinese text is transformed into vector form through word vector conversion,and then the BIGRU is used for serialization learning.Then,the attention mechanism is introduced to calculate the correlation between the input and output of BIGRU to obtain more accurate vector values,Finally,the vector value is spliced with the vector value serialized by BIGRU as the input of CRF layer,and the label prediction result is obtained.The simulation results show that the F1 values of Attention-BIGRU-CRF model in the corpus of people’s daily 2014 and MSRA are 97.34%and 98.25%respectively,and the word segmentation rate of processed text is 248.1 KB/s.Therefore,the model integrating attention mechanism and BIGRU-CRF network can not only improve the accuracy of word segmentation,but also improve the time efficiency of word segmentation.

关 键 词:自然语言处理 双向门控循环单元 条件随机场 注意力机制 中文分词 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象