基于BERT的中文地址分词方法被引量：3

Chinese address segment method based on BERT

作　　者：孙士琦汤鲲 SUN Shiqi;TANG Kun(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074,China;Fiber Home Starry Sky Co.,Ltd.,Nanjing 210000,China)

机构地区：[1]武汉邮电科学研究院,湖北武汉430074 [2]南京烽火星空通信发展有限公司,江苏南京210000

出　　处：《电子设计工程》2021年第9期155-159,共5页Electronic Design Engineering

摘　　要：针对传统中文地址分词工作中存在的准确率差,识别率低的问题,提出了一种基于BERT的中文地址分词方法。同时,将非行政级别的地址标签进行重新设计,并通过构建BERT-BiLSTM-CRF模型,将中文地址分词任务转换为命名实体识别任务。利用大量全国地址数据对BERT进行训练,获取文本抽象特征;利用双向长短时记忆网络将文本序列化并结合上下文进一步获取文本特征;通过条件随机场获取最优序列,提取出正确的地址级别。该方法在所使用训练数据集上取得了98.21%的精确率和98.23的F1值,证明了该方法的有效性。In order to solve the problems of poor accuracy and low recognition rate in traditional Chinese address segmentation,a Chinese address segmentation method based on BERT is proposed. At the same time,the non-administrative address label is redesigned,and the Chinese address segmentation task is transformed into named entity recognition task by constructing the BERT-BiLSTM-CRF model. A large number of national address data are used to train BERT to obtain the abstract features of the text. The bidirectional long short-term memory network is used to serialize the text and further obtain the text features in combination with the context. The optimal sequence is obtained through conditional random fields to extract the correct address level. The accuracy of 98.21% and the F1 value of 98.23 are obtained in the training data set,which proves the effectiveness of this method.

关键词：BERT 中文地址分词长短时记忆网络条件随机场命名实体识别

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BERT的中文地址分词方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BERT的中文地址分词方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于BERT的中文地址分词方法被引量：3