检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙士琦 汤鲲 SUN Shiqi;TANG Kun(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074,China;Fiber Home Starry Sky Co.,Ltd.,Nanjing 210000,China)
机构地区:[1]武汉邮电科学研究院,湖北武汉430074 [2]南京烽火星空通信发展有限公司,江苏南京210000
出 处:《电子设计工程》2021年第9期155-159,共5页Electronic Design Engineering
摘 要:针对传统中文地址分词工作中存在的准确率差,识别率低的问题,提出了一种基于BERT的中文地址分词方法。同时,将非行政级别的地址标签进行重新设计,并通过构建BERT-BiLSTM-CRF模型,将中文地址分词任务转换为命名实体识别任务。利用大量全国地址数据对BERT进行训练,获取文本抽象特征;利用双向长短时记忆网络将文本序列化并结合上下文进一步获取文本特征;通过条件随机场获取最优序列,提取出正确的地址级别。该方法在所使用训练数据集上取得了98.21%的精确率和98.23的F1值,证明了该方法的有效性。In order to solve the problems of poor accuracy and low recognition rate in traditional Chinese address segmentation,a Chinese address segmentation method based on BERT is proposed. At the same time,the non-administrative address label is redesigned,and the Chinese address segmentation task is transformed into named entity recognition task by constructing the BERT-BiLSTM-CRF model. A large number of national address data are used to train BERT to obtain the abstract features of the text. The bidirectional long short-term memory network is used to serialize the text and further obtain the text features in combination with the context. The optimal sequence is obtained through conditional random fields to extract the correct address level. The accuracy of 98.21% and the F1 value of 98.23 are obtained in the training data set,which proves the effectiveness of this method.
关 键 词:BERT 中文地址分词 长短时记忆网络 条件随机场 命名实体识别
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.42