基于层级多任务BERT的海关报关商品分类算法被引量：3

Customs declaration good classification algorithm based on hierarchical multi-task BERT

作　　者：阮启铭过弋[1,2,3] 郑楠王业相 RUAN Qiming;GUO Yi;ZHENG Nan;WANG Yexiang(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;National Engineering Laboratory for Big Data Distribution and Exchange Technologies-Business Intelligence and Visualization Research Center,Shanghai 200436,China;Shanghai Engineering Research Center of Big Data&Internet Audience,Shanghai 200072,China)

机构地区：[1]华东理工大学信息科学与工程学院,上海200237 [2]大数据流通与交易技术国家工程实验室-商业智能与可视化技术研究中心,上海200436 [3]上海大数据与互联网受众工程技术研究中心,上海200072

出　　处：《计算机应用》2022年第1期71-77,共7页journal of Computer Applications

基　　金：上海市科学技术委员会科研计划项目(17DZ1101003,180Z2252300)。

摘　　要：海关商品申报场景下,需采用分类模型将商品归类为统一的海关(HS)编码。然而现有海关商品分类模型忽略了待分类文本中词语的位置信息,同时HS编码数以万计,会导致类别向量稀疏、模型收敛速度慢等问题。针对上述问题,结合真实业务场景下人工逐层归类策略,充分利用HS编码的层次结构特点,提出了一种基于层级多任务BERT(HM-BERT)的分类模型。一方面通过BERT模型的动态词向量获取了报关商品文本中的位置信息,另一方面利用HS编码不同层级的类别信息对BERT模型进行多任务训练,以提高归类的准确性和收敛性。在国内某报关服务商2019年的报关数据集上进行的所提模型的有效性验证,相比BERT模型,HM-BERT模型的准确率提高了2个百分点,在模型训练速度上也有所提升;与同样分层级的H-fastText相比,准确率提高了7.1个百分点。实验结果表明,HM-BERT模型能有效改善海关报关商品的分类效果。In the customs good declaration scenarios,a classification model needs to be used to categorize the goods into uniform Harmonized System(HS)codes.However,the existing customs good classification models ignore the location information of words in the text to be classified,while the HS codes are in tens of thousands,which leads to problems such as class vector sparsity and slow convergence of the model.To address the above problems,a classification model based on Hierarchical Multi-task Bidirectional Encoder Representation from Transformers(HM-BERT)was proposed by combining the manual hierarchical classification strategy in real business scenarios and making full use of the hierarchical structure feature of HS codes.In one aspect,the dynamic word vector of Bidirectional Encoder Representation from Transformers(BERT)model was used to obtain the location information in the text of customs declaration goods.In other aspect,the accuracy and convergence of categorization were improved by making full use of the category information of different levels of HS codes to perform multi-task training of BERT model.In the effectiveness verification of the proposed model on the 2019 customs declaration dataset of a domestic customs service provider,HM-BERT model improves 2 percentage points in accuracy with faster training speed compared to BERT model,and improves 7.1 percentage points in accuracy compared with H(Hierarchical)-fastText.Experimental results show that HM-BERT model can effectively improve the classification effect of customs declaration goods.

关键词：海关编码多任务学习文本分类 BERT 向量稀疏

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于层级多任务BERT的海关报关商品分类算法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于层级多任务BERT的海关报关商品分类算法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于层级多任务BERT的海关报关商品分类算法被引量：3