检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:阮启铭 过弋[1,2,3] 郑楠 王业相 RUAN Qiming;GUO Yi;ZHENG Nan;WANG Yexiang(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;National Engineering Laboratory for Big Data Distribution and Exchange Technologies-Business Intelligence and Visualization Research Center,Shanghai 200436,China;Shanghai Engineering Research Center of Big Data&Internet Audience,Shanghai 200072,China)
机构地区:[1]华东理工大学信息科学与工程学院,上海200237 [2]大数据流通与交易技术国家工程实验室-商业智能与可视化技术研究中心,上海200436 [3]上海大数据与互联网受众工程技术研究中心,上海200072
出 处:《计算机应用》2022年第1期71-77,共7页journal of Computer Applications
基 金:上海市科学技术委员会科研计划项目(17DZ1101003,180Z2252300)。
摘 要:海关商品申报场景下,需采用分类模型将商品归类为统一的海关(HS)编码。然而现有海关商品分类模型忽略了待分类文本中词语的位置信息,同时HS编码数以万计,会导致类别向量稀疏、模型收敛速度慢等问题。针对上述问题,结合真实业务场景下人工逐层归类策略,充分利用HS编码的层次结构特点,提出了一种基于层级多任务BERT(HM-BERT)的分类模型。一方面通过BERT模型的动态词向量获取了报关商品文本中的位置信息,另一方面利用HS编码不同层级的类别信息对BERT模型进行多任务训练,以提高归类的准确性和收敛性。在国内某报关服务商2019年的报关数据集上进行的所提模型的有效性验证,相比BERT模型,HM-BERT模型的准确率提高了2个百分点,在模型训练速度上也有所提升;与同样分层级的H-fastText相比,准确率提高了7.1个百分点。实验结果表明,HM-BERT模型能有效改善海关报关商品的分类效果。In the customs good declaration scenarios,a classification model needs to be used to categorize the goods into uniform Harmonized System(HS)codes.However,the existing customs good classification models ignore the location information of words in the text to be classified,while the HS codes are in tens of thousands,which leads to problems such as class vector sparsity and slow convergence of the model.To address the above problems,a classification model based on Hierarchical Multi-task Bidirectional Encoder Representation from Transformers(HM-BERT)was proposed by combining the manual hierarchical classification strategy in real business scenarios and making full use of the hierarchical structure feature of HS codes.In one aspect,the dynamic word vector of Bidirectional Encoder Representation from Transformers(BERT)model was used to obtain the location information in the text of customs declaration goods.In other aspect,the accuracy and convergence of categorization were improved by making full use of the category information of different levels of HS codes to perform multi-task training of BERT model.In the effectiveness verification of the proposed model on the 2019 customs declaration dataset of a domestic customs service provider,HM-BERT model improves 2 percentage points in accuracy with faster training speed compared to BERT model,and improves 7.1 percentage points in accuracy compared with H(Hierarchical)-fastText.Experimental results show that HM-BERT model can effectively improve the classification effect of customs declaration goods.
关 键 词:海关编码 多任务学习 文本分类 BERT 向量稀疏
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7