检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈进东 胡超 郝凌霄 曹丽娜 CHEN Jin-dong;HU Chao;HAO Ling-xiao;CAO Li-na(School of Economics and Management,Beijing Information Science and Technology University,Beijing 100192,China;Beijing International Science and Technology Cooperation Base for Intelligent Decision and Big Data Application,Beijing 100192,China;Computer School,Beijing Information Science and Technology University,Beijing 100192,China;School of Economics and Management,North China Institute of Science and Technology,Langfang 065201,China)
机构地区:[1]北京信息科技大学经济管理学院,北京100192 [2]智能决策与大数据应用北京市国际科技合作基地,北京100192 [3]北京信息科技大学计算机学院,北京100192 [4]华北科技学院经济管理学院,廊坊065201
出 处:《科学技术与工程》2024年第34期14754-14764,共11页Science Technology and Engineering
基 金:国家重点研发计划(2019YFB1405303);北京市属高等学校优秀青年人才培育计划项目(BPHR202203233);国家自然科学基金(72174018)。
摘 要:识别服装质量抽检通告中的实体信息,对于评估不同区域的服装质量状况以及制定宏观政策具有重要意义。针对质量抽检通告命名实体识别存在的长文本序列信息丢失、小类样本特征学习不全等问题,以注意力机制为核心,提出了基于BERT(bidirectional encoder representations from transformers)和TENER(transformer encoder for NER)模型的领域命名实体识别模型。BERT-TENER模型通过预训练模型BERT获得字符的动态字向量;将字向量输入TENER模块中,基于注意力机制使得同样的字符拥有不同的学习过程,基于改进的Transformer模型进一步捕捉字符与字符之间的距离和方向信息,增强模型对不同长度、小类别文本内容的理解,并采用条件随机场模型获得每个字符对应的实体标签。在领域数据集上,BERT-TENER模型针对服装抽检领域的实体识别F_1达到92.45%,相较传统方法有效提升了命名实体识别率,并且在长文本以及非均衡的实体类别中也表现出较好的性能。Recognizing entity information in clothing quality sampling notice is important for assessing the quality status of clothes in different regions as well as formulating macro policies.Aiming at the problems of loss of information for long text sequences,and incomplete feature learning of small class samples in named entity recognition for clothing quality sampling notice.With the focus on the attention mechanism,a domain named entity recognition model based on the BERT(bidirectional encoder representations from transformers)and TENER(transformer encoder for NER)model was proposed.The dynamic character vectors of characters were obtained by the pre-training model BERT.These character vectors were input into the TENER module,which made the same characters undergo different learning processes based on the attention mechanism.The distance and direction information between characters were further captured,enhancing the model's understanding of the text content of different lengths and small categories.The conditional random field model was used to obtain the entity label corresponding to each character.On the domain dataset,the entity recognition F 1 of the BERT-TENER model for the clothing sampling domain reaches 92.45%.This model has not demonstrated applicability in other areas.The model effectively improves the named entity recognition rate compared with the traditional methods,and also shows better performance in long text as well as unbalanced entity categories.
关 键 词:命名实体识别 服装质量抽检通告 BERT(Bidirectional encoder representations from transformers) TENER(transformer encoder for NER)
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7