基于深度学习的中文短文本多标签分类模型  

Multi-label Classification Model of Chinese Short Texts Based on Deep Learning

在线阅读下载全文

作  者:曹珍[1] 郭攀峰 CAO Zhen;GUO Panfeng(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074)

机构地区:[1]武汉邮电科学研究院,武汉430074

出  处:《计算机与数字工程》2024年第6期1809-1814,共6页Computer & Digital Engineering

摘  要:目前,中文短文本因其长度短、结构多样和缺乏上下文等特点,常规多标签分类算法无法对其有效区分。针对以上问题,论文提出一种基于深度学习的中文短文本多标签分类模型CRC-MHA。CRC-MHA模型在文本表示层摒弃常规使用Word2vec进行静态词嵌入的方式,采用BERT对输入句子进行动态词嵌入,借助海量预训练文本的优势更好地表征文本的上下文语义,同时在特征提取层设计了一种结合CNN、RCNN和多头自注意力机制的并行特征提取策略,加强捕捉文本内部的关键特征来提升多标签分类效果。实验结果表明,CRC-MHA模型在评价指标加权平均F1值上较BERT模型提高1.95%,较BERT-CNN模型提高0.42%,较BERT-RCNN模型提高0.34%,验证了模型的有效性。Currently,short Chinese texts cannot be effectively distinguished by conventional multi-label classification algo-rithms due to their short length,diverse structure and lack of context.In view of the above problems,this paper proposes a multi-la-bel classification model CRC-MHA for Chinese short texts based on deep learning.The CRC-MHA model abandons the convention-al way of using Word2vec for static word embedding in the text representation layer,and uses BERT to perform dynamic word em-bedding for the input sentence.With the advantage of massive pre-training text,it can better characterize the contextual semantics of the text.At the same time,it designs a parallel feature extraction strategy combining CNN,RCNN and multi-head self-attention mechanism in the feature extraction layer,which enhances the capture of key features inside the text to improve the multi-label clas-sification effect.The experimental results show that the weighted average F1 value of the evaluation index of the CRC-MHA model is 1.95%higher than that of the BERT model,0.42%higher than that of the BERT-CNN model,and 0.34%higher than that of the BERT-RCNN model,which verifies the effectiveness of the model.

关 键 词:多标签分类 中文短文本 动态词嵌入 特征提取 

分 类 号:TN301.6[电子电信—物理电子学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象