融合空洞卷积和自注意力的民航监管文本分类  

Text Classification of Civil Aviation Supervision Based on Dilated Convolution and Self-Attention

在线阅读下载全文

作  者:王欣[1] 干镞锐 许雅玺[2] 史珂[3] WANG Xin;GAN Zu-rui;XU Ya-xi;SHI Ke(School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China;School of Economics and Management,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China;Institute of Civil Aviation Supervisor Training,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China)

机构地区:[1]中国民用航空飞行学院计算机学院,四川广汉618307 [2]中国民用航空飞行学院经济与管理学院,四川广汉618307 [3]中国民用航空飞行学院民航监察员培训学院,四川广汉618307

出  处:《计算机仿真》2024年第11期53-57,共5页Computer Simulation

摘  要:针对不平衡的短文本数据集的文本分类,提出了一种结合数据增强、空洞卷积和概率稀疏自注意力(ProbSparse SelfAttention)的短文本分类方法。首先,通过Ro Former-Sim解决了样本类别不平衡的问题。其次,在嵌入层中使用Ro BERTa获得字嵌入向量。然后,使用Text RCNN的结构通过特征提取来提取文本中包含的信息。同时,在池化层使用了空洞卷积来防止重要信息的丢失,并使用概率稀疏自注意力来获得不同字嵌入向量的权重。所提出的模型在民航监管事项检查记录数据集上的分类F1值达到96.31%。与其它经典的深度学习算法的对比实验结果表明,上述模型在短文本数据集上应用表现良好。This paper proposes a text classification method for an imbalanced short text dataset,which includes Data Augmentation,Dilated Convolution,and ProbSparse Self-Attention.The proposed method addresses the issue of sample imbalance through Roformer-Sim.Additionally,the character embedding vector is obtained using RoBERTa in the embedding layer,and the structure of TextRCNN is utilized for feature extraction to extract information from the text.At the same time,the Dilated Convolution was used in the pooling layer to prevent the loss of important information and ProbSparse Self-Attention was used to obtain weights for different word embedding vector.The classification F1 value of the proposed model on the Dataset of Inspection Records of Civil Aviation Regulatory Matters reached 96.31%.The comparative experimental results with other classic deep learning algorithms show that the model proposed in this paper performs well in the application of the short text dataset.

关 键 词:不平衡文本 文本分类 数据增强 空洞卷积 概率稀疏自注意力 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象