基于多任务学习的跨类型文本分类技术研究  

Research on Cross-Type Text Classification Technology Based on Multi-Task Learning

在线阅读下载全文

作  者:宋东桓 胡懋地 丁洁兰[1,2,3] 瞿子皓 常志军 钱力 Song Donghuan;Hu Maodi;Ding Jielan;Qu Zihao;Chang Zhijun;Qian Li(National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Information Resources Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China;Key Laboratory of New Publishing and Knowledge Services for Scholarly Journals,National Press and Publication Administration,Beijing 100190,China)

机构地区:[1]中国科学院文献情报中心,北京100190 [2]中国科学院大学经济与管理学院信息资源管理系,北京100190 [3]国家新闻出版署学术期刊新型出版与知识服务重点实验室,北京100190

出  处:《数据分析与知识发现》2025年第2期12-25,共14页Data Analysis and Knowledge Discovery

基  金:国家重点研发计划项目(项目编号:2022YFF0711900)的研究成果之一。

摘  要:【目的】解决常规文本分类任务中由于领域训练数据稀缺、类型间差异大等因素导致的分类准确率低的问题。【方法】引入深层金字塔卷积网络与多门控制单元机制,构建基于BERT-DPCNN-MMOE模型框架的分类模型,通过设计多任务实验和迁移学习实验,对标8种基线模型,验证本文分类模型的有效性。【结果】自主构建多任务跨类型的数据作为训练测试基础,发现本文模型在多任务实验和迁移学习实验中的分类效果均优于8种基线模型,F1值的提升幅度均超过4.7个百分点。【局限】模型在其他领域的适应性需进一步研究。【结论】基于BERT-DPCNN-MMOE分类模型在多任务、跨类型文本分类任务上能够表现出更优的效果,对未来专题情报分类任务具有重要意义。[Objective]This study addresses the issue of low classification accuracy in conventional text classification tasks due to factors such as sparse domain-specific training data and significant differences between types.[Methods]We constructed a novel classification model based on the BERT-DPCNN-MMOE framework,integrating the deep pyramid convolutional networks with the multi-gate control unit mechanism.Then,we designed multi-task and transfer learning experiments to validate the effectiveness of the new model against eight well-established and innovative models.[Results]This research independently constructed cross-type multi-task data as the basis for training and testing.The BERT-DPCNN-MMOE model outperformed the other eight baseline models in multi-task and transfer learning experiments,with F1 score improvements exceeding 4.7%.[Limitations]Further research is needed to explore the model's adaptability to other domains.[Conclusions]The BERT-DPCNN-MMOE model performs better in multi-task and cross-type text classification tasks.It is of significance for future specialized intelligence classification tasks.

关 键 词:多任务学习 跨类型文本分类 迁移学习 集成学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] G250[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象