检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄金凤 高岩[1] 徐童[1] 陈恩红[1] HUANG Jin-feng;GAO Yan;XU Tong;CHEN En-hong(School of Computer Science,University of Science and Technology of China,Hefei 230027,China)
机构地区:[1]中国科学技术大学计算机学院,安徽合肥230027
出 处:《工程管理科技前沿》2022年第3期23-30,共8页Frontiers of Science and Technology of Engineering Management
基 金:国家重点研发计划资助项目(2018YFB1402600)。
摘 要:科学技术的飞速发展衍生出海量的科技文档,其有效管理与查询依赖于准确的文档自动化分类。然而,由于学科门类众多且发展各异,导致相关文档数量存在严重的不平衡现象,削弱了分类技术的有效性。虽然相关研究证实预训练语言模型在文本分类任务上能够取得很好的效果,但由于科技文档较强的领域性导致通用预训练模型难以取得良好效果。更重要的是,不同领域积累的文档数量存在显著差异,其不平衡分类问题仍未完善解决。针对上述问题,本文通过引入和改进多种数据增强策略,提升了小样本类别的数据多样性与分类鲁棒性,进而通过多组实验讨论了不同预训练模型下数据增强策略的最佳组合方式。结果显示,本文所提出的技术框架能够有效提升科技文档不平衡分类任务的精度,从而为实现科技文档自动化分类及智能应用奠定了基础。Recent years have witnessed the rapid development of science and technologies,which results in the abundant technical documents.Along this line,automatic classification tools are urgently required to support the management and retrieval of technical documents.Though prior arts have mentioned that the pre-trained models could achieve competitive performance on textual classification tasks,considering the domain-specific characters of technical documents,effectiveness of these pre-trained models might be still limited.Even worse,due to the imbalanced accumulation of documents for different research fields,there exists the severe imbalanced classification issue,which impair the effectiveness of classification tool.To deal with these issues,in this paper,we propose a comprehensive framework,which adapts the multiple data augmentation strategies,for improving the diversity and robustness of document samples in few-shot categories.Moreover,extensive validations have been executed to reveal the most effective combination of data augmentation strategies under different pre-trained models.The results indicate that our proposed framework could effectively improve the performance of imbalanced classification issue,and further support the intelligent services on technical documents.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.133.128.223