面向开源科技情报分析的智能文本分类方法研究  

Intelligent Text Classification Method for Open-Source Technology Intelligence Analysis

在线阅读下载全文

作  者:彭鹏 徐红姣[1] PENG Peng;XU HongJiao(Institute of Scientific and Technical Information of China,Beijing 100038,P.R.China)

机构地区:[1]中国科学技术信息研究所,北京100038

出  处:《数字图书馆论坛》2025年第2期65-72,共8页Digital Library Forum

摘  要:随着网络信息的爆发式增长,从海量的网络文本信息中识别有价值的科技情报并对其进行智能分类成为开源科技情报分析的关键。针对开源科技情报文本的特点,构建了面向开源科技情报分析的文本智能去噪与分类一体化模型。结合大语言模型与提示工程的自动标注方法进行噪声数据标注及文本分类数据标注;基于预训练语言模型进行噪声识别与过滤,过滤非科技情报文本;利用多语言预训练模型及蒸馏技术,改进损失函数设计,解决类别分布不均和数据不足的问题,实现在一定程度上提升多标签科技情报文本分类的精度和稳定性的目标。实验结果表明,与TextCNN与BERT方法相比,所提出的方法具有较高的分类性能、更好的鲁棒性和适应性。With the explosive growth of network information,identifying valuable technology intelligence from massive network text information and classifying it intelligently have become the key to open-source technology intelligence analysis.Based on the characteristics of open-source technology intelligence texts,this paper constructs an integrated model of text denoising and classification for open-source technology intelligence analysis.It combines large language model with automatic annotation method of prompt engineering to annotate noise data and text classification data.A pre-trained language model is constructed for noise recognition and filtering,filtering non-technology intelligence texts.Multilanguage pre-trained models and distillation techniques are used to improve the loss function design,solve the problems of uneven class distribution and insufficient data,and achieve the goal of improving the accuracy and stability of multi-label technology intelligence text classification to a certain extent.The experimental results show that compared with TextCNN and BERT methods,the method proposed in this paper has higher classification ability,robustness,and adaptability.

关 键 词:开源科技情报 文本分类 信息过滤 预训练语言模型 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象