传统与大模型并举:中文文本分类技术对比研究

Comparative study on traditional and large model-based techniques for Chinese text classification: Leveraging both paradigms

作　　者：文飞 WEN Fei(ZhongZhuoxin(Beijing)Technology Co.,Ltd.,Beijing 100085,China)

机构地区：[1]中卓信(北京)科技有限公司,北京100085

出　　处：《智能计算机与应用》2024年第6期88-94,共7页Intelligent Computer and Applications

摘　　要：本文专注于探索与实践中文文本分类技术的演进,通过严谨的实证对比研究,检验了传统技术方法与基于大模型的先进算法在各类文本分类任务中的表现差异。研究在涵盖情感分析的基础数据集和富含复杂专业信息的多类别文本数据集上展开了深入探索,系统性地对比了传统统计学习方法、经典深度学习算法与当前极具影响力的预训练大模型(如BERT、LLM等)。研究核心围绕提升分类准确性这一关键目标,同时审视各模型在资源效率及训练时效性方面的能力。针对预训练大模型,利用了提示工程技术和模型微调手段,以期优化其性能表现。实验结果揭示了大模型在理解和利用语言上下文、提高泛化性能方面的显著优势,在不同数据集、验证集上普遍能降低10%以上的错误率,同时证实了在特定情境下传统技术依然具备独特且有效的应用价值。通过系统化的对比分析,本文旨在为中文文本分类技术的科学选型及未来发展方向提供有力依据与导向。This paper focuses on exploring and practicing the evolution of Chinese text performance differences between traditional methods and advanced algorithms based on large models across various text classification tasks.The paper delves into extensive investigations across foundational datasets for sentiment analysis and multi-class text datasets laden with intricate professional information,systematically comparing traditional statistical learning approaches,classical deep learning algorithms,and the currently influential pre-trained large models such as BERT and LLMs.Central to the proposed research is the enhancement of classification accuracy,while concurrently assessing the resource efficiency and training time effectiveness of each model.With respect to pretrained large models,the paper employs prompt engineering techniques and model fine-tuning strategies to optimize their performance.The proposed experimental outcomes vividly demonstrate the substantial advantages of large models in understanding and leveraging linguistic context,thereby boosting generalization capabilities,universally reduces the error rate by more than 10%across diverse datasets and validation sets.Meanwhile,the proposed findings confirm the unique and effective application value of conventional techniques under specific scenarios.Through systematic comparative analyses,this study aims to provide strong evidence and direction for the scientific selection and future development path of Chinese text classification technologies.

关键词：文本分类 BERT 预训练大语言模型提示工程微调小样本学习

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

传统与大模型并举:中文文本分类技术对比研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

传统与大模型并举:中文文本分类技术对比研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索