检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:文飞 WEN Fei(ZhongZhuoxin(Beijing)Technology Co.,Ltd.,Beijing 100085,China)
机构地区:[1]中卓信(北京)科技有限公司,北京100085
出 处:《智能计算机与应用》2024年第6期88-94,共7页Intelligent Computer and Applications
摘 要:本文专注于探索与实践中文文本分类技术的演进,通过严谨的实证对比研究,检验了传统技术方法与基于大模型的先进算法在各类文本分类任务中的表现差异。研究在涵盖情感分析的基础数据集和富含复杂专业信息的多类别文本数据集上展开了深入探索,系统性地对比了传统统计学习方法、经典深度学习算法与当前极具影响力的预训练大模型(如BERT、LLM等)。研究核心围绕提升分类准确性这一关键目标,同时审视各模型在资源效率及训练时效性方面的能力。针对预训练大模型,利用了提示工程技术和模型微调手段,以期优化其性能表现。实验结果揭示了大模型在理解和利用语言上下文、提高泛化性能方面的显著优势,在不同数据集、验证集上普遍能降低10%以上的错误率,同时证实了在特定情境下传统技术依然具备独特且有效的应用价值。通过系统化的对比分析,本文旨在为中文文本分类技术的科学选型及未来发展方向提供有力依据与导向。This paper focuses on exploring and practicing the evolution of Chinese text performance differences between traditional methods and advanced algorithms based on large models across various text classification tasks.The paper delves into extensive investigations across foundational datasets for sentiment analysis and multi-class text datasets laden with intricate professional information,systematically comparing traditional statistical learning approaches,classical deep learning algorithms,and the currently influential pre-trained large models such as BERT and LLMs.Central to the proposed research is the enhancement of classification accuracy,while concurrently assessing the resource efficiency and training time effectiveness of each model.With respect to pretrained large models,the paper employs prompt engineering techniques and model fine-tuning strategies to optimize their performance.The proposed experimental outcomes vividly demonstrate the substantial advantages of large models in understanding and leveraging linguistic context,thereby boosting generalization capabilities,universally reduces the error rate by more than 10%across diverse datasets and validation sets.Meanwhile,the proposed findings confirm the unique and effective application value of conventional techniques under specific scenarios.Through systematic comparative analyses,this study aims to provide strong evidence and direction for the scientific selection and future development path of Chinese text classification technologies.
关 键 词:文本分类 BERT 预训练大语言模型 提示工程 微调 小样本学习
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38