基于大语言模型的文本摘要质量评估

Text Summarization Quality Evaluation Based on Large Language Model

作　　者：谭琛瀚贾克斌王浩宇 TAN Chen-Han;JIA Ke-Bin;WANG Hao-Yu(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Laboratory of Advanced Information Network,Beijing 100876,China)

机构地区：[1]北京工业大学信息学部,北京100124 [2]先进信息网络北京实验室,北京100876

出　　处：《计算机系统应用》2025年第2期28-36,共9页Computer Systems & Applications

基　　金：北京市自然科学基金(4212001)。

摘　　要：自动文本摘要是自然语言处理(NLP)领域中的一个重要分支,其主要难点之一是在于如何快速、客观且准确地评估生成摘要的质量.针对现有文本摘要质量评估方法中评估准确度不高、需要参考文本以及计算资源消耗大的问题,本文提出一种基于大语言模型的文本摘要质量评估方法,设计基于思维链原理的提示词构建方法以提高大语言模型在文本摘要质量评估任务上的性能,同时生成思维链数据集并以模型微调的方式对小型大语言模型进行训练,显著降低了计算需求.本文方法首先根据文本摘要的特点确定评估维度,并基于思维链原理(chain of thought,CoT)构建提示词;使用提示词对大型大语言模型进行引导,使其根据摘要样本生成思维链过程与评估结果,同时以此为基础生成思维链数据集;使用生成的思维链数据集对小型大语言模型进行微调训练;最后使用微调后的小型大语言模型完成文本摘要的质量评估任务.本文在Summeval数据集上进行了对比实验与分析,实验结果表明,本评估方法显著提高了小型大语言模型在文本摘要质量评估任务上的评估准确度,实现了一种无需参考文本、评估准确度高、计算需求低、便于部署的文本摘要质量评估方法.Automatic text summarization is an important branch in the field of natural language processing(NLP),and one of its main difficulties lies in how to evaluate the quality of the generated summaries quickly,objectively,and accurately.Given the problems of low evaluation accuracy,the need for reference texts,and the large consumption of computing resources in the existing text summary quality evaluation methods,this study proposes an evaluation method for the quality of text summaries based on large language models.It designs a prompt construction method based on the principle of the chain of thought(CoT)to improve the performance of large language models in the evaluation of text summary quality.At the same time,a chain of thought data set is generated and a small large language model is trained in the way of model fine-tuning,significantly reducing the computing requirements.The proposed method first determines the evaluation dimension according to the characteristics of the text summary and constructs the prompt based on the principle of chain of thought.The prompt is utilized to guide the large language model to generate the chain of thought process and evaluation results based on the summary samples.Accordingly,a chain of thought data set is generated.The generated chain of thought data set is used to fine-tune and train the small large language model.Finally,the study uses the fine-tuned small-scale large language model to complete the quality evaluation of the text summary.Comparative experiments and analyses on the Summeval dataset show that this evaluation method significantly improves the evaluation accuracy of the small-scale large language model in the task of text summary quality evaluation.The study provides a text summary quality evaluation method,which is a method with high evaluation accuracy,low computing requirements,and easy deployment without reference texts.

关键词：文本摘要质量评估大语言模型思维链微调训练

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大语言模型的文本摘要质量评估

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大语言模型的文本摘要质量评估

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索