大语言模型的中文文本简化能力研究

A Study on the Evaluation of Large Language Models’Capabilities in Chinese Text Simplification

作　　者：杨尔弘[1] 朱君辉朱浩楠宗绪泉杨麟儿 Yang Erhong;Zhu Junhui;Zhu Haonan;Zong Xuquan;Yang Lin’er

机构地区：[1]北京语言大学国家语言资源监测与研究平面媒体中心/信息科学学院,北京100083

出　　处：《语言战略研究》2024年第5期34-47,共14页Chinese Journal of Language Policy and Planning

基　　金：国家语委重大科研项目“大语言模型的评测技术和方法研究”(ZDA145-17)。

摘　　要：大语言模型为自动文本简化提供了新思路。为了探究大语言模型的中文文本简化能力,本研究构建了中文篇章级文本简化数据集,对其中的平行文本对进行了特征分析;在此基础上,设计大语言模型自动文本简化实验,采用零样本、少样本、少样本+词表和少样本+规则这4种提示策略,综合已有的和本研究特有的语言特征评估指标,测评了6种国内外常用大语言模型在不同提示策略下的中文文本简化能力。研究发现,少样本提示策略在文本特征上表现最佳,显著提高了信息保存度;在提示中加入外部词表,有助于大语言模型使用相对简单的词语;在提示中融入简化规则,能使大语言模型使用更简洁的句法结构。不同的大语言模型在难度控制和语义保留程度上各有优势与局限,但在语篇衔接与连贯和段落划分上与人类专家存在明显差距,且均出现了不同程度的幻觉现象。未来仍需构建较大规模的高质量中文简化数据集,多角度诱导语言大模型的文本简化能力。Large language models(LLMs)off er new approaches for automatic text simplifi cation.To explore the capabilities of LLMs in simplifying Chinese texts,this study constructed a Chinese passage-level text simplifi cation dataset and conducted a feature analysis of the parallel text pairs within it.Based on this,an experiment was designed to assess the automatic text simplification performance of LLMs using four prompting strategies:zero-shot,few-shot,few-shot with lexicon,and fewshot with rules.The study evaluated the performance of six commonly used domestic and international LLMs in Chinese text simplification under different prompting strategies,utilizing a combination of existing and study-specific linguistic feature evaluation metrics.The findings revealed that the few-shot prompting strategy performed best in terms of text features,signifi cantly enhancing information retention.Incorporating external lexicons in the prompts helped the LLMs use relatively simpler words,while integrating simplifi cation rules enabled the LLMs to employ more concise syntactic structures.Diff erent LLMs exhibited distinct strengths and limitations in controlling complexity and preserving semantics,but all showed a noticeable gap compared to human experts in discourse cohesion,coherence,and paragraph segmentation,with varying degrees of hallucination also observed.Future research should focus on constructing larger-scale,high-quality Chinese simplifi cation datasets and exploring multi-faceted approaches to enhance the text simplifi cation capabilities of LLMs.

关键词：中文文本简化大语言模型语言特征分析

分类号：H002[语言文字—语言学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大语言模型的中文文本简化能力研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大语言模型的中文文本简化能力研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索