基于大语言模型和数据增强的通感隐喻分析

Synaesthesia metaphor analysis based on large language model and data augmentation

作　　者：盛坤王中卿[1] SHENG Kun;WANG Zhongqing(School of Computer Science and Technology,Soochow University,Suzhou Jiangsu 215006,China)

机构地区：[1]苏州大学计算机科学与技术学院,江苏苏州215006

出　　处：《计算机应用》2025年第3期794-800,共7页journal of Computer Applications

基　　金：国家自然科学基金资助项目(62076175,61976146)。

摘　　要：中文通感隐喻分析任务是隐喻领域的一个特定细分任务。由于通感语料中感觉词的分布不均匀,中文通感隐喻数据集存在数据稀疏的问题。为解决这一问题,利用真实训练数据中的稀疏感觉词数据作为提示,并使用大语言模型生成额外的合成样本进行数据增强。为避免合成数据的引入造成的额外噪声影响模型性能,构建基于大语言模型的数据增强框架,并采用评分机制和标签误差优化机制减小合成数据和真实数据之间的分布差异。实验结果表明,所提框架可以生成高质量的合成数据来扩充数据集,在感觉词抽取和感觉领域分类任务上的总体F1值达到68.5%,比仅使用真实训练数据的基线模型T5(Text-To-Text Transfer Transformer)提升了2.7个百分点。Task of Chinese synaesthesia metaphor analysis is a specific subtask in metaphor domain.The uneven distribution of sensory words in synaesthesia corpora leads to data sparsity in the Chinese synaesthesia metaphor datasets.To address this issue,sparse sensory word data from real training data were used as prompts,and additional synthetic samples were generated by large language model for data augmentation.To avoid additional noise caused by introduced synthetic data from affecting model performance,a data augmentation framework based on large language model was constructed.Besides,a scoring mechanism and a label error optimization mechanism were applied to reduce the distribution differences between synthetic and real data.Experimental results show that the proposed framework can generate high-quality synthetic data to expand the dataset,and achieves an overall F1 value of 68.5%in sensory word extraction and sensory domain classification tasks,which is 2.7 percentage point improved compared to the baseline model T5(Text-To-Text Transfer Transformer)trained only on real training data.

关键词：大语言模型数据增强通感隐喻数据稀疏数据合成

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大语言模型和数据增强的通感隐喻分析

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大语言模型和数据增强的通感隐喻分析

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索