检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙泽辰 肖义胜 李俊涛 张民[1] 周国栋[1] SUN Ze-Chen;XIAO Yi-Sheng;LI Jun-Tao;ZHANG Min;ZHOU Guo-Dong(School of Computer Science and Technology,Soochow University,Suzhou 215008,China)
机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215008
出 处:《软件学报》2025年第4期1604-1619,共16页Journal of Software
基 金:国家自然科学基金(62206194);江苏省自然科学基金(BK20220488)。
摘 要:先前的预训练语言模型已在众多自然语言理解任务中展现了其卓越的性能.然而,它们常表现出捷径学习的问题,即学习了非鲁棒性特征与标签之间的虚假关联,导致模型在不同于训练分布的测试场景中的泛化能力不佳.近期,生成式预训练大模型在理解任务中的出色表现引起了广泛的关注,但它们是否受到捷径学习的影响尚未被充分研究.以LLaMA系列模型与FLAN-T5模型为代表,探究生成式预训练大模型在多个自然语言理解任务中的捷径学习现象.研究结果表明,近期流行的生成式大模型仍然存在捷径学习的问题.进而,提出针对生成式预训练大模型的捷径学习问题的缓解策略——基于可控性解释的混合数据增强框架.该框架以数据为中心,基于模型生成的可控性解释数据与部分原始提示性数据构造小规模混合数据集,开展模型微调.在3个具有代表性的自然语言理解任务中的大量实验结果表明,使用该框架所构造的数据集训练模型能够有效缓解模型的捷径学习问题,提升模型在分布外测试场景中的鲁棒性与泛化能力,同时不牺牲甚至提升模型在分布内测试场景中的性能.代码已公开发布在https://github.com/Mint9996/HEDA.Previous pre-trained language models(PLMs)have demonstrated excellent performance in numerous tasks of natural language understanding(NLU).However,they generally suffer shortcut learning,which means learning the spurious correlations between non-robust features and labels,resulting in poor generalization in out-of-distribution(OOD)test scenarios.Recently,the outstanding performance of generative large language models(LLMs)in understanding tasks has attracted widespread attention,but the extent to which it is affected by shortcut learning has not been fully studied.In this paper,the shortcut learning effect of generative LLMs in three NLU tasks is investigated for the first time using the LLaMA series models and FLAN-T5 models as representatives.The results show that the shortcut learning problem still exists in generative LLMs.Therefore,a hybrid data augmentation framework is proposed based on controllable explanations as a mitigation strategy for the shortcut learning problem in generative LLMs.The framework is data-centric,constructing a small-scale mix dataset composed of model-generated controllable explain data and partial original prompting data for model fine-tuning.The experimental results in three representative NLU tasks show that the framework can effectively mitigate shortcut learning,and significantly improve the robustness and generalization of the model in OOD test scenarios while avoiding sacrifice of or even improving the model performance in in-distribution test scenarios.The solution code is available at https://github.com/Mint9996/HEDA.
关 键 词:捷径学习 生成式预训练语言模型 自然语言理解
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222