检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周昆 朱余韬 陈志朋 毛科龙 陈文通 陈昱硕 孙一丁 曹乾 王磊[2] 张蕾[2] 庞新程 谢曙方 赵鑫[2] 窦志成[2] 林衍凯 毛佳昕 宋睿华 陈旭[2] 徐君[2] 胡迪 严睿 黄文炳 魏哲巍 文继荣[1,2] ZHOU Kun;ZHU Yu-Tao;CHEN Zhi-Peng;MAO Ke-Long;CHEN Wen-Tong;CHEN Yu-Shuo;SUN Yi-Ding;CAO Qian;WANG Lei;ZHANG Lei;PANG Xin-Cheng;XIE Shu-Fang;ZHAO Xin;DOU Zhi-Cheng;LIN Yan-Kai;MAO Jia-Xin;SONG Rui-Hua;CHEN Xu;XU Jun;HU Di;YAN Rui;HUANG Wen-Bing;WEI Zhe-Wei;WEN Ji-Rong(School of Information,Renmin University of China,Beijing 100872;Gaoling School of Artificial Intelligence,Renmin University of China,Beijing 100872)
机构地区:[1]中国人民大学信息学院,北京100872 [2]中国人民大学高瓴人工智能学院,北京100872
出 处:《计算机学报》2025年第1期1-18,共18页Chinese Journal of Computers
基 金:国家自然科学基金(62222215,U2001212);北京市自然科学基金(4222027)资助。
摘 要:近年来,大语言模型已成为研究热点。其在大规模数据上预训练之后,具有强大的少样本和零样本上下文学习能力,能够便捷地用于许多真实场景复杂任务。然而,对大语言模型进行从头到尾的开发和训练,可参考的实现较少;且存在较难习得的知识,如长尾知识相关数据、复杂指令、难区分的负例等。为填补该领域空白,并强化对较难掌握数据的学习,本文提出了多阶段的课程学习方法,针对以上三种典型数据,使用了:(1)迭代增强长尾知识的预训练课程;(2)由简单到复杂的指令微调课程;(3)由易到难的人类对齐课程,完成了YuLan-Chat从头开始的整个训练流程。本文在四个与大语言模型基础能力和人类对齐能力相关的中英文评测基准上对YuLan-Chat进行评测,结果表明该模型能够在大部分场景下优于基线模型。分析实验进一步表明了该课程学习方法在GAOKAO和AlignBench评测基准上,能够分别提升模型9.7%和18.9%的答案预测准确率。In recent years,large language models(LLMs)have become a research hotspot in the field of natural language processing.After pre-training on large-scale data,these models exhibit strong few-shot and zero-shot in-context learning capabilities,making them highly applicable to complex tasks in real-world scenarios.However,there are limited references available for the training of LLMs from scratch.Moreover,there are challenges throughout the training process,such as learning the data with long-tail knowledge,complex instructions,and indistinguishable negatives.In this work,we propose a multi-stage curriculum learning approach,which tackles the aforementioned challenges using three curriculum learning strategies respectively:(1)an iterative curriculum for enhancing long-tail knowledge in pre-training,(2)a simple-to-complex curriculum for instruction tuning,and(3)an easy-to-difficult curriculum for human alignment learning.We employ these curricula to sequentially train our YuLan-Chat from scratch.Four Chinese and English benchmark datasets are used to evaluate YuLan-Chat.The results demonstrate that YuLan-Chat can mostly outperform baseline LLMs.Furthermore,our experiments show that the three-stage curriculum can yield performance improvements of 9.7%(GAOKAO)in pretraining,22.2% and 18.9%(AlignBench)in instruction tuning and human alignment stages,respectively.
关 键 词:大语言模型 课程学习 预训练 指令微调 人类对齐
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90