检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:薛德军 师庆辉 毕琰虹 芦筱菲 陈婧 王旭 王海山 耿崇 吴晨 XUE Dejun;SHI Qinghui;BI Yanhong;LU Xiaofei;CHEN Jing;WANG Xu;WANG Haishan;GENG Chong;WU Chen(Tongfang Knowledge Network Digital Publishing Technology Co.,Ltd,100192,Beijing,China)
机构地区:[1]同方知网数字出版技术股份有限公司,北京100192
出 处:《数字出版研究》2024年第3期122-132,共11页DIGITAL PUBLISHING RESEARCH
基 金:国家重点研发计划“面向办案的检察机关法律监督知识融合与智能交互关键技术研究”(项目编号:2020YFC0833003);国家卓越行动计划“科技期刊数字化运营国际平台服务项目”(项目编号:WKZB1911BJM501173/02)。
摘 要:在构建高性能大模型时,大规模高质量数据的重要性不容忽视。本研究旨在深入探究这一核心要素,并系统评估其在专业领域中的实际应用效果与潜在价值。本研究基于中国知网大量专业文献,构建了一个包含1316.45亿token的学术资源数据集AcaDS和2700万条指令的下游微调数据集AcaDSI,采用Transformer架构设计并训练了一个70亿参数规模的生成式学术大模型AcaLM-7B。通过实验评测,AcaLM-7B在面向学术研究的6个核心应用场景中获得总积分第一、3个单项第一和2个单项第二,验证了大规模高质量数据资源在构建专业大模型中的核心地位。此外,本研究在数字出版行业具有实际应用价值,有利于提升内容生产效率并优化用户体验。The importance of large-scale and high-quality data is paramount in building highperforming large models.This paper delved into this core element and systematically evaluated its practical application impacts and potential value in the professional field.Based on a large number of professional literature from China National Knowledge Infrastructure(CNKI),this paper constructed an academic resource dataset,AcaDS,containing 131.645 billion tokens and a fine-tuning dataset,AcaDSI,with 27 million instructions.A generative academic large model,AcaLM-7B,with 7 billion parameters was designed and trained using the Transformer architecture.Through experimental evaluation,AcaLM-7B achieved the first place in total score and the first place in three individual categories and the second place in two individual categories in six core application scenarios for academic research,demonstrating excellent per formance and verif ying the core position of large-scale and highquality data resources in building professional large models.In addition,this paper facilitated the improvement of content production efficiency and optimization of user experience,and thus had practical application value in the digital publishing industry.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49