AI预训练大模型发展综述  被引量:10

Overview of the Development of AI Pre-trained Large Models

在线阅读下载全文

作  者:蔡睿[1,2,3] 葛军[1,2,3] 孙哲[1,2,3] 胡冰[1,2,3] 徐玉华[1,2,3] 孙知信[1,2,3] CAI Rui;GE Jun;SUN Zhe;HU Bing;XU Yu-hua;SUN Zhi-xin(Post Big Data Technology and Application Engineering Research Center of Jiangsu Province,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;Post Industry Technology Research and Development Center of the State Posts Bureau(Internet of Things Technology),Nanjing University of Posts and Telecommunications,Nanjing 210003,China;Key Lab of Broadband Wireless Communication and Sensor Network Technology,Ministry of Education,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)

机构地区:[1]南京邮电大学江苏省邮政大数据技术与应用工程研究中心,南京210003 [2]南京邮电大学国家邮政局邮政行业技术研发中心(物联网技术),南京210003 [3]南京邮电大学宽带无线通信与传感网技术教育部重点实验室,南京210003

出  处:《小型微型计算机系统》2024年第10期2327-2337,共11页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(62272239,61972208)资助;国家自然科学基金项目(青年项目)(62302237)资助.

摘  要:本文首先介绍了AI预训练大模型相关的部分核心技术,其中包括Transformer架构和人类反馈强化学习技术以及近端策略优化技术;研究了通用大模型的发展,重点关注了基于Transformer-Decoder架构的GPT系列、LLaMA系列模型与基于Transformer-Encoder架构的BERT、ALBERT、DeBERTa与RoBERTa模型,深入研究了它们的架构和训练方法,总结了它们的特点,探讨了其在不同领域中的应用;关注了垂直领域的大模型发展,如金融、医学、法学、自然科学和代码编程等领域.在金融领域,研究了BloombergGPT、GPT-InvestAR和TradingGPT模型;在医学领域,探讨了Med-PaLM和PMC-LLaMA等模型;在法学领域,分析了Lawformer和Chatlaw模型;在自然科学领域,介绍了华为云盘古气象大模型和FLUID-GPT模型;在代码编程领域,研究了CodeGeex和PanGu-Coder2模型.最后,对当前AI预训练大模型在知识产权、歧视、成本等方面的局限性与未来发展进行了讨论.The paper firstly introduces the core technologies related to AI pre-training large models,including Transformer architecture,human feedback reinforcement learning technology,and proximal policy optimization technology.Then,it studies the development of general large models,focusing on the GPT series,LLaMA series models based on the Transformer-Decoder architecture,and the BERT,ALBERT,DeBERTa,and RoBERTa models based on the Transformer-Encoder architecture.The paper deeply analyzes their architectures and training methods,summarizes their characteristics,and discusses their applications in different fields.It also pays attention to the development of large models in vertical fields such as finance,medicine,law,natural science,and code programming.In the finance field,the paper studies the models such as BloombergGPT,GPT-InvestAR,and TradingGPT;in the medical field,it explores the models like Med-PaLM and PMC-LLaMA;in the legal field,it analyzes the models like Lawformer and Chatlaw;in the natural science field,it introduces the Pangu-Weather and the FLUID-GPT;in the code programming field,it studies the models like CodeGeex and PanGu-Coder2.Finally,the paper discusses the limitations and future development of current AI pre-training large models in terms of intellectual property,discrimination,and cost.

关 键 词:人工智能 AI大模型 通用大模型 垂直大模型 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象