大语言模型安全性:分类、评估、归因、缓解、展望

A survey on the safety of large language model:classification,evaluation,attribution,mitigation and prospect

作　　者：黄河燕[1] 李思霖兰天伟邱昱力柳泽明姚嘉树曾理单赢宇施晓明郭宇航[1] HUANG Heyan;LI Silin;LAN Tianwei;QIU Yuli;LIU Zeming;YAO Jiashu;ZENG Li;SHAN Yingyu;SHI Xiaoming;GUO Yuhang(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;School of Computer Science and Engineering,Beihang University,Beijing 100191,China;Research Center for Social Computing and Information Retrieval,Harbin Institute of Technology,Harbin 150001,China)

机构地区：[1]北京理工大学计算机学院,北京100081 [2]北京航空航天大学计算机学院,北京100191 [3]哈尔滨工业大学计算机学院社会计算与信息检索研究中心,黑龙江哈尔滨150001

出　　处：《智能系统学报》2025年第1期2-32,共31页CAAI Transactions on Intelligent Systems

基　　金：国家自然科学基金项目(U21B2009);科技创新2030-“新一代人工智能”重大项目(2020AAA0106601).

摘　　要：大语言模型能够在多个领域及任务上给出与人类水平相当的解答,并且在未经训练的领域和任务上展现了丰富的涌现能力。然而,目前基于大语言模型的人工智能系统存在许多安全性隐患,例如大语言模型系统容易受到难以被察觉的攻击,模型生成的内容存在违法、泄密、仇恨、偏见、错误等问题。并且在实际应用中,大语言模型可能被滥用,生成的内容可能引起国家、人群和领域等多个层面的困扰。本文旨在深入探讨大语言模型面临的安全性风险并进行分类,回顾现有的评估方法,研究安全性风险背后的因果机制,并总结现有的解决措施。具体而言,本文明确了大语言模型面临的10种安全性风险,并将其归类为模型自身安全性风险与生成内容的安全性风险两个方面,并对每种风险进行了详细的分析和讲解。此外,本文还从生命周期和危害程度两个角度对大语言模型的安全风险进行了系统化的分析,并介绍了现有的大语言模型安全风险评估方法、大语言模型安全风险的出现原因以及相应的缓解措施。大语言模型的安全风险是亟待解决的重要问题。Large language models can provide answers comparable to human levels in multiple fields.It demonstrates a wealth of emergent capabilities in fields and tasks that have not been trained.However,at present,there are many hidden dangers in artificial intelligence system based on large language model.The artificial intelligence systems based on large language model have many potential safety hazard.For example,large language models are vulnerable to undetectable attacks,including intricately elusive ones.The content generated by those models may have problems such as illegality,leaks,hatred,bias,errors,etc.What’s more,in practical applications,the abuse of large language models is also an important issue.The content generated by the model may cause troubles at multiple levels such as countries,social groups,and fields.This paper aims to deeply explore and classify the safety risks faced by large language models,review existing evaluation methods,study the causal mechanisms behind the safety risks,and summarizes existing solutions.Specifically,this paper identifies 10 safety risks of large language models and categorizes them into two aspects:the safety risks of the model itself and the safety risks of the generated content.What’s more,this paper systematically analyzes the safety risks of the large language model itself from two perspectives of life cycle and hazard level,and introduces the methods for risk assessment of existing large language models,the causes for occurrence of safety risks of large language model and corresponding mitigation methods.The safety risk of large language models is an important issue that needs to be solved urgently.

关键词：大语言模型模型自身安全性生成内容安全性安全性分类安全性风险评估安全性风险归因安全性风险缓解措施安全性研究展望

分类号：TP39[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大语言模型安全性:分类、评估、归因、缓解、展望

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大语言模型安全性:分类、评估、归因、缓解、展望

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索