检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:台建玮 杨双宁 王佳佳 李亚凯 刘奇旭[2] 贾晓启[2] Tai Jianwei;Yang Shuangning;Wang Jiajia;Li Yakai;Liu Qixu;Jia Xiaoqi(School of Internet,Anhui University,Hefei 230039;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093)
机构地区:[1]安徽大学互联网学院,合肥230039 [2]中国科学院信息工程研究所,北京100093
出 处:《计算机研究与发展》2025年第3期563-588,共26页Journal of Computer Research and Development
基 金:国家自然科学基金面上项目(71971002);安徽省自然科学基金项目(2108085QA35)。
摘 要:随着自然语言处理与深度学习技术的快速发展,大语言模型在文本处理、语言理解、图像生成和代码审计等领域中的应用不断深入,成为了当前学术界与工业界共同关注的研究热点.然而,攻击者可以通过对抗性攻击手段引导大语言模型输出错误的、不合伦理的或虚假的内容,使得大语言模型面临的安全威胁日益严峻.对近年来针对大语言模型的对抗性攻击方法和防御策略进行总结,详细梳理了相关研究的基本原理、实施方法与研究结论.在此基础上,对提示注入攻击、间接提示注入攻击、越狱攻击和后门攻击这4类主流的攻击模式进行了深入的技术探讨.更进一步地,对大语言模型安全的研究现状与未来方向进行了探讨,并展望了大语言模型结合多模态数据分析与集成等技术的应用前景.With the rapid development of natural language processing and deep learning technologies,large language models(LLMs)have been increasingly applied in various fields such as text processing,language understanding,image generation,and code auditing.These models have become a research hotspot of common interest in both academia and industry.However,adversarial attack methods allow attackers to manipulate large language models into generating erroneous,unethical,or false content,posing increasingly severe security threats to these models and their wide-ranging applications.This paper systematically reviews recent advancements in adversarial attack methods and defense strategies for large language models.It provides a detailed summary of fundamental principles,implementation techniques,and major findings from relevant studies.Building on this foundation,the paper delves into technical discussions of four mainstream attack modes:prompt injection attacks,indirect prompt injection attacks,jailbreak attacks,and backdoor attacks.Each is analyzed in terms of its mechanisms,impacts,and potential risks.Furthermore,the paper discusses the current research status and future directions of large language models security,and outlooks the application prospects of large language models combined with multimodal data analysis and integration technologies.This review aims to enhance understanding of the field and foster more secure,reliable applications of large language models.
关 键 词:大语言模型 对抗性攻击 防御策略 网络空间安全 生成式人工智能
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.248.35