GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs

GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs

作　　者：Parijat Rai Saumil Sood Vijay K. Madisetti Arshdeep Bahga Parijat Rai;Saumil Sood;Vijay K. Madisetti;Arshdeep Bahga(School of Computer Science Engineering & Technology, Bennett University, Greater Noida, India;School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, USA;Cloudemy Technology Labs, Chandigarh, India)

机构地区：[1]School of Computer Science Engineering & Technology, Bennett University, Greater Noida, India [2]School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, USA [3]Cloudemy Technology Labs, Chandigarh, India

出　　处：《Journal of Software Engineering and Applications》2024年第1期43-68,共26页软件工程与应用（英文）

摘　　要：This paper introduces a novel multi-tiered defense architecture to protect language models from adversarial prompt attacks. We construct adversarial prompts using strategies like role emulation and manipulative assistance to simulate real threats. We introduce a comprehensive, multi-tiered defense framework named GUARDIAN (Guardrails for Upholding Ethics in Language Models) comprising a system prompt filter, pre-processing filter leveraging a toxic classifier and ethical prompt generator, and pre-display filter using the model itself for output screening. Extensive testing on Meta’s Llama-2 model demonstrates the capability to block 100% of attack prompts. The approach also auto-suggests safer prompt alternatives, thereby bolstering language model security. Quantitatively evaluated defense layers and an ethical substitution mechanism represent key innovations to counter sophisticated attacks. The integrated methodology not only fortifies smaller LLMs against emerging cyber threats but also guides the broader application of LLMs in a secure and ethical manner.This paper introduces a novel multi-tiered defense architecture to protect language models from adversarial prompt attacks. We construct adversarial prompts using strategies like role emulation and manipulative assistance to simulate real threats. We introduce a comprehensive, multi-tiered defense framework named GUARDIAN (Guardrails for Upholding Ethics in Language Models) comprising a system prompt filter, pre-processing filter leveraging a toxic classifier and ethical prompt generator, and pre-display filter using the model itself for output screening. Extensive testing on Meta’s Llama-2 model demonstrates the capability to block 100% of attack prompts. The approach also auto-suggests safer prompt alternatives, thereby bolstering language model security. Quantitatively evaluated defense layers and an ethical substitution mechanism represent key innovations to counter sophisticated attacks. The integrated methodology not only fortifies smaller LLMs against emerging cyber threats but also guides the broader application of LLMs in a secure and ethical manner.

关键词：Large Language Models (LLMs) Adversarial Attack Prompt Injection Filter Defense Artificial Intelligence Machine Learning CYBERSECURITY

分类号：H31[语言文字—英语]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索