公开可验证的模型遗忘方案  

A Publicly Accountable Machine Unlearning Method

在线阅读下载全文

作  者:翁嘉思 辜燕云 刘家男 李明[1] 翁健[1] WENG Jia-Si;GU Yan-Yun;LIU Jia-Nan;LI Ming;WENG Jian(Department of Cyber Security,Jinan University,Guangzhou 510632;Department of Cyber Security,Dongguan University of Technology,Dongguan,Guangdong 523808)

机构地区:[1]暨南大学网络空间安全学院,广州510632 [2]东莞理工学院计算机科学与技术学院,广东东莞523808

出  处:《计算机学报》2025年第2期477-496,共20页Chinese Journal of Computers

基  金:国家自然科学基金青年项目(62302192,62102166);国家自然科学基金重点项目(62332007);国家自然科学基金联合项目(U23A20303);广东省自然科学基金面上项目(2024A1515010086);广州市科技计划项目(2024A04J3691,2024A03J0464);中国博士后科学基金第17批特别资助项目(2024T170348);江苏省机器学习与网络空间安全跨学科研究工程中心;中央高校基本科研专项资金;东莞市社会发展科技重点项目(20231800940342)的资助

摘  要:全球数字化进程的加速伴随着数据主体信息失控现象日益显著。国内外数据安全相关法律相继出台,其中遗忘权(the Right to Be Forgotten)强调了数据主体拥有从数据使用方撤回其数据的权利。模型遗忘(Machine Unlearning)是机器学习领域践行遗忘权的技术,允许模型拥有方(即数据使用方)从已训练的模型中遗忘原本训练数据的指定数据,以满足数据拥有方撤回其数据的需求。现有针对模型遗忘效果的验证方法通常假设存在一个从未使用过被遗忘数据的基准模型,并通过测量遗忘后模型和基准模型的参数分布或输出分布是否足够相似来完成验证。然而,在恶意攻击场景下,模型拥有方容易伪造遗忘后模型的参数和输出分布,且模型参数通常难以归因于特定的训练数据,导致验证方难以有效验证目标模型是否遗忘其数据。本文提出了一种新的公开可验证模型遗忘方案,该方案在数据拥有方和模型拥有方之间执行,并在模型拥有方出现恶意行为时,数据拥有方能够生成任意第三方可验证的不可否认凭证。具体地,数据拥有方先利用动态通用累加器来认证被授权使用的数据或删除不被授权使用的数据;随后,模型拥有方在公开可验证隐蔽模型下证明模型训练使用了被累加数据或没有使用不被累加数据;最后,数据拥有方验证证明的有效性,若发现模型拥有方使用了未授权数据,则其生成公开可验证的凭证来追责模型拥有方的不合法行为。实验评估了不同数据量下证明和验证的计算开销,同时评估了不同数据点删除对模型预测结果的影响。The rapid pace of global digitalization has brought about increasingly significant challenges related to the loss of control over personal information.In response to these challenges,the laws and regulations for protecting data security have been introduced,both domestically and internationally.Among these,the rule of“the Right to Be Forgotten”first established under the General Data Protection Regulation(GDPR),grants data owners the right to request the removal of their data from data users.Machine Unlearning is a technique in machine learning that embodies this right,enabling the model owner(i.e.,the data user)to remove specific data from a trained model to fulfill the data owner’s request to withdraw their data.However,implementing and verifying the effectiveness of machine unlearning presents a number of challenges.One critical issue is determining whether the specified data has indeed been removed from the trained model.Current verification methods often rely on the assumption of a baseline model,which is a version of the model that has never been trained on the data in question.Verification is then conducted by comparing the parameter distribution or the output distribution of the unlearned model with that of the baseline model.If the two distributions closely match,it is inferred that the data has been effectively forgotten.But this approach is vulnerable to several limitations.In scenarios where the model owner acts maliciously,they may forge the parameters or the output distribution of the unlearned model,creating the appearance that the data has been removed even when it has not.Furthermore,tracing model parameters back to specific training data is inherently challenging,making it difficult for verifiers to definitively confirm whether the target model has truly forgotten the data in question.These challenges underscore the need for more robust verification mechanisms.To address these limitations,this paper introduces a new publicly verifiable machine unlearning scheme that leverages cryptographic

关 键 词:机器学习 数据安全 遗忘权 模型遗忘 可验证性 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象