🧬Ethics & Safety

Privacy in ML

Privacy-preserving machine learning

100 papers2,121 total citations
Compare with other topics
Feb '24 Jan '26757 papers
Also includes: differential privacy, privacy, private learning, membership inference

Top Papers

#1

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang et al.

ICLR 2024
263
citations
#2

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

Jack Foster, Stefan Schoepf, Alexandra Brintrup

AAAI 2024arXiv:2308.07707
machine unlearningselective synaptic dampeningfisher information matrixpost hoc unlearning+3
170
citations
#3

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou et al.

ICLR 2024
158
citations
#4

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi, Jaechan Lee, Yangsibo Huang et al.

ICLR 2025arXiv:2407.06460
machine unlearninglanguage modelsprivacy leakageverbatim memorization+4
157
citations
#5

EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE

Zeyi Liao, Lingbo Mo, Chejian Xu et al.

ICLR 2025arXiv:2409.11295
web agent securityprivacy leakage attacksenvironmental injection attackadversarial threat modeling+4
106
citations
#6

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Oliver Jaffe et al.

ICLR 2025
58
citations
#7

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

Chongyu Fan, Jiancheng Liu, Alfred Hero et al.

ECCV 2024
50
citations
#8

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011
data attributiondata shapleyfoundation model pretraininggenerative ai copyright+3
44
citations
#9

Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios

Jie Xu, Yazhou Ren, Xiaolong Wang et al.

CVPR 2024
41
citations
#10

No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation

Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.

AAAI 2024arXiv:2312.10080
federated learninggraph neural networksrecommendation systemsfairness constraints+3
39
citations
#11

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

Feifei Wang, Zhentao Tan, Tianyi Wei et al.

CVPR 2024
37
citations
#12

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

Yuhao Sun, Lingyun Yu, Hongtao Xie et al.

CVPR 2024
36
citations
#13

Persistent Pre-training Poisoning of LLMs

Yiming Zhang, Javier Rando, Ivan Evtimov et al.

ICLR 2025
34
citations
#14

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025
33
citations
#15

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.

ICLR 2025arXiv:2406.07072
variational quantum machine learningparametrized quantum circuitsquantum kernel methodstrainability+3
33
citations
#16

Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It

Adam Lilja, Junsheng Fu, Erik Stenborg et al.

CVPR 2024
30
citations
#17

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

CVPR 2025
29
citations
#18

Machine Unlearning Fails to Remove Data Poisoning Attacks

Martin Pawelczyk, Jimmy Di, Yiwei Lu et al.

ICLR 2025
28
citations
#19

A Closer Look at Machine Unlearning for Large Language Models

Xiaojian Yuan, Tianyu Pang, Chao Du et al.

ICLR 2025
28
citations
#20

Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

Xinying Zou, Samir Perlaza, Inaki Esnaola et al.

AAAI 2024arXiv:2312.12236
worst-case probability measuregeneralization gap analysisgibbs probability measureexpected loss sensitivity+4
26
citations
#21

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

Zhaowei Zhu, Jialu Wang, Hao Cheng et al.

ICLR 2024
26
citations
#22

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025
25
citations
#23

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.

ICLR 2025arXiv:2406.16257
machine unlearningexact unlearningparameter-efficient fine-tuningparameter isolation+4
22
citations
#24

PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

Chulin Xie, De-An Huang, Wenda Chu et al.

CVPR 2024
20
citations
#25

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

Boyi Deng, Wenjie Wang, Fengbin Zhu et al.

AAAI 2025
19
citations
#26

Encryption-Friendly LLM Architecture

Donghwan Rho, Taeseong Kim, Minje Park et al.

ICLR 2025
18
citations
#27

Progressive Poisoned Data Isolation for Training-Time Backdoor Defense

Yiming Chen, Haiwei Wu, Jiantao Zhou

AAAI 2024arXiv:2312.12724
backdoor attacksdata poisoningtraining-time defensepoisoned data isolation+2
16
citations
#28

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

Tudor Cebere, Aurélien Bellet, Nicolas Papernot

ICLR 2025
16
citations
#29

The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Yangyang Guo, Fangkai Jiao, Liqiang Nie et al.

NeurIPS 2025
15
citations
#30

Position: Editing Large Language Models Poses Serious Safety Risks

Paul Youssef, Zhixue Zhao, Daniel Braun et al.

ICML 2025
15
citations
#31

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

Yiren Song, Pei Yang, Hai Ci et al.

CVPR 2025
14
citations
#32

MERGE: Fast Private Text Generation

Zi Liang, Pinghui Wang, Ruofei Zhang et al.

AAAI 2024arXiv:2305.15769
private inferencetransformer-based modelsnatural language generationcloud model deployment+4
14
citations
#33

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

Lukas Helff, Felix Friedrich, Manuel Brack et al.

ICML 2025
14
citations
#34

Regroup Median Loss for Combating Label Noise

Authors: Fengpeng Li, Kemou Li, Jinyu Tian et al.

AAAI 2024arXiv:2312.06273
label noisesmall-loss criterionrobust loss estimationsemi-supervised learning+3
14
citations
#35

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Yingzi Ma, Jiongxiao Wang, Fei Wang et al.

ICLR 2025
13
citations
#36

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

Jinluan Yang, Anke Tang, Didi Zhu et al.

ICLR 2025
12
citations
#37

Backdoor Cleaning without External Guidance in MLLM Fine-tuning

Xuankun Rong, Wenke Huang, Jian Liang et al.

NeurIPS 2025
12
citations
#38

A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility

Chen E, Yang Cao, Ge Yifei

AAAI 2024arXiv:2312.14388
local differential privacyprivacy amplificationshuffle modelpersonalized ldp+3
12
citations
#39

Minimum-Norm Interpolation Under Covariate Shift

Neil Mallinar, Austin Zane, Spencer Frei et al.

ICML 2024
transfer learningcovariate shiftbenign overfittinglinear interpolation+3
12
citations
#40

SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining

Pei-Kai Huang, Jun-Xiong Chong, Cheng-Hsuan Chiang et al.

AAAI 2025
11
citations
#41

DP-SGD Without Clipping: The Lipschitz Neural Network Way

Louis Béthune, Thomas Massena, Thibaut Boissin et al.

ICLR 2024
11
citations
#42

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

Thomas Zollo, Todd Morrill, Zhun Deng et al.

ICLR 2024
11
citations
#43

Privacy-Preserving Optics for Enhancing Protection in Face De-Identification

Jhon Lopez, Carlos Hinojosa, Henry Arguello et al.

CVPR 2024
11
citations
#44

Causal Fairness under Unobserved Confounding: A Neural Sensitivity Framework

Maresa Schröder, Dennis Frauen, Stefan Feuerriegel

ICLR 2024
11
citations
#45

Emerging Property of Masked Token for Effective Pre-training

Hyesong Choi, Hunsang Lee, Seyoung Joung et al.

ECCV 2024arXiv:2404.08330
masked image modelingself-supervised learningmasked token optimizationpre-training efficiency+3
10
citations
#46

Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses

David Glukhov, Ziwen Han, I Shumailov et al.

ICLR 2025
10
citations
#47

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Siqiao Mu, Diego Klabjan

NeurIPS 2025
10
citations
#48

Poincaré Differential Privacy for Hierarchy-Aware Graph Embedding

Yuecen Wei, Haonan Yuan, Xingcheng Fu et al.

AAAI 2024arXiv:2312.12183
graph neural networksdifferential privacyhyperbolic geometrygraph embedding+3
10
citations
#49

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Yangsibo Huang, Daogao Liu, Lynn Chua et al.

ICLR 2025
10
citations
#50

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Jin Zhou, Kaiwen Wang, Jonathan Chang et al.

NeurIPS 2025arXiv:2502.20548
distributional reinforcement learningkl-regularized rlllm post-trainingvalue-based algorithms+4
10
citations
#51

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

jingnan zheng, Xiangtian Ji, Yijun Lu et al.

NeurIPS 2025
9
citations
#52

Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs

Bowen Tan, Zheng Xu, Eric Xing et al.

ICML 2025
9
citations
#53

Scaling Laws for Differentially Private Language Models

Ryan McKenna, Yangsibo Huang, Amer Sinha et al.

ICML 2025
9
citations
#54

Multi-Dimensional Fair Federated Learning

Cong Su, Guoxian Yu, Jun Wang et al.

AAAI 2024arXiv:2312.05551
federated learninggroup fairnessclient fairnessdifferential multipliers+3
9
citations
#55

Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning

Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi

ICLR 2025
8
citations
#56

On Harmonizing Implicit Subpopulations

Feng Hong, Jiangchao Yao, YUEMING LYU et al.

ICLR 2024
8
citations
#57

SLIM: Spuriousness Mitigation with Minimal Human Annotations

Xiwei Xuan, Ziquan Deng, Hsuan-Tien Lin et al.

ECCV 2024
8
citations
#58

Differentially Private Steering for Large Language Model Alignment

Anmol Goel, Yaxi Hu, Iryna Gurevych et al.

ICLR 2025
8
citations
#59

Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining

Qi Cui, Ruohan Meng, Chaohui Xu et al.

CVPR 2024
7
citations
#60

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

Tianchun Wang, Yuanzhou Chen, Zichuan Liu et al.

ICLR 2025
7
citations
#61

Privacy Attacks on Image AutoRegressive Models

Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch et al.

ICML 2025
7
citations
#62

PPIDSG: A Privacy-Preserving Image Distribution Sharing Scheme with GAN in Federated Learning

Yuting Ma, Yuanzhi Yao, Xiaohua Xu

AAAI 2024arXiv:2312.10380
federated learningprivacy-preserving sharingreconstruction attacksinference attacks+4
7
citations
#63

Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment

Alvi Md Ishmam, Chris Thomas

CVPR 2024
7
citations
#64

Privacy amplification by random allocation

Moshe Shenfeld, Vitaly Feldman

NeurIPS 2025
7
citations
#65

From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring

Yang Li, Qiang Sheng, Yehan Yang et al.

NeurIPS 2025
7
citations
#66

Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures

Sayanton Vhaduri Dibbo, Adam Breuer, Juston Moore et al.

ECCV 2024
7
citations
#67

SEMU: Singular Value Decomposition for Efficient Machine Unlearning

Marcin Sendera, Łukasz Struski, Kamil Książek et al.

ICML 2025
7
citations
#68

Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion

Tianyuan Zou, Yang Liu, Peng Li et al.

ICML 2025
7
citations
#69

Robustness Auditing for Linear Regression: To Singularity and Beyond

Ittai Rubinstein, Samuel Hopkins

ICLR 2025arXiv:2410.07916
robustness auditinglinear regressionordinary least squaressample removal+3
7
citations
#70

OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition

Yuchen Pan, Junjun Jiang, Kui Jiang et al.

CVPR 2024
6
citations
#71

Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process

Jing Yang

ICML 2025
6
citations
#72

Mask in the Mirror: Implicit Sparsification

Tom Jacobs, Rebekka Burkholz

ICLR 2025arXiv:2408.09966
continuous sparsificationimplicit regularizationmirror flow frameworkunderdetermined linear regression+2
6
citations
#73

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

Chen Chen, Daochang Liu, Mubarak Shah et al.

CVPR 2025
6
citations
#74

Stealthy Shield Defense: A Conditional Mutual Information-Based Approach against Black-Box Model Inversion Attacks

Tianqu Zhuang, Hongyao Yu, Yixiang Qiu et al.

ICLR 2025
6
citations
#75

The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing

Blaise Delattre, Alexandre Araujo, Quentin Barthélemy et al.

ICLR 2024
6
citations
#76

Data-Free Hard-Label Robustness Stealing Attack

Xiaojian Yuan, Kejiang Chen, Wen Huang et al.

AAAI 2024arXiv:2312.05924
model stealing attackshard-label queriesrobustness stealingdata-free attacks+4
6
citations
#77

Differentially Private Federated Learning with Time-Adaptive Privacy Spending

Shahrzad Kianidehkordi, Nupur Kulkarni, Adam Dziedzic et al.

ICLR 2025
5
citations
#78

Strategic Classification With Externalities

Safwan Hossain, Evi Micha, Yiling Chen et al.

ICLR 2025
5
citations
#79

Protect Your Score: Contact-Tracing with Differential Privacy Guarantees

Rob Romijnders, Christos Louizos, Yuki Asano et al.

AAAI 2024arXiv:2312.11581
contact tracing algorithmsdifferential privacy guaranteesrisk score communicationprivacy-preserving mechanisms+4
5
citations
#80

Understanding Generalization in Quantum Machine Learning with Margins

TAK HUR, Daniel Kyungdeock Park

ICML 2025
5
citations
#81

Hessian-Free Online Certified Unlearning

Xinbao Qiao, Meng Zhang, Ming Tang et al.

ICLR 2025arXiv:2404.01712
machine unlearningcertified unlearningonline unlearninghessian-free optimization+4
5
citations
#82

Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers

Yuhao Yi, Ronghui You, Hong Liu et al.

AAAI 2024arXiv:2312.12835
byzantine machine learningresilient aggregation mechanismsdistributed learning systemsoutlier-robust clustering+4
5
citations
#83

Learning Safe Action Models with Partial Observability

Hai Le, Brendan Juba, Roni Stern

AAAI 2024
5
citations
#84

Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space

Yufei Gu, Xiaoqing Zheng, Tomaso Aste

ICLR 2024
5
citations
#85

Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning

Fengyu Gao, Ruida Zhou, Tianhao Wang et al.

ICLR 2025
5
citations
#86

Towards Trustworthy Federated Learning with Untrusted Participants

Youssef Allouah, Rachid Guerraoui, John Stephan

ICML 2025
5
citations
#87

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

xihong yang, Siwei Wang, Fangdi Wang et al.

ICML 2025
5
citations
#88

A Generic Framework for Conformal Fairness

Aditya Vadlamani, Anutam Srinivasan, Pranav Maneriker et al.

ICLR 2025
5
citations
#89

Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models

Linh Tran, Wei Sun, Stacy Patterson et al.

ICLR 2025
5
citations
#90

SAP: Corrective Machine Unlearning with Scaled Activation Projection for Label Noise Robustness

Sangamesh Kodge, Deepak Ravikumar, Gobinda Saha et al.

AAAI 2025
5
citations
#91

DF-MIA: A Distribution-Free Membership Inference Attack on Fine-Tuned Large Language Models

Zhiheng Huang, Yannan Liu, Daojing He et al.

AAAI 2025
4
citations
#92

X-Hacking: The Threat of Misguided AutoML

Rahul Sharma, Sumantrak Mukherjee, Andrea Šipka et al.

ICML 2025
4
citations
#93

How Far Are We from True Unlearnability?

Kai Ye, Liangcai Su, Chenxiong Qian

ICLR 2025arXiv:2509.08058
unlearnable examplesdata poisoningloss landscape analysismulti-task learning+4
4
citations
#94

Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

Yufan Liu, Wanqian Zhang, Dayan Wu et al.

ECCV 2024arXiv:2407.08127
model inversion attackblack-box attackprediction alignmentlatent code search+4
4
citations
#95

Personalized Privacy Protection Mask Against Unauthorized Facial Recognition

Ka Ho Chow, Sihao Hu, Tiansheng Huang et al.

ECCV 2024arXiv:2407.13975
facial recognition privacyprivacy protection maskcross-image optimizationperceptibility optimization+3
4
citations
#96

DCT-CryptoNets: Scaling Private Inference in the Frequency Domain

Arjun Roy, Kaushik Roy

ICLR 2025
4
citations
#97

Towards Establishing Guaranteed Error for Learned Database Operations

Sepanta Zeighami, Cyrus Shahabi

ICLR 2024
4
citations
#98

Differential Privacy Under Class Imbalance: Methods and Empirical Insights

Lucas Rosenblatt, Yuliia Lut, Ethan Turok et al.

ICML 2025
4
citations
#99

ExcluIR: Exclusionary Neural Information Retrieval

Wenhao Zhang, Mengqi Zhang, Shiguang Wu et al.

AAAI 2025
4
citations
#100

Bayesian Low-Rank Learning (Bella): A Practical Approach to Bayesian Neural Networks

Bao Gia Doan, Afshar Shamsi, Xiao-Yu Guo et al.

AAAI 2025
4
citations