Prateek Mittal
8
Papers
463
Total Citations
Papers (8)
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
277
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
Data Shapley in One Training Run
ICLR 2025arXiv
44
citations
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
NeurIPS 2025
1
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
0
citations
Adapting to Evolving Adversaries with Regularized Continual Robust Training
ICML 2025
0
citations
A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
ICML 2024
0
citations
PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches
CVPR 2025
0
citations