Maksym Andriushchenko

9

Papers

905

Total Citations

Papers (9)

Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

NeurIPS 2017arXiv

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem

Square Attack: a query-efficient black-box adversarial attack via random search

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Provably robust boosted decision stumps and trees against adversarial attacks

Understanding and Improving Fast Adversarial Training

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

Sharpness-Aware Minimization Leads to Low-Rank Features