Maksym Andriushchenko

8

Papers

375

Total Citations

Papers (8)

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

NeurIPS 2025arXiv

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Square Attack: a query-efficient black-box adversarial attack via random search

Understanding and Improving Fast Adversarial Training

NeurIPS 2020arXiv

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

NeurIPS 2023arXiv

Sharpness-Aware Minimization Leads to Low-Rank Features

NeurIPS 2023arXiv