ICML
5,975 papers tracked across 2 years
Top Papers in ICML 2025
View all papers →WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin, Zhelun Shi, Jiwen Yu et al.
From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline
Tianle Li, Wei-Lin Chiang, Evan Frick et al.
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Yiheng Xu, Zekun Wang, Junli Wang et al.
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan, Xingyao Wang, Graham Neubig et al.
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.
Layer by Layer: Uncovering Hidden Representations in Language Models
Oscar Skean, Md Rifat Arefin, Dan Zhao et al.
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
Chengzu Li, Wenshan Wu, Huanyu Zhang et al.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley, Daniel Tan, Niels Warncke et al.
Taming Rectified Flow for Inversion and Editing
Jiangshan Wang, Junfu Pu, Zhongang Qi et al.
A General Framework for Inference-time Scaling and Steering of Diffusion Models
Raghav Singhal, Zachary Horvitz, Ryan Teehan et al.
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang, Yangze Li, Chaoyou Fu et al.
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
Yuxin Zuo, Shang Qu, Yifei Li et al.
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Jiaxing Cui, Wei-Lin Chiang, Ion Stoica et al.
Theoretical guarantees on the best-of-n alignment policy
Ahmad Beirami, Alekh Agarwal, Jonathan Berant et al.
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
Xiang Fu, Brandon Wood, Luis Barroso-Luque et al.
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Fanqing Meng, Jiaqi Liao, Xinyu Tan et al.
Scaling Test-Time Compute Without Verification or RL is Suboptimal
Amrith Setlur, Nived Rajaraman, Sergey Levine et al.