ICML

5,975 papers tracked across 2 years

Top Papers in ICML 2025

View all papers →
#1

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin, Zhelun Shi, Jiwen Yu et al.

ICML 2025
806
citations
#2

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick et al.

ICML 2025
329
citations
#3

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.

ICML 2025
190
citations
#4

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu, Zekun Wang, Junli Wang et al.

ICML 2025
165
citations
#5

Training Software Engineering Agents and Verifiers with SWE-Gym

Jiayi Pan, Xingyao Wang, Graham Neubig et al.

ICML 2025
130
citations
#6

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.

ICML 2025
123
citations
#7

Layer by Layer: Uncovering Hidden Representations in Language Models

Oscar Skean, Md Rifat Arefin, Dan Zhao et al.

ICML 2025
118
citations
#8

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

Chengzu Li, Wenshan Wu, Huanyu Zhang et al.

ICML 2025
115
citations
#9

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley, Daniel Tan, Niels Warncke et al.

ICML 2025
110
citations
#10

Taming Rectified Flow for Inversion and Editing

Jiangshan Wang, Junfu Pu, Zhongang Qi et al.

ICML 2025
110
citations
#11

A General Framework for Inference-time Scaling and Steering of Diffusion Models

Raghav Singhal, Zachary Horvitz, Ryan Teehan et al.

ICML 2025
103
citations
#12

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Xiong Wang, Yangze Li, Chaoyou Fu et al.

ICML 2025
103
citations
#13

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.

ICML 2025
100
citations
#14

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Yuxin Zuo, Shang Qu, Yifei Li et al.

ICML 2025
98
citations
#15

OR-Bench: An Over-Refusal Benchmark for Large Language Models

Jiaxing Cui, Wei-Lin Chiang, Ion Stoica et al.

ICML 2025
97
citations
#16

Theoretical guarantees on the best-of-n alignment policy

Ahmad Beirami, Alekh Agarwal, Jonathan Berant et al.

ICML 2025
89
citations
#17

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.

ICML 2025
88
citations
#18

Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction

Xiang Fu, Brandon Wood, Luis Barroso-Luque et al.

ICML 2025
87
citations
#19

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Fanqing Meng, Jiaqi Liao, Xinyu Tan et al.

ICML 2025
72
citations
#20

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Amrith Setlur, Nived Rajaraman, Sergey Levine et al.

ICML 2025
68
citations