Most Cited 2025 Oral "centerline graph" Papers

22,274 papers found • Page 1 of 112

#1

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu et al.

ICLR 2025arXiv:2408.00714
2393
citations
#2

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Clemencia Siro, Guy Gur-Ari, Gaurav Mishra et al.

ICLR 2025oralarXiv:2206.04615
2226
citations
#3

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025oralarXiv:2408.06072
1409
citations
#4

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Chaoyou Fu, Peixian Chen, Yunhang Shen et al.

NEURIPS 2025spotlightarXiv:2306.13394
1277
citations
#5

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu et al.

NEURIPS 2025arXiv:2503.14476
1211
citations
#6

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Naman Jain, Han, Alex Gu et al.

ICLR 2025arXiv:2403.07974
1108
citations
#7

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, DAVID DOERMANN

NEURIPS 2025arXiv:2502.12524
938
citations
#8

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.

CVPR 2025highlightarXiv:2405.21075
917
citations
#9

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin, Zhelun Shi, Jiwen Yu et al.

ICML 2025arXiv:2410.18072
842
citations
#10

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Javier Rando, Tony Wang, Stewart Slocum et al.

ICLR 2025arXiv:2307.15217
750
citations
#11

Understanding R1-Zero-Like Training: A Critical Perspective

Zichen Liu, Changyu Chen, Wenjun Li et al.

COLM 2025paperarXiv:2503.20783
714
citations
#12

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue et al.

COLM 2025paperarXiv:2503.09516
694
citations
#13

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu et al.

ICLR 2025arXiv:2308.09583
655
citations
#14

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Jipeng Zhang, Hanze Dong, Tong Zhang et al.

ICLR 2025
642
citations
#15

VGGT: Visual Geometry Grounded Transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev et al.

CVPR 2025arXiv:2503.11651
612
citations
#16

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu et al.

NEURIPS 2025oralarXiv:2504.13837
540
citations
#17

Gymnasium: A Standard Interface for Reinforcement Learning Environments

Mark Towers, Ariel Kwiatkowski, John Balis et al.

NEURIPS 2025spotlightarXiv:2407.17032
534
citations
#18

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.

COLM 2025paperarXiv:2411.15124
494
citations
#19

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Jinheng Xie, Weijia Mao, Zechen Bai et al.

ICLR 2025arXiv:2408.12528
483
citations
#20

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Tianzhe Chu, Yuexiang Zhai, Jihan Yang et al.

ICML 2025arXiv:2501.17161
442
citations
#21

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Iman Mirzadeh, Keivan Alizadeh-Vahid, Hooman Shahrokhi et al.

ICLR 2025arXiv:2410.05229
436
citations
#22

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng XIANG, Zelong Lv, Sicheng Xu et al.

CVPR 2025highlightarXiv:2412.01506
434
citations
#23

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Chankyu Lee, Rajarshi Roy, Mengyao Xu et al.

ICLR 2025arXiv:2405.17428
419
citations
#24

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Terry Yue Zhuo, Minh Chien Vu, Jenny Chim et al.

ICLR 2025arXiv:2406.15877
410
citations
#25

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Songming Liu, Lingxuan Wu, Bangguo Li et al.

ICLR 2025arXiv:2410.07864
409
citations
#26

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You et al.

NEURIPS 2025oralarXiv:2502.09992
403
citations
#27

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Chenhao Tan, Robert Ness, Amit Sharma et al.

ICLR 2025arXiv:2305.00050
403
citations
#28

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Maksym Andriushchenko, francesco croce, Nicolas Flammarion

ICLR 2025arXiv:2404.02151
401
citations
#29

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Weihao Zeng, Yuzhen Huang, Qian Liu et al.

COLM 2025paperarXiv:2503.18892
390
citations
#30

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Boxuan Li, Yufan Song et al.

ICLR 2025arXiv:2407.16741
387
citations
#31

LIMO: Less is More for Reasoning

Yixin Ye, Zhen Huang, Yang Xiao et al.

COLM 2025paperarXiv:2502.03387
383
citations
#32

Generative Verifiers: Reward Modeling as Next-Token Prediction

Lunjun Zhang, Arian Hosseini, Hritik Bansal et al.

ICLR 2025arXiv:2408.15240
375
citations
#33

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Jihan Yang, Shusheng Yang, Anjali W. Gupta et al.

CVPR 2025arXiv:2412.14171
371
citations
#34

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.

ICCV 2025arXiv:2411.10440
360
citations
#35

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick et al.

ICML 2025arXiv:2406.11939
357
citations
#36

Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang et al.

ICCV 2025arXiv:2503.01785
357
citations
#37

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.

COLM 2025paperarXiv:2412.06769
357
citations
#38

U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

Chenxin Li, Xinyu Liu, Wuyang Li et al.

AAAI 2025paperarXiv:2406.02918
356
citations
#39

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Jingcheng Hu, Yinmin Zhang, Qi Han et al.

NEURIPS 2025arXiv:2503.24290
347
citations
#40

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.

ICLR 2025arXiv:2410.06940
342
citations
#41

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025arXiv:2406.04093
326
citations
#42

Training Language Models to Self-Correct via Reinforcement Learning

Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.

ICLR 2025arXiv:2409.12917
324
citations
#43

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.

COLM 2025paperarXiv:2503.01307
318
citations
#44

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Chunting Zhou, Lili Yu, Arun Babu et al.

ICLR 2025arXiv:2408.11039
318
citations
#45

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.

ICLR 2025arXiv:2410.02073
316
citations
#46

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Shenzhi Wang, Le Yu, Chang Gao et al.

NEURIPS 2025arXiv:2506.01939
305
citations
#47

Safety Alignment Should be Made More Than Just a Few Tokens Deep

Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.

ICLR 2025arXiv:2406.05946
303
citations
#48

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Yichen Gong, Delong Ran, Jinyuan Liu et al.

AAAI 2025paperarXiv:2311.05608
302
citations
#49

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun et al.

ICLR 2025arXiv:2406.04692
294
citations
#50

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Chengyue Wu, Xiaokang Chen, Zhiyu Wu et al.

CVPR 2025arXiv:2410.13848
293
citations
#51

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Chenglei Si, Diyi Yang, Tatsunori Hashimoto

ICLR 2025arXiv:2409.04109
285
citations
#52

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

NEURIPS 2025arXiv:2506.06941
277
citations
#53

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Ethan Chern, Steffi Chern, Shiqi Chen et al.

COLM 2025paperarXiv:2307.13528
276
citations
#54

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Junyi Zhang, Charles Herrmann, Junhwa Hur et al.

ICLR 2025arXiv:2410.03825
276
citations
#55

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.

ICLR 2025arXiv:2406.08464
276
citations
#56

OmniGen: Unified Image Generation

Shitao Xiao, Yueze Wang, Junjie Zhou et al.

CVPR 2025arXiv:2409.11340
271
citations
#57

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Xinyu Guan, Li Lyna Zhang, Yifei Liu et al.

ICML 2025oralarXiv:2501.04519
268
citations
#58

SpinQuant: LLM Quantization with Learned Rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov et al.

ICLR 2025arXiv:2405.16406
268
citations
#59

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Lianghui Zhu, Xinggang Wang, Xinlong Wang

ICLR 2025arXiv:2310.17631
265
citations
#60

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

yi yang, Xiaoxuan He, Hongkun Pan et al.

ICCV 2025arXiv:2503.10615
265
citations
#61

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Ali Hatamizadeh, Jan Kautz

CVPR 2025arXiv:2407.08083
264
citations
#62

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric Michaud et al.

ICLR 2025arXiv:2403.19647
263
citations
#63

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NEURIPS 2025oralarXiv:2503.21776
257
citations
#64

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Boyu Gou, Demi Ruohan Wang, Boyuan Zheng et al.

ICLR 2025arXiv:2410.05243
255
citations
#65

LoRA Learns Less and Forgets Less

Jonathan Frankle, Jose Javier Gonzalez Ortiz, Cody Blakeney et al.

ICLR 2025arXiv:2405.09673
252
citations
#66

Continuous 3D Perception Model with Persistent State

Qianqian Wang, Yifei Zhang, Aleksander Holynski et al.

CVPR 2025arXiv:2501.12387
250
citations
#67

A-Mem: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei et al.

NEURIPS 2025arXiv:2502.12110
250
citations
#68

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal, Sean Welleck

COLM 2025paperarXiv:2503.04697
250
citations
#69

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.

ICLR 2025arXiv:2409.12183
250
citations
#70

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

Jimeng Sun, Shubhendu Trivedi, Zhen Lin

ICLR 2025arXiv:2305.19187
248
citations
#71

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Qingqing Zhao, Yao Lu, Moo Jin Kim et al.

CVPR 2025arXiv:2503.22020
245
citations
#72

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Jiabo Ye, Haiyang Xu, Haowei Liu et al.

ICLR 2025arXiv:2408.04840
243
citations
#73

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Yuzhang Shang, Mu Cai, Bingxin Xu et al.

ICCV 2025arXiv:2403.15388
234
citations
#74

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Jiayi Ye, Yanbo Wang, Yue Huang et al.

ICLR 2025arXiv:2410.02736
229
citations
#75

LVBench: An Extreme Long Video Understanding Benchmark

Weihan Wang, zehai he, Wenyi Hong et al.

ICCV 2025highlightarXiv:2406.08035
229
citations
#76

Pyramidal Flow Matching for Efficient Video Generative Modeling

Yang Jin, Zhicheng Sun, Ningyuan Li et al.

ICLR 2025oralarXiv:2410.05954
227
citations
#77

OminiControl: Minimal and Universal Control for Diffusion Transformer

Zhenxiong Tan, Songhua Liu, Xingyi Yang et al.

ICCV 2025highlightarXiv:2411.15098
225
citations
#78

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Nikita Karaev, Iurii Makarov, Jianyuan Wang et al.

ICCV 2025highlightarXiv:2410.11831
223
citations
#79

Generative Representational Instruction Tuning

Niklas Muennighoff, Hongjin SU, Liang Wang et al.

ICLR 2025arXiv:2402.09906
222
citations
#80

Self-Play Preference Optimization for Language Model Alignment

Yue Wu, Zhiqing Sun, Rina Hughes et al.

ICLR 2025arXiv:2405.00675
221
citations
#81

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025arXiv:2505.05470
221
citations
#82

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.

ICCV 2025arXiv:2503.12937
220
citations
#83

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.

ICML 2025arXiv:2410.04417
214
citations
#84

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Yukang Chen, Fuzhao Xue, Dacheng Li et al.

ICLR 2025arXiv:2408.10188
214
citations
#85

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Yecheng Wu, Zhuoyang Zhang, Junyu Chen et al.

ICLR 2025arXiv:2409.04429
208
citations
#86

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Chris Rawles, Sarah Clinckemaillie, Yifan Chang et al.

ICLR 2025arXiv:2405.14573
207
citations
#87

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Kepan Nan, Rui Xie, Penghao Zhou et al.

ICLR 2025arXiv:2407.02371
207
citations
#88

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Zefan Cai, Yichi Zhang, Bofei Gao et al.

COLM 2025paperarXiv:2406.02069
204
citations
#89

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NEURIPS 2025spotlightarXiv:2503.13657
204
citations
#90

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

Jingyang Ou, Shen Nie, Kaiwen Xue et al.

ICLR 2025arXiv:2406.03736
202
citations
#91

Infinity∞: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Jian Han, Jinlai Liu, Yi Jiang et al.

CVPR 2025arXiv:2412.04431
201
citations
#92

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Yu Sun, Xinhao Li, Karan Dalal et al.

ICML 2025spotlightarXiv:2407.04620
199
citations
#93

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Xiaoxi Li, Jiajie Jin, Guanting Dong et al.

NEURIPS 2025arXiv:2504.21776
198
citations
#94

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Marianne Arriola, Aaron Gokaslan, Justin Chiu et al.

ICLR 2025arXiv:2503.09573
197
citations
#95

Revisiting Feature Prediction for Learning Visual Representations from Video

Quentin Garrido, Yann LeCun, Michael Rabbat et al.

ICLR 2025arXiv:2404.08471
196
citations
#96

One Step Diffusion via Shortcut Models

Kevin Frans, Danijar Hafner, Sergey Levine et al.

ICLR 2025arXiv:2410.12557
195
citations
#97

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.

ICLR 2025arXiv:2409.16040
194
citations
#98

MambaOut: Do We Really Need Mamba for Vision?

Weihao Yu, Xinchao Wang

CVPR 2025arXiv:2405.07992
193
citations
#99

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

Xiaohuan Pei, Tao Huang, Chang Xu

AAAI 2025paperarXiv:2403.09977
192
citations
#100

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Yiping Wang, Qing Yang, Zhiyuan Zeng et al.

NEURIPS 2025arXiv:2504.20571
190
citations
#101

Inverse Scaling: When Bigger Isn't Better

Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.

ICLR 2025arXiv:2306.09479
186
citations
#102

Shape of Motion: 4D Reconstruction from a Single Video

Qianqian Wang, Vickie Ye, Hang Gao et al.

ICCV 2025highlightarXiv:2407.13764
186
citations
#103

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai et al.

NEURIPS 2025oralarXiv:2505.13447
185
citations
#104

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Jingfeng Yao, Bin Yang, Xinggang Wang

CVPR 2025arXiv:2501.01423
184
citations
#105

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao et al.

ICML 2025oralarXiv:2410.17434
184
citations
#106

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025arXiv:2404.02078
183
citations
#107

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837
183
citations
#108

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu, Zekun Wang, Junli Wang et al.

ICML 2025arXiv:2412.04454
182
citations
#109

VACE: All-in-One Video Creation and Editing

Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.

ICCV 2025arXiv:2503.07598
181
citations
#110

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Jianing "Jed" Yang, Alexander Sax, Kevin Liang et al.

CVPR 2025arXiv:2501.13928
180
citations
#111

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.

ICLR 2025arXiv:2410.10819
179
citations
#112

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

NEURIPS 2025arXiv:2502.04463
178
citations
#113

ToolRL: Reward is All Tool Learning Needs

Cheng Qian, Emre Can Acikgoz, Qi He et al.

NEURIPS 2025oralarXiv:2504.13958
178
citations
#114

Gated Delta Networks: Improving Mamba2 with Delta Rule

Songlin Yang, Jan Kautz, Ali Hatamizadeh

ICLR 2025arXiv:2412.06464
177
citations
#115

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Sakshi, Utkarsh Tyagi, Sonal Kumar et al.

ICLR 2025arXiv:2410.19168
176
citations
#116

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Jiahui Gao, Renjie Pi, Jipeng Zhang et al.

ICLR 2025arXiv:2312.11370
174
citations
#117

Diffusion Models Are Real-Time Game Engines

Dani Valevski, Yaniv Leviathan, Moab Arar et al.

ICLR 2025arXiv:2408.14837
172
citations
#118

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

ICLR 2025arXiv:2403.17887
172
citations
#119

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen et al.

AAAI 2025paperarXiv:2407.08136
171
citations
#120

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Amrith Setlur, Chirag Nagpal, Adam Fisch et al.

ICLR 2025arXiv:2410.08146
169
citations
#121

OLMoE: Open Mixture-of-Experts Language Models

Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld et al.

ICLR 2025arXiv:2409.02060
169
citations
#122

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

Xin Wang, Yu Zheng, Zhongwei Wan et al.

ICLR 2025arXiv:2403.07378
168
citations
#123

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi, Jaechan Lee, Yangsibo Huang et al.

ICLR 2025arXiv:2407.06460
168
citations
#124

Latent Action Pretraining from Videos

Seonghyeon Ye, Joel Jang, Byeongguk Jeon et al.

ICLR 2025arXiv:2410.11758
168
citations
#125

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

Bencheng Liao, Shaoyu Chen, haoran yin et al.

CVPR 2025highlightarXiv:2411.15139
164
citations
#126

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.

CVPR 2025arXiv:2403.14773
164
citations
#127

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.

ICLR 2025arXiv:2410.12784
163
citations
#128

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Ruicheng Wang, Sicheng Xu, Cassie Lee Dai et al.

CVPR 2025arXiv:2410.19115
162
citations
#129

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

Yuancheng Wang, Haoyue Zhan, Liwei Liu et al.

ICLR 2025arXiv:2409.00750
161
citations
#130

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Yuang Zhang, Jiaxi Gu, Li-Wen Wang et al.

ICML 2025oralarXiv:2406.19680
161
citations
#131

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

NEURIPS 2025spotlightarXiv:2502.05171
158
citations
#132

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Wenbo Hu, Xiangjun Gao, Xiaoyu Li et al.

CVPR 2025highlightarXiv:2409.02095
158
citations
#133

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu, Ligeng Zhu, Baifeng Shi et al.

CVPR 2025arXiv:2412.04468
157
citations
#134

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Shansan Gong, Shivam Agarwal, Yizhe Zhang et al.

ICLR 2025arXiv:2410.17891
157
citations
#135

Training Software Engineering Agents and Verifiers with SWE-Gym

Jiayi Pan, Xingyao Wang, Graham Neubig et al.

ICML 2025arXiv:2412.21139
156
citations
#136

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin et al.

CVPR 2025arXiv:2405.19209
156
citations
#137

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Yuxiang Wei, Olivier Duchenne, Jade Copet et al.

NEURIPS 2025arXiv:2502.18449
156
citations
#138

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Yan Shu, Zheng Liu, Peitian Zhang et al.

CVPR 2025arXiv:2409.14485
155
citations
#139

Simplifying, Stabilizing and Scaling Continuous-time Consistency Models

Cheng Lu, Yang Song

ICLR 2025arXiv:2410.11081
155
citations
#140

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

Junfeng Fang, Houcheng Jiang, Kun Wang et al.

ICLR 2025arXiv:2410.02355
154
citations
#141

AFlow: Automating Agentic Workflow Generation

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu et al.

ICLR 2025arXiv:2410.10762
153
citations
#142

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang et al.

ICLR 2025oralarXiv:2412.10345
153
citations
#143

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Tinghao Xie, Xiangyu Qi, Yi Zeng et al.

ICLR 2025arXiv:2406.14598
151
citations
#144

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.

ICLR 2025arXiv:2406.04770
151
citations
#145

Navigation World Models

Amir Bar, Gaoyue Zhou, Danny Tran et al.

CVPR 2025arXiv:2412.03572
151
citations
#146

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Shengbang Tong, David Fan, Jiachen Zhu et al.

ICCV 2025arXiv:2412.14164
150
citations
#147

Retrieval Head Mechanistically Explains Long-Context Factuality

Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.

ICLR 2025arXiv:2404.15574
150
citations
#148

World Model on Million-Length Video And Language With Blockwise RingAttention

Hao Liu, Wilson Yan, Matei Zaharia et al.

ICLR 2025oralarXiv:2402.08268
149
citations
#149

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models

Bofei Gao, Feifan Song, Zhe Yang et al.

ICLR 2025arXiv:2410.07985
149
citations
#150

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Tianwei Yin, Qiang Zhang, Richard Zhang et al.

CVPR 2025arXiv:2412.07772
149
citations
#151

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Andrew Zhao, Yiran Wu, Yang Yue et al.

NEURIPS 2025spotlightarXiv:2505.03335
147
citations
#152

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, Vahab Mirrokni

NEURIPS 2025arXiv:2501.00663
147
citations
#153

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

YiFan Zhang, Huanyu Zhang, Haochen Tian et al.

ICLR 2025arXiv:2408.13257
147
citations
#154

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

MUDE HUI, Siwei Yang, Bingchen Zhao et al.

ICLR 2025arXiv:2404.09990
146
citations
#155

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Guosheng Zhao, Xiaofeng Wang, Zheng Zhu et al.

AAAI 2025paperarXiv:2403.06845
146
citations
#156

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

Yangzhen Wu, Zhiqing Sun, Shanda Li et al.

ICLR 2025
146
citations
#157

Scaling Large Language Model-based Multi-Agent Collaboration

Chen Qian, Zihao Xie, YiFei Wang et al.

ICLR 2025arXiv:2406.07155
146
citations
#158

Diffusion Policy Policy Optimization

Allen Ren, Justin Lidard, Lars Ankile et al.

ICLR 2025arXiv:2409.00588
146
citations
#159

Layer by Layer: Uncovering Hidden Representations in Language Models

Oscar Skean, Md Rifat Arefin, Dan Zhao et al.

ICML 2025oralarXiv:2502.02013
145
citations
#160

Segment Any 3D Gaussians

Jiazhong Cen, Jiemin Fang, Chen Yang et al.

AAAI 2025paperarXiv:2312.00860
145
citations
#161

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He et al.

NEURIPS 2025spotlightarXiv:2506.08009
145
citations
#162

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

Xuanchi Ren, Tianchang Shen, Jiahui Huang et al.

CVPR 2025highlightarXiv:2503.03751
144
citations
#163

Physics of Language Models: Part 3.2, Knowledge Manipulation

Zeyuan Allen-Zhu, Yuanzhi Li

ICLR 2025arXiv:2309.14402
143
citations
#164

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.

ICLR 2025arXiv:2410.09024
143
citations
#165

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe et al.

ICLR 2025arXiv:2410.07095
142
citations
#166

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin, Z.Z. Ren, Junxiao Song et al.

ICLR 2025arXiv:2408.08152
142
citations
#167

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Fangyu Lei, Jixuan Chen, Yuxiao Ye et al.

ICLR 2025arXiv:2411.07763
141
citations
#168

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

Shubham Toshniwal, Wei Du, Ivan Moshkov et al.

ICLR 2025arXiv:2410.01560
141
citations
#169

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.

AAAI 2025paperarXiv:2311.17179
141
citations
#170

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Junyu Chen, Han Cai, Junsong Chen et al.

ICLR 2025arXiv:2410.10733
138
citations
#171

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Chaoyou Fu, Haojia Lin, Xiong Wang et al.

NEURIPS 2025spotlightarXiv:2501.01957
138
citations
#172

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Zuyan Liu, Yuhao Dong, Ziwei Liu et al.

ICLR 2025oralarXiv:2409.12961
138
citations
#173

Language Prompt for Autonomous Driving

Dongming Wu, Wencheng Han, Yingfei Liu et al.

AAAI 2025paperarXiv:2309.04379
138
citations
#174

OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-On

Yuhao Xu, Tao Gu, Weifeng Chen et al.

AAAI 2025paperarXiv:2403.01779
138
citations
#175

MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos

Zhengqi Li, Richard Tucker, Forrester Cole et al.

CVPR 2025arXiv:2412.04463
136
citations
#176

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen et al.

AAAI 2025paperarXiv:2412.11664
136
citations
#177

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Qifan Yu, Wei Chow, Zhongqi Yue et al.

CVPR 2025arXiv:2411.15738
135
citations
#178

IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Ziyang Li, Saikat Dutta, Mayur Naik

ICLR 2025arXiv:2405.17238
135
citations
#179

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Qingkai Fang, Shoutao Guo, Yan Zhou et al.

ICLR 2025arXiv:2409.06666
135
citations
#180

MMaDA: Multimodal Large Diffusion Language Models

Ling Yang, Ye Tian, Bowen Li et al.

NEURIPS 2025arXiv:2505.15809
135
citations
#181

Automated Design of Agentic Systems

Shengran Hu, Cong Lu, Jeff Clune

ICLR 2025arXiv:2408.08435
133
citations
#182

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.

ICML 2025arXiv:2404.16873
132
citations
#183

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Shengpeng Ji, Ziyue Jiang, Wen Wang et al.

ICLR 2025oralarXiv:2408.16532
132
citations
#184

Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

Carles Domingo i Enrich, Michal Drozdzal, Brian Karrer et al.

ICLR 2025arXiv:2409.08861
131
citations
#185

DepthSplat: Connecting Gaussian Splatting and Depth

Haofei Xu, Songyou Peng, Fangjinhua Wang et al.

CVPR 2025arXiv:2410.13862
131
citations
#186

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

Chengzu Li, Wenshan Wu, Huanyu Zhang et al.

ICML 2025arXiv:2501.07542
131
citations
#187

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Kevin Qinghong Lin, Linjie Li, Difei Gao et al.

CVPR 2025arXiv:2411.17465
131
citations
#188

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Hadas Orgad, Michael Toker, Zorik Gekhman et al.

ICLR 2025arXiv:2410.02707
129
citations
#189

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver

Zhenting Qi, Mingyuan MA, Jiahang Xu et al.

ICLR 2025arXiv:2408.06195
129
citations
#190

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun et al.

NEURIPS 2025oralarXiv:2504.13181
129
citations
#191

Learning to Reason under Off-Policy Guidance

Jianhao Yan, Yafu Li, Zican Hu et al.

NEURIPS 2025arXiv:2504.14945
129
citations
#192

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.

AAAI 2025paperarXiv:2412.16986
129
citations
#193

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NEURIPS 2025arXiv:2504.16084
129
citations
#194

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Liao Qu, Huichao Zhang, Yiheng Liu et al.

CVPR 2025arXiv:2412.03069
128
citations
#195

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu et al.

ICLR 2025oralarXiv:2410.10813
128
citations
#196

WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing Yu, Haoyi Duan, Charles Herrmann et al.

CVPR 2025highlightarXiv:2406.09394
128
citations
#197

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Shi Yu, Chaoyue Tang, Bokai Xu et al.

ICLR 2025arXiv:2410.10594
127
citations
#198

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities

CHENMING ZHU, Tai Wang, Wenwei Zhang et al.

ICCV 2025
127
citations
#199

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Jing He, Haodong Li, Wei Yin et al.

ICLR 2025arXiv:2409.18124
127
citations
#200

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025arXiv:2501.13918
127
citations
PreviousNext