Most Cited 2025 Highlight "rank estimation" Papers

22,274 papers found • Page 1 of 112

#1

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Clemencia Siro, Guy Gur-Ari, Gaurav Mishra et al.

ICLR 2025oralarXiv:2206.04615
2192
citations
#2

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025oralarXiv:2408.06072
1355
citations
#3

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Chaoyou Fu, Peixian Chen, Yunhang Shen et al.

NEURIPS 2025spotlightarXiv:2306.13394
1237
citations
#4

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Naman Jain, Han, Alex Gu et al.

ICLR 2025posterarXiv:2403.07974
1016
citations
#5

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.

CVPR 2025highlightarXiv:2405.21075
858
citations
#6

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin, Zhelun Shi, Jiwen Yu et al.

ICML 2025posterarXiv:2410.18072
806
citations
#7

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Javier Rando, Tony Wang, Stewart Slocum et al.

ICLR 2025posterarXiv:2307.15217
733
citations
#8

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Jipeng Zhang, Hanze Dong, Tong Zhang et al.

ICLR 2025poster
642
citations
#9

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu et al.

ICLR 2025posterarXiv:2308.09583
637
citations
#10

VGGT: Visual Geometry Grounded Transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev et al.

CVPR 2025posterarXiv:2503.11651
552
citations
#11

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu et al.

NEURIPS 2025oralarXiv:2504.13837
483
citations
#12

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Jinheng Xie, Weijia Mao, Zechen Bai et al.

ICLR 2025posterarXiv:2408.12528
455
citations
#13

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Terry Yue Zhuo, Minh Chien Vu, Jenny Chim et al.

ICLR 2025posterarXiv:2406.15877
397
citations
#14

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Chenhao Tan, Robert Ness, Amit Sharma et al.

ICLR 2025posterarXiv:2305.00050
390
citations
#15

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Maksym Andriushchenko, francesco croce, Nicolas Flammarion

ICLR 2025posterarXiv:2404.02151
375
citations
#16

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Songming Liu, Lingxuan Wu, Bangguo Li et al.

ICLR 2025posterarXiv:2410.07864
365
citations
#17

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Boxuan Li, Yufan Song et al.

ICLR 2025posterarXiv:2407.16741
351
citations
#18

Generative Verifiers: Reward Modeling as Next-Token Prediction

Lunjun Zhang, Arian Hosseini, Hritik Bansal et al.

ICLR 2025posterarXiv:2408.15240
348
citations
#19

Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang et al.

ICCV 2025posterarXiv:2503.01785
347
citations
#20

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Jihan Yang, Shusheng Yang, Anjali W. Gupta et al.

CVPR 2025posterarXiv:2412.14171
342
citations
#21

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.

ICCV 2025posterarXiv:2411.10440
338
citations
#22

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick et al.

ICML 2025posterarXiv:2406.11939
329
citations
#23

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.

ICLR 2025posterarXiv:2410.06940
308
citations
#24

Training Language Models to Self-Correct via Reinforcement Learning

Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.

ICLR 2025posterarXiv:2409.12917
305
citations
#25

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.

ICLR 2025posterarXiv:2410.02073
299
citations
#26

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025posterarXiv:2406.04093
298
citations
#27

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Chunting Zhou, Lili Yu, Arun Babu et al.

ICLR 2025posterarXiv:2408.11039
294
citations
#28

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Yichen Gong, Delong Ran, Jinyuan Liu et al.

AAAI 2025paperarXiv:2311.05608
283
citations
#29

Safety Alignment Should be Made More Than Just a Few Tokens Deep

Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.

ICLR 2025posterarXiv:2406.05946
277
citations
#30

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun et al.

ICLR 2025posterarXiv:2406.04692
274
citations
#31

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Chenglei Si, Diyi Yang, Tatsunori Hashimoto

ICLR 2025posterarXiv:2409.04109
272
citations
#32

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Junyi Zhang, Charles Herrmann, Junhwa Hur et al.

ICLR 2025posterarXiv:2410.03825
262
citations
#33

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.

ICLR 2025posterarXiv:2406.08464
261
citations
#34

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Lianghui Zhu, Xinggang Wang, Xinlong Wang

ICLR 2025posterarXiv:2310.17631
258
citations
#35

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

NEURIPS 2025posterarXiv:2506.06941
257
citations
#36

OmniGen: Unified Image Generation

Shitao Xiao, Yueze Wang, Junjie Zhou et al.

CVPR 2025posterarXiv:2409.11340
253
citations
#37

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric Michaud et al.

ICLR 2025posterarXiv:2403.19647
252
citations
#38

SpinQuant: LLM Quantization with Learned Rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov et al.

ICLR 2025posterarXiv:2405.16406
248
citations
#39

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

yi yang, Xiaoxuan He, Hongkun Pan et al.

ICCV 2025posterarXiv:2503.10615
247
citations
#40

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.

ICLR 2025posterarXiv:2409.12183
239
citations
#41

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Jiabo Ye, Haiyang Xu, Haowei Liu et al.

ICLR 2025posterarXiv:2408.04840
237
citations
#42

Continuous 3D Perception Model with Persistent State

Qianqian Wang, Yifei Zhang, Aleksander Holynski et al.

CVPR 2025posterarXiv:2501.12387
236
citations
#43

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NEURIPS 2025oralarXiv:2503.21776
236
citations
#44

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

Jimeng Sun, Shubhendu Trivedi, Zhen Lin

ICLR 2025posterarXiv:2305.19187
233
citations
#45

LoRA Learns Less and Forgets Less

Jonathan Frankle, Jose Javier Gonzalez Ortiz, Cody Blakeney et al.

ICLR 2025posterarXiv:2405.09673
233
citations
#46

Pyramidal Flow Matching for Efficient Video Generative Modeling

Yang Jin, Zhicheng Sun, Ningyuan Li et al.

ICLR 2025oralarXiv:2410.05954
215
citations
#47

OminiControl: Minimal and Universal Control for Diffusion Transformer

Zhenxiong Tan, Songhua Liu, Xingyi Yang et al.

ICCV 2025highlightarXiv:2411.15098
214
citations
#48

Generative Representational Instruction Tuning

Niklas Muennighoff, Hongjin SU, Liang Wang et al.

ICLR 2025posterarXiv:2402.09906
214
citations
#49

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Nikita Karaev, Iurii Makarov, Jianyuan Wang et al.

ICCV 2025highlightarXiv:2410.11831
213
citations
#50

LVBench: An Extreme Long Video Understanding Benchmark

Weihan Wang, zehai he, Wenyi Hong et al.

ICCV 2025highlightarXiv:2406.08035
208
citations
#51

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Jiayi Ye, Yanbo Wang, Yue Huang et al.

ICLR 2025posterarXiv:2410.02736
207
citations
#52

Self-Play Preference Optimization for Language Model Alignment

Yue Wu, Zhiqing Sun, Rina Hughes et al.

ICLR 2025posterarXiv:2405.00675
207
citations
#53

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.

ICCV 2025posterarXiv:2503.12937
206
citations
#54

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Qingqing Zhao, Yao Lu, Moo Jin Kim et al.

CVPR 2025posterarXiv:2503.22020
203
citations
#55

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Kepan Nan, Rui Xie, Penghao Zhou et al.

ICLR 2025posterarXiv:2407.02371
200
citations
#56

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.

ICML 2025posterarXiv:2410.04417
190
citations
#57

Infinity∞: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Jian Han, Jinlai Liu, Yi Jiang et al.

CVPR 2025posterarXiv:2412.04431
189
citations
#58

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NEURIPS 2025spotlightarXiv:2503.13657
188
citations
#59

MambaOut: Do We Really Need Mamba for Vision?

Weihao Yu, Xinchao Wang

CVPR 2025posterarXiv:2405.07992
186
citations
#60

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

Jingyang Ou, Shen Nie, Kaiwen Xue et al.

ICLR 2025posterarXiv:2406.03736
182
citations
#61

One Step Diffusion via Shortcut Models

Kevin Frans, Danijar Hafner, Sergey Levine et al.

ICLR 2025posterarXiv:2410.12557
181
citations
#62

Inverse Scaling: When Bigger Isn't Better

Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.

ICLR 2025posterarXiv:2306.09479
180
citations
#63

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Chris Rawles, Sarah Clinckemaillie, Yifan Chang et al.

ICLR 2025posterarXiv:2405.14573
180
citations
#64

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025posterarXiv:2404.02078
179
citations
#65

Revisiting Feature Prediction for Learning Visual Representations from Video

Quentin Garrido, Yann LeCun, Michael Rabbat et al.

ICLR 2025posterarXiv:2404.08471
178
citations
#66

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.

ICLR 2025posterarXiv:2409.16040
178
citations
#67

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Xiaoxi Li, Jiajie Jin, Guanting Dong et al.

NEURIPS 2025posterarXiv:2504.21776
174
citations
#68

VACE: All-in-One Video Creation and Editing

Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.

ICCV 2025posterarXiv:2503.07598
169
citations
#69

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Jiahui Gao, Renjie Pi, Jipeng Zhang et al.

ICLR 2025posterarXiv:2312.11370
169
citations
#70

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837
169
citations
#71

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Marianne Arriola, Aaron Gokaslan, Justin Chiu et al.

ICLR 2025posterarXiv:2503.09573
166
citations
#72

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.

ICLR 2025posterarXiv:2410.10819
165
citations
#73

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu, Zekun Wang, Junli Wang et al.

ICML 2025posterarXiv:2412.04454
165
citations
#74

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

ICLR 2025posterarXiv:2403.17887
160
citations
#75

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Jingfeng Yao, Bin Yang, Xinggang Wang

CVPR 2025posterarXiv:2501.01423
159
citations
#76

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi, Jaechan Lee, Yangsibo Huang et al.

ICLR 2025posterarXiv:2407.06460
157
citations
#77

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

Yuancheng Wang, Haoyue Zhan, Liwei Liu et al.

ICLR 2025posterarXiv:2409.00750
156
citations
#78

Diffusion Models Are Real-Time Game Engines

Dani Valevski, Yaniv Leviathan, Moab Arar et al.

ICLR 2025posterarXiv:2408.14837
156
citations
#79

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

NEURIPS 2025posterarXiv:2502.04463
155
citations
#80

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.

CVPR 2025posterarXiv:2403.14773
154
citations
#81

ToolRL: Reward is All Tool Learning Needs

Cheng Qian, Emre Can Acikgoz, Qi He et al.

NEURIPS 2025oralarXiv:2504.13958
152
citations
#82

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.

ICLR 2025posterarXiv:2410.12784
150
citations
#83

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

Yangzhen Wu, Zhiqing Sun, Shanda Li et al.

ICLR 2025poster
146
citations
#84

Gated Delta Networks: Improving Mamba2 with Delta Rule

Songlin Yang, Jan Kautz, Ali Hatamizadeh

ICLR 2025posterarXiv:2412.06464
145
citations
#85

World Model on Million-Length Video And Language With Blockwise RingAttention

Hao Liu, Wilson Yan, Matei Zaharia et al.

ICLR 2025oralarXiv:2402.08268
144
citations
#86

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Yan Shu, Zheng Liu, Peitian Zhang et al.

CVPR 2025posterarXiv:2409.14485
144
citations
#87

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai et al.

NEURIPS 2025oralarXiv:2505.13447
143
citations
#88

Physics of Language Models: Part 3.2, Knowledge Manipulation

Zeyuan Allen-Zhu, Yuanzhi Li

ICLR 2025posterarXiv:2309.14402
142
citations
#89

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.

ICLR 2025posterarXiv:2406.04770
142
citations
#90

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Tinghao Xie, Xiangyu Qi, Yi Zeng et al.

ICLR 2025posterarXiv:2406.14598
141
citations
#91

Retrieval Head Mechanistically Explains Long-Context Factuality

Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.

ICLR 2025posterarXiv:2404.15574
140
citations
#92

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

Xuanchi Ren, Tianchang Shen, Jiahui Huang et al.

CVPR 2025highlightarXiv:2503.03751
138
citations
#93

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

Junfeng Fang, Houcheng Jiang, Kun Wang et al.

ICLR 2025posterarXiv:2410.02355
138
citations
#94

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.

AAAI 2025paperarXiv:2311.17179
137
citations
#95

Diffusion Policy Policy Optimization

Allen Ren, Justin Lidard, Lars Ankile et al.

ICLR 2025posterarXiv:2409.00588
137
citations
#96

Navigation World Models

Amir Bar, Gaoyue Zhou, Danny Tran et al.

CVPR 2025posterarXiv:2412.03572
136
citations
#97

AFlow: Automating Agentic Workflow Generation

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu et al.

ICLR 2025posterarXiv:2410.10762
135
citations
#98

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Shansan Gong, Shivam Agarwal, Yizhe Zhang et al.

ICLR 2025posterarXiv:2410.17891
135
citations
#99

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models

Bofei Gao, Feifan Song, Zhe Yang et al.

ICLR 2025posterarXiv:2410.07985
135
citations
#100

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

NEURIPS 2025spotlightarXiv:2502.05171
134
citations
#101

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin, Z.Z. Ren, Junxiao Song et al.

ICLR 2025posterarXiv:2408.08152
134
citations
#102

Training Software Engineering Agents and Verifiers with SWE-Gym

Jiayi Pan, Xingyao Wang, Graham Neubig et al.

ICML 2025posterarXiv:2412.21139
130
citations
#103

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Chaoyou Fu, Haojia Lin, Xiong Wang et al.

NEURIPS 2025spotlightarXiv:2501.01957
130
citations
#104

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.

ICLR 2025posterarXiv:2410.09024
127
citations
#105

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities

CHENMING ZHU, Tai Wang, Wenwei Zhang et al.

ICCV 2025poster
127
citations
#106

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver

Zhenting Qi, Mingyuan MA, Jiahang Xu et al.

ICLR 2025posterarXiv:2408.06195
127
citations
#107

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe et al.

ICLR 2025posterarXiv:2410.07095
127
citations
#108

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Qingkai Fang, Shoutao Guo, Yan Zhou et al.

ICLR 2025posterarXiv:2409.06666
127
citations
#109

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Shengpeng Ji, Ziyue Jiang, Wen Wang et al.

ICLR 2025oralarXiv:2408.16532
125
citations
#110

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Thomas Bush, Stephen Chung, Usman Anwar et al.

ICLR 2025posterarXiv:1901.03559
124
citations
#111

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He et al.

NEURIPS 2025spotlightarXiv:2506.08009
123
citations
#112

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.

ICML 2025posterarXiv:2404.16873
123
citations
#113

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Kevin Qinghong Lin, Linjie Li, Difei Gao et al.

CVPR 2025posterarXiv:2411.17465
123
citations
#114

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NEURIPS 2025posterarXiv:2504.16084
122
citations
#115

IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Ziyang Li, Saikat Dutta, Mayur Naik

ICLR 2025posterarXiv:2405.17238
122
citations
#116

MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

Jiahui Lei, Yijia Weng, Adam W Harley et al.

CVPR 2025highlightarXiv:2405.17421
121
citations
#117

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Shi Yu, Chaoyue Tang, Bokai Xu et al.

ICLR 2025posterarXiv:2410.10594
121
citations
#118

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Liao Qu, Huichao Zhang, Yiheng Liu et al.

CVPR 2025posterarXiv:2412.03069
120
citations
#119

Automated Design of Agentic Systems

Shengran Hu, Cong Lu, Jeff Clune

ICLR 2025posterarXiv:2408.08435
120
citations
#120

WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing Yu, Haoyi Duan, Charles Herrmann et al.

CVPR 2025highlightarXiv:2406.09394
120
citations
#121

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Jing He, Haodong Li, Wei Yin et al.

ICLR 2025posterarXiv:2409.18124
120
citations
#122

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Tianwei Yin, Qiang Zhang, Richard Zhang et al.

CVPR 2025posterarXiv:2412.07772
119
citations
#123

Layer by Layer: Uncovering Hidden Representations in Language Models

Oscar Skean, Md Rifat Arefin, Dan Zhao et al.

ICML 2025oralarXiv:2502.02013
118
citations
#124

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Hadas Orgad, Michael Toker, Zorik Gekhman et al.

ICLR 2025posterarXiv:2410.02707
118
citations
#125

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Xeron Du, Yifan Yao, Kaijing Ma et al.

NEURIPS 2025posterarXiv:2502.14739
118
citations
#126

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Min Shi, Fuxiao Liu, Shihao Wang et al.

ICLR 2025posterarXiv:2408.15998
116
citations
#127

Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

Carles Domingo i Enrich, Michal Drozdzal, Brian Karrer et al.

ICLR 2025posterarXiv:2409.08861
116
citations
#128

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun et al.

ICLR 2025posterarXiv:2410.01943
116
citations
#129

Data Scaling Laws in Imitation Learning for Robotic Manipulation

Fanqi Lin, Yingdong Hu, Pingyue Sheng et al.

ICLR 2025posterarXiv:2410.18647
115
citations
#130

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Rui-Jie Zhu, Qihang Zhao, Jason Eshraghian et al.

ICLR 2025posterarXiv:2302.13939
115
citations
#131

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

Chengzu Li, Wenshan Wu, Huanyu Zhang et al.

ICML 2025posterarXiv:2501.07542
115
citations
#132

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Liliang Ren, Yang Liu, Yadong Lu et al.

ICLR 2025posterarXiv:2406.07522
115
citations
#133

Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation

Wei Wang, Haifeng Xia, Chao Huang et al.

NEURIPS 2025oral
115
citations
#134

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen et al.

AAAI 2025paperarXiv:2412.11664
115
citations
#135

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.

AAAI 2025paperarXiv:2412.16986
115
citations
#136

ToolACE: Winning the Points of LLM Function Calling

Weiwen Liu, Xu Huang, Xingshan Zeng et al.

ICLR 2025posterarXiv:2409.00920
114
citations
#137

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu et al.

ICLR 2025oralarXiv:2410.10813
114
citations
#138

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin et al.

ICLR 2025oralarXiv:2407.12781
114
citations
#139

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Zehan Qi, Xiao Liu, Iat Long Iong et al.

ICLR 2025posterarXiv:2411.02337
113
citations
#140

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Fei Wang, XINGYU FU, James Y. Huang et al.

ICLR 2025oralarXiv:2406.09411
113
citations
#141

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025posterarXiv:2501.13918
112
citations
#142

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Lijie Fan, Tianhong Li, Siyang Qin et al.

ICLR 2025posterarXiv:2410.13863
112
citations
#143

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Zeyuan Allen-Zhu, Yuanzhi Li

ICLR 2025posterarXiv:2404.05405
111
citations
#144

Taming Rectified Flow for Inversion and Editing

Jiangshan Wang, Junfu Pu, Zhongang Qi et al.

ICML 2025posterarXiv:2411.04746
110
citations
#145

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley, Daniel Tan, Niels Warncke et al.

ICML 2025oralarXiv:2502.17424
110
citations
#146

Scaling up Masked Diffusion Models on Text

Shen Nie, Fengqi Zhu, Chao Du et al.

ICLR 2025oralarXiv:2410.18514
110
citations
#147

Tamper-Resistant Safeguards for Open-Weight LLMs

Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.

ICLR 2025posterarXiv:2408.00761
108
citations
#148

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

AAAI 2025paperarXiv:2403.14520
106
citations
#149

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Shaolei Zhang, Qingkai Fang, Yang et al.

ICLR 2025posterarXiv:2501.03895
106
citations
#150

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Xiaogeng Liu, Peiran Li, G. Edward Suh et al.

ICLR 2025posterarXiv:2410.05295
106
citations
#151

EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE

Zeyi Liao, Lingbo Mo, Chejian Xu et al.

ICLR 2025posterarXiv:2409.11295
106
citations
#152

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Rundi Wu, Ruiqi Gao, Ben Poole et al.

CVPR 2025posterarXiv:2411.18613
105
citations
#153

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Xierui Wang, Siming Fu, Qihan Huang et al.

ICLR 2025posterarXiv:2406.07209
104
citations
#154

A General Framework for Inference-time Scaling and Steering of Diffusion Models

Raghav Singhal, Zachary Horvitz, Ryan Teehan et al.

ICML 2025posterarXiv:2501.06848
103
citations
#155

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei et al.

ICLR 2025posterarXiv:2410.02644
103
citations
#156

Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

Kaiwen Zheng, Yongxin Chen, Hanzi Mao et al.

ICLR 2025posterarXiv:2409.02908
103
citations
#157

HelpSteer2-Preference: Complementing Ratings with Preferences

Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.

ICLR 2025posterarXiv:2410.01257
103
citations
#158

OmniRe: Omni Urban Scene Reconstruction

Ziyu Chen, Jiawei Yang, Jiahui Huang et al.

ICLR 2025posterarXiv:2408.16760
103
citations
#159

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Xiong Wang, Yangze Li, Chaoyou Fu et al.

ICML 2025posterarXiv:2411.00774
103
citations
#160

VideoPhy: Evaluating Physical Commonsense for Video Generation

Hritik Bansal, Zongyu Lin, Tianyi Xie et al.

ICLR 2025posterarXiv:2406.03520
102
citations
#161

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Wenhao Chai, Enxin Song, Yilun Du et al.

ICLR 2025oralarXiv:2410.03051
102
citations
#162

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Huanjin Yao, Jiaxing Huang, Wenhao Wu et al.

NEURIPS 2025spotlightarXiv:2412.18319
102
citations
#163

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Yuhui Li, Fangyun Wei, Chao Zhang et al.

NEURIPS 2025posterarXiv:2503.01840
102
citations
#164

Autoregressive Video Generation without Vector Quantization

Haoge Deng, Ting Pan, Haiwen Diao et al.

ICLR 2025oralarXiv:2412.14169
101
citations
#165

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

John Yang, Carlos E Jimenez, Alex Zhang et al.

ICLR 2025posterarXiv:2410.03859
101
citations
#166

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Yiwen Chen, Tong He, Di Huang et al.

ICLR 2025posterarXiv:2406.10163
101
citations
#167

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.

ICML 2025spotlightarXiv:2501.17148
100
citations
#168

On the self-verification limitations of large language models on reasoning and planning tasks

Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

ICLR 2025posterarXiv:2402.08115
100
citations
#169

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Yushi Bai, Jiajie Zhang, Xin Lv et al.

ICLR 2025posterarXiv:2408.07055
100
citations
#170

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Saaket Agashe, Jiuzhou Han, Shuyu Gan et al.

ICLR 2025posterarXiv:2410.08164
100
citations
#171

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Xianjie Wu, Jian Yang, Linzheng Chai et al.

AAAI 2025paperarXiv:2408.09174
99
citations
#172

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NEURIPS 2025posterarXiv:2505.24864
99
citations
#173

RegMix: Data Mixture as Regression for Language Model Pre-training

Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.

ICLR 2025posterarXiv:2407.01492
99
citations
#174

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

ICLR 2025posterarXiv:2407.20311
98
citations
#175

FoundationStereo: Zero-Shot Stereo Matching

Bowen Wen, Matthew Trepte, Oluwaseun Joseph Aribido et al.

CVPR 2025posterarXiv:2501.09898
98
citations
#176

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Gen Luo, Yiyi Zhou, Yuxin Zhang et al.

ICLR 2025posterarXiv:2403.03003
98
citations
#177

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Yuxin Zuo, Shang Qu, Yifei Li et al.

ICML 2025posterarXiv:2501.18362
98
citations
#178

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

Yantao Liu, Zijun Yao, Rui Min et al.

ICLR 2025posterarXiv:2410.16184
97
citations
#179

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Litu Rout, Yujia Chen, Nataniel Ruiz et al.

ICLR 2025posterarXiv:2410.10792
97
citations
#180

OR-Bench: An Over-Refusal Benchmark for Large Language Models

Jiaxing Cui, Wei-Lin Chiang, Ion Stoica et al.

ICML 2025posterarXiv:2405.20947
97
citations
#181

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Matt Deitke, Christopher Clark, Sangho Lee et al.

CVPR 2025posterarXiv:2409.17146
96
citations
#182

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He et al.

CVPR 2025posterarXiv:2503.10622
96
citations
#183

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Quanfeng Lu, Wenqi Shao, Zitao Liu et al.

ICCV 2025posterarXiv:2406.08451
96
citations
#184

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Wenkai Yang, Shuming Ma, Yankai Lin et al.

NEURIPS 2025posterarXiv:2502.18080
96
citations
#185

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Yuan Feng, Junlin Lv, Yukun Cao et al.

NEURIPS 2025posterarXiv:2407.11550
95
citations
#186

LLaVA-Critic: Learning to Evaluate Multimodal Models

Tianyi Xiong, Xiyao Wang, Dong Guo et al.

CVPR 2025posterarXiv:2410.02712
95
citations
#187

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen et al.

NEURIPS 2025posterarXiv:2505.24298
95
citations
#188

MoBA: Mixture of Block Attention for Long-Context LLMs

Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.

NEURIPS 2025spotlightarXiv:2502.13189
94
citations
#189

DEIM: DETR with Improved Matching for Fast Convergence

Shihua Huang, Zhichao Lu, Xiaodong Cun et al.

CVPR 2025posterarXiv:2412.04234
93
citations
#190

MLVU: Benchmarking Multi-task Long Video Understanding

Junjie Zhou, Yan Shu, Bo Zhao et al.

CVPR 2025posterarXiv:2406.04264
93
citations
#191

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

Shangzhan Zhang, Jianyuan Wang, Yinghao Xu et al.

CVPR 2025posterarXiv:2502.12138
92
citations
#192

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.00703
91
citations
#193

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Yuang Peng, Yuxin Cui, Haomiao Tang et al.

ICLR 2025posterarXiv:2406.16855
91
citations
#194

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Xinlei Chen, Zhuang Liu, Saining Xie et al.

ICLR 2025posterarXiv:2401.14404
91
citations
#195

ColPali: Efficient Document Retrieval with Vision Language Models

Manuel Faysse, Hugues Sibille, Tony Wu et al.

ICLR 2025posterarXiv:2407.01449
91
citations
#196

When Attention Sink Emerges in Language Models: An Empirical View

Xiangming Gu, Tianyu Pang, Chao Du et al.

ICLR 2025posterarXiv:2410.10781
90
citations
#197

Show-o2: Improved Native Unified Multimodal Models

Jinheng Xie, Zhenheng Yang, Mike Zheng Shou

NEURIPS 2025oralarXiv:2506.15564
90
citations
#198

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

Muyang Li, Yujun Lin, Zhekai Zhang et al.

ICLR 2025posterarXiv:2411.05007
90
citations
#199

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Haian Jin, Hanwen Jiang, Hao Tan et al.

ICLR 2025posterarXiv:2410.17242
90
citations
#200

Not All Language Model Features Are One-Dimensionally Linear

Josh Engels, Eric Michaud, Isaac Liao et al.

ICLR 2025posterarXiv:2405.14860
89
citations
PreviousNext