Most Cited 2025 "gated memory unit" Papers

22,274 papers found • Page 2 of 112

#201

Not All Language Model Features Are One-Dimensionally Linear

Josh Engels, Eric Michaud, Isaac Liao et al.

ICLR 2025posterarXiv:2405.14860
89
citations
#202

Remarkable Robustness of LLMs: Stages of Inference?

Vedang Lad, Jin Hwa Lee, Wes Gurnee et al.

NEURIPS 2025oralarXiv:2406.19384
89
citations
#203

Theoretical guarantees on the best-of-n alignment policy

Ahmad Beirami, Alekh Agarwal, Jonathan Berant et al.

ICML 2025posterarXiv:2401.01879
89
citations
#204

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Yuheng Ji, Huajie Tan, Jiayu Shi et al.

CVPR 2025posterarXiv:2502.21257
89
citations
#205

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Jianwen Jiang, Chao Liang, Jiaqi Yang et al.

ICLR 2025oralarXiv:2409.02634
89
citations
#206

CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding

Jiquan Wang, Sha Zhao, Zhiling Luo et al.

ICLR 2025oralarXiv:2412.07236
88
citations
#207

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Cong Wei, Zheyang Xiong, Weiming Ren et al.

ICLR 2025posterarXiv:2411.07199
88
citations
#208

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.

ICML 2025posterarXiv:2502.09621
88
citations
#209

Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction

Xiang Fu, Brandon Wood, Luis Barroso-Luque et al.

ICML 2025oralarXiv:2502.12147
87
citations
#210

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

Yuning Cui, Syed Waqas Zamir, Salman Khan et al.

ICLR 2025posterarXiv:2403.14614
86
citations
#211

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

ICLR 2025posterarXiv:2403.20271
86
citations
#212

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

gaojie lin, Jianwen Jiang, Jiaqi Yang et al.

ICCV 2025highlightarXiv:2502.01061
86
citations
#213

Vision-LSTM: xLSTM as Generic Vision Backbone

Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.

ICLR 2025posterarXiv:2406.04303
85
citations
#214

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

Shiyu Wang, Jiawei LI, Xiaoming Shi et al.

ICLR 2025oralarXiv:2410.16032
85
citations
#215

Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing

Bingliang Zhang, Wenda Chu, Julius Berner et al.

CVPR 2025posterarXiv:2407.01521
85
citations
#216

Making Text Embedders Few-Shot Learners

Chaofan Li, Minghao Qin, Shitao Xiao et al.

ICLR 2025posterarXiv:2409.15700
85
citations
#217

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Yang Tian, Sizhe Yang, Jia Zeng et al.

ICLR 2025posterarXiv:2412.15109
85
citations
#218

Training-free Camera Control for Video Generation

Chen Hou, Zhibo Chen

ICLR 2025posterarXiv:2406.10126
84
citations
#219

ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li et al.

NEURIPS 2025posterarXiv:2505.20275
84
citations
#220

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

Guosheng Zhao, Chaojun Ni, Xiaofeng Wang et al.

CVPR 2025posterarXiv:2410.13571
83
citations
#221

Kolmogorov-Arnold Transformer

Xingyi Yang, Xinchao Wang

ICLR 2025posterarXiv:2409.10594
83
citations
#222

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Di Liu, Meng Chen, Baotong Lu et al.

NEURIPS 2025posterarXiv:2409.10516
83
citations
#223

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

Colin White, Samuel Dooley, Manley Roberts et al.

ICLR 2025posterarXiv:2406.19314
83
citations
#224

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Jensen Zhou, Hang Gao, Vikram Voleti et al.

ICCV 2025posterarXiv:2503.14489
83
citations
#225

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

Minh Nguyen, Andrew Baker, Clement Neo et al.

ICLR 2025posterarXiv:2407.01082
82
citations
#226

Unlocking Guidance for Discrete State-Space Diffusion and Flow Models

Hunter Nisonoff, Junhao Xiong, Stephan Allenspach et al.

ICLR 2025posterarXiv:2406.01572
82
citations
#227

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

Chengke Zou, Xingang Guo, Rui Yang et al.

ICLR 2025posterarXiv:2411.00836
82
citations
#228

Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances

Shilin Lu, Zihan Zhou, Jiayou Lu et al.

ICLR 2025posterarXiv:2410.18775
82
citations
#229

MambaIRv2: Attentive State Space Restoration

Hang Guo, Yong Guo, Yaohua Zha et al.

CVPR 2025posterarXiv:2411.15269
82
citations
#230

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.

AAAI 2025paper
82
citations
#231

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

Shihao Wang, Zhiding Yu, Xiaohui Jiang et al.

CVPR 2025posterarXiv:2504.04348
82
citations
#232

Point Cloud Mamba: Point Cloud Learning via State Space Model

Tao Zhang, Haobo Yuan, Lu Qi et al.

AAAI 2025paperarXiv:2403.00762
81
citations
#233

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution

Zhiyuan You, Xin Cai, Jinjin Gu et al.

CVPR 2025posterarXiv:2501.11561
81
citations
#234

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NEURIPS 2025posterarXiv:2505.22648
81
citations
#235

Soft Merging of Experts with Adaptive Routing

Haokun Liu, Muqeeth Mohammed, Colin Raffel

ICLR 2025posterarXiv:2306.03745
81
citations
#236

Consistency Models Made Easy

Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.

ICLR 2025posterarXiv:2406.14548
81
citations
#237

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.

ICLR 2025posterarXiv:2410.00371
81
citations
#238

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Zhenggang Tang, Yuchen Fan, Dilin Wang et al.

CVPR 2025posterarXiv:2412.06974
80
citations
#239

Real-Time Video Generation with Pyramid Attention Broadcast

Xuanlei Zhao, Xiaolong Jin, Kai Wang et al.

ICLR 2025posterarXiv:2408.12588
79
citations
#240

MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

jiarui zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.

ICLR 2025posterarXiv:2502.17422
79
citations
#241

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

Yao Lai, Sungyoung Lee, Guojin Chen et al.

AAAI 2025paperarXiv:2405.14918
79
citations
#242

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025posterarXiv:2412.15188
79
citations
#243

MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS

Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi et al.

ICLR 2025posterarXiv:2411.02571
78
citations
#244

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Sergio Gómez Colmenarejo, Jost Springenberg, Jose Enrique Chen et al.

ICLR 2025poster
78
citations
#245

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Mengkang Hu, Yuhang Zhou, Wendong Fan et al.

NEURIPS 2025posterarXiv:2505.23885
78
citations
#246

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

Sherwin Bahmani, Ivan Skorokhodov, Guocheng Qian et al.

CVPR 2025posterarXiv:2411.18673
78
citations
#247

GraphRouter: A Graph-based Router for LLM Selections

Tao Feng, Yanzhen Shen, Jiaxuan You

ICLR 2025posterarXiv:2410.03834
77
citations
#248

Dissecting Adversarial Robustness of Multimodal LM Agents

Chen Wu, Rishi Shah, Jing Yu Koh et al.

ICLR 2025posterarXiv:2406.12814
77
citations
#249

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

David Chanin, James Wilken-Smith, Tomáš Dulka et al.

NEURIPS 2025oralarXiv:2409.14507
77
citations
#250

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.

ICLR 2025posterarXiv:2411.14257
77
citations
#251

CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

Hyungjin Chung, Jeongsol Kim, Geon Yeong Park et al.

ICLR 2025posterarXiv:2406.08070
77
citations
#252

Language models scale reliably with over-training and on downstream tasks

Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.

ICLR 2025posterarXiv:2403.08540
77
citations
#253

Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data

Xinyi Wang, Antonis Antoniades, Yanai Elazar et al.

ICLR 2025posterarXiv:2407.14985
76
citations
#254

Eliciting Human Preferences with Language Models

Belinda Li, Alex Tamkin, Noah Goodman et al.

ICLR 2025oralarXiv:2310.11589
76
citations
#255

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NEURIPS 2025spotlightarXiv:2504.12216
75
citations
#256

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Zhipei Xu, Xuanyu Zhang, Runyi Li et al.

ICLR 2025posterarXiv:2410.02761
75
citations
#257

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.

ICLR 2025posterarXiv:2407.08223
75
citations
#258

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Jiacheng Ye, Jiahui Gao, Shansan Gong et al.

ICLR 2025posterarXiv:2410.14157
75
citations
#259

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

Sreyan Ghosh, Arushi Goel, Jaehyeon Kim et al.

NEURIPS 2025spotlightarXiv:2507.08128
74
citations
#260

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Xiaojun Jia, Tianyu Pang, Chao Du et al.

ICLR 2025posterarXiv:2405.21018
74
citations
#261

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

NEURIPS 2025posterarXiv:2505.14652
74
citations
#262

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NEURIPS 2025posterarXiv:2506.01347
74
citations
#263

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang, Zijian Ma, Yunpu Ma et al.

AAAI 2025paperarXiv:2408.15978
74
citations
#264

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

Renrui Zhang, Xinyu Wei, Dongzhi Jiang et al.

ICLR 2025posterarXiv:2407.08739
74
citations
#265

MMTEB: Massive Multilingual Text Embedding Benchmark

Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.

ICLR 2025posterarXiv:2502.13595
74
citations
#266

OGBench: Benchmarking Offline Goal-Conditioned RL

Seohong Park, Kevin Frans, Benjamin Eysenbach et al.

ICLR 2025posterarXiv:2410.20092
74
citations
#267

Remasking Discrete Diffusion Models with Inference-Time Scaling

Guanghan Wang, Yair Schiff, Subham Sahoo et al.

NEURIPS 2025posterarXiv:2503.00307
74
citations
#268

MaskBit: Embedding-free Image Generation via Bit Tokens

Mark Weber, Lijun Yu, Qihang Yu et al.

ICLR 2025posterarXiv:2409.16211
73
citations
#269

MV-Adapter: Multi-View Consistent Image Generation Made Easy

Zehuan Huang, Yuan-Chen Guo, Haoran Wang et al.

ICCV 2025posterarXiv:2412.03632
73
citations
#270

REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Xingjian Leng, Jaskirat Singh, Yunzhong Hou et al.

ICCV 2025posterarXiv:2504.10483
73
citations
#271

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Wenbin Wang, Liang Ding, Minyan Zeng et al.

AAAI 2025paperarXiv:2408.15556
73
citations
#272

Language Models Learn to Mislead Humans via RLHF

Jiaxin Wen, Ruiqi Zhong, Akbir Khan et al.

ICLR 2025posterarXiv:2409.12822
73
citations
#273

Offline Actor-Critic for Average Reward MDPs

William Powell, Jeongyeol Kwon, Qiaomin Xie et al.

NEURIPS 2025poster
73
citations
#274

Planning in Natural Language Improves LLM Search for Code Generation

Evan Wang, Federico Cassano, Catherine Wu et al.

ICLR 2025posterarXiv:2409.03733
72
citations
#275

VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool

Chia-Tung Ho, Haoxing Ren, Brucek Khailany

AAAI 2025paperarXiv:2408.08927
72
citations
#276

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Fanqing Meng, Jiaqi Liao, Xinyu Tan et al.

ICML 2025posterarXiv:2410.05363
72
citations
#277

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data

Chengsen Wang, Qi Qi, Jingyu Wang et al.

AAAI 2025paperarXiv:2412.11376
72
citations
#278

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Akshara Prabhakar, Zuxin Liu, Ming Zhu et al.

NEURIPS 2025posterarXiv:2504.03601
71
citations
#279

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

Shaoyuan Xie, Lingdong Kong, Yuhao Dong et al.

ICCV 2025posterarXiv:2501.04003
71
citations
#280

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Yilun Zhao, Lujing Xie, Haowei Zhang et al.

CVPR 2025posterarXiv:2501.12380
70
citations
#281

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Zhepei Wei, Wei-Lin Chen, Yu Meng

ICLR 2025posterarXiv:2406.13629
70
citations
#282

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.

ICLR 2025posterarXiv:2411.13543
70
citations
#283

Fine-tuning can cripple your foundation model; preserving features may be the solution

Philip Torr, Puneet Dokania, Jishnu Mukhoti et al.

ICLR 2025posterarXiv:2308.13320
70
citations
#284

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Qingyang Zhang, Haitao Wu, Changqing Zhang et al.

NEURIPS 2025spotlightarXiv:2504.05812
70
citations
#285

Programming Refusal with Conditional Activation Steering

Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy et al.

ICLR 2025posterarXiv:2409.05907
70
citations
#286

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Chongyu Fan, Jiancheng Liu, Licong Lin et al.

NEURIPS 2025posterarXiv:2410.07163
70
citations
#287

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Xi Chen, Zhifei Zhang, He Zhang et al.

CVPR 2025highlightarXiv:2412.07774
70
citations
#288

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Yunfei Xie, Ce Zhou, Lang Gao et al.

ICLR 2025posterarXiv:2408.02900
70
citations
#289

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Yuxuan Zhang, Yirui Yuan, Yiren Song et al.

ICCV 2025posterarXiv:2503.07027
70
citations
#290

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Yongdong Luo, Xiawu Zheng, Guilin Li et al.

NEURIPS 2025posterarXiv:2411.13093
69
citations
#291

Diffusion-Based Planning for Autonomous Driving with Flexible Guidance

Yinan Zheng, Ruiming Liang, Kexin ZHENG et al.

ICLR 2025posterarXiv:2501.15564
69
citations
#292

DiT4Edit: Diffusion Transformer for Image Editing

Kunyu Feng, Yue Ma, Bingyuan Wang et al.

AAAI 2025paperarXiv:2411.03286
69
citations
#293

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Luo, Xue Yang, Wenhan Dou et al.

CVPR 2025posterarXiv:2410.08202
68
citations
#294

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.

CVPR 2025posterarXiv:2412.14167
68
citations
#295

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Amrith Setlur, Nived Rajaraman, Sergey Levine et al.

ICML 2025spotlightarXiv:2502.12118
68
citations
#296

Adaptive Keyframe Sampling for Long Video Understanding

Xi Tang, Jihao Qiu, Lingxi Xie et al.

CVPR 2025posterarXiv:2502.21271
68
citations
#297

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Xiao Liu, Tianjie Zhang, Yu Gu et al.

ICLR 2025posterarXiv:2408.06327
67
citations
#298

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.

NEURIPS 2025spotlightarXiv:2505.23747
67
citations
#299

Cradle: Empowering Foundation Agents towards General Computer Control

Weihao Tan, Wentao Zhang, Xinrun Xu et al.

ICML 2025posterarXiv:2403.03186
67
citations
#300

HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation

Yi Li, Yuquan Deng, Jesse Zhang et al.

ICLR 2025posterarXiv:2502.05485
67
citations
#301

History-Guided Video Diffusion

Kiwhan Song, Boyuan Chen, Max Simchowitz et al.

ICML 2025oralarXiv:2502.06764
66
citations
#302

dKV-Cache: The Cache for Diffusion Language Models

Xinyin Ma, Runpeng Yu, Gongfan Fang et al.

NEURIPS 2025posterarXiv:2505.15781
66
citations
#303

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Ke Yang, Yao Liu, Sapana Chaudhary et al.

ICLR 2025posterarXiv:2410.13825
66
citations
#304

Does Refusal Training in LLMs Generalize to the Past Tense?

Maksym Andriushchenko, Nicolas Flammarion

ICLR 2025posterarXiv:2407.11969
66
citations
#305

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

Zhen Xiang, Linzhi Zheng, Yanjie Li et al.

ICML 2025poster
66
citations
#306

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025posterarXiv:2504.05298
66
citations
#307

Scaling Laws for Precision

Tanishq Kumar, Zachary Ankner, Benjamin Spector et al.

ICLR 2025posterarXiv:2411.04330
65
citations
#308

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Cheng Yang, Chufan Shi, Yaxin Liu et al.

ICLR 2025posterarXiv:2406.09961
65
citations
#309

Accelerating Diffusion Transformers with Token-wise Feature Caching

Chang Zou, Xuyang Liu, Ting Liu et al.

ICLR 2025posterarXiv:2410.05317
65
citations
#310

Reasoning with Latent Thoughts: On the Power of Looped Transformers

Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.

ICLR 2025posterarXiv:2502.17416
65
citations
#311

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Zhen Ye, Peiwen Sun, Jiahe Lei et al.

AAAI 2025paperarXiv:2408.17175
65
citations
#312

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

Samuel Miserendino, Michele Wang, Tejal Patwardhan et al.

ICML 2025oralarXiv:2502.12115
64
citations
#313

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

Yakun Song, Zhuo Chen, Xiaofei Wang et al.

AAAI 2025paperarXiv:2401.07333
64
citations
#314

SWE-smith: Scaling Data for Software Engineering Agents

John Yang, Kilian Lieret, Carlos Jimenez et al.

NEURIPS 2025spotlightarXiv:2504.21798
64
citations
#315

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Zhenglin Huang, Jinwei Hu, Yiwei He et al.

CVPR 2025posterarXiv:2412.04292
64
citations
#316

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences

Canyu Zhao, Mingyu Liu, Wen Wang et al.

ICLR 2025posterarXiv:2407.16655
64
citations
#317

ImageFolder: Autoregressive Image Generation with Folded Tokens

Xiang Li, Kai Qiu, Hao Chen et al.

ICLR 2025posterarXiv:2410.01756
63
citations
#318

GameFactory: Creating New Games with Generative Interactive Videos

Jiwen Yu, Yiran Qin, Xintao Wang et al.

ICCV 2025highlightarXiv:2501.08325
63
citations
#319

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Aleksandar Makelov, Georg Lange, Neel Nanda

ICLR 2025posterarXiv:2405.08366
63
citations
#320

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Hritik Bansal, Arian Hosseini, Rishabh Agarwal et al.

ICLR 2025posterarXiv:2408.16737
63
citations
#321

FreDF: Learning to Forecast in the Frequency Domain

Hao Wang, Lichen Pan, Yuan Shen et al.

ICLR 2025posterarXiv:2402.02399
63
citations
#322

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

Tianwei Lin, Wenqiao Zhang, Sijing Li et al.

ICML 2025spotlightarXiv:2502.09838
63
citations
#323

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

Yaniv Nikankin, Anja Reusch, Aaron Mueller et al.

ICLR 2025posterarXiv:2410.21272
63
citations
#324

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

Marwa Abdulhai, Isadora White, Charlie Snell et al.

ICML 2025oralarXiv:2311.18232
63
citations
#325

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Yiming Huang, Xiao Liu, Yeyun Gong et al.

AAAI 2025paperarXiv:2403.02333
63
citations
#326

StreamDiffusion: A Pipeline-level Solution for Real-Time Interactive Generation

Akio Kodaira, Chenfeng Xu, Toshiki Hazama et al.

ICCV 2025posterarXiv:2312.12491
62
citations
#327

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Fei Shen, Hu Ye, Sibo Liu et al.

AAAI 2025paperarXiv:2407.02482
62
citations
#328

Simple Guidance Mechanisms for Discrete Diffusion Models

Yair Schiff, Subham Sahoo, Hao Phung et al.

ICLR 2025posterarXiv:2412.10193
62
citations
#329

CycleResearcher: Improving Automated Research via Automated Review

Yixuan Weng, Minjun Zhu, Guangsheng Bao et al.

ICLR 2025posterarXiv:2411.00816
62
citations
#330

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

Haoyu Fu, Diankun Zhang, Zongchuang Zhao et al.

ICCV 2025posterarXiv:2503.19755
62
citations
#331

UMA: A Family of Universal Models for Atoms

Brandon Wood, Misko Dzamba, Xiang Fu et al.

NEURIPS 2025spotlightarXiv:2506.23971
62
citations
#332

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?

Liqiang Jing, Zhehui Huang, Xiaoyang Wang et al.

ICLR 2025posterarXiv:2409.07703
62
citations
#333

UniScene: Unified Occupancy-centric Driving Scene Generation

Bohan Li, Jiazhe Guo, Hongsi Liu et al.

CVPR 2025posterarXiv:2412.05435
62
citations
#334

MagicPIG: LSH Sampling for Efficient LLM Generation

Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye et al.

ICLR 2025posterarXiv:2410.16179
62
citations
#335

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

Mark Boss, Zixuan Huang, Aaryaman Vasishta et al.

CVPR 2025posterarXiv:2408.00653
62
citations
#336

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Ziqi Pang, Tianyuan Zhang, Fujun Luan et al.

CVPR 2025posterarXiv:2412.01827
61
citations
#337

Learning Dynamics of LLM Finetuning

YI REN, Danica Sutherland

ICLR 2025posterarXiv:2407.10490
61
citations
#338

Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

Fushuo Huo, Wenchao Xu, Zhong Zhang et al.

ICLR 2025posterarXiv:2408.02032
61
citations
#339

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen et al.

ICLR 2025posterarXiv:2408.11049
61
citations
#340

CSGO: Content-Style Composition in Text-to-Image Generation

Peng Xing, Haofan Wang, Yanpeng Sun et al.

NEURIPS 2025posterarXiv:2408.16766
60
citations
#341

Image and Video Tokenization with Binary Spherical Quantization

Yue Zhao, Yuanjun Xiong, Philipp Krähenbühl

ICLR 2025posterarXiv:2406.07548
60
citations
#342

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models

Wei Xiao, Johnson (Tsun-Hsuan) Wang, Chuang Gan et al.

ICLR 2025posterarXiv:2306.00148
60
citations
#343

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Rongyao Fang, Chengqi Duan, Kun Wang et al.

NEURIPS 2025poster
60
citations
#344

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Ryan Burgert, Yuancheng Xu, Wenqi Xian et al.

CVPR 2025posterarXiv:2501.08331
59
citations
#345

DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki et al.

CVPR 2025posterarXiv:2503.01774
59
citations
#346

CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching

Xingjian Wu, Xiangfei Qiu, Zhengyu Li et al.

ICLR 2025posterarXiv:2410.12261
59
citations
#347

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Shuai Tan, Biao Gong, Xiang Wang et al.

ICLR 2025oralarXiv:2410.10306
59
citations
#348

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Shuyuan Tu, Zhen Xing, Xintong Han et al.

CVPR 2025posterarXiv:2411.17697
59
citations
#349

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Hyungjoo Chae, Namyoung Kim, Kai Ong et al.

ICLR 2025posterarXiv:2410.13232
59
citations
#350

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Oliver Jaffe et al.

ICLR 2025posterarXiv:2406.07358
58
citations
#351

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

En Yu, Kangheng Lin, Liang Zhao et al.

NEURIPS 2025posterarXiv:2504.07954
58
citations
#352

Repetition Improves Language Model Embeddings

Jacob Springer, Suhas Kotha, Daniel Fried et al.

ICLR 2025posterarXiv:2402.15449
58
citations
#353

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

Xuemeng Yang, Licheng Wen, Tiantian Wei et al.

ICCV 2025posterarXiv:2408.00415
58
citations
#354

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Shengqiong Wu, Hao Fei, Xiangtai Li et al.

ICLR 2025posterarXiv:2406.05127
58
citations
#355

Matryoshka Multimodal Models

Mu Cai, Jianwei Yang, Jianfeng Gao et al.

ICLR 2025posterarXiv:2405.17430
58
citations
#356

Thinkless: LLM Learns When to Think

Gongfan Fang, Xinyin Ma, Xinchao Wang

NEURIPS 2025posterarXiv:2505.13379
58
citations
#357

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.

AAAI 2025paperarXiv:2408.08640
58
citations
#358

Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos

Linyi Jin, Richard Tucker, Zhengqi Li et al.

CVPR 2025posterarXiv:2412.09621
58
citations
#359

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Reece Shuttleworth, Jacob Andreas, Antonio Torralba et al.

NEURIPS 2025posterarXiv:2410.21228
58
citations
#360

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Mufei Li, Siqi Miao, Pan Li

ICLR 2025posterarXiv:2410.20724
57
citations
#361

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen, Zichen Wen, Yichao Du et al.

NEURIPS 2025posterarXiv:2407.04842
57
citations
#362

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Edward LOO, Tianyu HUANG, Peng Li et al.

CVPR 2025highlightarXiv:2412.03079
57
citations
#363

Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation

Tiansheng Huang, Sihao Hu, Fatih Ilhan et al.

ICLR 2025posterarXiv:2409.01586
57
citations
#364

MUSt3R: Multi-view Network for Stereo 3D Reconstruction

Yohann Cabon, Lucas Stoffl, Leonid Antsfeld et al.

CVPR 2025highlightarXiv:2503.01661
57
citations
#365

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Guangkai Xu, yongtao ge, Mingyu Liu et al.

ICLR 2025posterarXiv:2403.06090
56
citations
#366

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Hanshi Sun, Li-Wen Chang, Wenlei Bao et al.

ICML 2025spotlightarXiv:2410.21465
56
citations
#367

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Lvmin Zhang, Shengqu Cai, Muyang Li et al.

NEURIPS 2025spotlightarXiv:2504.12626
56
citations
#368

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Chen Ziwen, Hao Tan, Kai Zhang et al.

ICCV 2025highlightarXiv:2410.12781
56
citations
#369

Long Context Tuning for Video Generation

Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.

ICCV 2025posterarXiv:2503.10589
56
citations
#370

RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts

Hjalmar Wijk, Tao Lin, Joel Becker et al.

ICML 2025spotlightarXiv:2411.15114
56
citations
#371

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Shuangrui Ding, Rui Qian, Xiaoyi Dong et al.

ICCV 2025posterarXiv:2410.16268
56
citations
#372

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NEURIPS 2025posterarXiv:2503.19470
56
citations
#373

FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

Yao Xiao, Tingfa Xu, Yu Xin et al.

AAAI 2025paperarXiv:2504.20670
55
citations
#374

Controlling Space and Time with Diffusion Models

Daniel Watson, Saurabh Saxena, Lala Li et al.

ICLR 2025posterarXiv:2407.07860
55
citations
#375

Sundial: A Family of Highly Capable Time Series Foundation Models

Yong Liu, Guo Qin, Zhiyuan Shi et al.

ICML 2025oralarXiv:2502.00816
55
citations
#376

Hymba: A Hybrid-head Architecture for Small Language Models

Xin Dong, Yonggan Fu, Shizhe Diao et al.

ICLR 2025posterarXiv:2411.13676
55
citations
#377

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois et al.

CVPR 2025posterarXiv:2412.10360
55
citations
#378

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Jianhong Bai, Menghan Xia, Xintao WANG et al.

ICLR 2025posterarXiv:2412.07760
55
citations
#379

Self-Improvement in Language Models: The Sharpening Mechanism

Audrey Huang, Adam Block, Dylan Foster et al.

ICLR 2025posterarXiv:2412.01951
55
citations
#380

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

ICML 2025posterarXiv:2410.02958
55
citations
#381

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Lijie Liu, Tianxiang Ma, Bingchuan Li et al.

ICCV 2025highlightarXiv:2502.11079
55
citations
#382

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Parshin Shojaee, Kazem Meidani, Shashank Gupta et al.

ICLR 2025posterarXiv:2404.18400
55
citations
#383

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang et al.

CVPR 2025posterarXiv:2411.19548
54
citations
#384

AgentSquare: Automatic LLM Agent Search in Modular Design Space

Yu Shang, Yu Li, Keyu Zhao et al.

ICLR 2025posterarXiv:2410.06153
54
citations
#385

Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami, Or Patashnik, Ohad Fried et al.

CVPR 2025posterarXiv:2411.14430
54
citations
#386

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao et al.

NEURIPS 2025spotlightarXiv:2503.22679
54
citations
#387

Inductive Moment Matching

Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song

ICML 2025oralarXiv:2503.07565
54
citations
#388

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Zhengyao Lyu, Chenyang Si, Junhao Song et al.

ICLR 2025oralarXiv:2410.19355
54
citations
#389

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning

Yu Fu, Zefan Cai, Abedelkadir Asi et al.

ICLR 2025posterarXiv:2410.19258
54
citations
#390

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

Julian Parker, Anton Smirnov, Jordi Pons et al.

ICLR 2025posterarXiv:2411.19842
54
citations
#391

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Lei Chen, Yuan Meng, Chen Tang et al.

CVPR 2025posterarXiv:2406.17343
54
citations
#392

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Songhao Han, Wei Huang, Hairong Shi et al.

CVPR 2025posterarXiv:2411.14794
54
citations
#393

Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Yangning Li, Yinghui Li, Xinyu Wang et al.

ICLR 2025posterarXiv:2411.02937
54
citations
#394

Wonderland: Navigating 3D Scenes from a Single Image

Hanwen Liang, Junli Cao, Vidit Goel et al.

CVPR 2025posterarXiv:2412.12091
54
citations
#395

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Tianyu Yu, Haoye Zhang, Qiming Li et al.

CVPR 2025highlightarXiv:2405.17220
54
citations
#396

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Zehui Chen, Kuikun Liu, Qiuchen Wang et al.

ICLR 2025posterarXiv:2407.20183
53
citations
#397

Proteina: Scaling Flow-based Protein Structure Generative Models

Tomas Geffner, Kieran Didi, Zuobai Zhang et al.

ICLR 2025posterarXiv:2503.00710
53
citations
#398

Goku: Flow Based Video Generative Foundation Models

Shoufa Chen, Chongjian GE, Yuqi Zhang et al.

CVPR 2025highlightarXiv:2502.04896
53
citations
#399

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Tianbao Xie, Jiaqi Deng, Xiaochuan Li et al.

NEURIPS 2025spotlightarXiv:2505.13227
53
citations
#400

Tell me about yourself: LLMs are aware of their learned behaviors

Jan Betley, Xuchan Bao, Martín Soto et al.

ICLR 2025oralarXiv:2501.11120
53
citations